Monday, January 14, 2013

Who's Connecting to My Servers?

I have been reading through the book Clojure Programming. In the course of reading this book, I've looked for all sorts of opportunities to try applying Clojure at work. Most of the time, I've used Clojure to implement convenience scripts to help with my day job. I would typically use Clojure whenever I have to work with Java based platforms and F# for .NET based platforms. Occasionally, I would develop scripts that does not have any dependency and I could choose any language to implement. What would typically happen is that I would choose the programming language that I used last. This strategy, unfortunately, would typically end up biasing me toward one programming language and lately, it has been biasing me toward Clojure. After noticing this trend, I have decided to deliberately and consciously choose to implement in the less frequently used language so I don't become completely rusty in the other programming languages.

Recently, I had to opportunity to write a small script. I was managing an infrastructure upgrade and needed to know the downstream impact. It was an infrastructure component that that a lot of persistent inbound connections, but unfortunately, the inbound connections were neither monitored nor documented. One way to check the connections is ask the the network engineers to setup monitoring on the servers and collect the information on the incoming connections. Our network engineers are generally pretty busy and we hate to add to their existing workloads. However, we can effectively do the same thing by running netstat -an on each of the target servers and taking that output dump and parse that for incoming connections. We would do this over a period of time to try to capture most of the client connections.

The following Clojure script loads all the netstat dump output files and generate a list of all the hosts that are connected to the target servers:

(import '( InetAddress))
(use '[clojure.string :only (join)])
(use '[ :as io])

; Load all the data from all *.data files in c:\work\servers folder
(def data (->> "c:\\work\\servers"
   (map #(.getAbsolutePath %))
   (filter #(re-matches #".*\.data$" %))
   (map #(slurp %))
   (join " ")))

; Find all ip addresses in the netstat dump
; Perform hostname lookup, discard duplicates, sort the hostnames   
(def hosts (->> data
   (re-seq #"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.\d+")
   (map #(second %))
   (map #(.getCanonicalHostName (InetAddress/getByName %)))
   (join "\n")))

; Dump output to clients.out file
(spit "c:\\work\\servers\\results.out" hosts)

The above script runs with the assumption that all data fits into memory. However, if that becomes a problem, it is fairly trivial to sequentially read and process netstat dump one file at a time and combine the results to write to the output.

The F# version is similar to Clojure version. Grabbing the files from the folder is easier but the need to explicitly handle exceptions adds back the additional lines of code to be about on par with code verbosity of the Clojure version.

open System.IO
open System.Net
open System.Text.RegularExpressions

// Load all the data from all *.data files in c:\work\servers folder
let data = Directory.GetFiles(@"c:\work\servers","*.data")
           |> File.ReadAllText
           |> String.concat " "

// Return hostname if it can be resolved
// otherwise return the ip address
let getHostEntry (ipaddress:string) =
      | err -> ipaddress

// Find all ip addresses in the netstat dump
// Perform hostname lookup, discard duplicates, sort the hostnames
let hosts = Regex.Matches(data,@"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.\d+")
            |> Seq.cast<Match>
            |> (fun m -> m.Groups.[1].Value)
            |> Set.ofSeq
            |> getHostEntry
            |> Seq.sort
            |> String.concat "\n"


No comments: