Showing posts with label fsharp. Show all posts
Showing posts with label fsharp. Show all posts

Monday, June 24, 2013

Riak CAP Tuning and F#

Riak provides the ability to tune CAP. CAP, which stands for Consistency, Availability, and Partition tolerance, does not seem like controls that are tunable. These terms seem evoke images of binary choices, as in either you have it or you don't. CAP terms by itself is ambiguous in their definitions. I'm not the only one who feels that way as can be seen in Daniel Abadi's blog post. For me, it was more helpful for me to think of tradeoffs as consistency latency (time needed to achieve eventual consistency), performance (read/write latency), and node failure tolerance (how many nodes can fail and still have a working cluster).

Riak exposes their CAP tuning controls via the named variables N, R, and W. These variables are defined as follows:

N
Number of nodes to replicated a piece of data
R
Number of nodes to read data to be considered success (read failure tolerance)
W
Number of nodes to write data to be considered write complete (write fault tolerance)

In addition, Riak exposes these additional tuning controls:

PR
Number of primary, non-fallback nodes that must return results for a successful read
PW
Number of primary, non-fallback nodes that must accept a write
DW
Number of nodes which have received an acknowledgement of the write from the storage backend

Bucket Level CAP Controls in Riak

Here's an example on how to set bucket level CAP settings in Riak with CorrugatedIron:


// Get existing bucket properties
let properties = ciClient.GetBucketProperties("animals",true).Value

// Set # of nodes a write must ultimately replicate to
// This should be set at the creation of the bucket
properties.SetNVal(3u)

// Set number of nodes that must successfully written before successful write response
properties.SetWVal(2u)

// Set # of nodes required to read a value succesfully
properties.SetRVal(1u)

// Set primary read value
properties.SetPrVal(1u)

// Set primary write value
properties.SetPwVal(1u)

// Set durable write value
properties.SetDwVal(1u)

// Change bucket properties with these new CAP control values
ciClient.SetBucketProperties("animals",properties)

Per Request CAP Controls in Riak

Riak allows you to tune CAP controls at per request level:

// Setting W & DW on puts
let options = new RiakPutOptions()
options.SetW(3u).SetDw(1u)
let data = new RiakObject("animals","toto",{nickname="Toto"; breed="Cairn Terrier"; score=5})
ciClient.Put(data,options)

// Get item with R value set to 2
ciClient.Get("animals","toto",1u).Value.GetObject<Animal>()

// Specify quorum
let getOptions = new RiakGetOptions()
getOptions.SetR("quorum")

// Need to convert IRiakClient to RiakClient in order to set RiakGetOptions
let client = ciClient :?> RiakClient
client.Get("animals","toto",getOptions).Value.GetObject<Animal>()

Sunday, May 12, 2013

Riak Links and Link Walking with F# and CorrugatedIron

Continuing my journey through the book Seven Databases in Seven Weeks, I explore links and link walking in this blog post. Riak has the ability to establish one-way relationship between entries via Links, providing some of the capabilities of a graph database (Riak documentation calls it a lightweight graph database). Riak documentation hints that links should be kept low, on the order of dozens, not thousands. Using the Twitter example from Riak Handbook, you can probably easily find out who Don Syme follows and who does those people follows, etc. But it would be difficult to find all the people that follows Don Syme using Riak's link capability.


Adding Links

Here's how you would add links with CorrugatedIron library:

type Cage = { room : int }

(*
Linking cage 1 to polly via contains, equivalent to doing the following
curl -X PUT http://localhost:8098/riak/cages/1 \
-H "Content-Type: application/json" \
-H "Link: </riak/animals/polly>; riaktag=\"contains\"" \
-d '{"room" : 101}'
*)
client.Put(
  let cage = new RiakObject("cages","1",{room=101})
  cage.LinkTo("animals","polly","contains")
  cage)

(*
Putting ace in cage 2 and setting cage 2 next to cage 1.
Equivalent to the following sample code:
curl -X PUT http://localhost:8091/riak/cages/2 \
-H "Content-Type: application/json" \
-H "Link:</riak/animals/ace>;riaktag=\"contains\",</riak/cages/1>;riaktag=\"next_to\"" \
-d '{"room" : 101}'
*)
// Adding more than one link  
client.Put(
  let cage = new RiakObject("cages","2",{room=101})
  [("animals","ace","contains");("cages","1","next_to")]
  |> List.iter(fun (bucket,key,tag) -> cage.LinkTo(bucket,key,tag))
  cage)  


Link Walking

With CorrugatedIron, I can use the API to perform link walking and as an extra bonus, I don't have to worry about extracting data from the multipart/mixed mime types. CorrugateIron library takes care of all of that for us.

(*
Link walking, equivalent to 
curl http://riakhost1:8098/riak/cages/1/_,_,_
or in the new version:
curl http://riakhost1:8098/buckets/cages/keys/1/_,_,_
*)
 
let results = client.WalkLinks(new RiakObject("cages","1"),
                               [|new RiakLink(null,null,null)|])
         
// Dump results, which returns a list of RiakObject(s)
results.Value

// Since in this particular case, we have only one result, we can do the following
// and get back polly
results.Value.[0].GetObject<Animal>()

Here is the output result for results.Value:

// results.Value
val it : IList<RiakObject> =
  seq
    [CorrugatedIron.Models.RiakObject
       {BinIndexes = dict [];
        Bucket = "animals";
        CharSet = null;
        ContentEncoding = null;
        ContentType = "application/json";
        HasChanged = false;
        IntIndexes = dict [];
        Key = "polly";
        LastModified = 1359758822u;
        LastModifiedUsec = 523420u;
        Links = seq [];
        Siblings = seq [];
        UserMetaData = dict [];
        VTag = "2BTveSKTYDNOZNCiOxyryw";
        VTags = seq ["2BTveSKTYDNOZNCiOxyryw"];
        Value = [|123uy; 32uy; 34uy; 110uy; 105uy; 99uy; 107uy; 110uy; 97uy;
                  109uy; 101uy; 34uy; 32uy; 58uy; 32uy; 34uy; 83uy; 119uy;
                  101uy; 101uy; 116uy; 32uy; 80uy; 111uy; 108uy; 108uy; 121uy;
                  32uy; 80uy; 117uy; 114uy; 101uy; 98uy; 114uy; 101uy; 100uy;
                  34uy; 32uy; 44uy; 32uy; 34uy; 98uy; 114uy; 101uy; 101uy;
                  100uy; 34uy; 32uy; 58uy; 32uy; 34uy; 80uy; 117uy; 114uy;
                  101uy; 98uy; 114uy; 101uy; 100uy; 34uy; 32uy; 125uy|];
        VectorClock = [|107uy; 206uy; 97uy; 96uy; 96uy; 96uy; 204uy; 96uy;
                        202uy; 5uy; 82uy; 28uy; 169uy; 111uy; 239uy; 241uy;
                        7uy; 114uy; 230uy; 184uy; 103uy; 48uy; 37uy; 50uy;
                        230uy; 177uy; 50uy; 60uy; 59uy; 216uy; 112uy; 138uy;
                        47uy; 11uy; 0uy|];}]
      
// results.Value.[0].GetObject<Animal>()
val it : Animal = {nickname = "Sweet Polly Purebred";
                   breed = "Purebred";}     

There seems to be some limitations with link walking using CorrugatedIron library. There is a WalkLinks() method as part RiakClient, but the input is expecting RiakLink object which has bucket,key, tag as arguments. The link spec is expecting bucket, tag, and keep flag as arguments. I do notice that in the Riak documentation that Link Walking is not available as part of the Protocol Buffers Client (PBC) API, so I'm guessing the WalkLinks() method in CorrugatedIron is either using HTTP protocol or a modified usage of MapReduce. Since link walking is a special case of MapReduce querying, it may not matter much that CorrugatedIron has some limitations on link walking. One other issue with CorrugatedIron and link walking is that when I add multiple links with the same tag and try to link walking with CorrugatedIron, I do not get back all the links, I only get one of the links. In order to follow the examples in the book Seven Databases in Seven Weeks, I can fall back to using ASP.NET MVC Rest API.

Link walking using CorrugatedIron library:

// Link walking, equivalent to 
// curl http://localhost:8098/riak/cages/2/animals,_,_
client.WalkLinks(new RiakObject("cages","2"),
                 [|new RiakLink("animals",null,null)|])
     
// curl http://localhost:8098/riak/cages/2/_,next_to,0/animals,_,_
client.WalkLinks(new RiakObject("cages","2"),
                 [|new RiakLink(null,null,"next_to");
                   new RiakLink("animals",null,null);|])

// I can't seem to specify the keep flag with CorrugatedIron, which keeps
// intermediate results as you walk beyond primary links
// curl http://localhost:8091/riak/cages/2/_,next_to,1/_,_,_     

The book Seven Databases in Seven Weeks is still using the old format for HTTP Link Walking. The new format is as follows:

GET /riak/bucket/key/[bucket],[tag],[keep]            # Old format
GET /buckets/bucket/keys/key/[bucket],[tag],[keep]    # New format

Link walking using REST API:


let riakurl = "http://myriakhost1:8098"
let restClient = new HttpClient()

type LinkWalkSpec = 
    { bucket: string; tag: string;  keep: string; }

    member x.Link = (sprintf "%s,%s,%s" x.bucket x.tag x.keep)

let linkWalker url bucket key (links:LinkWalkSpec list) =
    let baseurl = sprintf "%s/buckets/%s/keys/%s" url bucket key
    let rec buildLinkWalkUrl (linklist:LinkWalkSpec list) baseUrl =
      match linklist with
      | [] -> baseUrl
      | [x] -> sprintf "%s/%s" baseUrl (x.Link)
      | h::t -> let newUrl = sprintf "%s/%s" baseUrl (h.Link)
                buildLinkWalkUrl t newUrl

    buildLinkWalkUrl links baseurl
    |> restClient.GetStringAsync

// Equiv to : curl http://localhost:8091/riak/cages/2/_,next_to,1/_,_,_  
   
[{bucket="_";tag="next_to";keep="1"};{bucket="_";tag="_";keep="_"}] 
|> linkWalker riakurl "cages" "2" 

Link walking results from using REST API:

val it : Task<string> =
  System.Threading.Tasks.Task`1[System.String]
    {AsyncState = null;
     CreationOptions = None;
     Exception = null;
     Id = 1;
     IsCanceled = false;
     IsCompleted = false;
     IsFaulted = false;
     Result = "
--CveXyss6PAqBxOOWxeWBCf6eXii
Content-Type: multipart/mixed; boundary=AvYXKrJYDlkeNxh1bQyqDvBAuBF

--AvYXKrJYDlkeNxh1bQyqDvBAuBF
X-Riak-Vclock: a85hYGBgzGDKBVIcqW/v8Qdy5rhnMCUy5rEy9Oi+P8WXBQA=
Location: /buckets/cages/keys/1
Content-Type: application/json
Link: </buckets/animals/keys/polly>; riaktag="contains", </buckets/cages>; rel="up"
Etag: 6gyXgkIgzvwBGRRotqHK3b
Last-Modified: Fri, 26 Apr 2013 16:55:40 GMT

{"room":101}
--AvYXKrJYDlkeNxh1bQyqDvBAuBF--

--CveXyss6PAqBxOOWxeWBCf6eXii
Content-Type: multipart/mixed; boundary=SaWqmmho48dzhMDmJy3BVcCWrzu

--SaWqmmho48dzhMDmJy3BVcCWrzu
X-Riak-Vclock: a85hYGBgzGDKBVIcqW/v8Qdy5rhnMCUy5rEyPDvYcIovCwA=
Location: /buckets/animals/keys/polly
Content-Type: application/json; charset=utf-8
Link: </buckets/animals>; rel="up"
Etag: 2BTveSKTYDNOZNCiOxyryw
Last-Modified: Fri, 01 Feb 2013 22:47:02 GMT

{ "nickname" : "Sweet Polly Purebred" , "breed" : "Purebred" }
--SaWqmmho48dzhMDmJy3BVcCWrzu--

--CveXyss6PAqBxOOWxeWBCf6eXii--
";
     Status = RanToCompletion;}      

Friday, April 26, 2013

Exploring Riak with F# and CorrugatedIron

Many thanks to David and OJ for recommending CorrugatedIron .Net Riak client library. The following blog entry is to document my experiments with Riak CRUD operations using CorrugatedIron.

Pinging RIAK

I tested CorrugatedIron with F# script and here is the setup code along with testing ping capability:

// Needed to load the following libraries to get F# script to work
#r @"c:\dev\FsRiak\packages\CorrugatedIron.1.3.0\lib\net40\CorrugatedIron.dll"
#r @"c:\dev\FsRiak\packages\protobuf-net.2.0.0.621\lib\net40\protobuf-net.dll"
#r @"c:\dev\FsRiak\packages\Newtonsoft.Json.4.5.11\lib\net40\Newtonsoft.Json.dll"

open CorrugatedIron
open CorrugatedIron.Models
open Newtonsoft.Json
open System

// Setup connections
let cluster = RiakCluster.FromConfig("riakConfig", @"c:\dev\FsRiak\App.config");
let client = cluster.CreateClient();

// Ping the Riak Cluster
client.Ping()

Here is my App.config file used by CorrugatedIron:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <configSections>
    <section name="riakConfig" type="CorrugatedIron.Config.RiakClusterConfiguration, CorrugatedIron" />
  </configSections>
  <riakConfig nodePollTime="5000" defaultRetryWaitTime="200" defaultRetryCount="3">
    <nodes>
      <node name="dev1"  hostAddress="mydevhost-a" pbcPort="8087" restScheme="http" restPort="8098" poolSize="10" />
      <node name="dev2" hostAddress="mydevhost-b" pbcPort="8087" restScheme="http" restPort="8098" poolSize="10" />
      <node name="dev3" hostAddress="mydevhost-c" pbcPort="8087" restScheme="http" restPort="8098" poolSize="10" />
    </nodes>
  </riakConfig>
</configuration>

Here's the result from running ping:

val it : RiakResult = CorrugatedIron.RiakResult {ErrorMessage = null;
                                                 IsSuccess = true;
                                                 ResultCode = Success;}

Get List of Buckets

The following method call gets you the list of buckets along with metadata for the call status:

client.ListBuckets()

This method returns the following RiakResult object:

val it : RiakResult<seq<string>> =
  CorrugatedIron.RiakResult`1[System.Collections.Generic.IEnumerable`1[System.String]]
    {ErrorMessage = null;
     IsSuccess = true;
     ResultCode = Success;
     Value = seq ["photos"; "favs"; "animals"; "cages"; ...];}

Get Bucket Keys

Getting a list of keys for a bucket is also simple:

client.ListKeys("animals")

For the novice, this library warns you to not to do this in production environments....

*** [CI] -> ListKeys is an expensive operation and should not be used in Production scenarios. ***
val it : RiakResult<seq<string>> =
  CorrugatedIron.RiakResult`1[System.Collections.Generic.IEnumerable`1[System.String]]
    {ErrorMessage = null;
     IsSuccess = true;
     ResultCode = Success;
     Value = seq ["ace"; "polly"];}

Retrieve Content from Riak

Getting a value from Riak is pretty easy with this library:

client.Get("animals","ace")

A dump of the return object shows the actual data plus metadata about the Get operation:

val it : RiakResult<Models.RiakObject> =
  CorrugatedIron.RiakResult`1[CorrugatedIron.Models.RiakObject]
    {ErrorMessage = null;
     IsSuccess = true;
     ResultCode = Success;
     Value = CorrugatedIron.Models.RiakObject;}

A deeper dive into the Value field of RiakResult object gives the following:

val it : Models.RiakObject =
  CorrugatedIron.Models.RiakObject
    {BinIndexes = dict [];
     Bucket = "animals";
     CharSet = null;
     ContentEncoding = null;
     ContentType = "application/json";
     HasChanged = false;
     IntIndexes = dict [];
     Key = "ace";
     LastModified = 1359744019u;
     LastModifiedUsec = 788127u;
     Links = seq [];
     Siblings = seq [];
     UserMetaData = dict [];
     VTag = "7aPFusRQHlQ36ZP6G6GSyE";
     VTags = seq ["7aPFusRQHlQ36ZP6G6GSyE"];
     Value = [|123uy; 32uy; 34uy; 110uy; 105uy; 99uy; 107uy; 110uy; 97uy;
               109uy; 101uy; 34uy; 32uy; 58uy; 32uy; 34uy; 84uy; 104uy; 101uy;
               32uy; 87uy; 111uy; 110uy; 100uy; 101uy; 114uy; 32uy; 68uy;
               111uy; 103uy; 34uy; 32uy; 44uy; 32uy; 34uy; 98uy; 114uy; 101uy;
               101uy; 100uy; 34uy; 32uy; 58uy; 32uy; 34uy; 71uy; 101uy; 114uy;
               109uy; 97uy; 110uy; 32uy; 83uy; 104uy; 101uy; 112uy; 104uy;
               101uy; 114uy; 100uy; 34uy; 32uy; 125uy|];
     VectorClock = [|107uy; 206uy; 97uy; 96uy; 96uy; 96uy; 204uy; 96uy; 202uy;
                     5uy; 82uy; 28uy; 172uy; 90uy; 225uy; 175uy; 2uy; 57uy;
                     59uy; 156uy; 51uy; 152uy; 18uy; 25uy; 243uy; 88uy; 25uy;
                     132uy; 59uy; 26uy; 78uy; 241uy; 101uy; 1uy; 0uy|];}

The data I really wanted is embedded in another Value field where it is represented as an array of bytes, which is not the final format I want. I wanted to get back the JSON representation of the data I put in. In my previous blog, I had rolled my own JSON serializer/deserializer, but now I wanted to leverage the other pieces of library bundle by CorrugateIron, namely Json.NET. To do so,I define the Animal type and use Json.NET serializer/deserializer to convert between the objects and it's corresponding JSON representation

type Animal =
    { nickname: string; breed: string}

// Getting ace from animals bucket 
client.Get("animals","ace").Value.GetObject<Animal>() 

Adding Content to Riak

Adding content to Riak is pretty easy also, after you define a specific type for Json.NET to serialize the fields of that type:

new RiakObject("animals","delta",{nickname="Snoopy"; breed="Beagle"}) 
|> client.Put

If you dump the RiakResult object and the Value field of RiakResult you get the following:


val it : RiakResult =
  CorrugatedIron.RiakResult`1[CorrugatedIron.Models.RiakObject]
    {ErrorMessage = null;
     IsSuccess = true;
     ResultCode = Success;
     Value = CorrugatedIron.Models.RiakObject;}
  
val it : RiakObject =
  CorrugatedIron.Models.RiakObject
    {BinIndexes = dict [];
     Bucket = "animals";
     CharSet = null;
     ContentEncoding = null;
     ContentType = "application/json";
     HasChanged = false;
     IntIndexes = dict [];
     Key = "delta";
     LastModified = 1366930771u;
     LastModifiedUsec = 271921u;
     Links = seq [];
     Siblings = seq [];
     UserMetaData = dict [];
     VTag = "OvZlH7bsYKdO8zL76QdDY";
     VTags = seq ["OvZlH7bsYKdO8zL76QdDY"];
     Value = [|123uy; 34uy; 110uy; 105uy; 99uy; 107uy; 110uy; 97uy; 109uy;
               101uy; 34uy; 58uy; 34uy; 83uy; 110uy; 111uy; 111uy; 112uy;
               121uy; 34uy; 44uy; 34uy; 98uy; 114uy; 101uy; 101uy; 100uy; 34uy;
               58uy; 34uy; 66uy; 101uy; 97uy; 103uy; 108uy; 101uy; 34uy; 125uy|];
     VectorClock = [|107uy; 206uy; 97uy; 96uy; 96uy; 96uy; 204uy; 96uy; 202uy;
                     5uy; 82uy; 28uy; 169uy; 111uy; 239uy; 241uy; 7uy; 114uy;
                     230uy; 184uy; 103uy; 48uy; 37uy; 50uy; 230uy; 177uy; 50uy;
                     4uy; 27uy; 190uy; 59uy; 197uy; 151uy; 5uy; 0uy|];}

To verify interoperability, I would check the newly added data with curl:

$ curl -X GET http://192.168.56.1:8098/riak/animals/delta
$ {"nickname":"Snoopy","breed":"Beagle"}

Delete Riak Contents

Delete content by calling the intuitively named Delete method:

client.Delete("animals","delta")

One thing that I wasn't sure is how CorrugatedIron talks to the Riak clusters. In my simple REST API example, I know exactly which host I'm talking to since I explicitly specified the url. For CorrugatedIron, I configure a pool of connections in the app.config file. I wasn't sure which of the nodes CorrugatedIron was talking to. I fire up Wireshark and notice that I'm connected to the Riak cluster via port 8087, which is through the Protocol Buffers Client (PBC) interface...which explains the need to load the PBC libraries. This Protocol Buffers client is something new to me and the bundled protobuf-net library seemed to be the code developed by Google (Many thanks to OJ for correcting me on this, this is NOT a Google library but a library written by Marc Gravell). This, in turn led me to look at Google Protocol Buffers . This little side jaunt into CorrugatedIron library led to the discovery (for me) of a whole new set of network communication protocols. In any case, after checking Wireshark output, it seems that the requests are spread out to the different nodes and not restricted to a single node. I'm guessing that there are some builtin load balancer code in CorrugatedIron that sends my request to different nodes in the Riak cluster.

For the basic CRUD operations on Riak, CorrugatedIron has made it easier to work with Riak then having to come up with my own helper functions. This has been a good start and I hope work through more of the Riak examples from the book Seven Databases in Seven Weeks with CorrugatedIron in future blog posts.

Tuesday, February 19, 2013

Exploring VMware vSphere PowerCLI with F#

I was recently asked to vet some VMware cluster capacity numbers. It seemed like a task that might be repeated in the future and I really hate to manually transcribe the data and compare them. So that means I need to write a script to automate it. Fortunately, VMware has vSphere PowerCLI for this job. Unfortunately, the documentation for it was rather sparse. I also looked at the book VMware vSphere PowerCLI Reference: Automating vSphere Administration, but really did not want to script in PowerShell simply because I'm not that familiar with PowerShell. However, PowerCLI is accessible from .NET also, so I can write my script in F#.

The following lines of script simply drills down from Datacenter object to the VirtualMachine object. Once you grab all the objects, you can explore the properties of each object.

#r @"C:\pkg\VMware\Infrastructure\vSphere PowerCLI\VMware.Vim.dll"

open System
open VMware.Vim
open System.Collections.Specialized

let serviceUrl = "https://myVSphereHost/sdk"
let userId = "someUserId"
let password="somePassword"

let client = new VimClient()
let service = client.Connect(serviceUrl)

// Must login to do anything - if you are getting null values, it means the session automatically timed out
client.Login(userId,password)

// Let us get all the datacenters
let dataCenters = client.FindEntityViews(typeof<Datacenter>,null,null,null)

// Drill down into the first datacenter
let dc = dataCenters |> Seq.cast<Datacenter> |> Seq.head

// Get a cluster
let cluster = client.GetView(dc.Parent,null) :?> ClusterComputeResource

// Get the first host in the cluster
let host = client.GetView(cluster.Host |> Seq.head,null) :?> HostSystem

// Get the first VM on the physical host
let vm = client.GetView(host.Vm |> Seq.head,null) :?> VirtualMachine

Let's do something interesting with PowerCLI. In the following scripts, I wanted to grab the capacity information at the VMware cluster level and the combine allocation/utilization info of all the virtual machines hosted on the VMware cluster.


// Utility function to help use get vSphere Entities
let getEntityViews viewType searchParams =
    match searchParams with
    | Some(searchParams0) ->
        let filters = new NameValueCollection()
        searchParams0 |> Seq.iter (fun (k,v) -> filters.Add(k,v))
        client.FindEntityViews(viewType,null,filters,null) 

    | None ->
        client.FindEntityViews(viewType,null,null,null) 

// Get vSphere Entities with specific properties (reduces returned data)
// Don't know how to define a function with multiple arities in F# - clumsy workaround
let getEntityViews2 viewType searchParams props =
    match searchParams with
    | Some(searchParams0) ->
        let filters = new NameValueCollection()
        searchParams0 |> Seq.iter (fun (k,v) -> filters.Add(k,v))
        client.FindEntityViews(viewType,null,filters,props) 
    | None ->
        client.FindEntityViews(viewType,null,null,props) 


// Get Cluster usage summary!
let getUsageSummary clusterName =

    let clusterProps = [|"Summary"; "Host"|]
    let hostProps = [|"Vm";"Name";"Hardware";"Runtime"|]
    let vmProps = [|"Name";"Config";"Runtime";"Summary";"ResourceConfig"|]

    let toMB memoryInBytes = memoryInBytes / (1024L*1024L)

    // Get cluster - Expect only one result
    printfn "Getting cluster data..."
    let cluster = 
        let filters = Some([("name",clusterName)])
        getEntityViews2 typeof<ClusterComputeResource> filters clusterProps
            |> Seq.cast<ClusterComputeResource>
            |> Seq.head


    // Get all Hosts for this cluster
    printfn "Getting host list..."
    let hostList =
        cluster.Host
        |> Seq.map(fun moRef -> client.GetView(moRef,hostProps))
        |> Seq.cast<HostSystem>
        |> Seq.cache


    printfn "Getting VM list..."
    let vmList =
        hostList
        |> Seq.map (fun host -> host.Vm)
        |> Seq.concat
        |> Seq.map (fun moRef -> client.GetView(moRef,vmProps))
        |> Seq.cast<VirtualMachine>
        |> Seq.cache

    let clusterCores = cluster.Summary.NumCpuCores
    let clusterCPU = cluster.Summary.TotalCpu
    let clusterMemory = cluster.Summary.TotalMemory


    // Utility function to get summation results from selected fields
    let inline total (extractor:VirtualMachine -> Nullable<'b>) =
        vmList
        |> Seq.map extractor
        |> Seq.map (fun x -> x.GetValueOrDefault())
        |> Seq.sum


    printfn "Getting Cluster Summary Info..."
    let cpuUsed        = total (fun vm -> vm.Summary.Config.NumCpu)
    let memoryUsed     = total (fun vm -> vm.Summary.Config.MemorySizeMB)
    let cpuReserved    = total (fun vm -> vm.Summary.Config.CpuReservation)
    let memoryReserved = total (fun vm -> vm.Summary.Config.MemoryReservation)
    let maxMemoryUsage = total (fun vm -> vm.Runtime.MaxMemoryUsage)
    let maxCpuUsage    = total (fun vm -> vm.Runtime.MaxCpuUsage)

    let cpuReservation    = total (fun vm -> vm.ResourceConfig.CpuAllocation.Reservation) 
    let cpuLimit          = total (fun vm -> vm.ResourceConfig.CpuAllocation.Limit) 
    let memoryReservation = total (fun vm -> vm.ResourceConfig.MemoryAllocation.Reservation)
    let memoryLimit       = total (fun vm -> vm.ResourceConfig.MemoryAllocation.Limit) 

    
    printfn "Cluster Name               : %s" clusterName
    printfn "Number of Hosts in Cluster : %i" (Seq.length hostList)
    printfn "Number of VMs in Cluster   : %i" (Seq.length vmList)
    printfn "Cluster Total Cores        : %i" clusterCores
    printfn "Cluster Total CPU (MHz)    : %i" clusterCPU
    printfn "Cluster Total Memory       : %i" clusterMemory
    printfn "Cluster Total Memory (MB)  : %i" (toMB clusterMemory)
    printfn "CPU Used by VMs            : %i" cpuUsed
    printfn "Memory Used by VMs         : %i" memoryUsed
    printfn "CPU Reserved by VMs        : %i" cpuReserved
    printfn "Memory Reserved by VMs     : %i" memoryReserved
    printfn "Max Memory Usage by VMs    : %i" maxMemoryUsage
    printfn "Max CPU Usage by VMs (MHz) : %i" maxCpuUsage
    printfn "Total Allocated CPU Reservations    : %i" cpuReservation
    printfn "Total Allocated CPU Limits          : %i" cpuLimit
    printfn "Total Allocated Memory Reservations : %i" memoryReservation
    printfn "Total Allocated Memory Limits       : %i" memoryLimit
    printfn "Memory Used / Cluster Total Memory  : %f" (double(memoryUsed) / double(toMB clusterMemory))
    printfn "vCPU Allocated / Cluster Total Cores : %f" (float(cpuUsed) / float(clusterCores))    
    printfn "Done!"

// Invoke getUsageSummary to get the summary info on my cluster
getUsageSummary "MyTestCluster"

Sample results:

Number of Hosts in Cluster : 5
Number of VMs in Cluster   : 10
Cluster Total Cores        : 120
Cluster Total CPU (MHz)    : 276000
Cluster Total Memory       : 1374369136640
Cluster Total Memory (MB)  : 1310700
CPU Used by VMs            : 38
Memory Used by VMs         : 253952
CPU Reserved by VMs        : 0
Memory Reserved by VMs     : 0
Max Memory Usage by VMs    : 253952
Max CPU Usage by VMs (MHz) : 87400
Total Allocated CPU Reservations    : 0
Total Allocated CPU Limits          : -10
Total Allocated Memory Reservations : 0
Total Allocated Memory Limits       : -10
Memory Used / Cluster Total Memory  : 0.193753
vCPU Allocated / Cluster Total Cores : 0.316667

I wish VMware had PowerCLI class documentation similar to those for the .NET library hosted on MSDN. If VMware does have those documentation, I can't seem to find them. Lack of documentation has forced me to interactively explore the PowerCLI with F#. Thankfully, I can do this in F# and shudder at the thought of exploring PowerCLI in C#.

Monday, January 14, 2013

Who's Connecting to My Servers?

I have been reading through the book Clojure Programming. In the course of reading this book, I've looked for all sorts of opportunities to try applying Clojure at work. Most of the time, I've used Clojure to implement convenience scripts to help with my day job. I would typically use Clojure whenever I have to work with Java based platforms and F# for .NET based platforms. Occasionally, I would develop scripts that does not have any dependency and I could choose any language to implement. What would typically happen is that I would choose the programming language that I used last. This strategy, unfortunately, would typically end up biasing me toward one programming language and lately, it has been biasing me toward Clojure. After noticing this trend, I have decided to deliberately and consciously choose to implement in the less frequently used language so I don't become completely rusty in the other programming languages.

Recently, I had to opportunity to write a small script. I was managing an infrastructure upgrade and needed to know the downstream impact. It was an infrastructure component that that a lot of persistent inbound connections, but unfortunately, the inbound connections were neither monitored nor documented. One way to check the connections is ask the the network engineers to setup monitoring on the servers and collect the information on the incoming connections. Our network engineers are generally pretty busy and we hate to add to their existing workloads. However, we can effectively do the same thing by running netstat -an on each of the target servers and taking that output dump and parse that for incoming connections. We would do this over a period of time to try to capture most of the client connections.

The following Clojure script loads all the netstat dump output files and generate a list of all the hosts that are connected to the target servers:

(import '(java.net InetAddress))
(use '[clojure.string :only (join)])
(use '[clojure.java.io :as io])

; Load all the data from all *.data files in c:\work\servers folder
(def data (->> "c:\\work\\servers"
   (io/file)
   (file-seq)
   (map #(.getAbsolutePath %))
   (filter #(re-matches #".*\.data$" %))
   (map #(slurp %))
   (join " ")))

; Find all ip addresses in the netstat dump
; Perform hostname lookup, discard duplicates, sort the hostnames   
(def hosts (->> data
   (re-seq #"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.\d+")
   (map #(second %))
   (set)
   (map #(.getCanonicalHostName (InetAddress/getByName %)))
   (sort)
   (join "\n")))

; Dump output to clients.out file
(spit "c:\\work\\servers\\results.out" hosts)

The above script runs with the assumption that all data fits into memory. However, if that becomes a problem, it is fairly trivial to sequentially read and process netstat dump one file at a time and combine the results to write to the output.

The F# version is similar to Clojure version. Grabbing the files from the folder is easier but the need to explicitly handle exceptions adds back the additional lines of code to be about on par with code verbosity of the Clojure version.

open System.IO
open System.Net
open System.Text.RegularExpressions

// Load all the data from all *.data files in c:\work\servers folder
let data = Directory.GetFiles(@"c:\work\servers","*.data")
           |> Seq.map File.ReadAllText
           |> String.concat " "

// Return hostname if it can be resolved
// otherwise return the ip address
let getHostEntry (ipaddress:string) =
    try
        Dns.GetHostEntry(ipaddress).HostName
    with
      | err -> ipaddress

// Find all ip addresses in the netstat dump
// Perform hostname lookup, discard duplicates, sort the hostnames
let hosts = Regex.Matches(data,@"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.\d+")
            |> Seq.cast<Match>
            |> Seq.map (fun m -> m.Groups.[1].Value)
            |> Set.ofSeq
            |> Seq.map getHostEntry
            |> Seq.sort
            |> String.concat "\n"

File.WriteAllText(@"c:\work\servers\results.out",hosts)