Monday, June 11, 2012

Quick and easy way to monitor for memory leaks in Java application servers

I was part of a postmortem analysis team that performed a root cause analysis on why a particular node in a HA paired system failed and needed to provide recommendations to prevent future occurrences. Initial examination of the log showed that the node that failed had a permgen out of memory problem. Management team also wanted to know if there were memory leaks with the application.

Unfortunately, the Java application server was not configured with GC logging nor configured to perform a heap dump on out of memory error condition. Without the gc log data or the heap dump, it becomes harder to be able to explain why did the application server run out of permgen space at the time it failed. In the interest of collecting data and to prevent future node failures, we recommended that permgen space be increased and add the following startup parameters to the Java application server

    -XX:+PrintGCTimeStamps
    -XX:+PrintGCDetails
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:HeapDumpPath=/myapp/heapdumps
    -Xloggc:/myapp/logs/gc.log

One problem that arose during the writeup of the analysis was how to answer the question whether the application had any memory leak issue. The application team could not reproduce the problem in non-production environment but wanted some assurance that the application did not have a memory leak problem with PermGen space or anywhere else in the heap. I proposed that we track heap usage after every Full GC over a period of time and see the heap usage trends over time. The heap usage trends should provide a good indication if the application has a problematic memory leak issue or not.

In order to do so, I wrote a small script to parse out only the heap and permgen space info after a full GC from the gc log. I wrote this script in Clojure as follows:

; Define the regex pattern to parse each line in gc log
(defn full-gc-pattern []
    (let [timestamp  "([\\d\\.]+): .* " 
          space      "(\\d+)K->(\\d+)K\\((\\d+)K.+ "
          new-gen    space
          old-gen    space
          perm-gen   space
          heap       "(\\d+)K->(\\d+)K\\((\\d+)K\\)\\], "
          exec-stat  "(\\d+\\.\\d+).*" ]
         (re-pattern (str timestamp new-gen old-gen perm-gen heap exec-stat))))                 

; We only want to dump timestamp, total heap and perm gen space
; after each full GC
; Variable definitions:
;     ts - timestamp (in seconds)
;     ys - YoungGen space starting heap size (in KB)
;     ye - YoungGen space ending heap size (in KB)
;     ym - YoungGen space max heap size (in KB)
;     os - OldGen space starting heap size (in KB)
;     oe - OldGen space ending heap size (in KB)
;     om - OldGen space max heap size (in KB)
;     hs - Total heap space starting heap size (in KB)
;     he - Total heap space ending heap size (in KB)
;     hm - Total heap space max heap size (in KB)
;     ps - PermGen space starting heap size (in KB)
;     pe - PermGen space ending heap size (in KB)
;     pm - PermGen space max heap size (in KB)
(defn process-full-gc [entry]
    (let [parsed (first (re-seq (full-gc-pattern) entry))
         [a ts ys ye ym os oe om hs he hm ps pe pm & e] parsed]
        (println (format "%s,%s,%s" ts he pe))))

; Load the gc log data
(def gcdata (line-seq (clojure.java.io/reader (clojure.java.io/file "gc.log"))))
        
; Process each full GC entry after filtering out minor GC entries          
(doseq [line (keep #(re-find #".*Full GC.*" %) gcdata)]
    (process-full-gc line))

Using this script against the gc log file, and taking that output and loading into a spreadsheet, I can plot and generate a trendline to see if memory is growing or not. Here's the heap usage plot over approximately a month.

As seen on the chart, the slope of the trendlines are small and negative, which is a good indicator that we do not have a memory leak problem with this application.

Monday, May 28, 2012

Character Encoding Troubles

In the past, I encountered an issue with a legacy application that migrated from WebLogic Server running on a old Windows server to a newer Tomcat application server running on Red Hat Linux environment. This application has been running fine without issues for weeks until the developer suddenly started to complain that this application is not saving the registered ® and copyright © trademark symbols to the database. The developer also said that when he runs the application from his Windows laptop, he's able to save these two symbols into the database.

Earlier in my career, I was developer for a software company that built multilingual software specializing in Asian languages, I recognize this as a character encoding issue. So I asked the developer to send me the source code related to this issue and here's the relevant part of the code:

    // Sending updates to the database
    update.setValue("data",encodeString(text));
    // Inserting the data to the database
    insert.setValue("data",encodeString(text));

That seemed odd, what does the method encodeString do? Here's the implementation:

public String encodeString(String value) throws java.io.UnsupportedEncodingException
{
  log("encodeString", "Begin");
  if (value == null)
  {
    log("encodeString", "value == null");
    return value;
  }
  
  byte [] btValue = value.getBytes();
  String encodedValue = new String(btValue, _ISO88591);
  
  /*
  Charset utf8charset = Charset.forName("UTF-8");
  Charset iso88591charset = Charset.forName("ISO-8859-1");

  ByteBuffer inputBuffer = ByteBuffer.wrap(btValue);

  // decode UTF-8
  CharBuffer data = utf8charset.decode(inputBuffer);

  // encode ISO-8559-1
  ByteBuffer outputBuffer = iso88591charset.encode(data);
  byte[] outputData = outputBuffer.array();
  byte[] inputData = inputBuffer.array();
 
  log("ISO-8859-1: ", new String(outputData));
  log("UTF-8: ", new String(inputData));
  
  //String encodedValue = new String(btValue, _ISO88591);
  String encodedValue = new String(inputData);
  String encodedValue_ISO88591 = new String(inputData, _ISO88591);
  //encodedValue.getBytes("UTF-8");
  log("Encoded UTF: ", encodedValue);
  log("Encoded ISO88591: ", encodedValue_ISO88591);
  */
  
  return encodedValue;
}

Wow! I can see that the developer is trying to get a handle on this encoding business and hence all the commented out R&D code, but clearly this developer is not familiar with character set encoding issues.

The main problem is with the following line of code:

  byte [] btValue = value.getBytes();

From Java API manual, the getBytes() method encodes the string into a sequence of bytes using the platform's default charset. On the old legacy Windows Server that this application was originally running on, it was probably using Windows code page 1252, which is basically ISO-8859-1 and hence the register and copyright symbols were correctly encoded. However, on the Red Hat Linux operating system, the default encoding was ascii and therefore the register/copyright symbol got converted into question marks.

Java strings are internally unicode (UTF-16) and typically the JDBC drivers will provide the appropriate conversions to and from the database. Therefore, fix to this application is simple, all one have to do is merely change the following two lines of code and get rid of the entire encodeString() method:

    // Changed from : update.setValue("data",encodeString(text));
    update.setValue("data",text);
    // Changed from : insert.setValue("data",encodeString(text));
    insert.setValue("data",text);

Monday, May 14, 2012

Working with Tibco EMS Message Topics using F# and Clojure

In my previous blog post, I showed how to connect to a Tibco EMS Queue with F# and Clojure to represent integration interoperability from .NET and Java platforms. Message queues are a way to implement the Request-Reply pattern, which is one of the many enterprise integration patterns described in the book Enterprise Integration Patterns. Another basic enterprise integration pattern is the Publish-Subscribe pattern, which can be implement via JMS topics. This blog post shows how to connect to JMS topic from .NET and Java using Tibco EMS as the JMS provider.

Here is the F# version:

#r @"C:\tibco\ems\6.3\bin\TIBCO.EMS.dll"

open System
open TIBCO.EMS

let serverUrl = "tcp://localhost:7222"
let producer = "producer"
let consumer = "consumer"
let password = "testpwd"
let topicName = "testTopic"


let subscribeToTopic serverUrl userid password topicName messageProcessor =
    async {
        let connection = (userid,password)
                         |> (new TopicConnectionFactory(serverUrl)).CreateTopicConnection
        let session = connection.CreateTopicSession(false,Session.AUTO_ACKNOWLEDGE)
        let topic = session.CreateTopic(topicName)
        let subscriber =  session.CreateSubscriber(topic)
        connection.Start()
        printf "Subscriber connected!\n"
        while true do
            try
                subscriber.Receive() |> messageProcessor
            with _ ->  ()
        connection.Close()
    }

let publishTopicMessages serverUrl  userid password topicName messages =
    let connection = (userid,password)
                     |> (new TopicConnectionFactory(serverUrl)).CreateTopicConnection
    let session = connection.CreateTopicSession(false,Session.AUTO_ACKNOWLEDGE)
    let topic = session.CreateTopic(topicName)
    let publisher = session.CreatePublisher(topic)
    connection.Start()

    messages
    |> Seq.iter (fun item -> session.CreateTextMessage(Text=item)
                             |> publisher.Send)
                             
    connection.Close()

// Just dump message to console for now
let myMessageProcessor (msg:Message) =
    msg.ToString() |> printf "%s\n"

let consumeMessageAsync = subscribeToTopic "tcp://localhost:7222" "consumer" "testpwd"


let produceMessages topicName messages = publishTopicMessages "tcp://localhost:7222" "producer" "testpwd" topicName messages 


// Asynchronously start the topic subscriber
Async.Start(consumeMessageAsync "testTopic" myMessageProcessor)


// Publish messages to the Tibco EMS topic
[ "Aslund"; "Barrayar"; "Beta Colony"; "Cetaganda"; "Escobar"; "Komarr"; "Marilac"; "Pol"; "Sergyar"; "Vervain"]
|> produceMessages "testTopic"


printf "Done!"

One thing to point out is that Tibco, unfortunately, did not implement IDisposable for it's Connection objects; perhaps in it's bid to stay faithful to the Java API. That design choice seems unfortunate to me in the sense that I no longer can leverage C#'s using keyword or F#'s use keyword to automatically close connection. I suppose it is fairly trivial to subclass the QueueConnection and TopicConnection class and add the IDisposable interface, but I feel that Tibco should have done this and developed the Tibco .NET API using idioms that are .NET specific.

Putting my rants aside, here is the equivalent Clojure code to connect to Tibco Topics:

(import '(java.util Enumeration)
        '(com.tibco.tibjms TibjmsTopicConnectionFactory)
        '(javax.jms Message JMSException  Session
                    Topic TopicConnectionFactory
                    TopicConnection TopicSession
                    TopicSubscriber))
                  
(def serverUrl "tcp://localhost:7222")
(def producer "producer")
(def consumer "consumer")
(def password "testpwd")
(def topicName "testTopic")

;------------------------------------------------------------------------------
; Subscribe to Topic asynchronously
;------------------------------------------------------------------------------
(defn subscribe-topic [server-url user password topic-name process-message]
    (future
        (with-open [connection (-> (TibjmsTopicConnectionFactory. server-url)
                                   (.createTopicConnection user password))]
            (let [session (.createTopicSession connection false Session/AUTO_ACKNOWLEDGE)
                  topic (.createTopic session  topic-name)]
                (with-open [subscriber (.createSubscriber session topic)]
                    (.start connection)
                    (loop []                       
                        (process-message (.receive subscriber))
                        (recur)))))))

;------------------------------------------------------------------------------
; Publishing to a Topic
;------------------------------------------------------------------------------
(defn publish-to-topic [server-url user password topic-name messages]
    (with-open [connection (-> (TibjmsTopicConnectionFactory. server-url)
                               (.createTopicConnection user password))]
        (let [session (.createTopicSession connection false Session/AUTO_ACKNOWLEDGE)
              topic (.createTopic session  topic-name)
              publisher (.createPublisher session topic)]
            (.start connection)
            (doseq [item messages]
                (let [message (.createTextMessage session)]
                    (.setText message item)
                    (.publish publisher message))))))
                    
                      
; Create function aliases with connection information embedded                    
(defn produce-messages [topic-name messages]
    (publish-to-topic "tcp://localhost:7222" "producer" "testpwd" topic-name messages))

(defn consume-messages [topic-name message-processor]
    (subscribe-topic "tcp://localhost:7222" "consumer" "testpwd" topic-name message-processor))

; Just dump messages to console for now
(defn my-message-processor [message]
    (println (.toString message)))
    
; Start subscribing messages asynchronously
(consume-messages "testTopic" my-message-processor)                            
    
; Publish to topic
(def my-messages '("alpha" "beta" "gamma" "delta"
                   "epsilon" "zeta" "eta" "theta"
                   "iota" "kappa" "lambda" "mu" "nu"
                   "xi", "omicron" "pi" "rho"
                   "signma" "tau" "upsilon" "phi",
                   "chi" "psi" "omega"))                    

(produce-messages  "testTopic"  my-messages)    

When I fire up both scripts, the messages published to the topic would be received by both the .NET and Java clients. With these scripts, I can easily swap out the message generators or message processors as needed for any future testing scenarios.

Monday, April 30, 2012

F#, Clojure and Message Queues on Tibco EMS

It looks like I will be getting much more hands on with Tibco EMS. Since the Tibco EMS system in use will have connections from both .NET platforms and Java platforms, I wanted to write some scripts to run some engineering tests on Tibco EMS. I decided to simulate .NET side connections with F# and Java side connections with Clojure. Taking the sample code from Tibco installation, I created the following F# script that sends messages to a queue from the sample C# code:

#r @"C:\tibco\ems\6.3\bin\TIBCO.EMS.dll"

open System
open TIBCO.EMS

let serverUrl = "tcp://localhost:7222"
let producer = "producer"
let consumer = "consumer"
let password = "testpwd"
let queueName = "testQueue"


let getQueueTextMessages serverUrl  userid password queueName messageProcessor =
    async {
        let connection = (userid,password)
                         |> (new QueueConnectionFactory(serverUrl)).CreateQueueConnection
        let session = connection.CreateQueueSession(false,Session.AUTO_ACKNOWLEDGE)
        let queue = session.CreateQueue(queueName)
        let receiver =  session.CreateReceiver(queue)
        connection.Start()
        printf "Queue connection established!"
        while true do
            try
                receiver.Receive() |> messageProcessor
            with _ ->  ()
    }


let sendQueueTextMessages serverUrl  userid password queueName messages =
    let connection = (userid,password)
                     |> (new QueueConnectionFactory(serverUrl)).CreateQueueConnection
    let session = connection.CreateQueueSession(false,Session.AUTO_ACKNOWLEDGE)
    let queue = session.CreateQueue(queueName)
    let sender = session.CreateSender(queue)
    connection.Start()

    messages
    |> Seq.iter (fun item -> session.CreateTextMessage(Text=item)
                             |> sender.Send)
                             
    connection.Close()



// Just dump message to console for now
let myMessageProcessor (msg:Message) =
    msg.ToString() |> printf "%s\n"


let consumeMessageAsync = getQueueTextMessages "tcp://localhost:7222" "consumer" "testpwd"
let produceMessages queueName messages = sendQueueTextMessages "tcp://localhost:7222" "producer" "testpwd" queueName messages 

// Start message consumer asynchronously
Async.Start(consumeMessageAsync "testQueue" myMessageProcessor)


// Send messages to the Tibco EMS   
[ "Aslund"; "Barrayar"; "Beta Colony"; "Cetaganda"; "Escobar"; "Komarr"; "Marilac"; "Pol"; "Sergyar"; "Vervain"]
|> produceMessages "testQueue"

The queue consumer is implemented asynchronously so it won't block executing subsequent statements. To test Tibco JMS from Java, here is the equivalent Clojure code:

(import '(java.util Enumeration)
        '(com.tibco.tibjms TibjmsQueueConnectionFactory)
        '(javax.jms Message JMSException  Session
                    Queue QueueBrowser 
                    QueueConnection QueueReceiver 
                    QueueSession QueueSender))
                  
(def serverUrl "tcp://localhost:7222")
(def producer "producer")
(def consumer "consumer")
(def password "testpwd")
(def queueName "testQueue")

; Consume Queue Text messages asynchronously
(defn get-queue-text-messages [server-url user password queue-name process-message]
    (future
        (with-open [connection (-> (TibjmsQueueConnectionFactory. server-url)
                                   (.createQueueConnection user password))]
            (let [session (.createQueueSession connection false Session/AUTO_ACKNOWLEDGE)
                  queue (.createQueue session  queue-name)]
                (with-open [receiver (.createReceiver session queue)]              
                    (.start connection)
                    (loop []                       
                        (process-message (.receive receiver))
                        (recur)))))))
                   
; Send multiple Text messages
(defn send-queue-text-messages [server-url user password queue-name messages]
    (with-open [connection (-> (TibjmsQueueConnectionFactory. server-url)
                               (.createQueueConnection user password))]
        (let [session (.createQueueSession connection false Session/AUTO_ACKNOWLEDGE)
              queue (.createQueue session  queue-name)
              sender (.createSender session queue)]
            (.start connection)
            (doseq [item messages]
                (let [message (.createTextMessage session)]
                    (.setText message item)
                    (.send sender message))))))


; Create function aliases with connection information embedded                    
(defn consume-messages [queue-name message-processor]
    (get-queue-text-messages  serverUrl producer password queue-name message-processor))

(defn produce-messages [queue-name messages]
    (send-queue-text-messages  serverUrl producer password queue-name messages))

; Just dump messages to console for now
(defn my-message-processor [message]
    (println (.toString message)))

    
; Start consuming messages asynchronously
(consume-messages "testQueue" my-message-processor)                            

; Send messages to queue
(def my-messages '("alpha" "beta" "gamma" "delta"
                   "epsilon" "zeta" "eta" "theta"
                   "iota" "kappa" "lambda" "mu" "nu"
                   "xi", "omicron" "pi" "rho"
                   "signma" "tau" "upsilon" "phi",
                   "chi" "psi" "omega"))                    

(produce-messages  "testQueue"  my-messages)    

With these scripts, I can easily swap in different message generators and message processors as needed for any testing purposes. When I fired up both these scripts up to the part where queue consumers are running in both F# and Clojure version and then send the messages, I could see that Tibco EMS send half the messages to my F# script and the other half to my Clojure script. Since both of these scripts run in REPL environment, I can easily adjust my level of testing as I get results.

Monday, April 16, 2012

F# and SharePoint 2010 Object Hierarchy/Properties

I had the opportunity to take a SharePoint 2010 class recently. In the class, the labs were mostly a cut and paste affair due to time limitations. Those lab exercises only helps me to become familiar with working in Visual Studio and seeing some of the SharePoint API, but does not really help me engage actively in thinking about what I was actually doing. After the class, I decided to write a F# equivalent of the lab exercises to help me get a deeper understanding and to learn. The class did impress on me that the tooling for SharePoint 2010 development is so much more superior in C# such that I've decided to build only the logic code in F# and retain the C# SharePoint project. This blog post describes my effort in getting that first class exercise working with F# code.

I ended up creating an empty SharePoint 2010 C# project that references a F# library project. I added two application pages to the SharePoint 2010 project, the first being FarmHierarchy.aspx. In FarmHierarchy.aspx, I added the following markup between the opening and closing tags of the <asp:Content> element that has an ID of Main:

<h2>My Farm</h2>
<asp:TreeView ID="farmHierarchyViewer" runat="server"
    ShowLines="true" EnableViewState="true"></asp:TreeView>

The second application page I created was PropertyChanger.aspx. I added the following markup between the opening and closing tags of the <asp:Content> elsement that has an ID of Main:

    <h2>Properties:</h2>
    <asp:Label ID="objectName" runat="server" Text=""></asp:Label><br/><br/>
    <asp:Panel ID="webProperties" runat="server" Visible="false" BorderColor="Orange" BorderStyle="Dashed" BorderWidth="1">
        <asp:Label ID="WebLabel" runat="server" Text="Web Title"></asp:Label>
        <br/>
        <asp:TextBox ID="webTitle" runat="server" EnableViewState="true"></asp:TextBox>
        &nbsp;
        <asp:Button ID="webTitleUpdate" runat="server" Text="Update"/>
        &nbsp;
        <asp:Button ID="webCancel" runat="server" Text="Cancel" />
    </asp:Panel>
    <asp:Panel ID="listProperties" runat="server" Visible="false" BorderColor="Orange" BorderStyle="Dashed" BorderWidth="1">
        <asp:Label ID="ListLabel" runat="server" Text="List Properties"></asp:Label>
        <br/>
        <asp:CheckBox ID="listVersioning" runat="server" EnableViewState="true" Text="Enable Versioning" />
        <br/>
        <asp:CheckBox ID="listContentTypes" runat="server" EnableViewState="true" Text="Enable Content Types" />
        &nbsp;
        <asp:Button ID="listPropertiesUpdate" runat="server" Text="Update" />
        &nbsp;
        <asp:Button ID="listCancel" runat="server" Text="Cancel"/>
    </asp:Panel>

The F# code would iterate through the services, Web applications, site collections, and lists in the SharePoint farm, with the details added to nodes in the TreeView control:


module FsLab01

open System
open System.Web.UI.WebControls
open Microsoft.SharePoint
open Microsoft.SharePoint.Administration
open System.Web.UI

// Convenience methods
let nullfunc _ = ()

// Copied from Clojure
let cond clauses =
    let (_,func) = clauses |> Seq.find (fun (pred,func) -> pred)
    func()


let navListUrl url (id:Guid) =
    sprintf "%s/_layouts/lab01/PropertyChanger.aspx?type=list&objectID=%s" url <| id.ToString()


let navWebUrl url (id:Guid) =
    sprintf "%s/_layouts/lab01/PropertyChanger.aspx?type=web&objectID=%s" url <| id.ToString()

// Recursively add SPWeb & SPList objects
let rec addWeb (web:SPWeb) (parentNode:TreeNode) =

    let node = new TreeNode(web.Title,null,null,navWebUrl web.Url web.ID, "_self")
    node |> parentNode.ChildNodes.Add

    [0..(web.Lists.Count-1)]
    |> Seq.map (fun i -> web.Lists.[i])
    |> Seq.iter (fun item -> 
                      new TreeNode(item.Title,null,null,navListUrl web.Url item.ID,"_self")
                      |> node.ChildNodes.Add)
        
    [0..(web.Webs.Count-1)]
    |> Seq.map (fun i -> web.Webs.[i])
    |> Seq.iter (fun item -> try addWeb item node finally item.Dispose())


// Main function to be called by FarmHierarchy.aspx code to build TreeView control
let loadViewer (viewer:TreeView) (farm:SPFarm) =

    let processSite (webappnode:TreeNode) (site:SPSite) =
        site.CatchAccessDeniedException <- false
        try
            let node = new TreeNode(Text=site.Url)
            node |> webappnode.ChildNodes.Add

            addWeb site.RootWeb node

        finally
            site.CatchAccessDeniedException <- false
            

    let processWebApp (svcNode:TreeNode) (webapp:SPWebApplication) = 
        let node = new TreeNode(Text=webapp.DisplayName )
        node |> svcNode.ChildNodes.Add

        cond [(not webapp.IsAdministrationWebApplication,
               fun _ -> webapp.Sites |> Seq.iter (processSite node));
              (true,nullfunc)]
            
                

    let processService (svc:SPService) =
        let label = sprintf "FarmService (Type=%s; Status=%s)" (svc.TypeName) (svc.Status.ToString())
        let node = new TreeNode(Text=label)
        node |> viewer.Nodes.Add
        match svc with
        | :? SPWebService as websvc -> websvc.WebApplications |> Seq.iter (processWebApp node)
        | _ -> ()

    
    viewer.Nodes.Clear()
    farm.Services |> Seq.iter processService
    viewer.ExpandAll()

As I was creating loadViewer method, I was bothered by the if-else-then clauses in the code. I had a previous blog that talked about this issue when it struck me that what I really wanted was something similar to the cond macro in Clojure. Hence the convenience function called cond in the above F# code.

With the F# code written and packaged as a library, I can now use the above F# function in FarmHierarchy.aspx as follows:

        protected void Page_Load(object sender, EventArgs e)
        {
            SPFarm thisFarm = SPFarm.Local;
            FsLab01.loadViewer(farmHierarchyViewer, thisFarm);
        }

The second part of this lab was to create code to manipulate properties of SharePoint SPWeb or SPList objects. The F# code that manipulates the SharePoint object properties and the UI are as follow:



let changeProperty (page:Page) (objectName:Label) (webtitle:TextBox) 
                   (listpanel:Panel) (webpanel:Panel) 
                   (listVersioning:CheckBox) (listContentTypes:CheckBox) 
                   (webUpdateBtn:Button) (listUpdateBtn:Button)
                   (webCancelBtn:Button) (listCancelBtn:Button)   =    


    let homeurl  baseurl = sprintf "%s/_layouts/lab01/FarmHierarchy.aspx" baseurl

    let wrapupdates (web:SPWeb) action =
        web.AllowUnsafeUpdates <- true
        action()
        web.AllowUnsafeUpdates <- false
        

    let hidepanels () =
        listpanel.Visible <- false
        webpanel.Visible <- false

    let cancel (web:SPWeb) =
        page.Response.Redirect(homeurl web.Url)
    
        
    let checkNull item = 
        cond [(item=null,  fun _ -> objectName.Text <- "Malformed URL"
                                    hidepanels()
                                    "");
              (item<>null, fun _ -> item.ToString())]


    try
        let objectType = checkNull page.Request.["type"]
        let objectId = checkNull page.Request.["objectID"]

        match objectType with
        | "web" -> 
            listpanel.Visible <- false
            webpanel.Visible <- true
            use web = SPContext.Current.Site.OpenWeb(new Guid(objectId))

            // Hook up the events
            let myupdates _ =
                web.Title <- webtitle.Text
                web.Update()

            webUpdateBtn.Click.Add(
                fun _ ->
                    try 
                        wrapupdates web myupdates

                        page.Response.Redirect(homeurl web.Url)
                    with ex ->
                        objectName.Text <- ex.Message
                        hidepanels()
                )

            webCancelBtn.Click.Add(fun _ -> cancel web)

            objectName.Text <- sprintf "Web: %s" web.Title

            cond [(not page.IsPostBack,
                   fun () -> webtitle.Text <- web.Title);
                   (true,nullfunc)]

        | "list" -> 
            listpanel.Visible <- true
            webpanel.Visible <- false
            let web = SPContext.Current.Web
            let splist = web.Lists.[new Guid(objectId)]


            let myupdates _ =
                splist.EnableVersioning <- listVersioning.Checked
                splist.ContentTypesEnabled <- listContentTypes.Checked
                splist.Update()

            listUpdateBtn.Click.Add(
                fun _ ->
                    try 
                        wrapupdates web myupdates
                        page.Response.Redirect(homeurl web.Url)
                    with ex ->
                        objectName.Text <- ex.Message
                        hidepanels())

            listCancelBtn.Click.Add(fun _ -> cancel web)

            cond [(not page.IsPostBack,
                   fun _ -> listVersioning.Checked   <- splist.EnableVersioning
                            listContentTypes.Checked <- splist.ContentTypesEnabled)]
        | _ -> ()


    with  ex-> objectName.Text <- ex.Message

I needed to sign my F# code in order it to work in SharePoint. Since the C# SharePoint 2010 already created the key, all I had to do is to add --keyfile:<Location to keyfile>\key.snk to the F# build setting's "Other Flags" properties. In addition, before I can build and deploy, I needed to add the F# library dll to the C# SharePoint project's package. I can do so by clicking on Package in the SharePoint project and then click on the Advanced button.

I can now call this F# function from PropertyChanger.aspx code as follows:

            FsLab01.changeProperty(this.Page, objectName, webTitle,
                                   listProperties, webProperties, 
                                   listVersioning, listContentTypes,
                                   webTitleUpdate, listPropertiesUpdate,
                                   webCancel,listCancel);

Here's how what FarmHierarchy.aspx would look like :

Here's how what PropertyChanger.aspx would look like if I wanted to modify a SPWeb object:

Here's how what PropertyChanger.aspx would look like if I wanted to modify a SPList object:

Monday, March 26, 2012

F# and Windows Server AppFabric Cache

I recently started investigating Windows Server AppFabric, which is different from Windows Azure AppFabric, and trying understand how to use AppFabric, what are some of the operational support implications and functional capabilities and limitations. To help me get going, I have been reading the book Pro Windows Server AppFabric by Stephen Kaufman and going through the Windows Server AppFabric Training Kit. The free downloadable training kit was more valuable in helping me understand Windows Server AppFabric.

As always, the only way to truly learn new technology is to take it out for a spin. Here are some of the powershell scripts that I've used to create the cache and get it going:

# Import the necessary administration modules
Import-Module DistributedCacheAdministration
Import-Module DistributedCacheConfiguration

# List the available powershell commands
Get-Command -module DistributedCacheAdministration   
Get-Command -module DistributedCacheConfiguration


# These were run on the localhost of AppFabric
Use-CacheCluster

# Start the cache cluster
Start-CacheCluster

# Check to make sure it is up and running
Get-CacheHost

# Create a new cache 
New-Cache -CacheName MyTestCache -TimeToLive 60 -Expirable true

# Check cache is created
Get-Cache

# Grant local user access to cache
Grant-CacheAllowedClientAccount MyUserId

# Check security settings
Get-CacheAllowedClientAccounts

# After a few runs, check Cache statistics
Get-CacheStatistics -CacheName MyTestCache 

# Get Cache configuration information
Get-CacheConfig MyTestCache 


# Creating a Cache with HA - all server needs to be on 
# Windows Server Enterprise Edition
# Any host not on Enterprise Edition will not start cache cluster
New-Cache MyNewHACache -Secondaries 1

# Modifying the cache to enable callbacks
Set-CacheConfig -CacheName MyTestCache -NotificationsEnabled true -TimeToLive 180

For the client, I wrote some F# scripts to experiment with Windows Server AppFabric. I ran these on the local server and the loopback adapter enabled:

// I copied the libraries from c:\windows\system32\appfabric
#r @"C:\lib\Microsoft.ApplicationServer.Caching.Client.dll"
#r @"C:\lib\Microsoft.ApplicationServer.Caching.Core.dll"

open System
open System.Collections.Generic
open Microsoft.ApplicationServer.Caching

//  Make sure you grant
// grant-cacheallowedclientaccount my-user-id

// Expensive operation, do this once on startup
let dcf = new DataCacheFactory(
            let servers = new List<DataCacheServerEndpoint>()
            new DataCacheServerEndpoint("localhost",22233) |> servers.Add
            new DataCacheFactoryConfiguration(Servers=servers))
printfn "Data Cache Factory Created!"

// Testing add/get in default region
let cache = dcf.GetCache("MyTestCache")
let retval = cache.Add("mykey","hello app fabric!") 
cache.Get("mykey")

// Create a new region in the cache, this will pin to a single cache node
cache.CreateRegion("stocks")

// Helper function to create DataCacheTags
let createTags tags = Seq.map (fun tag -> new DataCacheTag(tag)) tags

   
type Company =
  { Symbol : string; Name: string; Address : string;  Phone : string; Tags :seq<string> }

// Create a bunch of company info and put it in the "stocks" region
[{Symbol="AAPL"; Name="Appl Inc."; Phone="408-996-1010"; 
  Address="1 Infinite Loop, Cupertino, CA 95014";
  Tags = ["Technology";"Nasdaq";"Personal Computers"]};
 {Symbol="CAT"; Name="Caterpillar, Inc."; Phone="309-675-1000"; 
  Address="100 North East Adams Street, Peoria, IL 61629";
  Tags = ["Dow";"Industrial Goods";"Farm & Construction Machinery"]};
 {Symbol="ACI"; Name="Arch Coal, Inc."; Phone="314-994-2700"; 
  Address="Once City Place Drive, Suite 300, St. Louis, MO 63141";
  Tags = ["Basic Materials";"Industrial Metals & Materials"]};
 {Symbol="HP"; Name="Hewlett-Packard Company"; Phone="650-857-1501"; 
  Address="3000 Hanover Street, Palo Alto, Ca 94304";
  Tags = ["Dow";"Technology"; "Diversified Computer Systems"]};
 {Symbol="JPM"; Name="JP Morgan Chase & Co."; Phone="212-270-6000"; 
  Address="270 Park Avenue, New York, NY 10017";
  Tags = ["Dow";"Financial"; "Money Center Banks"]};
 {Symbol="XOM"; Name="Exxon Mobile Corporation"; Phone="972-444-1000"; 
  Address="5959 Las Colinas Boulevard, Irving, TX 75039-2298";
  Tags = ["Dow";"Basic Materials"; "Major Integrated Oil & Gas"]};]
|> Seq.iter (fun item -> cache.Put(item.Symbol, item, (createTags item.Tags),"stocks") |> ignore)


// Get all stocks in "stocks" region as a HashMap 
let stocks = cache.GetObjectsInRegion("stocks")


// Bulk get, only available with defined regions not default region
let myportfolio = ["AAPL";"CAT";"HP";"JPM"]
let mystocks = cache.BulkGet(myportfolio,"stocks")

// Getting objects from cache by tags (again, only available in defined region)
cache.GetObjectsByTag(new DataCacheTag("Dow Jones"),"stocks")


// Getting stuff by all tags (AND filter)
let candidateTags = ["Technology";  "Personal Computers"] |> createTags
cache.GetObjectsByAllTags(candidateTags,"stocks") 

// Getting stuff by any tags (OR filter)
let interestTags = ["Technology";  "Financial"; "Basic Materials"] |> createTags
cache.GetObjectsByAnyTag(interestTags,"stocks")   

// Concurrency policy is strictly done by the client
// No explicit cache server control on concurrency

// Optimistic Currency example
cache.Remove("mykey")
let version = cache.Add("mykey","mystuff")

// Pretend another thread snuck in and modified this cache item.
cache.Put("mykey","changed value")

// This will check that somebody else modified this and throw an exception
cache.Put("mykey","another value",version)

// Pessimistic locking - locks across all cache nodes
let mutable lockHandle:DataCacheLockHandle = null
let item = cache.GetAndLock("mykey",TimeSpan.FromMinutes(60.0), &lockHandle,true)   


// This will throw error due to the first lockHandle
let mutable lockHandle2:DataCacheLockHandle = null

// This will throw an exception as it was already previously locked
let item2 = cache.GetAndLock("mykey",TimeSpan.FromMinutes(60.0),&lockHandle2, true)   

// This will totally ignore locking policies and overwrite things
// Locking is controlled by diligence on the client side.
cache.Put("mykey","no enforced concurrency on the server")


// Finally, this will update the data and unlock this item in the cache
// Need to redo the GetAndLock for this to work.  Previous statement would wipe out the lock
cache.PutAndUnlock("mykey","new stuff",lockHandle)


// Working with Callbacks
(DataCacheOperations.AddItem,
 fun cacheName regionName key version cacheOperation nd -> 
   printfn "Item added to cache : %s\n" key)
|> cache.AddCacheLevelCallback
|> ignore


// When this is call, it could be seconds or minutes before anything 
// is printed to the console
cache.Add("newkey","new stuff")















Monday, March 12, 2012

F# and SharePoint 2010

I bought a bunch of SharePoint 2010 books and haven't had the time to go through those books. Recently, I finally had some time to take look at those books and try it out in F#. Trying out examples with SharePoint 2007 was fairly painless as I get to do everything in a 32-bit environment. Trying to get Visual Studio 2010 F# to work with SharePoint 2010 was more challenging. While I was able to get the compiled F# code to work, I couldn't overcome the hurdles to get SharePoint 2010 to work with F# Interactive, which is really my prefer way to investigate SharePoint server object models. I can tell from this Stack Overflow entry that I was not alone in having trouble to get F# interactive to work with SharePoint API. The combination of .NET 3.5 framework and 64-bit defeated my effort at getting it to work. I have tried Igor Dvorkin's steps to run F# interactive in 64-bit and then taking the command line arguments for the F# compiler and apply to F# interactive where applicable. So I would start up my F# interactive with the following command line parameters:

"C:\Program Files (x86)\Microsoft F#\v4.0\fsi.exe" ^
  --debug:full ^
  --noframework ^
  --define:DEBUG ^
  --define:TRACE ^
  --optimize- ^
  --tailcalls- ^
  -r:"C:\Program Files (x86)\Reference Assemblies\Microsoft\FSharp\2.0\Runtime\v2.0\FSharp.Core.dll" ^
  -r:"C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\ISAPI\Microsoft.SharePoint.dll" ^
  -r:C:\Windows\Microsoft.NET\Framework\v2.0.50727\mscorlib.dll ^
  -r:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\v3.5\System.Core.dll" ^
  -r:C:\Windows\Microsoft.NET\Framework\v2.0.50727\System.dll ^
  -r:"C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\v3.0\System.ServiceModel.dll" 

When I try to run with the above arguments, I get the following error:

error FS0084: Assembly reference 
'C:\Program Files (x86)\Reference Assemblies\Microsoft\FSharp\2.0\Runtime\v2.0\FSharp.Compiler.Interactive.Settings.dll' 
was not found or is invalid 

So something in FSharp.Compiler.Interactive.Settings.dll is not working with .NET Framework 3.5. I wish there was an option in F# interactive that allows me to target the runtime .NET framework to use (e.g. --target:3.5)

The good news is that the F# compiler works fine as long as you make sure that you are targeting .NET 3.5 runtime and x64 CPU. Here's the F# version of a sample code that I pulled from the book Inside Microsoft SharePoint 2010 by Ted Pattison, Andrew Connell, Scott Hillier and David Mann

open System
open Microsoft.SharePoint

let siteUrl = "http://localhost/"
let sites = new SPSite(siteUrl)

let site = sites.RootWeb
http://www.blogger.com/blogger.g?blogID=18281936#editor/target=post;postID=3313071912798963533
seq {for i in 0..(site.Lists.Count-1) do
       if site.Lists.[i].Hidden <> true then
         yield (site.Lists.[i]) }
|> Seq.iter (fun item -> printf "%s\n" item.Title)

For those who have read my past posts on F# and SharePoint and my past blog post on Revisiting the SharePoint collection adapter for F#, I am sorry to say that I have been led astray by C# example codes and my brain temporary malfunctioned as it produced the SharePoint adapter nonsense in that previous blog post. There was no need to write any SharePoint utility library in C# to get F# to work with SPListCollection as sequences. In the above example, I used sequence expressions to generate SPList sequences from SPListCollection. Another way is to simply create a sequence by use map function as shown in the following example:

// Create a sequence by mapping over all items in the collection
let spcollection = site.Lists
[0..spcollection.Count-1] 
|> Seq.map (fun i -> spcollection.[i])
|> Seq.iter (fun item -> printf "%s\n" item.Title)

There was no need to leave the confines of F# development and it is also type safe compared to my previous clumsy attempt at converting SPListCollection into a sequence. If you really do want to use an adapter, it could also be implemented in F# as follows:

// SPCollectionAdapter written in F#
let fromSPCollection<'a,'b when 'b :> SPBaseCollection> (collection:'b) =
    let enumerator = collection.GetEnumerator()
    let rec enumerate (e:Collections.IEnumerator) acc =
        let flag = e.MoveNext()
        if flag = false then
            acc
        else
            enumerate e (e.Current :?> 'a :: acc)
    enumerate enumerator [] 

// Example usages
fromSPCollection<SPList,SPListCollection> site.Lists
|> Seq.iter (fun item  -> printf "%s\n" item.Title)
    

// webapp is an instance of SPWebApplication object
fromSPCollection<SPContentDatabase,SPContentDatabaseCollection>  webapp.ContentDatabases
|> Seq.iter (fun db -> 
              printf "Content Database : %s\n" db.Name
              printf "Connection String : %s\n" db.DatabaseConnectionString)
    

Tuesday, March 06, 2012

Adventures in troubleshooting out of memory errors with Coherence cluster.

One day, an application team manager called me and said that their application caused an out of memory error condition in their Oracle Coherence cluster. This same code base ran in the old Coherence 3.1 environment for months without running into out of memory conditions and now is failing in the new Coherence 3.6 environment in matter of a few weeks on a regular basis. He said that he had heap dumps and logs and asked whether I could take a look at it and troubleshoot it.

Initially, I was skeptical about being able to help this team manager out. After all, I know almost nothing about their application code and in all practical terms, I had no previous development experience with Coherence with the exception that I read the book Oracle Coherence 3.5 by Aleksandar Seovic in the past. My previously participated in testing Coherence performance on VMware and that really did not require me to delve into the Coherence API at all.

Despite these misgivings, I decided to provide my support and told the application team manager that I'll try my best.

The system with problems was a multi-node Coherence cluster. When I took a look at the logs, all of them had these similar verbose GC output:

[GC [1 CMS-initial-mark: 1966076K(1966080K)] 2083794K(2084096K), 0.1923110 secs] [Times: user=0.18 sys=0.00, real=0.19 secs] 
[Full GC [CMS[CMS-concurrent-mark: 1.624/1.626 secs] [Times: user=3.22 sys=0.00, real=1.62 secs] 
 (concurrent mode failure): 1966079K->1966078K(1966080K), 6.6177340 secs] 2084093K->2084082K(2084096K), [CMS Perm : 13617K->13617K(23612K)], 6.6177900 secs] [Times: user=8.21 sys=0.00, real=6.62 secs] 
[Full GC [CMS: 1966078K->1966078K(1966080K), 4.1110330 secs] 2084093K->2084089K(2084096K), [CMS Perm : 13617K->13615K(23612K)], 4.1111070 secs] [Times: user=4.11 sys=0.00, real=4.11 secs] 
[Full GC [CMS: 1966078K->1966078K(1966080K), 4.2973090 secs] 2084092K->2084087K(2084096K), [CMS Perm : 13615K->13615K(23612K)], 4.2973630 secs] [Times: user=4.28 sys=0.00, real=4.30 secs] 
[Full GC [CMS: 1966078K->1966078K(1966080K), 4.1831450 secs] 2084093K->2084093K(2084096K), [CMS Perm : 13615K->13615K(23612K)], 4.1831970 secs] [Times: user=4.18 sys=0.00, real=4.18 secs] 
[Full GC [CMS: 1966078K->1966078K(1966080K), 4.2524850 secs] 2084093K->2084093K(2084096K), [CMS Perm : 13615K->13615K(23612K)], 4.2525380 secs] [Times: user=4.24 sys=0.00, real=4.25 secs] 
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid23607.hprof ...
Heap dump file created [2274434953 bytes in 28.968 secs]

This garbage collection log output tells me that they are using CMS for the JVM GC. The concurrent mode failure entries certainly grabbed my attention. Normally one would fix concurrent mode failures by tuning the CMS initiation occupancy fraction via -XX:CMSInitiatingOccupancyFraction flag, but in this case, looking that the heap numbers in the lines labeled "Full GC" showed that GC could not clean up any memory at all. So this problem could not be solved by GC tuning. By the way, for a great book on tuning garbage collection, I would recommend Charlie Hunt's book Java Performance.

My next step was to take a look a the heap. The heap was slightly over 2 GB, which was expected since Coherence cluster node was each configured with a 2GB heap. Well, that presented a problem for me because I'm still mainly working on a 32-bit Windows laptop. I needed to find a 64-bit system with preferably 4 GB of ram or more to look at this. Once I was able to get such a machine and fired up Eclipse Memory Analyzer Tool (MAT). Once I looked at the heap, it was pretty obvious what was the biggest memory offender. The biggest memory offender was a top level hashmap chewing up 1.6 GB of memory. Delving further into that hash map structure, it reveals that Coherence caching structure is a hash of hashes. Looking at the hashes, I notice that there were over 2000+ items in the top level hash. That would imply that there were over 2000+ caches in the Coherence cluster. Studying each individual cache, I would notice cache names like

  • alpha-FEB-21
  • alpha-FEB-22
  • alpha-FEB-23
  • alpha-FEB-24
  • beta-FEB-21
  • beta-FEB-22
  • beta-FEB-23
  • beta-FEB-24

and so forth. I ask the application team manager whether he expected to have this many caches in the cluster. The application team manager said no; he expected a much smaller set of caches. The application normally destroy caches older than 2 days. The developers provided me their code related to the creation and destruction of caches and I saw the following lines of code and it seems pretty innocuous:

    public static void destroyCache(String name) {
        Collection listOfCacheNames = getListOfCacheNames(name, false);
        Iterator iterator = listOfCacheNames.iterator();
        while (iterator.hasNext()) {
            String name = (String) iterator.next();
            NamedCache namedCache = CacheFactory.getCache(name);
            namedCache.destroy();
        }
    }

I went back to the memory analyzer tool and performed a GC to root analysis and saw the top level object that's holding onto this heap as:

com.tangosol.coherence.component.net.Cluster$IpMonitor @ 0x77ff4b18 

with the label "Busy Monitor" next to it. This line item seems to suggest that there's a monitor lock on this cache. Looking at the Coherence API documentation, I see the following entry:


destroy

void destroy()
Release and destroy this instance of NamedCache.

Warning: This method is used to completely destroy the specified cache across the cluster. All references in the entire cluster to this cache will be invalidated, the cached data will be cleared, and all internal resources will be released.

Caches should be destroyed by the same mechansim in which they were obtained. For example:

  • new Cache() - cache.destroy()
  • CacheFactory.getCache() - CacheFactory.destroyCache()
  • ConfigurableCacheFactory.ensureCache() - ConfigurableCacheFactory.destroyCache()
Except for the case where the application code expicitly allocated the cache, this method should not be called by application code.

Looking at this documentation, we initially thought that since the cache was obtained via CacheFactory and therefore should be destroyed via CacheFactory ergo CacheFactory had a monitor lock on the underlying collections. The code provided by the developers used one mechanism to create the cache and another mechanism to destroy the cache so we presume that was the problem. So I implemented a test script to test out that theory and surprisingly, even destroying via CacheFactory, I still encounter out of memory issues. Only by clearing the cache before destroying the cache was I able to avoid out of memory errors. Here's the script that I developed in Clojure to test my theories:

(import '(org.apache.commons.lang3 RandomStringUtils) 
        '(java.math BigInteger)
        '(java.util Random Date HashMap)
        '(com.tangosol.net NamedCache CacheFactory CacheService Cluster))

(defn random-text [] (RandomStringUtils/randomAlphanumeric 1048576))
(defn random-key [] (RandomStringUtils/randomAlphanumeric 12))
        
(CacheFactory/ensureCluster)
(def buffer (new HashMap))

(defn print-horizontal-line [c] (println (apply str (repeat 80 c))))

(def caches '("alpha" "beta" "gamma" "delta" 
              "epsilon" "zeta" "eta" "theta" 
              "iota" "kappa" "lambda" "mu" "nu"
              "xi", "omicron" "pi" "rho"
              "signma" "tau" "upsilon" "phi", 
              "chi" "psi" "omega"))

(defn load-cache [cache-name n]
    (let [cache (CacheFactory/getCache cache-name)]
         (print-horizontal-line  "=")
         (println "Creating cache : " cache-name)
         (print-horizontal-line  "=")         
         (.clear buffer)
         (dotimes [_ n] (.put buffer (random-key) (random-text)))
         (.putAll cache buffer)))


(defn recreate-oom-problem [cache-name]
    (let [cache (CacheFactory/getCache cache-name)]
         (load-cache cache-name 200)
         (.destroy cache)))
         
(defn try-fix-oom-1 [cache-name]
    (let [cache (CacheFactory/getCache cache-name)]
         (load-cache cache-name 200)
         (CacheFactory/destroyCache cache)))

(defn try-fix-oom-2 [cache-name]
    (let [cache (CacheFactory/getCache cache-name)]
         (load-cache cache-name 200)
         (.clear cache)
         (CacheFactory/destroyCache cache)))
         
; Test run recreation of original problem.  Was able to reproduce OOM issues         ; 
(doseq [cache caches] (recreate-oom-problem cache))

; Surprise! Still have OOM issues
(doseq [cache caches] (try-fix-oom-1 cache))

; No longer have OOM issues, but memory is still leaking (slowly)
(doseq [cache caches] (try-fix-oom-2 cache))
         

However, I still suspect memory leaks, it's just that my memory leak is a lot smaller now. To verify that I had a memory leak, I would run my Clojure test script and the deliberately create and fill a cache without clearing it. I then forced a full garbage collection followed by a heap dump. In memory analyzer tool, I would look up the cache that I did not clear, and list all the incoming references. Then I would look for a HashMap in the incoming references and select one of those and check for outgoing references. And in that outgoing references, I could see that the key contains the name of a cache that I had called CacheFactory.destroyCache() on and the retained heap sizes range anywhere from 24 to 160 with the sizes that seems proportional to the size of the cache name.

In conclusion, it would seem Oracle Coherence does have a memory leak issues with the cache creation and destruction process. If we clear the cache before destroying the cache, I suspect it would be a long time before the memory leak is even noticeable by this particular application.

To verify that this leak did not exist in the older 3.1 version, we ran this test code on and and was unable to reproduce the out of memory errors. We also tested this against Oracle Coherence 3.7.1 and was unable to reproduce the out of memory error. So, it looks like that this memory error is specific to Oracle Coherence 3.6 only.

Throughout this entire process, I thought that the secret sauce that enabled me to quickly learn Coherence, reproduce and troubleshoot the Coherence out of memory problem was Clojure. Clojure allowed me to interactively manipulate Coherence clusters and explore the API, which would have been a lot slower if I had to go through the normal edit-compile-run cycle with plain old Java.

Tuesday, February 21, 2012

Coding for reuse versus coding for specific use

One day, a C# developer came to me asked me to troubleshoot their application, which involves some data transformation code. In the course of troubleshooting this particular application's runtime issue, I saw this particular code implementation:

private string[] SplitTextwithSpecialCharacter(string lineFromGUI, int splitLength) 
{

    List<string> lines = new List<string>();
    int start = 0;


    while (start < lineFromGUI.Length)
    {
        string temp = string.Empty;
      
        //if length is longer it will throw exception so size the last segment appropriately
        string line = lineFromGUI.Substring(start, Math.Min(splitLength, lineFromGUI.Length - start));
        
        //temp = line.Replace(SpecialCharacter, string.Empty);

        temp = line.Replace("^", ""); //New 6/30
        if (temp.Length > 0) 
        {
            //lines.Add(line.Replace(SpecialCharacter, string.Empty));
            lines.Add(temp);
        }
        //lines.Add(line);
        start += splitLength;
    }
    return lines.ToArray();
}

When I saw this code, I inwardly cringed due to the very narrow scope in the code implementation to solve a seemingly generic problem. This code screams for refactoring to me. From the commented out section of the code, I could see that one of the developers tried to implement some reuse before reverting back to the specific use.

I see a couple of issues with this particular implementation. The first issue being that this method is trying to do two very different things; one is trying to split the text into lines and the second is removing all the caret characters from the text. In my opinion, these 2 different operations should be separated out into 2 different methods. The second issue is that the caret character removal function should be generalized such that you should be able to remove any character or perhaps even a set of characters beyond just the caret symbol. In the specific implementation, what would happen if they needed code to remove an asterisk instead of caret? What would happen if they need to remove more than one special character? Do they add additional methods? Do they modify the existing method? If they modify the existing method to take parameters, then they would have to change all the existing client code that calls this method. So it would seem to be easier for future maintenance to go ahead and make these 2 functions reusable.

Here's how I would refactor this particular method into some utilities namespace and use extension methods to add capability to the string object. Here's how that code would look like:

public static class Extensions
{
    public static string FilterOut(this string line, ISet<char> filter)
    {

        if (line == null ||  filter == null || filter.Count == 0) return line;


        return new String(line.Where(c => !filter.Contains(c)).ToArray());
    }


    public static IList<string> SplitIntoLines(this string data, int splitLength) 
    {

        // Check preconditions
        if (data == null) throw new NullReferenceException();

        if (splitLength <= 0) throw new IndexOutOfRangeException("Must be greater than 0!");

        IList<string> lines = new List<string>();

        while (data.Length > splitLength)
        {
            string line = data.Substring(0,splitLength);
            data = data.Substring(splitLength, data.Length-splitLength);
            lines.Add(line);
        }
        lines.Add(data);
        return lines;
    }
}

With this extension method, you can now use this utility method as follows:

int linesize = 30;
ISet<char> filter = new HashSet<char> { '^', '*', '&', '$' };
string[] results = data.SplitIntoLines(linesize)
                       .Select(line => line.FilterOut(filter))
                       .ToArray();

Saturday, February 04, 2012

Detect multiple occurrences of Java classes in jar files

A developer came to me for help and said that a recent upgrade to a newer Java library broke the code. The developer was getting was a class not found exception. The developer said that the old jar file was replaced with a newer jar file. I know this same library upgrade in other applications did not have any problems, so this sounded like an issue with duplicate occurrence of the same class in the classpath. Since my Ruby Programming Language book was in handy access when this developer came to me, I ended up writing a small ruby script that went through all the jar files packed in the web archive and dumped an output of the fully qualified class along with the jar file that it can be found in and a count of the number of occurrences. Here's the ruby script running on a Windows platform:

java_home="C:\\sdk\\jdk1.6.0_24\\bin"

classmaps = Hash.new(0)

Dir["*.jar"].each do |file|
  cmd = sprintf("%s\\jar  -tf %s",java_home,file)
  lines = `#{cmd}`
  lines.each do |line|
    if line =~ /.class/
      key = line.chomp
      if classmaps.key?(key) then
        old = classmaps[key]
        data = sprintf("%s,%s",old,file)
        classmaps[key] = data 
      else
        classmaps[key] = file
      end
    end
  end
end


classmaps.each do |k,v|
    tokens = v.split(",")
    printf("%s,%i,%s\n",k,tokens.size,v)
end

I then loaded the output file in csv format into Excel and sorted the occurrences in descending order and was flabbergasted to find numerous entries that look something like the following entry:

Class NameOccurrenceFound in Jar files:
kodo/jdo/KodoExtent.class6mylib.jarmylib.jarmylib.jarkodo-api.jarkodo-runtime.jarkodo.jar

Somehow, for this application team, they were able to add the same class to the same jar file multiple times. I never thought that it was possible to add the same class to the same jar file multiple times nor would I ever want to do that. When I finally confronted the application team about this, they recognize that their build process is broken and need to fix their build process. But as a quick test, the developer removed some of the duplicate classes related to the library upgrade and the problems went away. Until Java 8 is introduced with Modularity capabilities, this tool has will be a handy way for me to check duplicate classes given a list of jar files.

Friday, January 20, 2012

Translating If-Then-Else Control Flow Idiom to F#

I was reading through Juval Löwy's Programming WCF Services book and wondering if I should do a series of WCF blog posts in F# based on Löwy's book when I ran into a common construct found in C# programs. That construct looks something like the following C# code:

    public static void MyMethod(String oldstuff, String newstuff, bool flag)
    {
        if (oldstuff == null)
         throw new Exception("oldstuff is null!");

        if (newstuff == null)  {
            DoSomething("Default");
            return;
        }
        if (flag == false)  {
            DoSomething(oldstuff);
            return;
        }
        DoSomething(newstuff);
    }

This is a construct that I oftened have used in the past and have never thought about it much. But when you translate the above code directly into F#, it becomes a lot more verbose because F# requires you to implement the then clause. A direct translation to F# as follows:

let mymethod oldstuff newstuff flag =
    if oldstuff = null then 
        raise (new Exception("oldstuff is null!"))
    else
        if newstuff = null then
            DoSomething("Default")
        else
            if flag = false then
                DoSomething(oldstuff)
            else
                DoSomething(newstuff)

If I had a lot of these if-then-else statements in my C# method, then my F# version would disappear off the screen to the right if I tried to implement it by direct translation. I thought about how I could implement this in F# and came up with this following possibility:

let revised_mymethod oldstuff newstuff flag =
    let (_,action) =
        [(oldstuff=null,  lazy (raise (new Exception("oldstuff is null!"))));
         (newstuff=null,  lazy (DoSomething("Default")));
         (flag=false,     lazy (DoSomething(oldstuff)));
         (flag=true,      lazy (DoSomething(newstuff)))]
        |> List.filter fst
        |> List.head
    action.Force()

Rewriting the C# code in this fashion makes me think of rules engines and after refactoring out some common code, I could rewrite the above F# code as follows:

let followrules (xs:(bool*Lazy<unit>) list) =
    (xs |> List.filter fst |> List.head |> snd).Force()

let revised_mymethod2 oldstuff newstuff flag =
    [(oldstuff=null , lazy (raise (new Exception("oldstuff is null!"))));
     (newstuff=null,  lazy (DoSomething("Default")));
     (flag=false,     lazy (DoSomething(oldstuff)));
     (flag=true,      lazy (DoSomething(newstuff)))]
    |> followrules

With this new construct, I can easily re-arrange the order of evaluation, add or remove new conditions. This new construct just seems to have more advantages than the old if-then-else construct in F#.

Wednesday, January 11, 2012

Testing Coherence with Clojure

A developer came to me the other day asking for help in diagnosing some issues with their application and the interaction with Oracle's Coherence product. I wanted to write some testing harness to quickly test some Coherence configuration and gave some thought about how I would go and try to replicate the issues that the application had. I wanted a REPL environment so that I can interactive manipulate the Coherence API and dump outputs on demand. I decided to use Clojure to experiment with Coherence, although I could have used JRuby, Jython, Groovy or Scala. From purely a familiarity perspective, I would rank my usage of these listed languages in the order of Ruby first, Python second, Groovy third, Clojure fourth and Scala last. But for some unknown, deep-seated and probably emotional reasons, I like Clojure more and relish the opportunity to use it. One of the first thing I tried with Clojure and Coherence is to perform a timing test on adding data to a 2 node distributed cache in serial vs concurrent mode. Here's the example Clojure script:

(import '(org.apache.commons.lang3 RandomStringUtils) 
        '(java.math BigInteger)
        '(java.util Random Date HashMap)
        '(com.tangosol.net NamedCache CacheFactory CacheService Cluster))

(CacheFactory/ensureCluster)
(def cache (CacheFactory/getCache "sandbox"))

(defn random-text [] (RandomStringUtils/randomAlphanumeric 1048576))
(defn random-key [] (RandomStringUtils/randomAlphanumeric 12))


; Testing serial puts
(new Date)
(dotimes [_ 200] (.put cache (random-key) (random-text)))
(new Date)

; Testing concurrent puts
(def buffer (new HashMap))
(new Date)
(dotimes [_ 200] (.put buffer (random-key) (random-text)))
(.putAll cache buffer)
(new Date)

Running this code showed a 2x gain in speed of data load. On one of the Coherence nodes, I had JVisualVM connected to it and watched the realtime GC behaviors with VisualGC. It has been fascinating to watch the behaviorial differences between serial vs parallel data load and the memory activities of the Coherence node when my Clojure script was idle. I hope to conduct more tests in the future looking at GC behaviors and leverage my Clojure script as load driver in my testing efforts and assist me in GC tuning of Coherence instances.