Monday, May 09, 2011

Porting the Log Analysis Code to Haskell

My coworker approached me the other day and ask what open source log analysis tools would I recommend. I personally do not have much experience with a general purpose open source log analysis tools so I would have probably recommended him to take a look at Splunk. Since I've recently written a customized log analysis software, I became curious and asked him what he intend to do with the log analysis software.

My coworker said that he needed to analyze Tibco EMS logs. Tibco EMS logs incoming messages in the order it receives. My coworker is interested in a set of related messages that is identified by the message ID tag. His particular issue is that the logs entries that he's interested in are interspersed with other log entries that he's not interested. He wanted a log file where the log entries are grouped by message id in historical order.

Once I understood his needs, I realize that he did not need the Splunk and that I could quickly adapt my F# log analysis software written in the previous blog post for his need. When I gave him the modified F# code, he asked me if I could port it to Linux. That threw me for a loop. I briefly entertained the idea of building a Mono system and compile F# on Mono but decided against it for now. I thought it would be easier to just port it to Haskell, which I already have on Linux.

Here's the ported Haskell log analysis software with modifications to work with Tibco log entries.


import Data.Time.Calendar
import Data.Time.LocalTime
import Data.Time.Parse
import List
import System.Environment

type Category = String
type Entry = [String]
type TimeStamp = (LocalTime,String)
type LogHeader = (TimeStamp, Category)

alphaTime = LocalTime (fromGregorian 2000 1 1) midnight 

data LogEntry = LogEntry (TimeStamp, String) [String]
                deriving (Show)

{- Grab label -}
categorize (_ : _ : label : _) = label
categorize words = ""

{- Grab timestamp -}
timestamp (date : time : _ )  = strptime "%Y-%m-%d %H:%M:%S" (date ++ " " ++ time)
timestamp words  = Nothing
           
{- header :: String -> (String, Maybe (LocalTime, String)) -}           
header line = (timestamp tokens, categorize tokens)    
    where tokens = words line

{- Concrete implementation of Tibco log parser -}    
logparser :: [String] -> LogHeader -> [String] -> [LogEntry] -> [LogEntry]
logparser (line : rest) xheader entry entries  = process (header line)    
    where process (Just (ts),label) = 
              logparser rest h [line] ((LogEntry xheader (reverse entry)):entries ) where h = (ts,label)
          process (Nothing,_) = logparser rest xheader (line : entry) entries

logparser [] xheader entry entries = reverse ((LogEntry xheader entry) : entries) 

{- Utility method  to pull items out of LogEntry -}
entry (LogEntry _ entries) = entries    
category (LogEntry (_,label) _) = label

{- comparator based on category -}
categorysort (LogEntry (_,a) _) (LogEntry (_,b) _) 
    | a > b = GT
    | a < b = LT
    | otherwise = EQ

parselog parser lines = parser lines ((alphaTime,".000"),"STARTFLAG") [] []
    
processlog = unlines
             . map (unlines . entry)
             . sortBy categorysort
             . (parselog logparser)
             . lines 

main = do (filename:_) <- getArgs
          contents <- readFile filename
          putStr (processlog contents)