Monday, February 04, 2013

Java Garbage Collection Log Analysis with R

In the past, I generally perform Java GC log analysis using a combination of scripts such as the one I have written in my previous blog post for parsing the gc log and transforming it into csv format, GCHisto tool as mentioned in the book Java Performance, and Microsoft Excel for mostly plotting purposes. However, as I find myself doing more and more GC analysis for troubleshooting performance issues, I'm finding that these tools are not flexible enough for my purposes.

I heard about The R Project from one my coworker. In addition, I see R gaining traction in the industry; I decided to try using R for GC analysis. This blog post is to document some of my experiences with using R for GC analysis.

There are some posts in the blogosphere that seem to suggest that it is difficult to learn R. I did not find that to be the case. Whether that's due to knowing functional programming or past experience with Matlab, I don't know. I did purchase a bunch of books on R to learn it including R in a Nutshell, R by Example, R in Action, Introductory Statistics with R. I got the most mileage with the last book.

In working with R, I still would use a script to parse out raw GC log and converted into csv file. So far, most of my GC analysis are on non-CMS GC logs so this blog post is restricted to discussing analysis of mostly parallel GC logs. In order to be able to better read my R scripts, I have changed my script to output the csv file with the following headers:

timestamp,gc.type,young.start,young.end,young.max,heap.start,heap.end,heap.max,pause.time,old.start,old.end,old.max,perm.start,perm.end,perm.max,time.user,time.sys,time.real

I can load the csv file in R as follows:

data.file <- "gc.log.csv"
data.gc <- read.csv(data.file,sep=',')

I created the following utility functions because I find that I often want to work in MB or GB for heap space rather than in KB. Also, I want to work with timestamp in hours rather than seconds.

sec.to.hours  <- function(x) x/(60.0*60.0)
kb.to.mb <- function(x) x/1024.0
kb.to.gb <- function(x) x/1024.0^2

Printing GC Pause Time Statistics

With this, I can dump the GC pause time statistical information as follows:

# Dump mean, standard deviation, GC counts for GC pause time
attach(data.gc)
 by(pause.time,gc.type,mean)
 by(pause.time,gc.type,sd)
 by(pause.time,gc.type,length)

 # GC Frequency statistics
 summary(diff(timestamp))

 # Time spent in GC
 sum(pause.time)

 # Percentage of time spent in GC
 sum(pause.time)*100.0/max(timestamp)
 
detach(data.gc)

For the above code, a sample output would be:

# by(pause.time,gc.type,mean) output 
# GC Average Pause Time
gc.type: full
[1] 2.125116
-------------------------------------------------------------- 
gc.type: minor
[1] 0.06472383

# by(pause.time,gc.type,sd) output 
# GC pause time distribution
gc.type: full
[1] 1.01972
-------------------------------------------------------------- 
gc.type: minor
[1] 0.02684703

# by(pause.time,gc.type,length) output 
# GC count by type
gc.type: full
[1] 7
-------------------------------------------------------------- 
gc.type: minor
[1] 2172

# summary(diff(timestamp)) output 
# GC Frequency statistics - (e.g. GC occurs on average every 84 seconds)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.08   22.44   71.34   84.83  143.10  252.90 
   
# sum(pause.time) output
# Total time spent in GC   
[1] 155.456

# sum(pause.time)*100.0/max(timestamp) output 
# Percentage of time spent in GC
[1] 0.08412511

Plotting GC Timeline

Here's how to plot GC Timeline:

with(data.gc, {
 plot.title <-"GC Timeline"
 gc.minor <- pause.time[gc.type == "minor"]
 gc.full <- pause.time[gc.type == "full"]

 # Plot bars for full GC pause time 
 plot(sec.to.hours(timestamp[gc.type == "full"]),
   gc.full*1000.0,
   xlab="Elapsed Time (hrs)",
   ylab="GC Pause Time Time(ms)",type='h',
   main=plot.title,col="red",ylim=c(0,1000),xlim=c(0,18))
   
 # Plot bars for minor GC pause time
 lines(sec.to.hours(timestamp[gc.type == "minor"]),
       gc.minor*1000.0,col="dark green",type="h")

 # Set tick marks for every hour
 axis(1,seq(0,18,1))
 grid()
})

A sample output of the above code would be:

Plotting GC Pause Time Distribution

To plot GC pause time distribution:

# Plot GC Pause Time Distribution
plot.title <- "GC Pause Time Distribution"
with(data.gc, {
 hist(pause.time,breaks=100, 
   xlab="GC Pause Time(sec)",
   xlim=c(0,1.0),
   main=plot.title)
})

Plotting GC Pause Time Distribution as a Boxplot

Alternatively, you can view this as a boxplot:

plot.title <- "GC Pause Time Distribution as Boxplot"
with(data.gc, {
 boxplot(pause.time ~ gc.type,main=plot.title)
}) 

Plotting Promoted Heap Memory

One particular case, I was analyzing a situation where minor GC pause time was unusually large. My original hypothesis is that a lot of memory was being promoted into old generation space. Therefore, I wanted a plot of heap memory promoted into old gen space:

plot.title <- "Heap Promoted into Old Gen Space"
with(data.gc, {
 old.end <- kb.to.mb(heap.end - young.end)
 x <- sec.to.hours(timestamp)
 # If there is a full GC, promoted heap size would go negative
 y <- diff(old.end)
 plot(x[-1],y,
   xlab="Elapsed Time (hrs)",
   ylab="Promoted Heap (MB)",type='h',
   main=plot.title)
 grid()
})

In this particular case, the promoted heap was not all that large as shown in the diagram below:

Plotting Young Gen Survived Heap

I then investigated survivor heap spaces. Here's the script that plots the survivor heap space along with GC pause time on the same chart:

# Plot survived Heap
xtitle <- 'Elapsed Time (hrs)'
ytitle <-'Survivor Heap (MB)'
plot.title <- "Survived Heap after Minor GC"

with(data.gc, {
   
 x <- sec.to.hours(timestamp)
 y <- kb.to.mb(young.end)
 plot(x,y,
  xlab=xtitle,ylab=ytitle,
  main=plot.title,type='o',col='blue')
  
 # Plot on secondary axis
 par(new=TRUE)
 plot(x,pause.time,
   axes=FALSE,type='o',
   col="green",pch=23,xlab="",ylab="")
 axis(4)
 mtext("GC Pause Time (sec)",4)
})

In addition, I can also check the correlation coefficient via :

> with(data.gc,cor(young.end[gc.type == 'minor'],pause.time[gc.type == 'minor']))
[1] 0.9659936

As we can see from the result, the correlation was pretty high that size of the survivor heap and its effect on GC pause time.

Checking for Memory Leaks

Another plot that I often have to generate is used as a quick check for memory leaks. Here's the R script that plots old gen heap space usage after every full GC:

# Check for memory leak
xtitle <- 'Elapsed Time (hrs)'
ytitle <-'Old Gen Heap (MB)'


with(data.gc, {
 plot.title <- "Old Gen Space after Full GC"
 x <- sec.to.hours(timestamp[gc.type=="full"])
 y <- kb.to.mb(old.end[gc.type=="full"])
 plot(x,y,
   xlab=xtitle,ylab=ytitle,main=plot.title,type='o',col='blue')

 # Add fitted line
 abline(lm(y ~ x),col="red")
 retval <<- lm(y ~ x)
})
retval

With the trend analysis line, we can potentially project when the system will run out of memory. For the following plot, checking the linear model fit along with eyeballing the chart suggests that we don't really have a memory leak issue:

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
   1285.978        0.298  

I am delighted by the power afforded by R to perform GC analysis. It has now become my preferred tool to perform GC analysis. In addition, because I learned how to use R, I'm beginning to use R in other capacity, including inventory , trending, and capacity planning analysis.

6 comments:

Unknown said...

Hi John,

This looks like a flexible and powerful approach for GC log analysis. Thanks for sharing it. Could you please also share the clojure script used for generating the csv file?

TIA

Cheers,
-Ajit

Ryan said...

The chewiebug GC Viewer has a very good GC log parser in it, and you can export whatever it parses (including log files for every current Oracle JVM collector - e.g. CMS, G1) and have it spit out CSV for you

Justin Kamerman said...

Great article. Nice to see data tools being applied to tech problems in a practical manner. I'll definitely make use of some of these snippets.

Melbourne Mobile Developer said...

I like your blog a lot. Its informative and full of information. Thank you for sharing.

surya said...

Usefull information, Interesting. GCeasy is a tool that analyzes logs related to garbage collection. It is an easy way to detect memory leak problems as it goes through and analyze garbage collection logs. Another reason to use GCeasy is that it is available online; there is no need for you to install it on your machine to use it.

Briana said...

Great readding your post