Tuesday, February 21, 2012

Coding for reuse versus coding for specific use

One day, a C# developer came to me asked me to troubleshoot their application, which involves some data transformation code. In the course of troubleshooting this particular application's runtime issue, I saw this particular code implementation:

private string[] SplitTextwithSpecialCharacter(string lineFromGUI, int splitLength) 

    List<string> lines = new List<string>();
    int start = 0;

    while (start < lineFromGUI.Length)
        string temp = string.Empty;
        //if length is longer it will throw exception so size the last segment appropriately
        string line = lineFromGUI.Substring(start, Math.Min(splitLength, lineFromGUI.Length - start));
        //temp = line.Replace(SpecialCharacter, string.Empty);

        temp = line.Replace("^", ""); //New 6/30
        if (temp.Length > 0) 
            //lines.Add(line.Replace(SpecialCharacter, string.Empty));
        start += splitLength;
    return lines.ToArray();

When I saw this code, I inwardly cringed due to the very narrow scope in the code implementation to solve a seemingly generic problem. This code screams for refactoring to me. From the commented out section of the code, I could see that one of the developers tried to implement some reuse before reverting back to the specific use.

I see a couple of issues with this particular implementation. The first issue being that this method is trying to do two very different things; one is trying to split the text into lines and the second is removing all the caret characters from the text. In my opinion, these 2 different operations should be separated out into 2 different methods. The second issue is that the caret character removal function should be generalized such that you should be able to remove any character or perhaps even a set of characters beyond just the caret symbol. In the specific implementation, what would happen if they needed code to remove an asterisk instead of caret? What would happen if they need to remove more than one special character? Do they add additional methods? Do they modify the existing method? If they modify the existing method to take parameters, then they would have to change all the existing client code that calls this method. So it would seem to be easier for future maintenance to go ahead and make these 2 functions reusable.

Here's how I would refactor this particular method into some utilities namespace and use extension methods to add capability to the string object. Here's how that code would look like:

public static class Extensions
    public static string FilterOut(this string line, ISet<char> filter)

        if (line == null ||  filter == null || filter.Count == 0) return line;

        return new String(line.Where(c => !filter.Contains(c)).ToArray());

    public static IList<string> SplitIntoLines(this string data, int splitLength) 

        // Check preconditions
        if (data == null) throw new NullReferenceException();

        if (splitLength <= 0) throw new IndexOutOfRangeException("Must be greater than 0!");

        IList<string> lines = new List<string>();

        while (data.Length > splitLength)
            string line = data.Substring(0,splitLength);
            data = data.Substring(splitLength, data.Length-splitLength);
        return lines;

With this extension method, you can now use this utility method as follows:

int linesize = 30;
ISet<char> filter = new HashSet<char> { '^', '*', '&', '$' };
string[] results = data.SplitIntoLines(linesize)
                       .Select(line => line.FilterOut(filter))

Saturday, February 04, 2012

Detect multiple occurrences of Java classes in jar files

A developer came to me for help and said that a recent upgrade to a newer Java library broke the code. The developer was getting was a class not found exception. The developer said that the old jar file was replaced with a newer jar file. I know this same library upgrade in other applications did not have any problems, so this sounded like an issue with duplicate occurrence of the same class in the classpath. Since my Ruby Programming Language book was in handy access when this developer came to me, I ended up writing a small ruby script that went through all the jar files packed in the web archive and dumped an output of the fully qualified class along with the jar file that it can be found in and a count of the number of occurrences. Here's the ruby script running on a Windows platform:


classmaps = Hash.new(0)

Dir["*.jar"].each do |file|
  cmd = sprintf("%s\\jar  -tf %s",java_home,file)
  lines = `#{cmd}`
  lines.each do |line|
    if line =~ /.class/
      key = line.chomp
      if classmaps.key?(key) then
        old = classmaps[key]
        data = sprintf("%s,%s",old,file)
        classmaps[key] = data 
        classmaps[key] = file

classmaps.each do |k,v|
    tokens = v.split(",")

I then loaded the output file in csv format into Excel and sorted the occurrences in descending order and was flabbergasted to find numerous entries that look something like the following entry:

Class NameOccurrenceFound in Jar files:

Somehow, for this application team, they were able to add the same class to the same jar file multiple times. I never thought that it was possible to add the same class to the same jar file multiple times nor would I ever want to do that. When I finally confronted the application team about this, they recognize that their build process is broken and need to fix their build process. But as a quick test, the developer removed some of the duplicate classes related to the library upgrade and the problems went away. Until Java 8 is introduced with Modularity capabilities, this tool has will be a handy way for me to check duplicate classes given a list of jar files.