Recently at work, I found the need to remove duplicate lines from a file and sort the result. I've must have rewritten this hundreds of times and in a variety of programming languages: java,c#,perl,ruby,vb.
I know that you can do this easily if you have unix shell and the unix tools. Then this can be done with a simplesort dups.txt | uniqHowever, I want to code and my latest fancy is with functional programming language. So my latest iteration of the code is done in Haskell. Haskell took me a while to grok. It really warped my way of thinking about how to program. Reminds me of the first time that I had to develop proofs in math in high school. I'm slowly getting the hang of programming simple stuffs in Haskell.
Code written in Haskell
import List main = readFile "dups.txt" >>= (\fileContents -> putStr (unlines (sort (nub (lines fileContents)))))lines
- breaks the file content into lines
- nub
- removes the duplicate
- unlines
- combines the lines into one giant string
Here's how I would have implemented it in other languages:
Code written in Perl
open(FIN,"< dups.txt");
@lines=<FIN>;
close(FIN);
chomp (@lines);
my %uniques;
foreach $line (@lines)
{
$uniques{$line}++;
}
foreach $key (sort keys %uniques) {
print $key,"\n";
}
Code written in Ruby
File.open("dups.txt") { |file|
lines = file.readlines
lines.uniq!
lines.sort.each { |line|
puts line
}
}
Code written in F#
Okay, for self enlightment, I tried to implement this in F#,which is mostly OCaml with interop to the .Net Framework. Maybe it's me but OCaml's mixed imperative and functional style confuses me or maybe I just don't have a really good grasp of OCaml syntax. In any case, it took me a while to figure out how to do this in OCaml.
#light
open System.IO;;
let main() =
let lines = File.ReadAllLines("dups.txt")
let s = List.fold_right Set.add (Array.to_list lines) Set.empty
Set.iter print_endline s
do main()
Modified on 11/23/2007...after working in F# for a while, I can rewrite the above F# code as
#light
open System.IO;;
File.ReadAllLines("dups.txt")
|> Set.of_array
|> Set.iter print_endline
As a developer, I love the code density of Haskell. It's also readable so you are not sacrificing code readability for code density.