Recently at work, I found the need to remove duplicate lines from a file and sort the result. I've must have rewritten this hundreds of times and in a variety of programming languages: java,c#,perl,ruby,vb.
I know that you can do this easily if you have unix shell and the unix tools. Then this can be done with a simplesort dups.txt | uniqHowever, I want to code and my latest fancy is with functional programming language. So my latest iteration of the code is done in Haskell. Haskell took me a while to grok. It really warped my way of thinking about how to program. Reminds me of the first time that I had to develop proofs in math in high school. I'm slowly getting the hang of programming simple stuffs in Haskell.
Code written in Haskell
import List main = readFile "dups.txt" >>= (\fileContents -> putStr (unlines (sort (nub (lines fileContents)))))lines
- breaks the file content into lines
- nub
- removes the duplicate
- unlines
- combines the lines into one giant string
Here's how I would have implemented it in other languages:
Code written in Perl
open(FIN,"< dups.txt"); @lines=<FIN>; close(FIN); chomp (@lines); my %uniques; foreach $line (@lines) { $uniques{$line}++; } foreach $key (sort keys %uniques) { print $key,"\n"; }
Code written in Ruby
File.open("dups.txt") { |file| lines = file.readlines lines.uniq! lines.sort.each { |line| puts line } }
Code written in F#
Okay, for self enlightment, I tried to implement this in F#,which is mostly OCaml with interop to the .Net Framework. Maybe it's me but OCaml's mixed imperative and functional style confuses me or maybe I just don't have a really good grasp of OCaml syntax. In any case, it took me a while to figure out how to do this in OCaml.
#light open System.IO;; let main() = let lines = File.ReadAllLines("dups.txt") let s = List.fold_right Set.add (Array.to_list lines) Set.empty Set.iter print_endline s do main()
Modified on 11/23/2007...after working in F# for a while, I can rewrite the above F# code as
#light open System.IO;; File.ReadAllLines("dups.txt") |> Set.of_array |> Set.iter print_endline
As a developer, I love the code density of Haskell. It's also readable so you are not sacrificing code readability for code density.