January 03, 2011
One of the prominent features of Clojure are a core set of immutable data structures with efficient manipulation operations. Two of the most innovative and important are the persistent vector and persistent hash map.
As a little project I set myself in order to get to know Haskell better, I have been porting these structures to Haskell. I think it's now at a state where the basics are there and usable, so I've put it up on my Github. The API provides Data.PVector (the persistent vector) and Data.PHashMap (the persistent hash map). The interface for both has been kept as consistent as possible with Data.Map.
These two data structures are:
Here's a demo of what you can do with a PVector:
ghci> :m + Data.PVector
ghci> empty -- the empty pvector
fromList []
ghci> append 1 it
fromList [1]
ghci> append 42 it
fromList [1,42]
ghci> append 13 it
fromList [1,42,13]
ghci> let a = it
ghci> a ! 0 -- indexes are from 0 to n-1
1
ghci> a ! 1
42
ghci> a ! 2
13
ghci> set 1 71 a -- a new PVector with the element replaced
fromList [1,71,13]
ghci> adjust succ 2 a -- apply a function to a single element
fromList [1,42,14]
ghci> Data.PVector.map succ a -- apply a function to all elements
fromList [2,43,14]
ghci> fromList [1..10] -- convert a list to a PVector
fromList [1,2,3,4,5,6,7,8,9,10]
ghci> elems it -- convert a PVector to a list
[1,2,3,4,5,6,7,8,9,10]
And here's a demo of the basic functionality of PHashMap:
ghci> :m + Data.PHashMap
ghci> empty Data.HashTable.hashString
-- an empty PHashMap (requires a key hash function)
fromList hashFn []
ghci> insert "foo" 1 it
fromList hashFn [("foo",1)]
ghci> insert "bar" 42 it
fromList hashFn [("foo",1),("bar",42)]
ghci> insert "qux" 123 it
fromList hashFn [("qux",12),("foo",1),("bar",42)]
ghci> insert "qux" 13 it -- inserting an existing key overwrites by
default
fromList hashFn [("qux",13),("foo",1),("bar",42)]
ghci> let a = it
ghci> a ! "foo"
1
ghci> a ! "baz" -- using (!) is unsafe
*** Exception: array index out of range: element not in the map
ghci> Data.PHashMap.lookup "bar" a
Just 42
ghci> Data.PHashMap.lookup "baz" a -- 'lookup' returns a safe Maybe
Nothing
ghci> adjust succ "foo" a -- apply a function to a value
fromList hashFn [("qux",13),("foo",2),("bar",42)]
ghci> Data.PHashMap.map succ a -- apply a function to all values
fromList hashFn [("qux",14),("foo",2),("bar",43)]
ghci> keys a
["qux","foo","bar"]
ghci> elems a
[13,1,42]
ghci> fromList Data.HashTable.hashString [("a", 1), ("b", 2), ("c", 3)]
fromList hashFn [("b",2),("c",3),("a",1)]
ghci> toList it
[("b",2),("c",3),("a",1)]
To try it yourself, just clone it from my Github repo, then configure, build and install it:
runhaskell Setup.hs configure --user
runhaskell Setup.hs build
runhaskell Setup.hs install
The single-element operations for each of these structures technically run in logarithmic time. However, they implemented as 32-ary trees which never exceed a depth of 7 nodes, so you can treat them as constant-time operations (for relatively large constants).
I wrote this code after reading the following explanatory blog posts on how the structures work in Clojure. I haven't veered too far from Clojure's implementation, so these posts should also provide a decent birds-eye overview of my Haskell implementation.
Things that I haven't gotten round to but would be good to have are:
Performance tuning
This is my first proper project in Haskell, so I'm sure there are many problems with it. If you have any suggestions, I'd love to hear your comments. And Github pull requests would be very welcome!