The [paper](http://static.usenix.org/event/imc05/tech/full_papers/lee/lee_html/paper.html) describes an algorithm that uses less RAM and provides more accuracy. There's an [example implementation](https://gist.github.com/3229420) put together by some nice chaps on irc.