Know Your Storage Options - Benchmarking Tokyo Cabinet
Tokyo Cabinet is what I’ve been focusing my interest on lately. It is a persistent and fast key-value store, and there is a choice of database engine implementations to use: hash, fixed-length array, B+ tree with a user specified ordering function and table database, which is based on hash. In-memory of file-based. The table engine is particularly interesting because it acts like a relational database without a schema – you are free to simply insert arbitrary data – with support for indexing and queries. TC is written in C with APIs for Perl, Ruby, Java, and Lua. The author is Mikio Hirabayashi, and Tokyo’s largest application is mixi.jp, Japan’s largest social network.
There is also Tokyo Tyrant, the accompanying server. Among the rest, it offers thread safe access, supports memcached protocol and asynchronous replication. The spec describes how to run two instances in a master-slave setup, for example.
All the above would not be worth much if it wasn’t for the speed – the specification claims Tokyo Cabinet to be able to store one million records in 0.7 seconds for a hash database. I wanted to check that myself, so I wrote a script that compares the performance of ActiveRecord with MySQL backend with the table engine and put it on github. I already saw benchmarks comparing table to hash et al; in this case I wanted to measure the near equivalents.
For Ruby, there is a choice of using the official bindings, which compile to native code, and the rufus-tokyo gem, which utilizes ruby-ffi to dynamically provide a somewhat more Ruby-ish API that’s portable across all Ruby implementations. John, author of rufus-tokyo, made some benchmarks and wrote a more detailed explanation on his blog. I have used rufus-tokyo’s wrapper to the native bindings for file access, and its API for Tyrant.
The full results are available as a gist. Highlights: Tokyo Cabinet table can be about 300x faster than MySQL on inserts, 20x faster on retrieving all records and roughly equivalent when looking up by foreign key.
In general, areas where I (today) see Tokyo Cabinet as definitely worth using are persistent cache and often inserted data that can grow in size quickly. The presentation available on the project homepage lists a few more interesting applications like storing images and timestamp database.
Other notable resources: