After reading about Opendedup on Slashdot this weekend, I decided to try it out to see how well it all really worked. My test server was an install of Ubuntu 9.10 x64. If you happen to be using that stack, the installation isn't too difficult:
Download required files (adding links to the most recent versions of each, check for newer versions as necessary):
chmod +x jdk-7-ea-bin-b87-linux-x64-25_mar_2010.bin
(follow instructions - afterwards, but sure to set the JAVA_HOME variable)
tar zxf debian-fuse.tar.gz cd debian-fuse dpkg --install *.deb
Next, just extract the SDFS packages and use:
tar zxf sdfs-latest.tar.gz
Now, we make our filesystem and mount it:
./mkfs.sdfs --volume-name=deduped --volume-capacity=5000MB
./mount.sdfs -m /srv -v deduped
Assuming all goes well, you should have a newly mounted deduped mount.
Great results from testing in the small
As a test, I copied over a sample song from my music collection (what nerd doesn't enjoy a little Weird Al?). Copying to /root, the file size was 2.9MB. Once I copied it to my deduped /srv directory, the file size took just 46K on disk! Not too shabby. Just as a sanity check, I copied the file back off the deduped filesystem and the file size grew back to normal.
Things not all rosy in Opendedup-land
I decided to try throwing a little more data at it as a test and copied over the Documents directory from my desktop. The folder that I copied was slightly over 600MB of docs, text files, images, and a few other file types. During the file copy, Opendedup took a significant amount of memory (it hung around the 90% mark). My test machine was a small virtual machine (1 CPU, 2GB of RAM) and the file transfer slowed it down significantly. Eventually, I got curious as to how much had been transferred. I cd'd to the test dir and did an 'ls' which never completed and I could no longer open a new shell via SSH to the vm either. I'm sure this would be much better if I had the resources to throw a little more RAM and CPU at it (since I'm running the minimum), but I don't have time the resources to try at the moment.
Overall, the technology seems really promising and pretty straightforward to use. If my compression rates hold true, this could dramatically cut down on the amount of disk space needed to store my backups and virtual machine templates. Judging by the performance I've seen thus far, I don't think I'd want to run this in production, but it looks promising, nonetheless.