Fun with torrents, and Amazon S3
I've been watching Amazon's S3 service since it was first announced (the land grab is over, BTW - S3 announced a revision which allows you to use your own domain name in the hosting). I'm generally a fan of services priced pay-as-you go, especially when they're done with good technology following best-practices. S3 does all of these things. And it's about as cheap as reliable bandwidth, storage, and scale can come these days, too.
But, since I don't have any Web 2.0 startup ideas, nor any large files to distribute, I haven't gotten to play around with S3 too heavily. That all change this morning, when a new Nerd Vittles tutorial went up (check it out here). This was perfect - there was no torrent download, and a plea for help (in the form of a "downloads only available while the $ for bandwidth remain plea). As the post covers VoIP with open source tools, Virtual Machines, and a chance to tinker with S3 - well, most who know me can guess I was salivating at the thought of putting all three together.
And, that's what I did. I downloaded the original file, uploaded it to a bucket I have on S3. Then, I copied the generated S3 .torrent to my own server, and told the Nerd Vittles folks where to find it. Meanwhile, I had copied the original file up to my hosted server, and started up a seed there, as well (this saved me having to pay for an extra download from S3).
So, now I had Amazon's super-reliable service providing a seed, a tracker, and there was now a bittorrent flashmob forming. I could augment S3's seed with one of my own servers and whatever other bandwidth I had around, but I could also sit back and know that S3 would keep things alive, no matter what else I did.
S3 actually provides a pretty reliable backing store - it appears to provide about 75-100 Kb/s to each peer that shows up. This can add up pretty quickly, at least, if there are enough people in the mob to keep things moving, and you have (relatively) free other bandwidth to contribute. So, not content just to watch people transfer the torrent and slowly tick up my S3 bill (the entire project has cost a little over $1 so far today), I had to experiment with one more variable. Objects in S3 have an ACL associated with them. According to the docs, you have to make an object "public-read" for it to become available as a torrent source. This is true, if only because you can't get the initial .torrent created otherwise.
However, it occurred to me S3's tracker might, possibly, continue running, even if you change the ACL so that the seeder would have to drop out of the swarm. So, I changed the ACL on the original S3 object to remove "public-read", and waited. To check that things were still swarming along nicely, I even started a download on another machine. Happily, the S3 tracker is still playing the tracker role, but the S3 seeder has stopped racking up bandwidth. Since I'm still running another seed on my own machines, I can be sure that the swarm will stay healthy, but I can also pay Amazon only for the super-reliable tracker infrastructure, and the (modest) cost of storing an inert copy of the file.
Now, I just gotta' point a directional antenna at the newly-launched Google WiFi, which I can't currently get indoors, and bump up the seeding a bit more. That is, assuming Google wasn't savvy enough to limit bittorrent bandwidth on its WiFi. More on that if I get a chance to test it.
Some conclusions:
Host the .torrent yourself, rather than using S3's url?torrent REST trick. This means you can continue to distribute the .torrent file even if you knock the S3 seeder out of the swarm.
Hosting the .torrent also means that it's difficult (impossible?) for leachers to find the original URL on S3. Otherwise, if you link directly to the torrent URL on S3, it is possible for savvy users to just use normal HTTP download, and make you foot the bill for the entire download (S3's http is very fast - this would be a tempting trick, depending on the state of the BT swarm).
If you have other seeds, or trust that your swarm is well-enough established to keep running on its own, you can remove the "public-read" ACL and still have S3 host the tracker on its reliable infrastructure. There are several open requests with Amazon to provide tools for how to manage bittorrent usage. So far, this is the only tool that I know of that helps, and it's not exactly automatic.
Technorati Tags: amazons3, googlewifi, voip