2012-12-19

Networks as a bottleneck

Something I've been watching with interest recently is the failure in cross-continent network speeds to keep pace with the rapid growth in storage. And why, you might ask, does this keep anyone up at night? It turns out that if you have been storing lots of videos of dancing cats with hosting firm A, and then firm A starts to exhibit financial wobbles or the country in which firm A is based decides to pass new laws about access to data, and you decide you want to store them with hosting firm B instead, you are going to find that network transmission is a very real bottleneck.

Some baseline numbers first of all:

  • A regular 2-layer Blu-Ray disc contains about 50GB (gigabytes: 10^9 bytes) of data, which could store just over 3 hours of 1080p (HD) video, or 15GB/hour.
  • Leasing 10Gbps (giga-bits-per-second, 1 byte == 8 bits) of bandwidth between New York and London will cost you around $10,000 per month

Let's suppose that we have about 100,000 hours of video, which is 1,500,000 GB i.e. 1.5 PB ("petabytes": 10^15 bytes). If we wanted to transmit that over a 10Gbps link, that's 1.1 GB per second. It would take (1,500,000 / 1.1) seconds - 378 hours, or a litte over 15 days) to transmit that raw data, assuming perfect transmission; let's add a reasonable 20% overhead, and you're looking at 18 days non-stop to send your data over the Atlantic.

Ironically, cost isn't the main issue: leasing the line for that time (and assuming you can lease the exact period at the pro-rata cost) will cost you $5400, and compared to the $90,000 that the 750 x 2TB disks to hold your data will cost, that's just noise. But you are going to have tremendous problems shifting your data significant distances.

That's OK, though; as Andrew Tanenbaum famously said:

Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.
You can easily get 1.5TB per backup tape, so 1000 tapes (a cube of 10x10x10 tapes, about 50cm x 100cm x 220cm, fairly easily managed in one person's checked luggage on a reputable airline) should be fine for this data. For redundancy we take two copies of the data and send two gofers on separate LDN-to-NYC flights; say, $600/person. Within 10 hours we can get the data trans-Atlantic with a tiny data loss; say, 5% of tapes fail so only 2-3 tapes of data fail for both gofers, and we can get that data sent via the network for a tiny cost in time (about 20 minutes per tape). Problem solved?

Well, you've still got to get the data onto the tape. Tape writing speeds are about 900GB/hour at the top end, so with one tape drive it will take 1666+ hours to write all the data to one set of tapes, or 3333 hours total. With 100 tape drives, you could get this down to 33 hours of writing - but don't forget the reading at the other end, which means you need another 100 tape drives. At $1000+ per tape drive for a high-speed device you're looking at $200K just for your reading/writing devices, more than double the cost of the drives themselves.

So if you have money to burn, you can get 98%+ of the data across the Atlantic in 33 + 10 + 33 hours - 6 days, one third of the time required for direct network transfer. However it's going to cost you a lot. A cheaper option would be ripping the hard drives out of firm A, duplicating them into two other sets of drives (using very cheap hardware with multiple SATA interfaces and minimum-wage goons to swap drives in and out. SATA 2.0 transfer speeds of 300MB/s == 1TB/hour) are much better than tape; the limiting factor is the hard drive write speed which is still about 100MB/s == 300GB/hour, and the marginal cost of an extra motherboard for the transfer is order of $200 instead of a $1000 tape drive, so you still end up about 40% faster this way for the same cost as tape. You can then send your two gofers with the hard drives instead of tapes, although they're going to be a lot heavier - around 300Kg of luggage per person - so they'll need to pay quite a bit extra; I'd estimate an additional $400 per 100Kg (essentially, one person plus luggage). As a bonus, I'd expect the disk failure rate to be lower.

High-speed Internet is definitely here for the masses, but it hasn't yet really impacted long-distance network connections. Tanenbaum's maxim can be considered safely debunked; tape read/write speeds have not kept pace with tape capacities, and even cheap disks need expensive interfaces.

No comments:

Post a Comment

All comments are subject to retrospective moderation. I will only reject spam, gratuitous abuse, and wilful stupidity.