I'm not entirely sure where this post is going to go, so bear with me. I've been thinking about encryption during the past few days, and how it relates to "cloud" services (Amazon Web Services, Google Compute Engine, Microsoft Azure) etc. If you can't trust your cloud storage provider to repel attempted security breaches, whether from government agencies or pirate enterprise, what can you do about it?
First, a bit of background. "The cloud" is commonly perceived as an amorphous blob of computing power, spread across cities / countries / continents as appropriate. The supposed killer application of the cloud is that it removes the need for a company to rent space in several data centers, build out an appropriate security system, build out a load balancing system to handle one or more data centers being down, employ a dedicated set of data center operations and systems staff... instead it can just "rent" 1PB of storage - where one PB (petabyte) = 1024 terabytes (TB), 1 TB is a standard hard disk size - and specify that be split across at least four geographic locations where at least 2 are in Europe and at least 2 are in the USA. That way it can keep four copies of its key information, and (since data centers get periodically taken down for maintenance or via accidents) at least one copy of the information will almost certainly be available very close to a given user, and at least one copy is practically always available somewhere in the world. Job done, right?
Well, not quite. The first problem is that the company's users in (say) the UK and Germany may well be talking to a data center in France for most of the time. That means that their key data is flowing across national borders, and very vulnerable to being tapped by a random bad guy who may or may not represent a government. Since industrial espionage and national wiretapping are uncomfortably close, how can the company protect against this?
For high security data, the problem can usually be solved with SSL (Secure Sockets Layer) communication. The idea here is that the server (in the cloud) and the client (in the company site) negotiate to establish a shared secret key before starting to communicate data. Essential to the security of this approach is that the server can "sign" a request by the client, proving that it knows a secret ("private key") that only the server should know, without giving away to an observer what that private key actually is. So the server has to be provided with a suitable private key as part of the cloud set-up process, and the client needs to be told what a valid server "signature" looks like.
One aspect of encryption that is often overlooked is that, for a user's data to be available in data centers A, B, C and D, whenever the data changes in one data center it has to be copied ("replicated") to all other data centers. Because this often happens after a user's conversation with a data center has finished, this can't be done as part of the user's SSL connection; it has to be managed separately. If you don't encrypt this later communication, a very clever eavesdropper who has determined some information about the structure of your data and messages can eavesdrop on the inter-datacenter communications. This is what Google has announced recently: they now encrypt all traffic between their data centers, to foil such eavesdropping. This way, every "pipe" between the user and any computer on which they might store data is encrypted.
So far so good, but the user's data is sitting on several hard drives scattered across the globe. What if an intruder gains access to one of the machines with this data? Well, currently he or she can read the user's data with impunity. We can try to fix this with encryption; pick a secret key, store the data in encrypted form, then decrypt it as it's read. The problem here is that you have to keep the secret key somewhere, and access it whenever the user wants to read their data; this means, in practice, storing a copy of the secret key in each data center and having a very robust way of checking whether a machine is authorised to know it just before decrypting the data. Now you've just shifted the hacker's problem; he or she has to find the key store and compromise it. This is probably not much harder than compromising the original machine.
The other problem is compelled access; if your cloud hosting company is given a court order to provide some entity with your data then they can decrypt your data at will and hand it over in the clear. So how do you protect your data?
The obvious answer is that you should keep the secret key yourself. There are two flavours of approach here, and each has its problems. The superficially attractive approach is that you pass in a copy of your key each time you want to access your data; because your connection to the server is secure (see above) this is nominally safe. The server keeps your key in memory, reads the encrypted data into memory, decrypts it and sends it back to you in clear text, and then wipes its memory to overwrite the key and clear text. An attacker would have to have access to the machine at the time you are accessing your data, and be able to read the relevant segment of memory to get hold of the key, or alternatively compromise the server software itself and get it to squirrel away copies of keys as they are received; this is significantly harder, but still feasible. So this is better, but far from perfect.
The "bullet-proof" approach is never to send your encryption key to the server at all. Instead you send and receive encrypted data, keeping a copy of the key on your personal machine, and encrypt/decrypt the data as it passes from and to your machine. This is essentially a perfect defence against a compromised cloud provider. One problem, though, is that it negates many of the benefits of cloud hosting; any processing (e.g. indexing) of the data you provide has to be done on your own machine, as the cloud systems will never have access to the clear text to be able to index it. All they are doing is providing a distributed, expensive and slow hard drive for you, which is not worth very much money to most people. And, of course, your machine will now be the target of crackers who will mail you malware and try to get you to browse malware-serving web pages to compromise your system and send them your key, or even key-log your keyboard to catch your key as you enter it.
Worse, if you are responsible for your encryption key, you had better make sure you don't lose it. For a key to be strong (proof against distributed cracking) it has to have a lot of information and hence will tend to be hard to remember exactly. If you forget it, you are screwed - your encrypted data is just wasted space on a hard drive. Perhaps you will use a "password safe" program to store these keys, but then a) you need to trust that the password safe program has not itself been compromised and b) you have to remember the password safe key and keep it safe from keylogging...
All of this goes to show that if you want to keep data in the cloud, and you want it to be secure, it turns out to be a very hard problem if you are defending it against a capable and determined opponent. The best approach seems to be to have a reasonably robust encryption scheme (say, server-side encryption) and accept the risk of hosting company compromise; to defend against this, try not to have data that anyone would actually find interesting enough to try to decrypt.