Think a minute about the security challenges involved in creating a health-centered social network. Or, more generally, any web application that has to handle sensitive user data. What if the database server becomes compromised? How do you make sure that, even if the database is stolen, your users' secrets remain confidential? How can any cloud computing application provide any significant measure of privacy?
At Clipperz they claim to have found the solution: they call it "zero-knowledge web applications". Since the term "zero knowledge" has a precise meaning in cryptography, that's a bit confusing.
What Clipperz promotes boils down to evangelism for the "host-proof hosting" AJAX programming pattern.
client-side encryption
You still with me? It's a very simple concept, actually. Encrypt any sensitive data on the client-side browser before sending it to the web server. Data is never stored plaintext. Users can retrieve their data, in encrypted form, and only in their private browser is it decrypted and becomes accessible. Should the database server be compromised, an attacker finds only encrypted gibberish in the database.
Wow! Total privacy in the cloud computing age! Why don't we rewrite all our web applications to use this neat trick?
Yes, why don't we? Read on to find several answers to that question.
crippled technology
As a programmer, you need access to the data you're manipulating. That's what programming is about: performing operations on data. If the data is protected by a wall of encryption on the client side, that means I can't manipulate it in any meaningful way on the server side. I'll have to do all serious programming on the client side, in JavaScript. That presents several serious drawbacks:
- Flaky programming platform
- Like any veteran of the browser wars, I've learned to deeply distrust (and dislike) web browsers as a client-side programming platform. We've seen it with Java: write once, debug everywhere. JavaScript has an even more checkered history of browser incompatibilities and bugs.
- Data doesn't belong in the GUI
- AJAX is very nice for fetching data and plugging them into a display. Nothing wrong with that. However, actually handling those data should be done at the server side, where the database environment is. It's no coincidence that the available JavaScript frameworks are all geared mainly towards display manipulation: that's what this technology was made for.
- Crippled databases, crippled applications
- Reducing a database to merely a diskspace to put encrypted bytes in, severely cripples your ability to do something actually interesting in your web application. No relation between user data is possible. No indexing and searching.
- Isolated data, isolated users
- There's no point in creating a social network, if people can't even share their interests, since those are closely held in each user's private browser.
- Usability problems
- The decryption pass-phrase key is held in memory by the client browser but gets lost on page refresh. This means you'll either have to re-request the pass-phrase on each page, or write a pure-AJAX application that doesn't do any page refreshes at all, which triggers significant usability disadvantages.
broken trust model
Apart from these technical objections, the underlying philosophy also doesn't make sense.
The basic idea is: don't trust web hosts. Nothing wrong with that, actually, it's a dangerous world out there.
But then the proposed solution is: don't trust web hosts, trust web applications. Which may make sense in a purely theoretical sense if you're a geek and squeeze your eyes a bit. But which doesn't, in any meaningful way, translate into a practical solution, not even for a propeller head like yours truly.
First of all, non-techies don't distinguish between hosts and the software served by that host. And they certainly don't distinguish between data stored on the host and encrypted data, interleaved in the same GUI, processed by the same software. If the host is trustworthy, the software is also, right? And if the host isn't trustworthy, why would I even consider typing in my medical details into it's software?
Next, even if you're one of the three people who hasn't stopped reading this blog entry yet and understands the issues involved, how are you going to ensure that your confidential data is not surreptitiously sent out without your knowing? How are you going to trust the application, if you don't trust the host that's serving you the application code on each page? You're going to audit the full client-side JavaScript application manually? You're going to do that on each page reload, to see if they maybe injected some malicious script? You're going to set up an auditing service that alerts you if anything changes in the JavaScript application code tomorrow?
transforming the concept
In short, programming a host-proof hosting application creates more problems than it solves.
It's against the grain in several respects:
- It fosters user isolation in a web context that naturally seeks to connect users;
- It violates basic MVC software architecture principles by pulling data processing into the GUI layer;
- It tries to replace a well-understood trust mechanism (a host is a brand, for chrissake!) with an obscure alternative that just doesn't cut it.
However, if you ditch the threat model in which the web host is the user's adversary, new opportunities emerge.
Envision a different threat model, in which user and web host cooperate to protect sensitive data against a third party adversary.
a confidential health care web application
Say, we have a health care context and wish to separate personally identifiable data (name, address, date of birth) from the actual health records (symptoms, diagnosis, treatment). The two data sets can only be connected by the user, which stores her own connection key safely encrypted through the algorithm described above. Additionally, the user provides the web host with a separate connection key for billing and/or emergency access. The web host connection key is protected by a one-way public key encryption.
In this scenario, the web host software can only connect the two data sets (addresses and medical) if a human operator of the web host provides the secret key that unlocks the connection. This secret key is not stored on the web host. Consequently, should the web host be compromised and the database stolen, a hacker will not be able to link address information to sensitive medical data. Only the user or the web host operator can make that connection, ensuring confidentiality.