Monday, March 1, 2021
Latest News from Cambridge and England

MongoDBs field-level encryption protects private data—even from DBAs

Enlarge / Encrypted coffee is likely poisonous and should never be consumed raw. Decrypt and validat..

By admin , in Tech , at April 1, 2020

Enlarge / Encrypted coffee is likely poisonous and should never be consumed raw. Decrypt and validate responsibly before human consumption.Brett Hoerner / Flickr

In December 2019, popular document database MongoDB added a fairly radical new feature to the platform: field-level database encryption. At first glance, one might wonder whether this is a meaningful feature in a world that already has at-rest storage encryption and in-flight transport encryption—but after a little closer analysis, the answer is a resounding yes.

One of MongoDB's first customers to use the new technology is Apervita, a vendor that handles confidential data for well over 2,000 hospitals and nearly 2 million individual patients. Apervita worked side by side with MongoDB during development and refinement of the technology.

Since reaching general availability in December, the technology has also been adopted by several government agencies and Fortune 50 companies, including some of the largest pharmacies and insurance providers.

Field-level encryption in a nutshell

MongoDB's field-level encryption (FLE) offers the ability to store certain parts of the data in its document store encrypted. The community (free) version of MongoDB allows for explicit encryption of fields in client-side applications.

Enterprise versions of MongoDB—and Mongo's cloud-based Database-as-a-Service, Atlas—also support automatic encryption. MongoDB Enterprise and Atlas can also enforce encryption on protected fields at server-side, preventing a terminally clueless application developer from accidentally storing sensitive data in clear text. Encrypted fields can be automatically decrypted upon read—presuming the application has the key—in either free or enterprise versions.

Setting up an automatically encrypted database is a little too chewy to poke through in code here. But to understand how and when the encryption occurs, it may help to take a quick look at the Python code to do a single, explicitly encrypted MongoDB insertion:

# Explicitly encrypt a field: encrypted_field = client_encryption.encrypt(     "123456789",     Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,     key_id=data_key_id)  coll.insert_one({"encryptedField": encrypted_field})

The explicit call here makes it pretty clear what's going on: the data is encrypted on the client application side, then sent to and stored by the MongoDB server instance. This obviously gives us most of the benefit of both in-flight and at-rest encryption, but there's another layer of defense offered here that might not be as immediately obvious.

A closer look at the sysadmin problem

System administrators—and database administrators—represent one of the thorniest problems of data confidentiality. A computer needs a human operator with all the privileges necessary to start, stop, maintain, and monitor services; this entails the sysadmin effectively having access to any data either stored on or processed by that system.

Similarly, databases—particularly large-scale databases—must have database administrators. The DBA may not have the low-level root access to a system that a sysadmin would, but it has access to the inner workings of the database itself. In addition to designing the initial structure of the database, a DBA must be able to log and monitor the running database engine, to identify "hot spots" in the data.

Those hot spots might call for restructuring or indexing to alleviate performance problems as they arise. Troubleshooting them properly will also frequently mean the need for a DBA to be able to replay troublesome queries, to see if the DBA's changes have made a positive or negative impact on performance.

At-rest encryption does very little to solve either the sysadmin problem or the DBA problem. Although sysadmins can't get meaningful data by cloning the raw disks of the system, they can easily copy the unencrypted data from the running system once its storage has been unlocked.

If the storage encryption key is present in hardware—for example, built into a Trusted Platform Module (TPM)—it does little or nothing to mitigate the sysadmin problem, since the sysadmin has access to the running system. As Apervita CTO Michael Oltman told us, "[we're] not worried about someone walking out of an AWS data center with our server."

An at-rest encryption system that requires a remote operator to unlock storage with a key provided at boot mitigates this problem somewhat. But a local system administrator will likely still have opportunities to compromise the running machine—and availability may be impacted, since unavailability of the remote key operator means services won't come back up automatically after a maintenance window involving a reboot.

This inability to secure private data from system and database administrators makes it more difficult and expensive to scale a large operation without potentially breaching confidentiality.

Field-level encryption enables scale by segmenting access

  • The top view is the actual data, visible from the application server that possesses the key. The bottom view is all you're going to get on the DB server itself. MongoDB
  • The full schematic of the query flow shows us how the sausage is made. Encryption and decryption are transparently handled by provided MongoDB drivers, even for query construction. MongoDB
  • If the flat query flow diagram was a bit much to follow, you can watch the process step by step in this simple animation. MongoDB

Now that we understand the sysadmin problem, we can look at how field-level encryption mitigates it. With FLE, the application encrypts data before ever sending it to the database—and the database stores it exactly as-is. Similarly, when encrypted data is queried, it's retrieved and sent back to the application still encrypted—decryption never happens at the server level, and in fact, the server doesn't have access to the keys necessary to decrypt it.

With data securely encrypted before ever hitting the database—and never being decrypted until it comes back from the database—the sysadmin problem is largely solved, whether discussing sysadmins or DBAs. A system administrator with local root access can stop, start, and upgrade services without ever getting access to the data—and a database administrator can view and replay running queries without seeing the private contents, either.

To be fair, we've only kicked this particular can a little further down the road. Sysadmins and developers with access to the production application server can still see data they shouldn't—the application itself must handle the raw data, after all.

The segmentation is stRead More – Source