Nonenumerable database

Sometimes it is useful to first define an abstraction. A good abstraction can reveal that some problems could be solved by the same solution. A good abstraction also makes it easier for us to communicate about the solution and understand its properties.

One such abstraction which I think should be defined and named is nonenumerable database. A database where you can query a value by a key, but you cannot enumerate all keys (nor all values). Even a database operator/administrator is not be able to do so.

At the same time, some problems are more intuitive than others and can help us better understand or even inform the abstraction. We will look at three examples where nonenumerable database can help: DNS/DNSSEC, gun ownership registry, and voting.

Not just a key-value database

A nonenumerable database is not just a key-value database and does not just provide an efficient and durable way of storing and retrieving the value given a key, but it also prevents one to learn about all keys stored in the database. Without knowing a key it is also not possible to obtain the value or really any value from the database. All this holds even for a database operator/administrator (or cloud provider). This last requirement should hold even when the traffic to the database is observed. Many key-value databases still support various forms of querying, but a nonenumerable database by design does not.

DNS, DNSSEC and enumeration of names in a zone

Domain Name System Security Extensions (DNSSEC) adds cryptographic authentication and integrity to DNS, preventing various attacks when resolving a domain name to its IP. The issue is that because DNSSEC also has to cryptographically prove that a name does not resolve (when it does not), it opens a door to domain walking: enumerating all names available.

Could DNS then be seen as an example of a nonenumerable database? Given interest in addressing domain walking issue introduced by DNSSEC it seems many would like to see it as such, but it is not quite: longer names might not be enumerable, but for short names one can enumerate them simply by trying them (e.g., aa.com, bb.com, cc.com). Note as well that, in DNS, author of the value (DNS record) picks the key (name) as well: this is probably a realistic requirement for a nonenumerable database despite allowing the author to subvert security guarantees otherwise afforded by the database. This informs us that for our nonenumerable database abstraction to be practically secure we have to put some guidelines/limits on the minimal key size/entropy and/or maximal allowed query rate.

DNSSEC made domain walking easier than just trying names but it also raises a question: what should a nonenumerable database return when a key does not exist in the database? DNSSEC requires an authoritative (negative) answer, but our definition of nonenumerable database is silent on that. Moreover, it does not even require authenticated answers. Should it?

DNSSEC does not provide confidentiality and large DNS providers (e.g., Google at 8.8.8.8 and CloudFlare at 1.1.1.1) probably know all names stored in DNS, without having to enumerate them themselves. Users did that for them. A nonenumerable database could prevent that.

Because DNS is also a distributed database it might be hard to guarantee all properties of a nonenumerable database, but future implementations of a nonenumerable database might still inform future versions of DNSSEC.

Gun ownership registry

In USA, there is no searchable database of gun owners, no national gun registry. Tracing a gun owner from the serial number of the gun is all done by hand, searching through boxes and boxes of forms. The gun lobby worries that a searchable registry would give the government a tool to confiscate guns.

A nonenumerable database addresses both the gun lobby’s concerns and allow fast tracing of a gun owner when trace is required. In fact, if the worry of the gun lobby is that the government could confiscate guns of everyone, then current approach with boxes full of paper forms does not really prevent that: the government can still go box by box and confiscate guns from everyone found in the box. It would be not very efficient (each box probably contains forms from people living in different places) but it is doable. With a nonenumerable database even that is not be possible.

On the other hand, once a gun is found and tracing the owner is required, lookup by gun’s serial number would be very fast. Moreover, having database online could allow sellers to directly input ownership information immediately after the sale, which would help in cases when a crime happens soon after selling the gun. This could be a good use case for a blockchain solution, where records could also be updated in a decentralized way as guns are resold.

To make enumeration harder by simply trying serial numbers in order, guns’ serial numbers should probably be changed to something closer to UUID, e.g., a random 128-bit number.

Verifiable voting

Online voting is hard: it is hard to ensure anonymity of votes, it is hard to prevent vote-buying and coercion, it is hard to prevent compromised identities and computers, it is hard to prevent phishing, it is hard to prevent compromised servers and manipulating the vote tally, etc. One piece of a solution might be a nonenumerable database.

Simply storing votes into a nonenumerable database is useless by design: one cannot tally such votes. Instead, you could store receipts for cast votes and proofs that they have been included in the tally, keyed by a ballot number.

Only knowing the ballot number one could check the cast vote. Based on the guarantees of the nonenumerable database, just checking the vote using the ballot number does not reveal your identity. If the vote does not match what you cast or if the proof is invalid, you can make the ballot number public and others can check as well that there was something wrong during the vote.

A downside is that it makes vote-buying easier: a voter can give their ballot number to the buyer for them to verify the (bought) vote. Because it is hard to obtain a ballot number of someone else, the buyer can have high confidence they are spending their money well.

Possible design

One possible design of a nonenumerable database is to use (a hash of) an encryption key as a database key and the corresponding value is then encrypted with that key. Without knowing the (encryption) key, you cannot obtain the corresponding value. You cannot enumerate values as well.

As noted above, we should also allow a mapping between an arbitrary key and the corresponding encryption key. This would allow storing a DNS name which maps to an encryption key which can decrypt the corresponding IP address stored as a value. But how do we store this mapping so that it cannot be used to enumerate the database?

Moreover, our definition of a nonenumerable database requires that even when the traffic to the database is observed, its guarantees should still hold. Both this and the mapping are the core design issues of a nonenumerable database. Maybe they can be addressed by using a trusted execution environment.

When interoperability is needed between multiple parties using a nonenumerable database, a blockchain-based solution might be reasonable. Especially if querying such blockchain-based nonenumerable database requires a non-free transaction: then there are both technical and economical limits on query rate which helps prevent enumeration by trying keys.

If you have any relevant link, project, idea, comment, feedback, critique,
language fix, or any other information, please share it bellow. Thanks.
Subscribe