Sometimes it is useful to first define an abstraction. A good abstraction
can reveal that some problems could be solved by the same solution. A good
abstraction also makes it easier for us to communicate about the solution
and understand its properties.
One such abstraction which I think should be defined and named
is nonenumerable database. A database where you can query a value by a
key, but you cannot enumerate all keys (nor all values). Even a database
operator/administrator is not be able to do so.
At the same time, some problems are more intuitive than others and can
help us better understand or even inform the abstraction. We will look at
three examples where nonenumerable database can help: DNS/DNSSEC, gun
ownership registry, and voting.
Not just a key-value database
A nonenumerable database is not just a key-value database
and does not just provide an efficient and durable way of storing and retrieving the value given a key, but it also
prevents one to learn about all keys stored in the database. Without knowing a key it is also
not possible to obtain the value or really any value from the database.
All this holds even for a database operator/administrator (or cloud provider).
This last requirement should hold even when the traffic to the database is observed.
Many key-value databases still support various forms of querying, but a nonenumerable database by design does not.
DNS, DNSSEC and enumeration of names in a zone
Domain Name System Security Extensions (DNSSEC)
adds cryptographic authentication and integrity to DNS, preventing various attacks when resolving a domain name to its IP.
The issue is that because DNSSEC also has to cryptographically prove that a name does not resolve (when it does not),
it opens a door to domain walking: enumerating all names available.
Could DNS then be seen as an example of a nonenumerable database? Given interest in addressing domain walking issue introduced by
DNSSEC it seems many would like to see it as such, but it is not quite: longer names might not be enumerable,
but for short names one can enumerate them simply by trying them (e.g.,
cc.com). Note as well that, in DNS, author of the value
(DNS record) picks the key (name) as well: this is probably a realistic requirement for a nonenumerable database
despite allowing the author to subvert security guarantees otherwise afforded by the database. This informs us that for
our nonenumerable database abstraction to be practically secure we have to put some guidelines/limits on the minimal
key size/entropy and/or maximal allowed query rate.
DNSSEC made domain walking easier than just trying names but it also raises a question: what should a nonenumerable
database return when a key does not exist in the database? DNSSEC requires an authoritative (negative) answer, but
our definition of nonenumerable database is silent on that. Moreover, it does not even require authenticated answers.
DNSSEC does not provide confidentiality and large DNS providers (e.g., Google at 184.108.40.206 and CloudFlare at 220.127.116.11)
probably know all names stored in DNS, without having to enumerate them themselves. Users did that for them.
A nonenumerable database could prevent that.
Because DNS is also a distributed database it might be hard to guarantee all properties of a nonenumerable database,
but future implementations of a nonenumerable database might still inform future versions of DNSSEC.
Gun ownership registry
In USA, there is no searchable database of gun owners,
no national gun registry. Tracing a gun owner from the serial number of the gun is all done by hand, searching through boxes
and boxes of forms. The gun lobby worries that a searchable registry would give the government a tool to confiscate guns.
A nonenumerable database addresses both the gun lobby’s concerns and allow fast tracing of a gun owner when trace is required.
In fact, if the worry of the gun lobby is that the government could confiscate guns of everyone, then current approach with boxes
full of paper forms does not really prevent that: the government can still go box by box and confiscate guns from everyone found
in the box. It would be not very efficient (each box probably contains forms from people living in different places) but it is doable.
With a nonenumerable database even that is not be possible.
On the other hand, once a gun is found and tracing the owner is required, lookup by gun’s serial number would be very fast.
Moreover, having database online could allow sellers to directly input ownership information immediately after the sale, which
would help in cases when a crime happens soon after selling the gun. This could be a good use case for a blockchain solution, where
records could also be updated in a decentralized way as guns are resold.
To make enumeration harder by simply trying serial numbers in order, guns’ serial numbers should probably be changed to something closer
to UUID, e.g., a random 128-bit number.
Online voting is hard: it is hard to ensure anonymity of votes, it is hard to prevent vote-buying and coercion, it is hard to
prevent compromised identities and computers, it is hard to prevent phishing, it is hard to prevent compromised servers and manipulating the vote
tally, etc. One piece of a solution might be a nonenumerable database.
Simply storing votes into a nonenumerable database is useless by design: one cannot tally such votes.
Instead, you could store receipts for cast votes and proofs that they have been included in the tally, keyed by a
Only knowing the ballot number one could check the cast vote.
Based on the guarantees of the nonenumerable database, just checking the vote
using the ballot number does not reveal your identity.
If the vote does not match what you
cast or if the proof is invalid, you can make the ballot number public and others can check as well that there was
something wrong during the vote.
A downside is that it makes vote-buying easier: a voter can give their ballot number to the buyer for them to
verify the (bought) vote. Because it is hard to obtain a ballot number of someone else, the buyer
can have high confidence they are spending their money well.
One possible design of a nonenumerable database is to use (a hash of) an encryption key as a database key and the corresponding value
is then encrypted with that key. Without knowing the (encryption) key, you cannot obtain the corresponding value. You cannot enumerate
values as well.
As noted above, we should also allow a mapping between an arbitrary key and the corresponding
encryption key. This would allow storing a DNS name which maps to an encryption key which can decrypt the corresponding
IP address stored as a value. But how do we store this mapping so that it cannot be used to enumerate the database?
Moreover, our definition of a nonenumerable database requires that even when the traffic to the database is observed,
its guarantees should still hold. Both this and the mapping are the core design issues of a nonenumerable database.
Maybe they can be addressed by using a trusted execution environment.
When interoperability is needed between multiple parties using a nonenumerable database, a blockchain-based solution
might be reasonable. Especially if querying such blockchain-based nonenumerable database requires a non-free transaction:
then there are both technical and economical limits on query rate which helps prevent enumeration
by trying keys.