Nonenumerable database

Sometimes it is useful to first define an abstraction. A good abstraction can reveal that some problems could be solved by the same solution. A good abstraction also makes it easier for us to communicate about the solution and understand its properties.

One such abstraction which I think should be defined and named is nonenumerable database. A database where you can query a value by a key, but you cannot enumerate all keys (nor all values). Even a database operator/administrator is not be able to do so.

At the same time, some problems are more intuitive than others and can help us better understand or even inform the abstraction. We will look at three examples where nonenumerable database can help: DNS/DNSSEC, gun ownership registry, and voting.

Keep reading

PeerDoc – Scaling real-time text editing

PeerDoc is a collaborative real-time rich-text editor with undo/redo, cursor tracking, inline comments, permissions/sharing control over documents, a change history. Things you would expect from a modern collaborative editor on the web.

But the main difference is that it combines two types of collaboration:

  • real-time collaboration between collaborators on the draft of the document (push-based collaboration)
  • fork and merge request style of collaboration with others, allowing collaboration to scale beyond a small group of collaborators (pull-based collaboration)

Keep reading

JSON decoding in node is fast

I made a benchmark comparing how long it takes to decode JSON in node.js and Go. As I have noticed in the past, JSON decoding in node.js is really fast.

Keep reading

Google search across languages

In 2013, I wrote how crowdsourcing could help break out of the language bubble we all find ourselves in when we search online. How our search results only come from the language we write our search keywords in. But since then, machine translation between languages has greatly improved. So why are we still in our language bubbles?

There are some rays of hope. Google is reportedly able to return you results from other languages when there are no good hits in your primary language. I have to say, though, that I have never experienced this. Maybe because I usually search in English. I would guess that when you search in smaller languages, you sometimes get English results added.

But I want the opposite. I want to see ideas and thoughts and solutions that might be available in other languages, the languages I do not speak, when I search in English. I want diversity. I think all the pieces to build this are available. Why is this not already available? Am I overlooking any obstacle to this?

Store IDs in MongoDB as binary or as string?

I was curious if MongoDB compression can efficiently store IDs if they are represented as string instead in a more compact binary form. So I made a benchmark and measure compression performance of three available compressors: zlib, snappy, and zstd.

Keep reading

Can humans grow up in zero gravity?

Gravity seems to be our first and most important teacher. Patient, consistent and always present. Children can learn cause and effect through it. What happens if I lift an object and let go? Over and over again. On repeat. Always the same thing.

When we think about life in space. Can we raise children there? Can children grow up in zero gravity? Can they develop their mental abilities without having this teacher around them? Or will they develop in other ways? A different kind of logic?

Database-abstraction APIs should not exist

Database-abstraction APIs where you write a query using the host programming language should not exist. Or more precisely, should not have to exist. For example, in Django you can query the database using the following Python code:

Entry.objects.filter(is_draft=True)

Which Django translates (roughly) into the following SQL:

SELECT *
  FROM blog_entry
  WHERE is_draft = true;

But why we cannot write SQL query directly as an SQL query, while retaining all other features Django offers through its database-abstraction API (database agnostic code, inputs to queries and outputs from queries being Python objects, etc.)? I claim there is no reason anymore for that.

Keep reading

Towards Automatic Machine Learning Pipeline Design

I recently finished my PhD thesis and is now available online. Most of the code related to the thesis is available in this repository.

Keep reading

In node.js, always query in JSON from PostgreSQL

Recently I was exploring the use of PostgreSQL as a replacement for MongoDB. PostgreSQL has in recent versions great support for JSON. You can store JSON values and you can even make indices on JSON fields. When combined with node.js and its driver things look almost magical. You read from PostgreSQL and you get automatically a JavaScript object, JSON fields automatically embedded. But can we also use JSON for transporting results of queries themselves, especially joins? In MongoDB the idea is to embed such related documents. In PostgreSQL we could also embed them instead of joining them, but would that be faster? I made a benchmark to get answers.

Keep reading

Reactive queries in PostgreSQL

I am a big fan of the application architecture promoted by Meteor. I like declarative programming. You describe what you want and not how and the system does the rest. Reactive programming is very similar. You define how outputs should be computed from inputs, but when is this computed and how it is composed with other computations is left to the system. So you can define what is read from the database and send to the client. And how it is read on the client and transformed and send to the UI library. And then UI library can render this data. And every time something changes, the rest gets automatically recomputed, refreshed, re-rendered.

Meteor is tightly linked with MongoDB. They developed a complex piece of technology to provide reactive queries. Reactive queries are queries which after providing initial results they also continue providing any changes to those results as input data used in queries change. While I like MongoDB, I still prefer consistency tools provided by traditional SQL databases: transactions, foreign keys, joins and triggers. They are close to declarative programming as well. You define relations between data once and then the system makes sure data is consistent. I had to implement many of those features on top of MongoDB, like my package PeerDB.

This is why I made reactive-postgres node.js package. It provides exactly such reactive queries, but for PostgreSQL open source database. Its API is simple, on purpose, and because it should be. You provide a query, you get initial data, and then you get all changes. Try it out.

Subscribe
Recent Tweets @mitar_m