Recently I was exploring the use of PostgreSQL as a replacement for MongoDB.
PostgreSQL has in recent versions great support for JSON.
You can store JSON values and you can even make indices on JSON fields.
When combined with node.js and its driver
things look almost magical.
JSON fields automatically embedded. But can we also use JSON for transporting
results of queries themselves, especially joins?
In MongoDB the idea is to embed such related documents. In PostgreSQL we could also
embed them instead of joining them, but would that be faster?
I made a benchmark to get answers.
Source code and
results are available,
for JS driver and for native driver. First
I populated the database with two
tables, posts and comments, where each comment has a related post. I generated 10000 posts, each
with 100 comments.
I tested these queries.
|JOIN + array_agg
|JOIN + to_json(array_agg)
|JOIN + to_jsonb(array_agg)
|SUBQUERY + array_agg
|SUBQUERY + to_json(array_agg)
|SUBQUERY + to_jsonb(array_agg)
Green color marks combinations I would recommend.
Red color shows JSONB should be avoided.
My insights from the benchmark are:
- We can immediately see that converting query results to JSON in the database and then sending them over is
generally much faster then sending over raw PostgreSQL fields. I believe this is because JSON parsing is so much faster
- Doing a traditional JOIN without aggregation is really slow. I attribute this to the fact that traditional join repeats all
column values in the main table. So for 100 comments, related post ’s row is repeated 100 times.
Doing aggregation of comments into an array seems to really improve things. MongoDB’s idea of embedding is really
powerful and you can see benefits even here.
- While using JSONB for storing JSON in PostgreSQL and even having indices on JSON fields is very cool, but using it
- Subquery is faster than JOIN. A surprise because the word is that subqueries can be at best as fast as joins, but
sometimes they will be worse. Here, there are better.
- It seems that the best approach is to use subqueries to get related documents, aggregate them into an array, and
convert the array to JSON. Or, you can be lazy and just simply always convert the whole result (with aggregated arrays) to JSON.
- Important to note is that while it looks that using a subquery and just aggregation is the fastest, the results
driver does not parse embedded records.
Probably parsing them properly would add quite a bit.
Conclusion. If you are using PostgreSQL fields which can be reasonably converted to JSON and you are using node.js,
it seems you should simply always convert results to JSON before sending them over to the client. If you want to fetch
related documents, do not do JOIN but do a subquery and aggregate results into an array. If you are always converting
results to JSON, then this is it. If not, then at least convert that aggregated array to JSON.