Recently I was exploring the use of PostgreSQL as a replacement for MongoDB. PostgreSQL has in recent versions great support for JSON. You can store JSON values and you can even make indices on JSON fields. When combined with node.js and its driver things look almost magical. You read from PostgreSQL and you get automatically a JavaScript object, JSON fields automatically embedded. But can we also use JSON for transporting results of queries themselves, especially joins? In MongoDB the idea is to embed such related documents. In PostgreSQL we could also embed them instead of joining them, but would that be faster? I made a benchmark to get answers.
Source code and results are available, for JS driver and for native driver. First I populated the database with two tables, posts and comments, where each comment has a related post. I generated 10000 posts, each with 100 comments. I tested these queries.
js | native | |||||
---|---|---|---|---|---|---|
raw | json | jsonb | raw | json | jsonb | |
posts | 48.6 | 24.1 | 28.2 | 45.5 | 33.3 | 40.2 |
comments | 4208.5 | 2957.2 | 3752.9 | 4457.8 | 4636.8 | 5214.7 |
JOIN | 7387.2 | 4661.5 | 5747.4 | 8509.2 | 7137.4 | 8165.3 |
JOIN + array_agg | 2828.6 | 2967.4 | 3919.1 | 2956.3 | 4015.4 | 5091.2 |
JOIN + to_json(array_agg) | 2847.6 | 2822.4 | 4305.4 | 4278.5 | ||
JOIN + to_jsonb(array_agg) | 3640 | 4246.4 | 5015.8 | 5824.2 | ||
SUBQUERY + array_agg | 2696.4 | 2726.3 | 3530.5 | 3030.1 | 4125.5 | 4513.3 |
SUBQUERY + to_json(array_agg) | 2690.4 | 2729.9 | 3751.2 | 3765.6 | ||
SUBQUERY + to_jsonb(array_agg) | 3525.8 | 4776.6 | 4828.2 | 6047.6 |
Green color marks combinations I would recommend. Red color shows JSONB should be avoided.
My insights from the benchmark are:
Conclusion. If you are using PostgreSQL fields which can be reasonably converted to JSON and you are using node.js, it seems you should simply always convert results to JSON before sending them over to the client. If you want to fetch related documents, do not do JOIN but do a subquery and aggregate results into an array. If you are always converting results to JSON, then this is it. If not, then at least convert that aggregated array to JSON.