Written by David Rowley

February 8, 2024

PostgreSQL 16 introduces quite a few improvements to the query planner and makes many SQL queries run faster than they did on previous versions of PostgreSQL.

If you look at the PG16 release notes, you’ll see some of these planner improvements. But with the volume of changes made in each PostgreSQL release, it’s not possible to provide enough detail about each and every change. So maybe you might need a bit more detail to know what the change is about—before you understand if it’s relevant to you.

In this blog post, assuming you’ve already got a handle on the basics of EXPLAIN, you’ll get a deep dive into the 10 improvements made in the PostgreSQL 16 query planner. For each of the improvements to the PG16 planner (the planner is often called an optimizer in other relational databases), you’ll also get comparisons between PG15 and PG16 planner output—plus examples of what changed, in the form of a self-contained test you can try for yourself.

Let’s dive into these 10 improvements to the PostgreSQL planner in PG16:

Incremental sorts for DISTINCT queries Faster ORDER BY / DISTINCT aggregates Memoize for UNION ALL queries Support Right Anti Join Parallel Hash Full and Right Joins Optimize window function frame clauses Optimize various window functions JOIN removals for partitioned tables Short circuit trivial DISTINCT queries Incremental Sort after Merge Join, in more cases 1. Allow incremental sorts in more cases, including DISTINCT (David Rowley) Incremental sorts were first added in PostgreSQL 13. These incremental sorts reduce the effort required to get sorted results. How? By exploiting the knowledge that some given result set is already sorted by 1 or more leading columns—and only performing a sort on the remaining columns.

For example, if there’s a btree index on column a and we need the rows ordered by a,b, then we can use the btree index (which provides presorted results on column a) and sort the rows seen so far only when the value of a changes. With the quicksort algorithm used by PostgreSQL, sorting many smaller groups is more efficient than sorting one large group.

The PostgreSQL 16 query planner now considers performing incremental sorts for SELECT DISTINCT queries. Prior to PG16, when the sorting method was chosen for SELECT DISTINCT queries, the planner only considered performing a full sort (which is more expensive than an incremental sort.)

— Setup
CREATE TABLE distinct_test (a INT, b INT);
INSERT INTO distinct_test
SELECT x,1 FROM generate_series(1,1000000)x;
CREATE INDEX on distinct_test(a);
VACUUM ANALYZE distinct_test;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT DISTINCT a,b FROM distinct_test;
PG15 EXPLAIN output QUERY PLAN
—————————————————————
HashAggregate (actual rows=1000000 loops=1)
Group Key: a, b
Batches: 81 Memory Usage: 11153kB Disk Usage: 31288kB
-> Seq Scan on distinct_test (actual rows=1000000 loops=1)
Planning Time: 0.065 ms
Execution Time: 414.226 ms
(6 rows)
PG16 EXPLAIN output QUERY PLAN
——————————————————————
Unique (actual rows=1000000 loops=1)
-> Incremental Sort (actual rows=1000000 loops=1)
Sort Key: a, b
Presorted Key: a
Full-sort Groups: 31250 Sort Method: quicksort Average Memory: 26kB Peak Memory: 26kB
-> Index Scan using distinct_test_a_idx on distinct_test (actual rows=1000000 loops=1)
Planning Time: 0.108 ms
Execution Time: 263.167 ms
(8 rows)
In the PostgreSQL 16 EXPLAIN output above, you can see the planner chose to use the distinct_test_a_idx index on the a column and then performed an Incremental Sort to sort all of the equal values of a by b. The Presorted Key: a indicates this. Because the INSERT statements above only added a single value of b for each value of a, each batch of tuples sorted by incremental sort only contains a single row.

The EXPLAIN output for PostgreSQL 16 above shows that the Peak Memory for the Incremental Sort was just 26 kilobytes, whereas the hashing method used by PostgreSQL 15 needed much memory, so much so that it needed to spill about 30 megabytes of data to disk. The query executed 63% faster on PostgreSQL 16.

2. Add the ability for aggregates having ORDER BY or DISTINCT to use pre-sorted data (David Rowley) In PostgreSQL 15 and earlier, aggregate functions containing an ORDER BY or DISTINCT clause would result in the executor always performing a sort inside the Aggregate node of the plan. Because the sort was always performed, the planner would never try to form a plan to provide presorted input to aggregate the rows in order.

The PostgreSQL 16 query planner now tries to form a plan which feeds the rows to the plan’s Aggregate node in the correct order. And the executor is now smart enough to recognize this and forego performing the sort itself when the rows are already pre-sorted in the correct order.

— Setup
CREATE TABLE aggtest (a INT, b text);
INSERT INTO aggtest SELECT a,md5((b%100)::text) FROM generate_series(1,10) a, generate_series(1,100000)b;
CREATE INDEX ON aggtest(a,b);
VACUUM FREEZE ANALYZE aggtest;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, BUFFERS)
SELECT a,COUNT(DISTINCT b) FROM aggtest GROUP BY a;
PG15 EXPLAIN output QUERY PLAN
—————————————————————
GroupAggregate (actual rows=10 loops=1)
Group Key: a
Buffers: shared hit=892, temp read=4540 written=4560
-> Index Only Scan using aggtest_a_b_idx on aggtest (actual rows=1000000 loops=1)
Heap Fetches: 0
Buffers: shared hit=892
Planning Time: 0.122 ms
Execution Time: 302.693 ms
(8 rows)
PG16 EXPLAIN output QUERY PLAN
—————————————————————
GroupAggregate (actual rows=10 loops=1)
Group Key: a
Buffers: shared hit=892
-> Index Only Scan using aggtest_a_b_idx on aggtest (actual rows=1000000 loops=1)
Heap Fetches: 0
Buffers: shared hit=892
Planning Time: 0.061 ms
Execution Time: 115.534 ms
(8 rows)
Aside from PostgreSQL 16 executing the query over twice as fast as in PG15, the only indication of this change in the EXPLAIN ANALYZE output above is from the temp read=4540 written=4560 that’s not present in the PostgreSQL 16 output. In PG15, this is caused by the implicit sort spilling to disk.

3. Allow memoize atop a UNION ALL (Richard Guo) Memoize plan nodes were first introduced in PostgreSQL 14. The Memoize plan node acts as a cache layer between a parameterized Nested Loop and the Nested Loop’s inner side. When the same value needs to be looked up several times, Memoize can give a nice performance boost as it can skip executing its subnode when the required rows have been queried already and are cached.

The PostgreSQL 16 query planner will now consider using Memoize when a UNION ALL query appears on the inner side of a parameterized Nested Loop.

— Setup
CREATE TABLE t1 (a INT PRIMARY KEY);
CREATE TABLE t2 (a INT PRIMARY KEY);
CREATE TABLE lookup (a INT);

INSERT INTO t1 SELECT x FROM generate_Series(1,10000) x;
INSERT INTO t2 SELECT x FROM generate_Series(1,10000) x;
INSERT INTO lookup SELECT x%10+1 FROM generate_Series(1,1000000)x;

ANALYZE t1,t2,lookup;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) t
INNER JOIN lookup l ON l.a = t.a;
PG15 EXPLAIN output QUERY PLAN
——————————————————————————-
Nested Loop (actual rows=2000000 loops=1)
-> Seq Scan on lookup l (actual rows=1000000 loops=1)
-> Append (actual rows=2 loops=1000000)
-> Index Only Scan using t1_pkey on t1 (actual rows=1 loops=1000000)
Index Cond: (a = l.a)
Heap Fetches: 1000000
-> Index Only Scan using t2_pkey on t2 (actual rows=1 loops=1000000)
Index Cond: (a = l.a)
Heap Fetches: 1000000
Planning Time: 0.223 ms
Execution Time: 1926.151 ms
(11 rows)
PG16 EXPLAIN output QUERY PLAN
———————————————————————————
Nested Loop (actual rows=2000000 loops=1)
-> Seq Scan on lookup l (actual rows=1000000 loops=1)
-> Memoize (actual rows=2 loops=1000000)
Cache Key: l.a
Cache Mode: logical
Hits: 999990 Misses: 10 Evictions: 0 Overflows: 0 Memory Usage: 2kB
-> Append (actual rows=2 loops=10)
-> Index Only Scan using t1_pkey on t1 (actual rows=1 loops=10)
Index Cond: (a = l.a)
Heap Fetches: 10
-> Index Only Scan using t2_pkey on t2 (actual rows=1 loops=10)
Index Cond: (a = l.a)
Heap Fetches: 10
Planning Time: 0.229 ms
Execution Time: 282.120 ms
(15 rows)
In the PostgreSQL 16 EXPLAIN output above, you can see the Memoize node is put atop of the Append node—which caused a reduction in the number of loops in the Append from 1 million in PG15 down to 10 in PG16. Each time the Memoize node has a cache hit, there’s no need to execute the Append to fetch records. This results in the query running around 6 times faster on PostgreSQL 16.

4. Allow anti-joins to be performed with the non-nullable input as the inner relation (Richard Guo) When performing a Hash Join for an INNER JOIN, PostgreSQL prefers to build the hash table on the smaller of the two tables. Smaller hash tables are better as it’s less work to build them. Smaller tables are also better as they’re more cache-friendly for the CPU, and it’s less likely that the CPU will stall while waiting for data to arrive from main memory.

In PostgreSQL versions before 16, an Anti Join—as you might see if you use NOT EXISTS in your queries—would always put the table mentioned in the NOT EXISTS part on the inner side of the join. This meant there was no flexibility to hash the smaller of the two tables, resulting in possibly having to build a hash table on the larger table.

The PostgreSQL 16 query planner can now choose to hash the smaller of the two tables. This can now be done because PostgreSQL 16 supports Right Anti Join.

— Setup
CREATE TABLE small(a int);
CREATE TABLE large(a int);
INSERT INTO small
SELECT a FROM generate_series(1,100) a;
INSERT INTO large
SELECT a FROM generate_series(1,1000000) a;
VACUUM ANALYZE small,large;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM small s
WHERE NOT EXISTS(SELECT 1 FROM large l WHERE s.a = l.a);
PG15 EXPLAIN output QUERY PLAN
—————————————————————
Hash Anti Join (actual rows=0 loops=1)
Hash Cond: (s.a = l.a)
-> Seq Scan on small s (actual rows=100 loops=1)
-> Hash (actual rows=1000000 loops=1)
Buckets: 262144 Batches: 8 Memory Usage: 6446kB
-> Seq Scan on large l (actual rows=1000000 loops=1)
Planning Time: 0.103 ms
Execution Time: 139.023 ms
(8 rows)
PG16 EXPLAIN output QUERY PLAN
———————————————————–
Hash Right Anti Join (actual rows=0 loops=1)
Hash Cond: (l.a = s.a)
-> Seq Scan on large l (actual rows=1000000 loops=1)
-> Hash (actual rows=100 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on small s (actual rows=100 loops=1)
Planning Time: 0.094 ms
Execution Time: 77.076 ms
(8 rows)
You can see from the EXPLAIN ANALYZE output above that due to PG16’s planner opting to use a Hash Right Anti Join, the Memory Usage in PostgreSQL 16 was much less than in PostgreSQL 15 and the Execution Time was almost halved.

5. Allow parallelization of FULL and internal right OUTER hash joins (Melanie Plageman, Thomas Munro) PostgreSQL 11 saw the introduction of Parallel Hash Join. This allows multiple parallel workers in a parallel query to assist in the building of a single hash table. In versions prior to 11, each worker would have built its own identical hash table, resulting in additional memory overheads.

In PostgreSQL 16, Parallel Hash Join has been improved and now supports FULL and RIGHT join types. This allows queries that have a FULL OUTER JOIN to be executed in parallel and also allows Right Joins plans to execute in parallel.

— Setup
CREATE TABLE odd (a INT);
CREATE TABLE even (a INT);
INSERT INTO odd
SELECT a FROM generate_series(1,1000000,2) a;
INSERT INTO even
SELECT a FROM generate_series(2,1000000,2) a;
VACUUM ANALYZE odd, even;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT COUNT(o.a),COUNT(e.a) FROM odd o FULL JOIN even e ON o.a = e.a;
PG15 EXPLAIN output QUERY PLAN
——————————————————————-
Aggregate (actual rows=1 loops=1)
-> Hash Full Join (actual rows=1000000 loops=1)
Hash Cond: (o.a = e.a)
-> Seq Scan on odd o (actual rows=500000 loops=1)
-> Hash (actual rows=500000 loops=1)
Buckets: 262144 Batches: 4 Memory Usage: 6439kB
-> Seq Scan on even e (actual rows=500000 loops=1)
Planning Time: 0.079 ms
Execution Time: 220.677 ms
(9 rows)
PG16 EXPLAIN output QUERY PLAN
——————————————————————————–
Finalize Aggregate (actual rows=1 loops=1)
-> Gather (actual rows=2 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Partial Aggregate (actual rows=1 loops=2)
-> Parallel Hash Full Join (actual rows=500000 loops=2)
Hash Cond: (o.a = e.a)
-> Parallel Seq Scan on odd o (actual rows=250000 loops=2)
-> Parallel Hash (actual rows=250000 loops=2)
Buckets: 262144 Batches: 4 Memory Usage: 6976kB
-> Parallel Seq Scan on even e (actual rows=250000 loops=2)
Planning Time: 0.161 ms
Execution Time: 129.769 ms
(13 rows)
The EXPLAIN output shows that PostgreSQL 16 was able to perform the join in parallel and this resulted in a significant reduction in the query’s Execution Time.

6. Allow window functions to use faster ROWS mode when RANGE mode active but unnecessary (David Rowley) When a query contains a window function such as row_number(), rank(), dense_rank(), percent_rank(), cume_dist() and ntile(), if the window clause did not specify the ROWSoption, then PostgreSQL would always use the default RANGE option. The RANGE option causes the executor to look ahead until it finds the first “non-peer” row. A peer row is a row in the window frame which compares equally according to the window clause’s ORDER BY clause. When there is no ORDER BY clause, all rows within the window frame are peers. When processing records which have many rows which sort equally according to the window clause’s ORDER BY clause, the additional processing to identify these peer rows could be costly.

The window functions mentioned above don’t behave any differently whether ROWS or RANGE is specified in the query’s window clause. However, the executor in PostgreSQL versions prior to 16 didn’t know that, and because some window functions do care about the ROWS/RANGE option, the executor had to perform checks for peer rows in all cases.

The PostgreSQL 16 query planner knows which window functions care about the ROWS/RANGE option and it passes this information along to the executor so that it can skip the needless additional processing.

This optimization works particularly well when row_number() is being used to limit the number of results in the query as shown in the example below.

— Setup
CREATE TABLE scores (id INT PRIMARY KEY, score INT);
INSERT INTO scores SELECT s,random()*10 FROM generate_series(1,1000000)s;
CREATE INDEX ON scores(score);
VACUUM ANALYZE scores;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT * FROM (
SELECT id,ROW_NUMBER() OVER (ORDER BY score) rn,score
FROM scores
) m WHERE rn Index Only Scan using part_tab_p1_pkey on part_tab_p1 pt_2 (actual rows=0 loops=1)
Heap Fetches: 0
-> Sort (actual rows=0 loops=1)
Sort Key: nt.part_tab_id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on normal_table nt (actual rows=0 loops=1)
Planning Time: 0.325 ms
Execution Time: 0.037 ms
(14 rows)
PG16 EXPLAIN output QUERY PLAN
—————————————————–
Seq Scan on normal_table nt (actual rows=0 loops=1)
Planning Time: 0.244 ms
Execution Time: 0.015 ms
(3 rows)
The important thing to notice here is that the PostgreSQL 16 plan does not include a join to part_tab meaning all there is to do is scan normal_table.

9. Use Limit instead of Unique to implement DISTINCT, when possible (David Rowley) The PostgreSQL query planner is able to avoid including plan nodes to de-duplicate the results when it can detect that all rows contain the same value. Detecting this is trivial and when the optimization can be applied it can result in huge performance gains.

— Setup
CREATE TABLE abc (a int, b int, c int);
INSERT INTO abc SELECT a%10,a%10,a%10 FROM generate_series(1,1000000)a;
VACUUM ANALYZE abc;

EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF)
SELECT DISTINCT a,b,c FROM abc WHERE a = 5 AND b = 5 AND c = 5;
PG15 EXPLAIN output QUERY PLAN
————————————————————————
Unique (actual rows=1 loops=1)
-> Gather (actual rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Unique (actual rows=1 loops=3)
-> Parallel Seq Scan on abc (actual rows=33333 loops=3)
Filter: ((a = 5) AND (b = 5) AND (c = 5))
Rows Removed by Filter: 300000
Planning Time: 0.114 ms
Execution Time: 30.381 ms
(10 rows)
PG16 EXPLAIN output QUERY PLAN
—————————————————
Limit (actual rows=1 loops=1)
-> Seq Scan on abc (actual rows=1 loops=1)
Filter: ((a = 5) AND (b = 5) AND (c = 5))
Rows Removed by Filter: 4
Planning Time: 0.109 ms
Execution Time: 0.025 ms
(6 rows)
If you look carefully at the SQL query, you’ll notice that each column in the DISTINCT clause also has an equality condition in the WHERE clause. This means that all the output rows in the query will have the same values in each column. The PostgreSQL 16 query planner is able to take advantage of this knowledge and simply LIMIT the query results to 1 row. PostgreSQL 15 produced the same query result by reading the entire results and using the Unique operator to reduce all the rows down to a single row. The Execution Time for PostgreSQL 16 was more than 1200 times faster than PostgreSQL 15.

10. Relax overly strict rules in select_outer_pathkeys_for_merge() (David Rowley) Before PostgreSQL 16, when the query planner considered performing a Merge Join, it would check if the sort order of the merge would suit any upper-level plan operation (such as DISTINCT, GROUP BY or ORDER BY) and only use that order if it matched the requirements for the upper-level exactly. This ch
Read More