-
Notifications
You must be signed in to change notification settings - Fork 754
Description
sqlancer found the following error, where a query rewritten based on TLP (Ternary Logic Partitioning) returns different cardinalities. It looks like the problem is caused by the condition (num_nulls(t1.c0) IS DISTINCT FROM t1.c0) being interpreted differently when they are pushed down to workers.
We have the following distributed tables:
...
SELECT create_distributed_table('t1', 'c0');
SELECT create_distributed_table('t2', 'c1');
SELECT create_distributed_table('t4', 'c0');
SELECT create_distributed_table('t5', 'c0');
...I tried to simplify the original sqlancer query as a smaller one which can still reproduce the error:
Query 1:
SELECT t5.c0
FROM t1*, t0
LEFT OUTER JOIN t5* ON (True)
-- cardinality: 52Query 2
SELECT ALL t5.c0
FROM t1, t0
LEFT OUTER JOIN t5* ON (True)
WHERE (num_nulls(t1.c0) IS DISTINCT FROM t1.c0)
UNION ALL
-- PART 2: WHERE condition IS FALSE (NOT condition)
SELECT ALL t5.c0
FROM t1*, t0*
LEFT OUTER JOIN t5 ON (True)
WHERE NOT (num_nulls(t1.c0) IS DISTINCT FROM t1.c0)
UNION ALL
-- PART 3: WHERE condition IS NULL
SELECT ALL t5.c0
FROM t1*, t0
LEFT OUTER JOIN t5* ON (True)
WHERE (num_nulls(t1.c0) IS DISTINCT FROM t1.c0) IS NULL;
-- cardinality: 44
-- ========================================
-- TLP EXPLANATION:
-- The TLP technique transforms a query Q into Q₁ ∪ Q₂ ∪ Q₃ where:
-- Q₁: SELECT ... WHERE condition IS TRUE (condition evaluates to TRUE)
-- Q₂: SELECT ... WHERE condition IS FALSE (condition evaluates to FALSE)
-- Q₃: SELECT ... WHERE condition IS NULL (condition evaluates to NULL)
--
-- In SQL's three-valued logic, every condition can evaluate to TRUE, FALSE, or NULL.
-- The union of these three partitions should be equivalent to the original query.
-- This is a fundamental property used by SQLancer to detect query processing bugs.
--
-- BUG DETECTED: Original query returns 52 rows, TLP query returns 44 rows!
-- This might indicate a bug in Citus's distributed query processing.
-- ========================================
Please see the attached file database5.log for a full log of sqlancer, which can be used to reproduce the case.
Note: If we comment out the create_distribute_table commands, both queries return 52 rows. So, this works in vanilla postgres.
Postgres version 17.7, citus version:
test=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
citus | 14.0-1 | pg_catalog | Citus distributed database
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
update: the error also exists in citus 13.2-1 and 13.1.1