REINDEX
ingby Christoph Schiessl on PostgreSQL
Last week, I published an article explaining how to alphabetically order by columns using a custom ENUM
type. For illustration, I defined the type marital_status
and the table people
.
postgres=# CREATE TYPE marital_status_type
postgres-# AS ENUM ('single', 'married', 'divorced', 'other');
CREATE TYPE
postgres=# CREATE TABLE people (
postgres(# id SERIAL PRIMARY KEY,
postgres(# marital_status marital_status_type NOT NULL
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX index_people_on_marital_status
postgres-# ON people (marital_status);
CREATE INDEX
Furthermore, I also defined an IMMUTABLE
function to typecast the ENUM
values to TEXT
so that I could create an index for the alphabetical ordering.
postgres=# CREATE FUNCTION marital_status_as_text(marital_status marital_status_type)
postgres-# RETURNS TEXT LANGUAGE SQL IMMUTABLE AS $sql$ SELECT CAST(marital_status AS TEXT); $sql$;
CREATE FUNCTION
postgres=# CREATE INDEX index_people_on_marital_status_text
postgres-# ON people (marital_status_as_text(marital_status));
CREATE INDEX
This function was needed because the typecast itself is mutable! You can think about this typecast as a function whose output depends on its single parameter and external factors. Another way to say this is that the typecast to TEXT
is not referentially transparent precisely because of its dependency on factors other than its parameter(s). You may get different results if you call the function multiple times, even though you use the same parameters. In PostgreSQL terminology, referentially transparent functions are called IMMUTABLE
; certain operations work only with such functions. The creation of INDEX
es is a prime example of this restriction.
postgres=# INSERT INTO people (marital_status)
postgres-# VALUES ('single'), ('married'), ('divorced'), ('other');
INSERT 0 4
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# -- Query using the index `index_people_on_marital_status` ...
postgres=# SELECT * FROM people WHERE marital_status = 'other'::marital_status_type;
id | marital_status
----+----------------
4 | other
(1 row)
postgres=# -- Query using the index `index_people_on_marital_status_text` ...
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
id | marital_status
----+----------------
4 | other
(1 row)
So, we told PostgreSQL the function marital_status_as_text
is IMMUTABLE
, which was needed so that the function could be used to define an INDEX
. However, given that the function is just a wrapper around the typecast, which is not referentially transparent, it cannot be IMMUTABLE
. So, essentially, we have lied to PostgreSQL, but I have not yet explained the external factors that influence the typecast. Well, the typecast returns the name of the ENUM
value as it was defined when we created the custom type. This is a problem because these values can be renamed later on. For instance, we may rename the value other
to unknown
.
postgres=# ALTER TYPE marital_status_type
postgres-# RENAME VALUE 'other' TO 'unknown';
ALTER TYPE
The problem arises because we already had data in the people
table before the rename. Now, we have a funny situation — note that I'm setting enable_seqscan=off
to force the Query Planner to use our indexes:
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status = 'unknown'::marital_status_type;
id | marital_status
----+----------------
4 | unknown
(1 row)
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'unknown'::TEXT;
id | marital_status
----+----------------
(0 rows)
So, we get different results for essentially the same query. If the query uses the ENUM
value directly, we get a single row as a result, but if we search based on the corresponding TEXT
value, we get an empty result set. This is clearly not the expected behavior! But it gets even worse because depending on the planned query (with or without index usage), we can also get different results for one-to-one the same query.
postgres=# -- turn off sequential scans to force index scans ...
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SET SESSION enable_indexscan = DEFAULT;
SET
postgres=# SET SESSION enable_bitmapscan = DEFAULT;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
id | marital_status
----+----------------
4 | unknown
(1 row)
postgres=# -- turn off index scans to force sequential scans ...
postgres=# SET SESSION enable_seqscan = DEFAULT;
SET
postgres=# SET SESSION enable_indexscan = off;
SET
postgres=# SET SESSION enable_bitmapscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::TEXT;
id | marital_status
----+----------------
(0 rows)
Crazy stuff, right? The data tracked by the index is out of sync with the data in the table — the index is now said to be corrupted. This is 100% our fault because we incorrectly declared our function as IMMUTABLE
, and PostgreSQL took us at our word. So, there is no way it could have known that renaming an ENUM
value may affect the index. Anyway, you should never do something like this in the first place, but what should you do if you nonetheless find yourself in a situation like this? Well, there's only one thing that you can do, really, ...
REINDEX
ingYour only option is to rebuild the corrupted indexes. In this case, we have only one corrupted index, and we can start a rebuild with a simple REINDEX
statement.
postgres=# REINDEX INDEX index_people_on_marital_status_text;
REINDEX
We can rerun the same experiment as before to confirm that the index is back in sync with the table's data.
postgres=# SET SESSION enable_seqscan = off;
SET
postgres=# SET SESSION enable_indexscan = DEFAULT;
SET
postgres=# SET SESSION enable_bitmapscan = DEFAULT;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::text;
id | marital_status
----+----------------
(0 rows)
postgres=# SET SESSION enable_seqscan = DEFAULT;
SET
postgres=# SET SESSION enable_indexscan = off;
SET
postgres=# SET SESSION enable_bitmapscan = off;
SET
postgres=# SELECT * FROM people WHERE marital_status_as_text(marital_status) = 'other'::text;
id | marital_status
----+----------------
(0 rows)
VoilĂ ! The problem is fixed, but not its root cause. If we now renamed another ENUM
value, we would run into the same problem again. The lesson here is that you must be careful when defining custom functions. Namely, you can run into issues if you set your functions' volatility class to be stricter than they really are.
That's everything for today! Thank you for reading, and see you soon!
I send two weekly emails on building performant and resilient Web Applications with Python, JavaScript and PostgreSQL. No spam. Unscubscribe at any time.
ENUM
Type Columns and ORDER BY
Learn how custom ENUM
types in PostgreSQL can affect SELECT
queries and ORDER BY
clauses. Be mindful of indexing and typecasting.
By Christoph Schiessl on PostgreSQL
Learn how to manipulate PostgreSQL's query planner to force it to use your indexes while working on optimizing the performance of your queries.
By Christoph Schiessl on PostgreSQL
MATERIALIZED VIEW
s
Learn how to use PostgreSQL's MATERIALIZED VIEW
s to improve performance of complex queries. Persist query results and refresh them manually or automatically.
By Christoph Schiessl on PostgreSQL