Yesterday I spent two very unpleasant hours debugging the weirdest SQL error I’ve seen in my life, running the below query (simplified for this post).
psql -qAt --no-psqlrc <<BACKUP
DO
$$
DECLARE r record;
BEGIN
RAISE INFO '%', 'info';
END
$$;
BACKUP
Running this in your terminal will result in a nasty syntax error.
ERROR: syntax error at or near "1111"
LINE 2: 1111
^
ERROR: syntax error at or near "RAISE"
LINE 2: RAISE INFO '%', 'info';
^
ERROR: syntax error at or near "1111"
LINE 2: 1111;
You stare on the screen for a while, absolutely sure that number 1111
is nowhere close to the data you work with. You try again. Another error. You save the code into a file and try again. It works. What the heck? You try again using the bash heredoc. Another failure.
The minute you realize $$
is being substituted with the ID of the current process, you feel like the dumbest person on Earth. Yet the happiest one at the same time.
The solution is trivial.
psql -qAt --no-psqlrc <<BACKUP
DO
\$\$
DECLARE r record;
BEGIN
RAISE INFO '%', 'info';
END
\$\$;
BACKUP
Thanks to pg_upgrade
tool the PostgreSQL upgrade on Ubuntu is pretty straightforward. Different PostGIS versions might cause troubles though. This post covers PostgreSQL 9.5, PostGIS 2.2 to PostgreSQL 9.6, PostGIS 2.3 migration.
First of all, install the PostgreSQL 9.6 with PostGIS 2.3.
apt install postgresql-9.6 postgresql-9.6-postgis-2.3
Mind that newly installed database cluster runs on port 5433
.
If you run pg_upgrade
at this stage, it will fail with the following error.
could not load library "$libdir/postgis_topology-2.2":
ERROR: could not access file "$libdir/postgis_topology-2.2": No such file or directory
pg_upgrade
can’t run the upgrade because PostGIS versions don’t match. Install the PostGIS 2.3 for PostgreSQL 9.5 and update extensions in all your databases.
apt install postgresql-9.5-postgis-2.3
:::sql
ALTER EXTENSION postgis UPDATE;
With both clusters using the same PostGIS version, the upgrade can begin. First, stop them with
Then, run the actual pg_upgrade
command as postgres
user. Make sure the pg_hba.conf
file is set to allow local connections.
/usr/lib/postgresql/9.6/bin/pg_upgrade \
-b /usr/lib/postgresql/9.5/bin/ \
-B /usr/lib/postgresql/9.6/bin/ \
-d /var/lib/postgresql/9.5/main \
-D /var/lib/postgresql/9.6/main \
-o ' -c config_file=/etc/postgresql/9.5/main/postgresql.conf' \
-O ' -c config_file=/etc/postgresql/9.6/main/postgresql.conf'
The following result means the upgrade was smooth.
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
Checking database user is the install user ok
Checking database connection settings ok
Checking for prepared transactions ok
Checking for reg* system OID user data types ok
Checking for contrib/isn with bigint-passing mismatch ok
Checking for roles starting with 'pg_' ok
Creating dump of global objects ok
Creating dump of database schemas
ok
Checking for presence of required libraries ok
Checking database user is the install user ok
Checking for prepared transactions ok
If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.
Performing Upgrade
------------------
Analyzing all rows in the new cluster ok
Freezing all rows on the new cluster ok
Deleting files from new pg_clog ok
Copying old pg_clog to new server ok
Setting next transaction ID and epoch for new cluster ok
Deleting files from new pg_multixact/offsets ok
Copying old pg_multixact/offsets to new server ok
Deleting files from new pg_multixact/members ok
Copying old pg_multixact/members to new server ok
Setting next multixact ID and offset for new cluster ok
Resetting WAL archives ok
Setting frozenxid and minmxid counters in new cluster ok
Restoring global objects in the new cluster ok
Restoring database schemas in the new cluster
ok
Copying user relation files
ok
Setting next OID for new cluster ok
Sync data directory to disk ok
Creating script to analyze new cluster ok
Creating script to delete old cluster ok
Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
./analyze_new_cluster.sh
Running this script will delete the old cluster's data files:
./delete_old_cluster.sh
The old cluster can be removed and the new one switched back to port 5432
. Run /usr/lib/postgresql/9.6/bin/vacuumdb -p 5433 --all --analyze-in-stages
to collect statistics.
Spatial indexes are absolutely crucial part of any spatial database and - as I tend to say quite often - only a fool would try to query spatial data without building spatial indexes beforehand.
Spatial indexes are based on bounding box comparisons, which are generally very fast. Yet, there are situations when spatial indexes don’t help much (or they don’t help as much as they could, if you wish).
Bounding box comparisons are effective with lots of small bounding boxes rather then few large ones. Why? See the picture above. The curved line (imagine it’s a pipeline for example) clearly demonstrates when the spatial index/bounding box comparison might fall short of what you’d expect.
Once the bounding box gets really big, it intersects so many other geometries’ bounding boxes that the whole comparison starts to slow down.
Luckily, PostGIS 2.2 introduced a ST_Subdivide function that can lend a helping hand in here.
Until today, we delivered the parcel geometries into our real estate acquisition process system with the following query, that takes all the geometries from the req_geom
table (pipelines, remember?) and intersects them with cadastral parcels. The second part of the query adds those parcels that haven’t been digitalized and were created manually by one of my workmates.
INSERT INTO requested_parcels (uid, par_id)
SELECT
reqs.uid,
b.id par_id
FROM
running_requests reqs
JOIN
req_geom a ON (reqs.uid = a.uid)
JOIN
pargeo b ON (ST_Intersects(a.geom, b.geom))
UNION
SELECT
reqs.uid,
a.idpar::numeric
FROM
running_requests reqs
JOIN
req_man a ON (reqs.uid = a.uid);
It’s a perfectly standard query that intersects several request geometries with ~20M parcels, nothing really fancy. Except that it takes 25 minutes to finish. Why? Pipelines, remember?
Yet, the query below takes only 30 seconds to finish (that’s a huge time saver considering that the whole process used to take ~40 minutes)! Why? Because the ST_Subdivide
effectively shrinks the req_geom
geometries until they have 50 vertices each at most. Such small geometries are perfect input for the bounding box comparison. Remember to call DISTINCT
when using ST_Subdivide
, you’d probably get duplicate parcel ids otherwise.
I also replaced the UNION
with the WHERE NOT EXISTS
expression, as it’s reasonable to assume that numeric ids comparison will be faster.
INSERT INTO requested_parcels (uid, par_id)
SELECT DISTINCT
reqs.uid,
b.id par_id
FROM
running_requests reqs
JOIN
(
SELECT
uid,
ST_Subdivide(geom, 50) geom
FROM
req_geom
) a ON (reqs.uid = a.uid)
JOIN
pargeo b ON (ST_Intersects(a.geom, b.geom));
INSERT INTO requested_parcels (uid, par_id)
SELECT
reqs.uid,
a.idpar::numeric
FROM
running_requests reqs
JOIN
req_man a ON (reqs.uid = a.uid)
WHERE NOT EXISTS (
SELECT 1
FROM pozadovane_parcely pp
WHERE pp.par_id = a.idpar
);
I got to count occurrences of / character today and found out no built-in function exists in PostgreSQL, so here’s my shot at it. Pretty simple, yet useful.
CREATE OR REPLACE FUNCTION how_many(IN text, IN varchar, OUT integer)
RETURNS integer
AS
$how_many$
SELECT length($1) - length(replace($1, $2, ''));
$how_many$
LANGUAGE SQL
SECURITY DEFINER;
-- SELECT how_many('test', 't'); -- returns number 2