Articles tagged with postgis tag
Among all the sensitive spatial data being collected through cellphones and credit cards, our address of residency is probably the most delicate one. Can it be anonymized/pseudonymized/obscured before you share it with your business partners?
Imagine given a set of address points for each of your clients and the set of all address points in the country, you should adjust it in the following way:
- find the two nearest address points for each address point of your client
- find the center of these two and the client address point
- measure the distance of the computed center to each of three points and keep the maximum value
- make the biggest distance even bigger by adding 10 % of its value
- ceil the value
- output the new position and the ceiled distance
This shifts each address point by a dynamic distance, giving us at least three points within the given distance (one of them being the original address point).
SELECT
tmp.code,
ST_X(tmp.new_position) x,
ST_Y(tmp.new_position) y,
ceil(MAX(biggest_distance) + MAX(biggest_distance) * 0.1) round_distance
FROM (
SELECT
tmp.code,
tmp.geom,
ST_Centroid((ST_Union(two_closest_points, tmp.geom))) new_position,
-- get distance to two closest points and the client address point
ST_Centroid((ST_Union(two_closest_points, tmp.geom))) <-> (ST_DumpPoints(ST_Union(two_closest_points, tmp.geom))).geom biggest_distance
FROM (
SELECT
r1.code,
r1.geom,
ST_Union(neighbours.geom) two_closest_points
FROM address_points r1,
LATERAL (
-- keep two closest points to each client address point
SELECT
r2.code,
r2.geom,
r1.geom <-> r2.geom distance
FROM address_points r2
WHERE r1.code <> r2.code
ORDER BY r1.geom <-> r2.geom ASC
LIMIT 2
) neighbours
GROUP BY
r1.code,
r1.geom
) tmp
) tmp
GROUP BY
tmp.code,
tmp.geom,
tmp.new_position;
You might want to use LATERAL
for tasks like this.
PostGIS upgrades used to be a nightmare. Broken dependencies, version mismatches, you name it. Upgrading PostgreSQL 10 with PostGIS 2.4 to PostgreSQL 11 on CentOS has been my mission impossible for two days. And it doesn’t seem to come to an end.
What? Why?
We’re running fairly large spatially enabled PostgreSQL 10 database cluster. To keep up with pretty fast development, I was hoping to pg_upgrade
it to PostgreSQL 11.
Tried and failed
I’ve been trying different upgrade strategies with PostgreSQL 11 already running to no avail. Here comes the list.
Install PostGIS 2.4 to PostgreSQL 11 and pg_upgrade
yum install postgis24_11
systemctl stop postgresql-11
su postgres
/usr/pgsql-11/bin/pg_upgrade \
--check \
-b /usr/pgsql-10/bin/ -B /usr/pgsql-11/bin/ \
-d /var/lib/pgsql/10/data -D /var/lib/pgsql/11/data \
--link \
-U root \
-o ' -c config_file=/var/lib/pgsql/10/data/postgresql.conf' -O ' -c config_file=/var/lib/pgsql/11/data/postgresql.conf'
This results in:
Your installation references loadable libraries that are missing from the
new installation. You can add these libraries to the new installation,
or remove the functions using them from the old installation. A list of
problem libraries is in the file: loadable_libraries.txt
loadable_libraries.txt
says the following:
could not load library "$libdir/postgis-2.4": ERROR: could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init
Duckduckgoing I found the related PostgreSQL mailing list thread.
Build and install PostGIS 2.4 from source to PostgreSQL 11 and pg_upgrade
The bug report says there’s something wrong with proj4
version, so I chose proj49
and geos37
.
yum install proj49 proj49-devel
wget https://download.osgeo.org/postgis/source/postgis-2.4.6.tar.gz
tar -xzvf postgis-2.4.6.tar.gz
cd postgis-2.4.6
./configure \
--with-pgconfig=/usr/pgsql-11/bin/pg_config \
--with-geosconfig=/usr/geos37/bin/geos-config \
--with-projdir=/usr/proj49/
make && make install
CREATE EXTENSION postgis
fails with could not load library "/usr/pgsql-11/lib/postgis-2.4.so": /usr/pgsql-11/lib/postgis-2.4.so: undefined symbol: geod_polygon_init
. Oh my.
Install PostGIS 2.5 to PostgreSQL 10 and pg_upgrade
Running out of ideas, I tried to install PostGIS 2.5 to our PostgreSQL 10 cluster and pg_upgrade.
The resulting error appeared almost instantly:
Transaction check error:
file /usr/pgsql-10/bin/shp2pgsql-gui from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/liblwgeom.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/lib/postgis-2.4.so from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/address_standardizer_data_us.sql from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_sfcgal.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_tiger_geocoder.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
file /usr/pgsql-10/share/extension/postgis_topology.control from install of postgis25_10-2.5.1-1.rhel7.x86_64 conflicts with file from package postgis24_10-2.4.5-1.rhel7.x86_64
What the…
Build and install PostGIS 2.5 from source to PostgreSQL 10 and pg_upgrade
wget https://download.osgeo.org/postgis/source/postgis-2.5.1.tar.gz
tar -xzvf postgis-2.5.1.tar.gz
cd postgis-2.5.1
./configure \
--with-pgconfig=/usr/pgsql-10/bin/pg_config \
--with-geosconfig=/usr/geos37/bin/geos-config
make && make install
CREATE EXTENSION postgis
fails with ERROR: could not load library "/usr/pgsql-10/lib/postgis-2.5.so": /usr/pgsql-10/lib/postgis-2.5.so: undefined symbol: GEOSFrechetDistanceDensify
. Again? Really?
GEOSFrechetDistanceDensify
was added in GEOS 3.7 (linked in ./configure
), yet ldd /usr/pgsql-10/lib/postgis-2.5.so
says:
linux-vdso.so.1 => (0x00007ffd4c5fa000)
libgeos_c.so.1 => /usr/geos36/lib64/libgeos_c.so.1 (0x00007f68ddf5a000)
libproj.so.0 => /lib64/libproj.so.0 (0x00007f68ddd07000)
libjson-c.so.2 => /lib64/libjson-c.so.2 (0x00007f68ddafc000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x00007f68dd792000)
libm.so.6 => /lib64/libm.so.6 (0x00007f68dd48f000)
libSFCGAL.so.1 => /lib64/libSFCGAL.so.1 (0x00007f68dc9c0000)
libc.so.6 => /lib64/libc.so.6 (0x00007f68dc5f3000)
libgeos-3.6.3.so => /usr/geos36/lib64/libgeos-3.6.3.so (0x00007f68dc244000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f68dbf3d000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f68dbd27000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f68dbb22000)
libz.so.1 => /lib64/libz.so.1 (0x00007f68db90c000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f68db6e6000)
/lib64/ld-linux-x86-64.so.2 (0x000055960f119000)
libCGAL.so.11 => /usr/lib64/libCGAL.so.11 (0x00007f68db4bd000)
libCGAL_Core.so.11 => /usr/lib64/libCGAL_Core.so.11 (0x00007f68db284000)
libmpfr.so.4 => /usr/lib64/libmpfr.so.4 (0x00007f68db029000)
libgmp.so.10 => /usr/lib64/libgmp.so.10 (0x00007f68dadb0000)
libboost_date_time-mt.so.1.53.0 => /usr/lib64/libboost_date_time-mt.so.1.53.0 (0x00007f68dab9f000)
libboost_thread-mt.so.1.53.0 => /usr/lib64/libboost_thread-mt.so.1.53.0 (0x00007f68da988000)
libboost_system-mt.so.1.53.0 => /usr/lib64/libboost_system-mt.so.1.53.0 (0x00007f68da783000)
libboost_serialization-mt.so.1.53.0 => /usr/lib64/libboost_serialization-mt.so.1.53.0 (0x00007f68da517000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f68da2fa000)
librt.so.1 => /usr/lib64/librt.so.1 (0x00007f68da0f2000)
I’m nearly desperate after spending two days trying to break through. I have ~ 300 GB of PostgreSQL data to migrate to the current version and there seems to be no possible way to do it in CentOS.
One more thing to note: using yum install postgis25_11
and CREATE EXTENSION postgis
in v11 database fails with the exact same error like the one above. I really enjoy working with PostgreSQL and PostGIS, yet there’s hardly something I fear more than trying to upgrade those two things together.
Since version 2.4.0, PostGIS can serve MVT data directly. MVT returning queries put heavy workload on the database though. On top of that, each of the query has to be run again every time a client demands the data. This leaves us with plenty of room to optimize the process.
During the last week, while working on the Czech legislative election data visualization, I’ve struggled with the server becoming unresponsive far too often due to the issues mentioned above.
According to the schema, the first client to come to the server:
- goes through filesystem unstopped, because there are no cached files yet,
- continues to the Flask backend and asks for a file at
{z}/{x}/{y}
,
- Flask backend asks the database to return the MVT for the given tile,
- Flask backend writes the response to the filesystem and sends it to the client.
Other clients get tiles directly from the filesystem, leaving the database at ease.
Nginx
Nginx is fairly simple to set up, once you know what you’re doing. The /volby-2017/municipality/
location serves static MVT from the given alias directory. If not found, the request is passed to @postgis
location, that asks the Flask backend for the response.
server election {
location /volby-2017/municipality {
alias /opt/volby-cz-2017/server/cache/;
try_files $uri @postgis;
}
location @postgis {
include uwsgi_params;
uwsgi_pass 127.0.0.1:5000;
}
}
Flask backend
Generating static MVT in advance
If you’re going to serve static tiles that don’t change often, it might be a good idea to use PostGIS to create files in advance and serve them with Nginx.
CREATE TABLE tiles (
x integer,
y integer,
z integer,
west numeric,
south numeric,
east numeric,
north numeric,
geom geometry(POLYGON, 3857)
);
Using mercantile, you can create the tiles
table holding the bounding boxes of the tiles you need. PostGIS them inserts the actual MVT into the mvt
table.
CREATE TEMPORARY TABLE tmp_tiles AS
SELECT
muni.muni_id,
prc.data,
ST_AsMVTGeom(
muni.geom,
TileBBox(z, x , y, 3857),
4096,
0,
false
) geom,
x,
y,
z
FROM muni
JOIN (
SELECT
x,
y,
z,
geom
FROM tiles
) bbox ON (ST_Intersects(muni.geom, bbox.geom))
JOIN party_results_cur prc ON (muni.muni_id = prc.muni_id);
CREATE TABLE mvt (mvt bytea, x integer, y integer, z integer);
DO
$$
DECLARE r record;
BEGIN
FOR r in SELECT DISTINCT x, y, z FROM tmp_tiles LOOP
INSERT INTO mvt
SELECT ST_AsMVT(q, 'municipality', 4096, 'geom'), r.x, r.y, r.z
FROM (
SELECT
muni_id,
data,
geom
FROM tmp_tiles
WHERE (x, y, z) = (r)
) q;
RAISE INFO '%', r;
END LOOP;
END;
$$;
Once filled, the table rows can be written to the filesystem with the simple piece of Python code.
#!/usr/bin/env python
import logging
import os
import time
from sqlalchemy import create_engine, text
CACHE_PATH="cache/"
e = create_engine('postgresql:///')
conn = e.connect()
sql=text("SELECT mvt, x, y, z FROM mvt")
query = conn.execute(sql)
data = query.cursor.fetchall()
for d in data:
cachefile = "{}/{}/{}/{}".format(CACHE_PATH, d[3], d[1], d[2])
print(cachefile)
if not os.path.exists("{}/{}/{}".format(CACHE_PATH, d[3], d[1])):
os.makedirs("{}/{}/{}".format(CACHE_PATH, d[3], d[1]))
with open(cachefile, "wb") as f:
f.write(bytes(d[0]))
Conclusion
PostGIS is a brilliant tool for generating Mapbox vector tiles. Combined with Python powered static file generator and Nginx, it seems to become the only tool needed to get you going.
PostGIS 2.4.0 was released recently bringing the possibilities to generate Mapbox Vector Tiles without any third party tools. I got a shot at it with Node.js and docker. Even if it’s not as straightforward as solely using ST_AsMVT, it still looks pretty great.
Docker container
There are no Ubuntu or Debian based PostGIS 2.4.0 packages as far as I know. As installation from source (especially considering GIS software) is always a bit risky, I prefer using Docker to stay away from trouble. The image is based on Ubuntu 17.04, has PostgreSQL 9.6 and PostGIS 2.4.0 installed. It exposes port 5432 to the host, so you can access the database from the outside the container.
FROM ubuntu:17.04
RUN apt update
RUN apt install -y wget less systemd
RUN touch /etc/apt/sources.list.d/pgdg.list
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ zesty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN apt update
RUN apt -y install postgresql-9.6 postgresql-server-dev-9.6
USER postgres
RUN /usr/lib/postgresql/9.6/bin/pg_ctl -D /var/lib/postgresql/9.6/main -l /tmp/logfile start
USER root
RUN echo "host all all 0.0.0.0/0 trust" >> /etc/postgresql/9.6/main/pg_hba.conf && \
echo "listen_addresses='*'" >> /etc/postgresql/9.6/main/postgresql.conf
EXPOSE 5432
RUN apt install -y netcat build-essential libxml2 libxml2-dev libgeos-3.5.1 libgdal-dev gdal-bin libgdal20 libgeos-dev libprotobuf-c1 libprotobuf-c-dev libprotobuf-dev protobuf-compiler protobuf-c-compiler
RUN wget http://download.osgeo.org/postgis/source/postgis-2.4.0alpha.tar.gz
RUN tar -xvzf postgis-2.4.0alpha.tar.gz
RUN cd postgis-2.4.0alpha && ./configure && make && make install
USER postgres
RUN service postgresql start && psql -c "CREATE EXTENSION postgis"
USER root
COPY start.postgis.sh /start.postgis.sh
RUN chmod 0755 /start.postgis.sh
CMD ["/start.postgis.sh"]
start.postgis.sh
file starts the database server and keeps it running forever.
#!/bin/bash
DATADIR="/var/lib/postgresql/9.6/main"
CONF="/etc/postgresql/9.6/main/postgresql.conf"
POSTGRES="/usr/lib/postgresql/9.6/bin/postgres"
su postgres sh -c "$POSTGRES -D $DATADIR -c config_file=$CONF" &
until nc -z localhost 5432;
do
echo ...
sleep 5
done
sleep 5 # just for sure
su - postgres -c "psql -c \"CREATE EXTENSION IF NOT EXISTS postgis\""
echo database up and running
wait $!
Data
I got a cadastre area dataset of the Czech Republic for testing, which contains ~ 13,000 polygons. The geometries should come in Web Mercator a.k.a. EPSG:3857 to work with MVT.
Vector tiles
I got a bit confused by the docs of ST_AsMVT and ST_AsMVTGeom. Especially the latter one took me a few hours to get it right. What is essential (I guess) about Mapbox Vector Tiles is that you have to abstract from the real world coordinates and start thinking inside the tile coordinates. What PostGIS does with ST_AsMVTGeom
(and what any other MVT implemenation should do for you) is that it takes real world coordinates and put them inside a tile.
To make this work, you need to know every bounding box of every tile on every zoom level in a Web Mercator projection. Or you can use TileBBox procedure by Mapbox, if you wish.
The SQL query itself is pretty simple (this comes from an express route I’ll be discussing shortly).
SELECT ST_AsMVT('cadastre', 4096, 'geom', q)
FROM (
SELECT
code,
name,
ST_AsMVTGeom(
geom,
TileBBox(${req.params.z}, ${req.params.x}, ${req.params.y}, 3857),
4096,
0,
false
) geom
FROM cadastre_area
WHERE ST_Intersects(geom, (SELECT ST_Transform(ST_MakeEnvelope($1, $2, $3, $4, $5), 3857)))
) q
When filled with proper arguments instead of placeholders, it returns a bytea.
\x1aa5dbd0070a047465737412e216120400000101180322d7160987913f8db38e01aa59160e2a010412012a0624060e001410420a1a00203b0a3914190e15085912010a0f0c0f06370804080a0e0e0234090e0
This can be consumed by a Leaflet map using Leaflet.VectorGrid plugin. To keep it short, the frontend code actually boils down to three lines of code.
var url = 'http://localhost:3000/mvt/{x}/{y}/{z}';
var cadastre = L.vectorGrid.protobuf(url);
map.addLayer(cadastre);
The server MVP is available as a GitHub gist.
For a long time I’ve wanted to play with pgRouting and that time has finally come. Among many other routing functions there is one that caught my eye, called pgr_drivingdistance
. As the documentation says, it returns the driving distance from a start node using Dijkstra algorithm. The aforementioned distance doesn’t need to be defined in Euclidean space (the real distance between two points), it might be calculated in units of time, slopeness etc. How to get it going?
Data
OSM will do as it always does. There is a tool called osm2pgrouting
to help you load the data, the pure GDAL seems to be a better way to me though. Importing the downloaded data is trivial.
ogr2ogr -f "PostgreSQL" PG:"dbname=pgrouting active_schema=cze" \
-s_srs EPSG:4326 \
-t_srs EPSG:5514 \
roads.shp \
-nln roads \
-lco GEOMETRY_NAME=the_geom \
-lco FID=id \
-gt 65000 \
-nlt PROMOTE_TO_MULTI \
-clipsrc 16.538 49.147 16.699 49.240
To route the network, it has to be properly noded. Although pgRouting comes with built-in pgr_nodenetwork
, it didn’t seem to work very well. To node the network, use PostGIS ST_Node
. Note this doesn’t consider bridges and tunnels.
CREATE TABLE cze.roads_noded AS
SELECT
(ST_Dump(geom)).geom the_geom
FROM (
SELECT
ST_Node(geom) geom
FROM (
SELECT ST_Union(the_geom) geom
FROM cze.roads
) a
) b;
After noding the network, all the information about speed limits and oneways is lost. If needed, it can be brought back with following:
CREATE INDEX ON cze.roads_noded USING gist(the_geom);
ALTER TABLE cze.roads_noded ADD COLUMN id SERIAL PRIMARY KEY;
ALTER TABLE cze.roads_noded ADD COLUMN maxspeed integer;
UPDATE cze.roads_noded
SET maxspeed = a.maxspeed
FROM (
SELECT DISTINCT ON (rn.id)
rn.id,
r.maxspeed
FROM cze.roads_noded rn
JOIN cze.roads r ON (ST_Intersects(rn.the_geom, r.the_geom))
ORDER BY rn.id, ST_Length(ST_Intersection(rn.the_geom, r.the_geom)) DESC
) a
WHERE cze.roads_noded.id = a.id;
With everything set, the topology can be built.
ALTER TABLE cze.roads_noded ADD COLUMN source integer;
ALTER TABLE cze.roads_noded ADD COLUMN target integer;
SELECT pgr_createTopology('cze.roads_noded', 1);
This function creates the cze.roads_noded_vertices_pgr
that contains all the extracted nodes from the network.
As already mentioned, measures other than length can be used as a distance, I chose the time to get to a given node on foot.
ALTER TABLE cze.roads_noded ADD COLUMN cost_minutes integer;
UPDATE cze.roads_noded
SET cost_minutes = (ST_Length(the_geom) / 83.0)::integer; -- it takes average person one minute to walk 83 meters
UPDATE cze.roads_noded
SET cost_minutes = 1
WHERE cost_minutes = 0;
Routing
Now the interesting part. All the routing functions are built on what’s called inner queries that are expected to return a certain data structure with no geometry included. As I want to see the results in QGIS immediately, I had to use a simple anonymous PL/pgSQL block that writes polygonal catchment areas to a table (consider it a proof of concept, not the final solution).
DROP TABLE IF EXISTS cze.temp;
CREATE TABLE cze.temp AS
SELECT *
FROM cze.roads_noded_vertices_pgr ver
JOIN (
SELECT *
FROM pgr_drivingDistance(
'SELECT id, source, target, cost_minutes as cost, cost_minutes as reverse_cost FROM cze.roads_noded',
6686,
10,
true
)
)dist ON ver.id = dist.node;
DO $$
DECLARE
c integer;
BEGIN
DROP TABLE IF EXISTS tmp;
CREATE TABLE tmp (
agg_cost integer,
geom geometry(MULTIPOLYGON, 5514)
);
-- order by the biggest area so the polygons are not hidden beneath the bigger ones
FOR c IN SELECT agg_cost FROM cze.temp GROUP BY agg_cost HAVING COUNT(1) > 3 ORDER BY 1 DESC LOOP
RAISE INFO '%', c;
INSERT INTO tmp (agg_cost, geom)
SELECT
c,
ST_Multi(ST_SetSRID(pgr_pointsAsPolygon(
'SELECT
temp.id::integer,
ST_X(temp.the_geom)::float AS x,
ST_Y(temp.the_geom)::float AS y
FROM cze.temp
WHERE agg_cost = ' || c
), 5514));
END LOOP;
END$$;
Using pgr_pointsAsPolygon
renders resulting nodes accessible in 10-minute walk in polygons, but weird looking ones. Not bad, could be better though.
How about seeing only nodes instead of polygons?
SELECT
agg_cost,
ST_PointN(geom, i)
FROM (
SELECT
agg_cost,
ST_ExteriorRing((ST_Dump(geom)).geom) geom,
generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
FROM tmp
) a;
Looks good, could be better though.
How about creating concave hulls from the extracted nodes?
SELECT
agg_cost,
ST_ConcaveHull(ST_Union(geom)) geom
FROM (
SELECT
agg_cost,
ST_PointN(geom, i) geom
FROM (
SELECT
agg_cost,
ST_ExteriorRing((ST_Dump(geom)).geom) geom,
generate_series(0,ST_NumPoints(ST_ExteriorRing((ST_Dump(geom)).geom))) i
FROM tmp
) a
) b
GROUP BY agg_cost
ORDER BY agg_cost DESC;
This one looks the best I guess.
Remarks
- The documentation doesn’t help much.
- I’d expect existing functions to return different data structures to be easy-to-use, actually.
LATERAL
might be really handy with those inner queries, have to give it a shot in the future.
- Pedestrians usually don’t follow the road network.
- Bridges and tunnels might be an issue.