I spend a lot of time reading PostgreSQL docs. It occurred to me just a few weeks ago that those versioned manuals are great opportunity to get an insight into PostgreSQL development history. Using PostgreSQL, of course.
TOP 5 functions with the most verbose docs in each version
SELECT
version,
string_agg(func, ' | ' ORDER BY letter_count DESC)
FROM (
SELECT
version,
func,
letter_count,
row_number() OVER (PARTITION BY version ORDER BY letter_count DESC)
FROM postgresql_development.data
) a
WHERE row_number <= 10
GROUP BY version
ORDER BY version DESC
Seems like a huge comeback for CREATE TABLE
.
VERSION |
1st |
2nd |
3rd |
4th |
5th |
10.0 |
CREATE TABLE |
ALTER TABLE |
REVOKE |
GRANT |
SELECT |
9.6 |
REVOKE |
ALTER TABLE |
GRANT |
CREATE TABLE |
SELECT |
9.5 |
REVOKE |
ALTER TABLE |
GRANT |
CREATE TABLE |
SELECT |
9.4 |
REVOKE |
GRANT |
ALTER TABLE |
CREATE TABLE |
SELECT |
9.3 |
REVOKE |
GRANT |
CREATE TABLE |
ALTER TABLE |
ALTER DEFAULT PRIVILEGES |
9.2 |
REVOKE |
GRANT |
CREATE TABLE |
ALTER TABLE |
ALTER DEFAULT PRIVILEGES |
9.1 |
REVOKE |
GRANT |
CREATE TABLE |
ALTER TABLE |
ALTER DEFAULT PRIVILEGES |
9.0 |
REVOKE |
GRANT |
CREATE TABLE |
ALTER TABLE |
ALTER DEFAULT PRIVILEGES |
8.4 |
REVOKE |
GRANT |
CREATE TABLE |
ALTER TABLE |
SELECT |
8.3 |
REVOKE |
CREATE TABLE |
GRANT |
ALTER TABLE |
COMMENT |
8.2 |
REVOKE |
CREATE TABLE |
GRANT |
ALTER TABLE |
SELECT |
8.1 |
REVOKE |
CREATE TABLE |
GRANT |
ALTER TABLE |
SELECT |
8 |
CREATE TABLE |
REVOKE |
GRANT |
SELECT |
ALTER TABLE |
7.4 |
CREATE TABLE |
REVOKE |
ALTER TABLE |
GRANT |
SELECT |
7.3 |
CREATE TABLE |
SELECT |
ALTER TABLE |
REVOKE |
GRANT |
7.2 |
CREATE TABLE |
SELECT INTO |
SELECT |
ALTER TABLE |
CREATE TYPE |
7.1 |
CREATE TABLE |
SELECT INTO |
SELECT |
CREATE TYPE |
ALTER TABLE |
7.0 |
SELECT |
SELECT INTO |
CREATE TYPE |
CREATE TABLE |
COMMENT |
Number of functions available in each version
SELECT
version,
count(func),
sum(letter_count)
FROM postgresql_development.data
GROUP BY version ORDER BY version;
The most verbose docs in each version
SELECT DISTINCT ON (version)
version,
func,
letter_count
FROM postgresql_development.data
ORDER BY version, letter_count DESC;
Poor REVOKE
, the defeated champion.
VERSION |
FUNCTION |
LETTER COUNT |
10 |
CREATE TABLE |
3142 |
9.6 |
REVOKE |
2856 |
9.5 |
REVOKE |
2856 |
9.4 |
REVOKE |
2856 |
9.3 |
REVOKE |
2856 |
9.2 |
REVOKE |
2856 |
9.1 |
REVOKE |
2508 |
9 |
REVOKE |
2502 |
8.4 |
REVOKE |
2105 |
8.3 |
REVOKE |
1485 |
8.2 |
REVOKE |
1527 |
8.1 |
REVOKE |
1312 |
8 |
CREATE TABLE |
1251 |
7.4 |
CREATE TABLE |
1075 |
7.3 |
CREATE TABLE |
929 |
7.2 |
CREATE TABLE |
929 |
7.1 |
CREATE TABLE |
871 |
7 |
SELECT |
450 |
CREATE TABLE docs evolution
SELECT
version,
letter_count
FROM postgresql_development.data
WHERE func = 'CREATE TABLE'
ORDER BY func, version;
Something’s going on in an upcoming 10.0 version.
All the data was obtained with the following Python script and processed inside the PostgreSQL database. Plots done with Bokeh, though I probably wouldn’t use it again, the docs site is absurdly sluggish and the info is just all over the place.
PostGIS 2.4.0 was released recently bringing the possibilities to generate Mapbox Vector Tiles without any third party tools. I got a shot at it with Node.js and docker. Even if it’s not as straightforward as solely using ST_AsMVT, it still looks pretty great.
Docker container
There are no Ubuntu or Debian based PostGIS 2.4.0 packages as far as I know. As installation from source (especially considering GIS software) is always a bit risky, I prefer using Docker to stay away from trouble. The image is based on Ubuntu 17.04, has PostgreSQL 9.6 and PostGIS 2.4.0 installed. It exposes port 5432 to the host, so you can access the database from the outside the container.
FROM ubuntu:17.04
RUN apt update
RUN apt install -y wget less systemd
RUN touch /etc/apt/sources.list.d/pgdg.list
RUN echo "deb http://apt.postgresql.org/pub/repos/apt/ zesty-pgdg main" > /etc/apt/sources.list.d/pgdg.list
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN apt update
RUN apt -y install postgresql-9.6 postgresql-server-dev-9.6
USER postgres
RUN /usr/lib/postgresql/9.6/bin/pg_ctl -D /var/lib/postgresql/9.6/main -l /tmp/logfile start
USER root
RUN echo "host all all 0.0.0.0/0 trust" >> /etc/postgresql/9.6/main/pg_hba.conf && \
echo "listen_addresses='*'" >> /etc/postgresql/9.6/main/postgresql.conf
EXPOSE 5432
RUN apt install -y netcat build-essential libxml2 libxml2-dev libgeos-3.5.1 libgdal-dev gdal-bin libgdal20 libgeos-dev libprotobuf-c1 libprotobuf-c-dev libprotobuf-dev protobuf-compiler protobuf-c-compiler
RUN wget http://download.osgeo.org/postgis/source/postgis-2.4.0alpha.tar.gz
RUN tar -xvzf postgis-2.4.0alpha.tar.gz
RUN cd postgis-2.4.0alpha && ./configure && make && make install
USER postgres
RUN service postgresql start && psql -c "CREATE EXTENSION postgis"
USER root
COPY start.postgis.sh /start.postgis.sh
RUN chmod 0755 /start.postgis.sh
CMD ["/start.postgis.sh"]
start.postgis.sh
file starts the database server and keeps it running forever.
#!/bin/bash
DATADIR="/var/lib/postgresql/9.6/main"
CONF="/etc/postgresql/9.6/main/postgresql.conf"
POSTGRES="/usr/lib/postgresql/9.6/bin/postgres"
su postgres sh -c "$POSTGRES -D $DATADIR -c config_file=$CONF" &
until nc -z localhost 5432;
do
echo ...
sleep 5
done
sleep 5 # just for sure
su - postgres -c "psql -c \"CREATE EXTENSION IF NOT EXISTS postgis\""
echo database up and running
wait $!
Data
I got a cadastre area dataset of the Czech Republic for testing, which contains ~ 13,000 polygons. The geometries should come in Web Mercator a.k.a. EPSG:3857 to work with MVT.
Vector tiles
I got a bit confused by the docs of ST_AsMVT and ST_AsMVTGeom. Especially the latter one took me a few hours to get it right. What is essential (I guess) about Mapbox Vector Tiles is that you have to abstract from the real world coordinates and start thinking inside the tile coordinates. What PostGIS does with ST_AsMVTGeom
(and what any other MVT implemenation should do for you) is that it takes real world coordinates and put them inside a tile.
To make this work, you need to know every bounding box of every tile on every zoom level in a Web Mercator projection. Or you can use TileBBox procedure by Mapbox, if you wish.
The SQL query itself is pretty simple (this comes from an express route I’ll be discussing shortly).
SELECT ST_AsMVT('cadastre', 4096, 'geom', q)
FROM (
SELECT
code,
name,
ST_AsMVTGeom(
geom,
TileBBox(${req.params.z}, ${req.params.x}, ${req.params.y}, 3857),
4096,
0,
false
) geom
FROM cadastre_area
WHERE ST_Intersects(geom, (SELECT ST_Transform(ST_MakeEnvelope($1, $2, $3, $4, $5), 3857)))
) q
When filled with proper arguments instead of placeholders, it returns a bytea.
\x1aa5dbd0070a047465737412e216120400000101180322d7160987913f8db38e01aa59160e2a010412012a0624060e001410420a1a00203b0a3914190e15085912010a0f0c0f06370804080a0e0e0234090e0
This can be consumed by a Leaflet map using Leaflet.VectorGrid plugin. To keep it short, the frontend code actually boils down to three lines of code.
var url = 'http://localhost:3000/mvt/{x}/{y}/{z}';
var cadastre = L.vectorGrid.protobuf(url);
map.addLayer(cadastre);
The server MVP is available as a GitHub gist.
As I’m still running Ubuntu 16.04 based Linux Mint, I have no access to GDAL 2.x repositories (except for ubuntugis, that I really don’t like to use). Provided with a GeoPackage raster file recently, I had to find a way to load it into QGIS, somehow. The solution is simple: Docker with gdal_translate.
Preparing the Docker container
I like using Docker for experiments that might leave the OS in an unexpected state (which is exactly what happens to me with ubuntugis repository whenever I use it. That’s why I don’t anymore.). A very simple Dockerfile keeps the troubles away from you.
FROM ubuntu:17.04
RUN apt update
RUN apt install -y gdal-bin
cd
into the folder and build the image with docker build -t gdal .
. Once ready, summon the daemon, run the container, mount the GeoPackage file to the container directory and you’re ready to rock.
docker run -v /path/to/geopackage:/home/ -it gdal
Raster GeoPackage to GeoTiff translation
With the container running, the raster GeoPackage to GeoTiff translation can be done easily with gdal_translate
. Note I chose to cut the source file into tiles, because the gdal_translate was choking about the resulting size.
#!/bin/bash
SIZE=10000
ULX=-630000
ULY=-1135450
LRX=-560000
LRY=-1172479
COUNTER_X=0
COUNTER_Y=0
while [[ $ULX -lt $LRX ]]
do
while [[ $ULY -gt $LRY ]]
do
echo $ULX, $(($ULX+$SIZE)), $ULY, $(($ULY-$SIZE))
gdal_translate \
-co TILED=YES \
-co COMPRESS=DEFLATE \
-co TFW=YES \
-co NUM_THREADS=ALL_CPUS \
-a_nodata 0 \
-of GTiff \
-projwin $ULX, $ULY, $(($ULX+$SIZE)), $(($ULY-$SIZE)) \
-projwin_srs EPSG:5514 \
data/detected.gpkg data/detected_${COUNTER_X}_${COUNTER_Y}.tiff
ULY=$(($ULY-$SIZE))
COUNTER_Y=$((COUNTER_Y+1))
done
ULX=$(($ULX+$SIZE))
ULY=-1135450
COUNTER_X=$((COUNTER_X+1))
done
Final Touch: Raster to Vector
After the GeoTiff is written to hard drive, inotifywait can be used to generate overviews. And with ease of calling gdal_polygonize.py
on each of GeoTiffs…vector layer, at you service.
Recently I’ve bought a book called Maps by Aleksandra Mizielinska and Daniel Mizielinski to my nephew. The book’s absolutely wonderful and made me want to try crafting a map with similar looks. I don’t do maps much at CleverMaps, so this was a great opportunity to find out what new features became available during the last months of QGIS development.
Result
A map of North America in scale of 1:22,000,000 featuring the biggest lakes, rivers, mountain ranges and basic administrative units for the North American countries. I aimed for visually appealing overview map rather than perfectly correct topographic one.
Data
I used my beloved Natural Earth dataset for both cultural (boundaries, cities) and physical (rivers, lakes) map features. Different scales came to play for different map layers as they seemed a bit too/few simplified for the given scale.
Fonts
I usually use built-in system fonts (Ubuntu Condensed or such), but this kind of map needed a more handwritten looking, sort of childish font. After searching dafont.com I chose PreCursive by RaseOne Full Time Artists and KG Primary Penmanship by Kimberly Geswein.
Symbols
The mountain point symbol was one of the two custom symbols used on the map. It comes from BSGStudio. The ocean wave symbol was made by myself.
QGIS effects
I’ve used several techniques I find interesting enough to be listed here.
Coastlines
For a long time I’ve considered coastlines a field for cartographic invention. They can be emphasized by shading or 3D effects. I chose the set of four parallel coastlines subtly disappearing into the sea, hopefully invoking the feeling of waves coming to the shore.
It’s done by dissolving all the features and buffering them again and again.
Buffered labels
Buffered labels are usually hard to get right, because they fill so much space if the buffer color’s not corresponding to its surroundings. But choosing the proper color can be a real struggle at times.
On this map, almost all the labels are buffered with the color of its surroundings, which makes them more legible, yet not too expressive. This is possible thanks to QGIS expression based properties that let you define unique styling to different map features.
Where it isn’t possible (e.g. Bahamas or Honduras) to choose just one buffer color, the label is not buffered at all (or the semi-transparent white buffer is used).
Note the Rocky Mountains label is split on the borders of the U.S.A. and Canada and its both parts match the background color.
Tapered rivers
Rivers are tapered based on the Natural Earth’s width attribute value for each river segment.
Labels in separate layers
I’m used to put labels into separate layers in more complicated map compositions, especially when you need to draw label along path for areal features (such as countries or states).
It becomes a bit harder to keep the features in sync with the labels though. I’d like to use only one layer for all the map layers in the future, as I feel that’s the way to go for the best labeling.
Labels wrapped on character
Some labels just can’t fit the feature they belong to and QGIS lets you deal with this by wrapping labels on a special character, \
in my case.
Layer blending mode
The mechanics behind layer blending modes are still a mystery to me, but they can add that little extra to a map very easily. Thanks to the Overlay blending mode, the Rocky Mountains may remain very subtle on different kinds of background.
Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!
Using PostgreSQL cstore_fdw
I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.
sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install
# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'
This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.
cat <<END | psql -qAt --no-psqlrc
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
id integer,
mac text,
ssid text,
signal_strength numeric,
security integer,
lat numeric,
lon numeric,
alt numeric,
unixtime bigint,
filename text
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END
The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:
- neither
UPDATE
nor DELETE
can be used
- no
CREATE INDEX
- no
SERIAL
To overcome these shortcomings I used COPY
statement to spit out the slightly modified table and immediately loaded it back in.
cat <<END | psql -qAt --no-psqlrc
COPY (
SELECT
row_number() OVER (),
mac,
ssid,
signal_strength,
security,
split_part(filename, '_', 2)::integer,
to_timestamp(unixtime),
ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
FROM data_cstore.wifi
WHERE lon BETWEEN 0 AND 20
AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'
DROP SCHEMA IF EXISTS data_cstore CASCADE;
DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
id integer,
mac text,
ssid text,
signal_strength numeric,
security integer,
userid integer,
unixtime timestamp without time zone,
geom geometry(POINTZ, 32633)
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END
Putting the networks on the map
As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).
CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
SELECT
row_number() OVER () id,
mac,
ST_SetSRID(ST_MakePoint(x, y), 32633) geom
FROM (
SELECT
mac,
AVG(ST_X(geom)) x,
AVG(ST_Y(geom)) y
FROM data_cstore.wifi_loc
GROUP BY 1
) a;