Needed to create a polygon from a point defining its size in both axes, here’s a little syntax sugar to make life easier.
CREATE OR REPLACE FUNCTION ST_PolygonFromCentroid(centroid geometry, xsize numeric, ysize numeric)
RETURNS geometry
AS $ST_PolygonFromCentroid$
SELECT ST_MakeEnvelope(
ST_X(ST_Translate($1, -$2, -$3)),
ST_Y(ST_Translate($1, -$2, -$3)),
ST_X(ST_Translate($1, $2, $3)),
ST_Y(ST_Translate($1, $2, $3))
);
$ST_PolygonFromCentroid$
LANGUAGE SQL;
Run it as:
SELECT ST_PolygonFromCentroid(ST_SetSRID(ST_MakePoint(13.912,50.633),4326), 1, 1);
Doing overlays (ST_Intersection()
) in PostGIS based on spatial relationships (ST_Intersects()
, ST_Contains()
, …) is so easy it is not something you get particularly excited about.
Today I faced a bit more interesting task: given two polygon layers, get me all the polygons from layer A such that they lie across the polygons from layer B and… a picture worth a thousand words, right?
I hope you got the idea, it is fairly simple:
- Intersect A (red, blue) with B (green)
- Subtract the result of previous from layer A
- Combine results from steps 1 and 2
- Keep polygon only if its id occurs more than twice (that means it went straight through the layer B)
- Profit!
WITH overlays AS (
/* nothing fancy here */
SELECT
A.ogc_fid a_id,
B.ogc_fid b_id,
ST_Intersection(A.geom, B.geom) geom,
ST_Area(ST_Intersection(A.geom, B.geom) area_shared
FROM A
JOIN B ON (ST_Intersects(A.geom, B.geom)
),
diffs AS (
/* note this is a 1:1 relationship in ST_Difference */
/* a little hack is needed to prevent PostGIS from returning its usual difference mess */
SELECT
o.a_id,
o.b_id,
(ST_Dump(ST_Difference(ST_Buffer(A.geom, -0.0001), o.geom))).geom, -- ugly hack
o.area_shared
FROM overlays o
JOIN A ON (o.a_id = A.id)
),
merged AS (
/* put those two result sets together */
SELECT * FROM overlays
UNION ALL
SELECT * FROM diffs
),
merged_reduced AS (
/* get only those A polygons that consist of three parts at least for each intersection with B polygon */
SELECT
m.*
FROM merged m
JOIN (
SELECT
a_id,
b_id
FROM merged
GROUP BY a_id, b_id
HAVING COUNT(1) > 2
) a ON (a.a_id = m.a_id AND a.b_id = m.b_id)
)
/* do as you wish with the result */
SELECT *
FROM merged_reduced;
In my case, centerlines of layer B were also included and their length inside each intersection was used to divide the area of the smallest part with. It was fun, actually.
Since PostgreSQL 9.3 there has been a handy little keyword called LATERAL
, which - combined with JOIN
- might rock your GIS world in a second. To keep it simple, a LATERAL JOIN
enables a subquery in the FROM
part of a query to reference columns from preceding expressions in the FROM
list. What the heck?
Imagine that not so unusual request to generate random points in polygons (something I needed to do today). Do it automatically without your favorite piece of desktop GIS software.
It is pretty easy using LATERAL JOIN
:
SELECT
a.id,
b.*
FROM (
VALUES(
1,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
),
4326)
)
UNION ALL
VALUES(
2,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
),
4326)
)
) a(id, geom)
CROSS JOIN LATERAL (
SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
FROM (
SELECT
random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
FROM generate_series(0,200)
) tmp
) b;
What actually happened over there? If you want to put points inside polygons, you need… polygons. We will do just fine with two of them created inside this query:
VALUES(
1,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
),
4326)
)
UNION ALL
VALUES(
2,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
),
4326)
)
All the magic happens inside the LATERAL JOIN
part of the query:
CROSS JOIN LATERAL (
SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
FROM (
SELECT
random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
FROM generate_series(0,200)
) tmp
) b;
The inner SELECT
calculates random points based on the extent of the polygon. Note it directly calls a.geom
, a value that comes from the previous SELECT
! The LATERAL JOIN
part is thus run for every row of the previous SELECT
statement inside FROM
part of the query. This means it will return 201 points for each of the two polygons (run the query inside QGIS to see the result).
Note all the points fall inside the polygons by accident, because they are square. Otherwise a ST_Contains
or ST_Within
should be used inside the outermost WHERE
query to filter outliers. This part could use some tweaking.
Imagine you run two database machines hosting structurally the same databases on two separate servers and you need to transfer data from one to another. Not very often, let’s say once a month. Your tables aren’t small nor huge, let’s say millions rows in general.
You’re going to use pg_dump
and pipe it to psql
, but the indices on your tables will slow you down a lot.
That’s why you’ll want to drop all indices and constraints (drop_indices_constraints.sql
):
SELECT 'ALTER TABLE ' ||
tc.table_schema ||
'.' ||
tc.table_name ||
' DROP CONSTRAINT ' ||
tc.constraint_name ||
';'
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu ON (tc.constraint_catalog = ccu.constraint_catalog AND tc.constraint_schema = ccu.constraint_schema AND tc.constraint_name = ccu.constraint_name)
WHERE tc.table_schema IN (SELECT unnest(string_to_array(:'schemas', ',')))
UNION ALL
SELECT
'DROP INDEX IF EXISTS ' || schemaname || '.' || indexname || ';'
FROM pg_indexes
WHERE schemaname IN (SELECT unnest(string_to_array(:'schemas', ',')));
Then you will transfer the data:
pg_dump -a -t "schema1.*" -t "schema2.*" -O -d source -v | psql -h localhost -d target
And restore the already dropped indices and constraints (create_indices_constraints.sql
):
WITH constraints AS (
SELECT 'ALTER TABLE ' ||
tc.table_schema ||
'.' ||
tc.table_name ||
' ADD CONSTRAINT ' ||
tc.constraint_name ||
' ' ||
tc.constraint_type ||
'(' ||
string_agg(ccu.column_name::text, ', ')|| -- column order should be taken into account here
');' def,
tc.table_schema,
tc.table_name,
tc.constraint_name
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu ON (tc.constraint_catalog = ccu.constraint_catalog AND tc.constraint_schema = ccu.constraint_schema AND tc.constraint_name = ccu.constraint_name)
WHERE tc.table_schema IN (SELECT unnest(string_to_array(:'schemas', ',')))
AND tc.constraint_type = 'PRIMARY KEY'
GROUP BY
tc.table_schema,
tc.table_name,
tc.constraint_name,
tc.constraint_type
)
SELECT def FROM constraints
UNION ALL
SELECT indexdef || ';'
FROM pg_indexes
WHERE schemaname IN (SELECT unnest(string_to_array(:'schemas', ',')))
AND NOT EXISTS (
SELECT 1 FROM
constraints c
WHERE pg_indexes.schemaname = c.table_schema
AND pg_indexes.tablename = c.table_name
AND pg_indexes.indexname = c.constraint_name
);
Few sidenotes
- Run the second piece of code first. If you forget, run that code on the source database.
- Notice the
:schemas
. Variable assignment is one of the psql
features I really like.
- Notice
DROP INDEX IF EXISTS
and the CTE used in the drop code - that’s due to the fact that dropping the constraint obviously drops the underlying index as well and you don’t want to dropping something that doesn’t exist or creating something that exists already.
The bash script proposal might look as follows:
# store indices and constraint definitions
psql -qAt -d target -v schemas='schema1','schema2' -f create_indices_constraints.sql > create.sql
# drop indices and constraints
psql -qAt -d target -v schemas='schema1','schema2' -f drop_indices_constraints.sql | psql -d target
# load data
pg_dump -a -t "schema1.*" -t "schema2.*" -O -d source -v | psql -h localhost -d target
#renew indices and constraints
psql -qAt -d target -f create.sql
PostgreSQL foreign data wrappers are used to connect PostgreSQL database to different datasources, e.g. other SQL databases, CSV files, XLS spreadsheets×
The one I’ve been interested in for several months is Paul Ramsey’s OGR FDW - it gives you access to OGR supported spatial formats directly from your database. No more shapefiles lying around?
Each foreign data wrapper should have three basic components:
- foreign server object
- foreign user mapping - not necessary if you’re not connecting to other database
- foreign table(s)
I got some data about rivers and dams from DIBAVOD open datasets to play with.
First define the foreign server object:
CREATE SERVER dibavod
FOREIGN DATA WRAPPER ogr_fdw
OPTIONS (
datasource '/downloads/dibavod',
format 'ESRI Shapefile',
config_options 'SHAPE_ENCODING=CP1250'
);
Note the OGR specific driver configuration options are available inside config_options
. In case of ESRI Shapefiles, the datasource
is the directory your files reside in.
Let’s create PostgreSQL tables (use ogrinfo
or Paul’s ogr_fdw_info
to list the columns):
CREATE FOREIGN TABLE rivers (
fid integer,
utokj_id numeric,
utokjn_id numeric,
utokjn_f numeric,
prprop_z integer,
ex_jh integer,
pozn text,
shape_leng numeric,
naz_tok text,
idvt integer,
tok_id numeric,
shape_len numeric,
geom geometry(LINESTRING, 5514)
)
SERVER dibavod
OPTIONS (layer 'A02_Vodni_tok_JU');
CREATE FOREIGN TABLE dams (
fid integer,
objectid integer,
naz_na text,
nadr_gid numeric,
kota_hladi numeric,
hloubka numeric,
zatop_ploc numeric,
objem numeric,
kota_hraz numeric,
kota_preli numeric,
kota_vypus numeric,
plocha_m2 numeric,
shape_area numeric,
shape_len numeric,
geom geometry(MULTIPOLYGON, 5514)
)
SERVER dibavod
OPTIONS (LAYER 'A05_Vodni_nadrze');
Note the fid
column - required for write access to underlying datasource.
Things to remember:
- foreign tables mean no constraints nor indices
- no indices mean spatial queries are terribly slow compared to PostGIS
- I like the idea of
CREATE UNLOGGED TABLE dams2 AS SELECT * FROM dams
, not sure what to use it for though