Spatial indexes are absolutely crucial part of any spatial database and - as I tend to say quite often - only a fool would try to query spatial data without building spatial indexes beforehand.
Spatial indexes are based on bounding box comparisons, which are generally very fast. Yet, there are situations when spatial indexes don’t help much (or they don’t help as much as they could, if you wish).
Bounding box comparisons are effective with lots of small bounding boxes rather then few large ones. Why? See the picture above. The curved line (imagine it’s a pipeline for example) clearly demonstrates when the spatial index/bounding box comparison might fall short of what you’d expect.
Once the bounding box gets really big, it intersects so many other geometries’ bounding boxes that the whole comparison starts to slow down.
Luckily, PostGIS 2.2 introduced a ST_Subdivide function that can lend a helping hand in here.
Until today, we delivered the parcel geometries into our real estate acquisition process system with the following query, that takes all the geometries from the req_geom
table (pipelines, remember?) and intersects them with cadastral parcels. The second part of the query adds those parcels that haven’t been digitalized and were created manually by one of my workmates.
INSERT INTO requested_parcels (uid, par_id)
SELECT
reqs.uid,
b.id par_id
FROM
running_requests reqs
JOIN
req_geom a ON (reqs.uid = a.uid)
JOIN
pargeo b ON (ST_Intersects(a.geom, b.geom))
UNION
SELECT
reqs.uid,
a.idpar::numeric
FROM
running_requests reqs
JOIN
req_man a ON (reqs.uid = a.uid);
It’s a perfectly standard query that intersects several request geometries with ~20M parcels, nothing really fancy. Except that it takes 25 minutes to finish. Why? Pipelines, remember?
Yet, the query below takes only 30 seconds to finish (that’s a huge time saver considering that the whole process used to take ~40 minutes)! Why? Because the ST_Subdivide
effectively shrinks the req_geom
geometries until they have 50 vertices each at most. Such small geometries are perfect input for the bounding box comparison. Remember to call DISTINCT
when using ST_Subdivide
, you’d probably get duplicate parcel ids otherwise.
I also replaced the UNION
with the WHERE NOT EXISTS
expression, as it’s reasonable to assume that numeric ids comparison will be faster.
INSERT INTO requested_parcels (uid, par_id)
SELECT DISTINCT
reqs.uid,
b.id par_id
FROM
running_requests reqs
JOIN
(
SELECT
uid,
ST_Subdivide(geom, 50) geom
FROM
req_geom
) a ON (reqs.uid = a.uid)
JOIN
pargeo b ON (ST_Intersects(a.geom, b.geom));
INSERT INTO requested_parcels (uid, par_id)
SELECT
reqs.uid,
a.idpar::numeric
FROM
running_requests reqs
JOIN
req_man a ON (reqs.uid = a.uid)
WHERE NOT EXISTS (
SELECT 1
FROM pozadovane_parcely pp
WHERE pp.par_id = a.idpar
);
I’ve come across the beautiful GIS StackExchange question recently, asking how to draw a wind rose within PostGIS.
It’s pretty easy to accomplish this with a custom PLPGSQL procedure below, that takes line geometry, number of sections and radius of the inner circle as parameters.
CREATE OR REPLACE FUNCTION ST_WindRose(
line geometry,
directions int,
radius numeric
)
RETURNS TABLE (
id integer,
geom geometry(LINESTRING)
)
AS $ST_WindRose$
BEGIN
IF directions % 2 <> 0 THEN
RAISE EXCEPTION 'Odd number of directions found, please provide even number of directions instead.';
END IF;
IF radius > ST_Length(line) THEN
RAISE EXCEPTION 'Inner circle radius is bigger than the wind rose diameter, please make it smaller.';
END IF;
RETURN QUERY
WITH rose AS (
SELECT
ST_Rotate(_line, radians(360) / directions * dirs.id, ST_Centroid(_line)) _line
FROM (
SELECT line _line
) a
CROSS JOIN (
SELECT generate_series(1, directions / 2) id
) dirs
)
SELECT
row_number() OVER ()::integer id,
_line geom
FROM (
SELECT _line FROM rose
UNION ALL
SELECT ST_ExteriorRing(ST_Buffer(ST_Centroid(line), radius, 30)) -- inner circle
UNION ALL
SELECT ST_ExteriorRing(ST_Buffer(ST_Centroid(line), ST_Length(line)/2, 30)) -- outer circle
) a;
END
$ST_WindRose$
LANGUAGE PLPGSQL;
Wind rose created with this function might look like the one below.
Run it as follows. The line
parameter should be a simple straight line made of just two vertices.
SELECT * FROM ST_WindRose(ST_MakeLine(ST_MakePoint(0,0), ST_MakePoint(0,1)), 12, 0.01);
Needed to create a polygon from a point defining its size in both axes, here’s a little syntax sugar to make life easier.
CREATE OR REPLACE FUNCTION ST_PolygonFromCentroid(centroid geometry, xsize numeric, ysize numeric)
RETURNS geometry
AS $ST_PolygonFromCentroid$
SELECT ST_MakeEnvelope(
ST_X(ST_Translate($1, -$2, -$3)),
ST_Y(ST_Translate($1, -$2, -$3)),
ST_X(ST_Translate($1, $2, $3)),
ST_Y(ST_Translate($1, $2, $3))
);
$ST_PolygonFromCentroid$
LANGUAGE SQL;
Run it as:
SELECT ST_PolygonFromCentroid(ST_SetSRID(ST_MakePoint(13.912,50.633),4326), 1, 1);
Doing overlays (ST_Intersection()
) in PostGIS based on spatial relationships (ST_Intersects()
, ST_Contains()
, …) is so easy it is not something you get particularly excited about.
Today I faced a bit more interesting task: given two polygon layers, get me all the polygons from layer A such that they lie across the polygons from layer B and… a picture worth a thousand words, right?
I hope you got the idea, it is fairly simple:
- Intersect A (red, blue) with B (green)
- Subtract the result of previous from layer A
- Combine results from steps 1 and 2
- Keep polygon only if its id occurs more than twice (that means it went straight through the layer B)
- Profit!
WITH overlays AS (
/* nothing fancy here */
SELECT
A.ogc_fid a_id,
B.ogc_fid b_id,
ST_Intersection(A.geom, B.geom) geom,
ST_Area(ST_Intersection(A.geom, B.geom) area_shared
FROM A
JOIN B ON (ST_Intersects(A.geom, B.geom)
),
diffs AS (
/* note this is a 1:1 relationship in ST_Difference */
/* a little hack is needed to prevent PostGIS from returning its usual difference mess */
SELECT
o.a_id,
o.b_id,
(ST_Dump(ST_Difference(ST_Buffer(A.geom, -0.0001), o.geom))).geom, -- ugly hack
o.area_shared
FROM overlays o
JOIN A ON (o.a_id = A.id)
),
merged AS (
/* put those two result sets together */
SELECT * FROM overlays
UNION ALL
SELECT * FROM diffs
),
merged_reduced AS (
/* get only those A polygons that consist of three parts at least for each intersection with B polygon */
SELECT
m.*
FROM merged m
JOIN (
SELECT
a_id,
b_id
FROM merged
GROUP BY a_id, b_id
HAVING COUNT(1) > 2
) a ON (a.a_id = m.a_id AND a.b_id = m.b_id)
)
/* do as you wish with the result */
SELECT *
FROM merged_reduced;
In my case, centerlines of layer B were also included and their length inside each intersection was used to divide the area of the smallest part with. It was fun, actually.
Since PostgreSQL 9.3 there has been a handy little keyword called LATERAL
, which - combined with JOIN
- might rock your GIS world in a second. To keep it simple, a LATERAL JOIN
enables a subquery in the FROM
part of a query to reference columns from preceding expressions in the FROM
list. What the heck?
Imagine that not so unusual request to generate random points in polygons (something I needed to do today). Do it automatically without your favorite piece of desktop GIS software.
It is pretty easy using LATERAL JOIN
:
SELECT
a.id,
b.*
FROM (
VALUES(
1,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
),
4326)
)
UNION ALL
VALUES(
2,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
),
4326)
)
) a(id, geom)
CROSS JOIN LATERAL (
SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
FROM (
SELECT
random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
FROM generate_series(0,200)
) tmp
) b;
What actually happened over there? If you want to put points inside polygons, you need… polygons. We will do just fine with two of them created inside this query:
VALUES(
1,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
),
4326)
)
UNION ALL
VALUES(
2,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
),
4326)
)
All the magic happens inside the LATERAL JOIN
part of the query:
CROSS JOIN LATERAL (
SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
FROM (
SELECT
random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
FROM generate_series(0,200)
) tmp
) b;
The inner SELECT
calculates random points based on the extent of the polygon. Note it directly calls a.geom
, a value that comes from the previous SELECT
! The LATERAL JOIN
part is thus run for every row of the previous SELECT
statement inside FROM
part of the query. This means it will return 201 points for each of the two polygons (run the query inside QGIS to see the result).
Note all the points fall inside the polygons by accident, because they are square. Otherwise a ST_Contains
or ST_Within
should be used inside the outermost WHERE
query to filter outliers. This part could use some tweaking.