Is there a way of using PostgreSQL + PostGIS for finding the number of self intersections in a linestring? was a question that made me think of this problem. I came up with a solution that takes just a few lines of code.
Assume the following geometries:
CREATE TABLE test2 (
id integer NOT NULL,
wkb_geometry geometry(LineString,5514)
);
COPY test2 (id, wkb_geometry) FROM stdin;
1 01020000208A15000004000000CCDC7845E339EEBFF2003B4A8A08E1BFE4154DAB7C31DCBF24C2042773E3E53F2287BA2CC591E43F604749BFE3B2E2BF2AE9770A11B8F0BF9C91435D56C0C63F
2 01020000208A1500000600000050212BF9E63EC03F1FA046FD69F1EA3F504D44212915EA3F74A99EDF44E3F33F2CE2805DFAB1F33F805D24B1B189DC3F9834DE5938C1F53FB56F1FBF8AAFEC3F24D0C85B4666EA3FF311B0D8D75BE93F306EAA073894D23FA841B27E3404F33F
\.
Note that those geometries are valid while not being simple, thus, ST_IsValidReason()
wouldn’t help much. What if we compared it to their single counterparts? Those would have had vertices at intersections. Once you know the original number of vertices and the number of simple geometry vertices, it is fairly easy to subtract those two.
WITH noded AS (
SELECT id, COUNT(id)
FROM (
SELECT DISTINCT (ST_DumpPoints(ST_Node(wkb_geometry))).geom, id
FROM test
) tmp group by id
),
test AS (
SELECT id, COUNT(id)
FROM (
SELECT DISTINCT (ST_DumpPoints(wkb_geometry)).geom, id
FROM test
) tmp group by id
)
SELECT noded.id, noded.count - test.count cnt FROM noded JOIN test USING (id);
This query gives you geometry id and the difference in number of vertices between the original and simple geometry. Note the DISTINCT
in the noded
CTE - with ST_Node()
you get one vertex x number of intersecting lines
for each intersection. DISTINCT
gives you just one of them.
The query result on my test
table:
Creating a rectangular grid to cover a given extent with same sized cells is one of the basic GIS tasks I’ve had to solve several times so far. I used QGIS or some Python to give me a bunch of INSERT
statements to run in PostGIS database, now I’ve come with a final solution at last.
CREATE OR REPLACE FUNCTION cm_grid(
blx float8, -- bottom left x coordinate
bly float8, -- bottom left y coordinate
trx float8, -- top right x coordinate
try float8, -- top right y coordinate
xsize float8, -- cell width
ysize float8, -- cell height
srid integer DEFAULT 5514,
OUT col varchar,
OUT "row" varchar,
OUT geom geometry
) RETURNS SETOF record AS
$$
DECLARE
width float8; -- total area width
height float8; -- total area height
cols integer;
rows integer;
BEGIN
width := @($1 - $3); -- absolute value
height := @($2 - $4); -- absolute value
cols := ceil(width / xsize);
rows := ceil(height / ysize);
RETURN QUERY
SELECT
cast(
lpad((i)::varchar,
CASE WHEN
char_length(rows::varchar) > char_length(cols::varchar)
THEN char_length(rows::varchar)
ELSE char_length(cols::varchar)
END,
'0')
AS varchar
) AS row,
cast(
lpad((j)::varchar,
CASE WHEN
char_length(rows::varchar) > char_length(cols::varchar)
THEN char_length(rows::varchar)
ELSE char_length(cols::varchar)
END,
'0') AS varchar
) AS col,
ST_SetSRID(
ST_GeomFromText(
'POLYGON((' ||
array_to_string(
ARRAY[i * xsize + blx, j * ysize + bly],
' '
) || ',' ||
array_to_string(
ARRAY[i * xsize + blx, (j+1) * ysize + bly],
' '
) || ',' ||
array_to_string(
ARRAY[(i+1) * xsize + blx, (j+1) * ysize + bly],
' '
) || ',' ||
array_to_string(
ARRAY[(i+1) * xsize + blx, j * ysize + bly],
' '
) || ',' ||
array_to_string(
ARRAY[i * xsize + blx, j * ysize + bly],
' '
) || '
))'
)
, srid) AS geom
FROM
generate_series(0, cols) AS i,
generate_series(0, rows) AS j;
END;
$$
LANGUAGE plpgsql;
And you call it like this:
CREATE TABLE grid AS
SELECT *
FROM cm_grid(-675593.69, -1057711.19, -672254.69, -1054849.19, 333.47, 333.47);
Few notes:
- it takes bounding box coordinates (bottom left, top right) for an extent,
- followed by cell width and height,
- and optional SRID (defaults to 5514 which is Czech national grid),
- each cell is indexed with
row
and col
number
The messy CASE
statement makes sure both row
and col
are of the same length. I used array_to_string
for better readability. It might not be the fastest way, didn’t do any benchmarks.
Being part of CleverMaps means doing lot of nasty work with PostGIS. Recently, I’ve been given a following task that needed to be done for a really big project dealing with agricultural parcels:
- given a polygonal shapefile of agricultural parcels, create 20m wide buffers around all of them,
- extract holes from these parcels,
- clip buffers so they don’t overlap with other parcels,
- get rid of overlaps between nearby parcels (e.g. dissolve them),
- create output combined from holes and buffers,
- the output must not contain features having more than ~1,000,000 vertices
This process is going to be run ~20× on layers with ~40,000-70,000 polygons.
Input data
- polygonal layer of agricultural parcels
- rectangular grid (7.5 × 7.5 km) for cutting the output
First try
My initial effort was to union all the buffers and then clip them with a rectangular grid. Long story short: Don’t do that. Never. Ever. I mean it.
It works fine until you end up with one huge multipolygon having like ~2,000,000 vertices. But then you need to split it somehow so you meet the 1,000,000 limit rule (see list above). Spatial index doesn’t help you much in such cases, so that really huge polygon is being cut by every rectangle it intersects and it takes hours and hours. It’s just a no go.
The other way round
Let’s put it the other way round. First, split buffers by rectangular grid, doing union on each cell separately.
Import
Using the swiss knife of GIS to import the data:
export SHAPE_ENCODING="ISO-8859-1"
ogr2ogr -f PostgreSQL PG:"dbname=db user=postgres" parcels.shp -lco ENCODING=UTF-8 -t_srs "EPSG:2154"
ogr2ogr -f PostgreSQL PG:"dbname=db user=postgres" grid.shp -lco ENCODING=UTF-8 -t_srs "EPSG:2154"
PostGIS processing
Recently I stumbled upon a psql \set
command. Launching several queries on the same table, it might be useful to define it’s name with \set table tablename
:
\set table 'parcels'
-- prepare separate table for holes (inner rings)
DROP TABLE IF EXISTS holes;
CREATE TABLE holes (
id serial,
ilot_id varchar,
wkb_geometry geometry('Polygon', 2154),
path integer);
I found the following query an easy way to get all the rings from geometries having more than one ring:
INSERT INTO holes (ilot_id, wkb_geometry, path) (
SELECT id,
(ST_DumpRings(wkb_geometry)).geom::geometry('Polygon', 2154) as wkb_geometry,
unnest((ST_DumpRings(wkb_geometry)).path) as path
FROM :table
WHERE ST_NRings(wkb_geometry) > 1
);
Here’s a little trick. Doing some checks I found out that some of the polygons had two rings without having any inner ring, both of them being the same. I guess this comes from some kind of human error. This query thus deletes all rings with path = 0
(exterior rings). At the same time, it deals with that invalid polygons by checking their spatial relationship to parcels.
DELETE FROM holes
WHERE path = 0
OR id IN (
SELECT holes.id
FROM holes
JOIN :table ON
ST_Within(
ST_Buffer(holes.wkb_geometry,-1),
:table.wkb_geometry
)
AND holes.wkb_geometry && :table.wkb_geometry
);
To my surprise, it is possible that parcel has a hole with another parcel being in that hole. Argh. Find those and get rid of them.
DROP TABLE IF EXISTS ints;
CREATE TABLE ints AS
SELECT holes.*
FROM holes
JOIN :table ON ST_Intersects(holes.wkb_geometry, :table.wkb_geometry)
AND ST_Relate(holes.wkb_geometry, :table.wkb_geometry, '2********');
DELETE FROM holes
WHERE id IN (SELECT id FROM ints);
I still need to get the difference between the hole geometry and the parcel that resides inside it - this difference is the actual hole I’m looking for.
DROP TABLE IF EXISTS diff_ints;
CREATE TABLE diff_ints AS
SELECT
ints.id,
ST_Difference(
ints.wkb_geometry,
ST_Collect(:table.wkb_geometry)
) wkb_geometry
FROM ints, :table
WHERE ST_Within(:table.wkb_geometry, ints.wkb_geometry)
GROUP BY ints.wkb_geometry, ints.id;
And I’m done with holes. Get back to buffers.
DROP TABLE IF EXISTS buffer;
CREATE TABLE buffer AS
SELECT id, ST_Buffer(wkb_geometry, 20) wkb_geometry
FROM :table;
CREATE INDEX buffer_gist_idx ON buffer USING gist(wkb_geometry);
ALTER TABLE buffer ADD PRIMARY KEY(id);
VACUUM ANALYZE buffer;
Combine all the parts together.
DROP TABLE IF EXISTS diff;
CREATE TABLE diff AS
SELECT a.id, ST_Difference(a.wkb_geometry, ST_Union(ST_MakeValid(b.wkb_geometry))) as wkb_geometry
FROM buffer a, :table b
WHERE ST_Intersects(a.wkb_geometry, b.wkb_geometry)
GROUP BY a.id, a.wkb_geometry
UNION
SELECT id::varchar, wkb_geometry FROM holes
UNION
SELECT id::varchar, wkb_geometry FROM diff_ints;
CREATE INDEX diff_gist_idx ON diff USING gist(wkb_geometry);
VACUUM ANALYZE diff;
Collect the geometries in every cell, simplify them a little, snap them to 3 decimal numbers, make them valid and dump them to simple features. This query takes ~300,000 ms which is orders of magnitude faster than my initial attempt.
DROP TABLE IF EXISTS uni;
CREATE TABLE uni AS
SELECT
g.ogc_fid AS grid_id,
(ST_Dump(
ST_MakeValid(
ST_SnapToGrid(
ST_SimplifyPreserveTopology(
ST_CollectionExtract(
ST_Buffer(
ST_Collect(
ST_Intersection(a.wkb_geometry, g.wkb_geometry)
)
, 0)
, 3)
, 0.1)
, 0.001)
)
)).geom as wkb_geometry
FROM diff a, grid g
WHERE ST_Intersects(a.wkb_geometry, g.wkb_geometry)
GROUP BY g.ogc_fid;
After running the query it is reasonable to check the results. I’m only interested in polygonal geometries, ST_GeometryType()
would tell me of any other geometry type. Invalid geometries are not allowed.
SELECT DISTINCT ST_GeometryType(wkb_geometry) FROM uni;
SELECT COUNT(1) FROM uni WHERE NOT ST_IsValid(wkb_geometry);
Add primary key on serial column as a last SQL step.
ALTER TABLE uni ADD COLUMN id serial;
ALTER TABLE uni ADD PRIMARY KEY(id);
Export
And spit it out as a shapefile.
ogr2ogr -f "ESRI Shapefile" output.shp PG:"dbname=ign user=postgres" uni -s_srs "EPSG:2154" -t_srs "EPSG:2154" -lco ENCODING=UTF-8
Lesson learned
- More of little seems to be faster than less of bigger.
- Never stop learning and trying different approaches.
- Although using
CTE
might be tempting, creating separate tables for separate steps of the whole process is much more comfortable for debugging.
Using WMS in real time might easily become pain in the ass due to low connection speed or slow server response. Downloading images beforehand seems to be a reasonable choice both to avoid any slowdowns and to improve user experience when working with WMS layers.
OWSLib is a great tool to help you get images from WMS server. Code and some comments follow.
import math
import os
import random
import time
from owslib.wms import WebMapService
BOTTOM_LEFT = (-679363,-1120688)
TOP_RIGHT = (-565171,-1042703)
SRS_WIDTH = math.fabs(-639084 - -638825) # tile width in units of crs => 259 m
SRS_HEIGHT = math.fabs(-1070426 - -1070273) # tile height in units of crs => 153 m
PX_WIDTH = 977
PX_HEIGHT = 578
FORMAT = 'image/png'
LAYERS = ['KN', 'RST_PK']
SIZE = (PX_WIDTH, PX_HEIGHT)
SRS = 'EPSG:5514'
STYLES = ['default', 'default']
TRANSPARENT = True
DIRECTORY = 'tiles/'
SLEEP = random.randint(0,20) # seconds
dx = math.fabs(BOTTOM_LEFT[0] - TOP_RIGHT[0]) # area width in units of crs
dy = math.fabs(BOTTOM_LEFT[1] - TOP_RIGHT[1]) # area height in units of crs
cols = int(math.ceil(dx / SRS_WIDTH)) + 1
rows = int(math.ceil(dy / SRS_HEIGHT)) + 1
counter = 0
with open('world_file.pngw', 'r') as wld_template:
tmpl = wld_template.read()
wms = WebMapService('http://services.cuzk.cz/wms/wms.asp', version='1.1.1')
for i in xrange(0, rows):
if not os.path.exists(DIRECTORY + str(i)):
os.mkdir(DIRECTORY + str(i))
for j in xrange(0, cols):
if os.path.exists(DIRECTORY + str(i) +'/kn_' + str(i) + '_' + str(j) + '.png'):
counter += 1
continue
bbox = (
i * SRS_WIDTH + BOTTOM_LEFT[0],
j * SRS_HEIGHT + BOTTOM_LEFT[1],
(i + 1) * SRS_WIDTH + BOTTOM_LEFT[0],
(j + 1) * SRS_HEIGHT + BOTTOM_LEFT[1]
)
img = wms.getmap(
layers=LAYERS,
styles=STYLES,
srs=SRS,
bbox=bbox,
size=SIZE,
format=FORMAT,
transparent=TRANSPARENT
)
with open(DIRECTORY + str(i) +'/kn_' + str(i) + '_' + str(j) + '.png', 'wb') as png:
png.write(img.read())
with open(DIRECTORY + str(i) + '/kn_' + str(i) + '_' + str(j) + '.pngw', 'w') as wld_file:
wld_file.write(tmpl)
wld_file.write('\n' + str(i * SRS_WIDTH + BOTTOM_LEFT[0]))
wld_file.write('\n' + str((j+1) * SRS_HEIGHT + BOTTOM_LEFT[1]))
counter += 1
print str(counter), ' out of ', str(rows * cols)
time.sleep(SLEEP)
First, always make sure you are not violating terms of use defined by service provider. If you are not, here are the necessary steps:
- Define your area of interest with bottom left and top right coordinates.
- Calculate width of single image both in pixels and units of CRS to get the rightsized image. Note that there may be image size restrictions defined by provider (2048 × 2048 px is usually the biggest you can get).
- Define template world file for referencing images. OWSLib doesn’t provide world files to saved images, these have to be created by you. I recommend to use a template file for creating real world files.
- Be nice! Don’t overload the service. I use
time.sleep()
for this.
- Profit.
The trouble with WMS is that you can’t set an arbitrary scale you want to obtain images in (e.g. 1:1 000). It’s fairly easy to get all values needed to imitate this behavior though.
Using QGIS you can:
- Get bounding box of area you’re interested in.
- Save current view as an image (together with the world file!) and use it as a specimen for your own world files.
- Derive image width (CRS, pixels) from the saved image, thus getting the same zoom level you were using in QGIS.
Code given is not bulletproof, it will fail on any network error. However, if you restart it after such a crash, it checks for existing files and starts with the first missing, so you don’t have to download all the tiles again.
I decided to migrate my web to OpenShift. It was a bit frustrating but I got it working eventually.
Things to know before taking the leap
Some domain providers don’t support CNAME changes for root domains (zimmi.cz in my case). This means you can’t simply tell your domain to serve content from OpenShift address. But what you can do is to tell your www
subdomain to do so:
www.zimmi.cz CNAME hp-zimmi.rhcloud.com
Which is great until you realize you’ve just created two different websites. That’s where wwwizer lends you a hand and lets you redirect your naked domain to your www
domain:
zimmi.cz A 174.129.25.170
Now everything works fine and you have your www.domain.tld
up and running.
OpenShift subdomains
I wasn’t successful creating a subdomain on the same application where I run my domain. This can be easily solved by creating another application and pointing DNS to it:
posts.zimmi.cz A 174.179.25.170
www.posts.zimmi.cz CNAME posts-zimmi.rhcloud.com
Just don’t forget to handle both naked and www
version. When Google reindexes new URLs (http://www.zimmi.cz/posts instead of http://posts.zimmi.cz) subdomain application might be deleted.