Articles tagged with postgresql tag

Plotting the Czech Cadastre Land Use with d3: Data Extraction (part I)

This post is the first part of the upcoming series summarizing the process of visualizing landuse data with bash, PostgreSQL and d3.js. Read other parts:

you’re reading it now
Plotting the Czech Cadastre Land Use with d3: Data Transformation (part II)
Plotting the Czech Cadastre Land Use with d3: Data Transformation (part III)

Czech Office for Surveying, Mapping and Cadastre has recently published lot of data via Atom feed. There’s pretty small and a bit boring dataset included, featuring quarterly updated landuse-related values for all 13,091 cadastral areas:

absolute number of land lots within given category (arable land, forests, etc.)
absolute area of land lots within given category

Data are published as CSV files linked from the Atom feed. Sadly, they come windows-1250 encoded, using Windows line endings, with trailing semicolons and header rows using diacritics.

ETL process

Before the d3 viz can be crafted, it’s necessary to:

extract CSV data from the URLs provided via the Atom feed
transform those data into a relational database, do some math
load data into a d3.js viz
profit (as usual)

Extract

#!/bin/bash
# extract.sh -f YYYYMMDD

while [[ $# -gt 1 ]]
do
key="$1"

case $key in
    -f|--file)
    FILE="$2"
    shift # past argument
    ;;
    *)
        # unknown option
    ;;
esac
shift # past argument or value
done

URL=http://services.cuzk.cz/sestavy/UHDP/UHDP-
CSVFILE=$FILE.csv
CSVUTF8FILE=${CSVFILE%.*}.utf.csv
URL+=$CSVFILE

echo "downloading $URL"
wget -q $URL -O $CSVFILE

if [[ $? != 0 ]]; then
    rm -f $CSVFILE
    echo "download failed"
    exit
fi

echo "converting to utf-8"
iconv -f WINDOWS-1250 -t UTF-8 $CSVFILE -o $CSVUTF8FILE &amp;&amp; \
echo "modifying ${FILE}"
sed -i 's/^M$//' $CSVUTF8FILE &amp;&amp; \
sed -i 's/\r$//' $CSVUTF8FILE &amp;&amp; \
sed -i 's/;*$//g' $CSVUTF8FILE &amp;&amp; \
sed -i '1d' $CSVUTF8FILE

echo "importing to database"
sed -e "s/\${DATE}/$FILE/g" extract.sql | psql -qAt --no-psqlrc

rm $CSVFILE $CSVUTF8FILE

This script downloads CSV file, deals with all the pitfalls mentioned above and, when done, copy command within extract.sql loads the data into a data_YYYYMMDD table. Putting all the files into the one table would have saved me a lot of transformation SQL, yet it didn’t feel quite right though.

Transform

See Plotting the Czech Cadastre Land Use with d3: Data Transformation (part II).

Load

See Plotting the Czech Cadastre Land Use with d3: Data Transformation (part III).

PostGIS Custom Function to Create Wind Rose

I’ve come across the beautiful GIS StackExchange question recently, asking how to draw a wind rose within PostGIS.

It’s pretty easy to accomplish this with a custom PLPGSQL procedure below, that takes line geometry, number of sections and radius of the inner circle as parameters.

CREATE OR REPLACE FUNCTION ST_WindRose(
    line geometry,
    directions int,
    radius numeric
)
RETURNS TABLE (
    id integer,
    geom geometry(LINESTRING)
)
AS $ST_WindRose$
BEGIN
    IF directions % 2 &lt;&gt; 0 THEN
        RAISE EXCEPTION 'Odd number of directions found, please provide even number of directions instead.';
    END IF;

IF radius &gt; ST_Length(line) THEN
    RAISE EXCEPTION 'Inner circle radius is bigger than the wind rose diameter, please make it smaller.';
END IF;

RETURN QUERY
WITH rose AS (
    SELECT
        ST_Rotate(_line, radians(360) / directions * dirs.id, ST_Centroid(_line)) _line
    FROM (
        SELECT line _line
    ) a
    CROSS JOIN (
        SELECT generate_series(1, directions / 2) id
    ) dirs
)
SELECT
    row_number() OVER ()::integer id,
    _line geom
FROM (
    SELECT _line FROM rose
    UNION ALL
    SELECT ST_ExteriorRing(ST_Buffer(ST_Centroid(line), radius, 30)) -- inner circle
    UNION ALL
    SELECT ST_ExteriorRing(ST_Buffer(ST_Centroid(line), ST_Length(line)/2, 30)) -- outer circle
) a;
END
$ST_WindRose$
LANGUAGE PLPGSQL;

Wind rose created with this function might look like the one below.

Run it as follows. The line parameter should be a simple straight line made of just two vertices.

SELECT * FROM ST_WindRose(ST_MakeLine(ST_MakePoint(0,0), ST_MakePoint(0,1)), 12, 0.01);

PostGIS Custom Function to Create Polygon from Centroid

Needed to create a polygon from a point defining its size in both axes, here’s a little syntax sugar to make life easier.

CREATE OR REPLACE FUNCTION ST_PolygonFromCentroid(centroid geometry, xsize numeric, ysize numeric)
RETURNS geometry
AS $ST_PolygonFromCentroid$
SELECT ST_MakeEnvelope(
    ST_X(ST_Translate($1, -$2, -$3)),
    ST_Y(ST_Translate($1, -$2, -$3)),
    ST_X(ST_Translate($1, $2, $3)),
    ST_Y(ST_Translate($1, $2, $3))
);
$ST_PolygonFromCentroid$
LANGUAGE SQL;

Run it as:

SELECT ST_PolygonFromCentroid(ST_SetSRID(ST_MakePoint(13.912,50.633),4326), 1, 1);

Finding Polygons Lying across Other Polygons with PostGIS

Doing overlays (ST_Intersection()) in PostGIS based on spatial relationships (ST_Intersects(), ST_Contains(), …) is so easy it is not something you get particularly excited about.

Today I faced a bit more interesting task: given two polygon layers, get me all the polygons from layer A such that they lie across the polygons from layer B and… a picture worth a thousand words, right?

I hope you got the idea, it is fairly simple:

Intersect A (red, blue) with B (green)
Subtract the result of previous from layer A
Combine results from steps 1 and 2
Keep polygon only if its id occurs more than twice (that means it went straight through the layer B)
Profit!

WITH overlays AS (
/* nothing fancy here */
    SELECT
        A.ogc_fid a_id,
        B.ogc_fid b_id,
        ST_Intersection(A.geom, B.geom) geom,
        ST_Area(ST_Intersection(A.geom, B.geom) area_shared
    FROM A
    JOIN B ON (ST_Intersects(A.geom, B.geom)
),
diffs AS (
/* note this is a 1:1 relationship in ST_Difference */
/* a little hack is needed to prevent PostGIS from returning its usual difference mess */
    SELECT
        o.a_id,
        o.b_id,
        (ST_Dump(ST_Difference(ST_Buffer(A.geom, -0.0001), o.geom))).geom, -- ugly hack
        o.area_shared
    FROM overlays o
    JOIN A ON (o.a_id = A.id)
),

merged AS (
/* put those two result sets together */
    SELECT * FROM overlays
    UNION ALL
    SELECT * FROM diffs
),

merged_reduced AS (
/* get only those A polygons that consist of three parts at least for each intersection with B polygon */
  SELECT
    m.*
  FROM merged m
  JOIN (
    SELECT
      a_id,
      b_id
    FROM merged
    GROUP BY a_id, b_id
    HAVING COUNT(1) &gt; 2
  ) a ON (a.a_id = m.a_id AND a.b_id = m.b_id)
)
/* do as you wish with the result */
SELECT *
FROM merged_reduced;

In my case, centerlines of layer B were also included and their length inside each intersection was used to divide the area of the smallest part with. It was fun, actually.

Dead Simple Random Points in Polygons with PostGIS

Since PostgreSQL 9.3 there has been a handy little keyword called LATERAL, which - combined with JOIN - might rock your GIS world in a second. To keep it simple, a LATERAL JOIN enables a subquery in the FROM part of a query to reference columns from preceding expressions in the FROM list. What the heck?

Imagine that not so unusual request to generate random points in polygons (something I needed to do today). Do it automatically without your favorite piece of desktop GIS software.

It is pretty easy using LATERAL JOIN:

SELECT
    a.id,
    b.*
FROM (
    VALUES(
        1,
        ST_SetSRID(
            ST_GeomFromText(
                'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
            ),
        4326)
    )
    UNION ALL
    VALUES(
        2,
        ST_SetSRID(
            ST_GeomFromText(
                'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
            ),
        4326)
    )
) a(id, geom)
CROSS JOIN LATERAL (
    SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
    FROM (
        SELECT
            random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
            random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
        FROM generate_series(0,200)
    ) tmp
) b;

What actually happened over there? If you want to put points inside polygons, you need… polygons. We will do just fine with two of them created inside this query:

VALUES(
    1,
    ST_SetSRID(
        ST_GeomFromText(
            'POLYGON((0 0, -1 0, -1 -1, 0 -1, 0 0))'
        ),
    4326)
)
UNION ALL
VALUES(
    2,
    ST_SetSRID(
        ST_GeomFromText(
            'POLYGON((0 0, 1 0, 1 1, 0 1, 0 0))'
        ),
    4326)
)

All the magic happens inside the LATERAL JOIN part of the query:

CROSS JOIN LATERAL (
    SELECT ST_SetSRID(ST_MakePoint(tmp.x, tmp.y), 4326) geom
    FROM (
        SELECT
            random() * (ST_XMax(a.geom) - ST_XMin(a.geom)) + ST_XMin(a.geom) x,
            random() * (ST_YMax(a.geom) - ST_YMin(a.geom)) + ST_YMin(a.geom) y
        FROM generate_series(0,200)
    ) tmp
) b;

The inner SELECT calculates random points based on the extent of the polygon. Note it directly calls a.geom, a value that comes from the previous SELECT! The LATERAL JOIN part is thus run for every row of the previous SELECT statement inside FROM part of the query. This means it will return 201 points for each of the two polygons (run the query inside QGIS to see the result).

Note all the points fall inside the polygons by accident, because they are square. Otherwise a ST_Contains or ST_Within should be used inside the outermost WHERE query to filter outliers. This part could use some tweaking.