Michal ZimmermannPieces of knowledge from the world of GIS.

Bash: Prepend To Filename

for f in *; do mv "$f" "prepend_$f"; done

Whenever you need to prepend anything to your files.

Migrating Geoserver And Checking For Missing Data

I’ve upgraded a handful of Geoserver installations and it has never been flawless. If you’re lucky you end up with just some layers missing, if you’re not, you’ll miss a bunch of them (together with layergroups, some stores, workspaces might screw up etc.).

But how do you check for missing data before switching to the newer version? Thanks to the REST API implemented within Geoserver, it’s rather easy.

import requests
from bs4 import BeautifulSoup
from requests.auth import HTTPBasicAuth

req = requests.get('http://example.com/geoserver/rest/layers', auth=HTTPBasicAuth('username', 'password'))

html = BeautifulSoup(req.text)
i = 0
for link in html.find_all('a'):
    i += 1
    href = link.get_text()
    print i

with open('list.txt', 'a') as f:
        f.write(href)
        f.write('\n')

We needed to migrate ~ 17,000 layers last week, and yes, we could have just shut the door and spend couple of nights checking one after another, if we were the dumbest GIS company ever.

As I wanted to make it a bit easier I wrote the simple Python script (see above) that just authenticates against Geoserver and downloads the list of layers. I actually had to do that twice - both old and new instance. A simple file comparison followed and I got a list of missing layers in less than two minutes.

If you do the same to workspaces, stores and layergroups, your chances of not losing some data after the switch are pretty high.

I guess it’s reasonable to check your maps by hand as well, but this gives you the picture of the current state of your data real quick.

ogr2ogr UNIX x Windows

GDAL with its ogr2ogr, ogrinfo and many more is one of the best open source tools to do anything to your spatial data. It is a command line tool, which sort of determines it to be used with UNIX systems, but you might bump into a Windows guy trying to use it as well once in a while.

Be careful, it behaves differently on different OS. Let’s say you do something like this on UNIX:

ogr2ogr -f GeoJSON -where "attribute IN ('value1', 'value2')" output.json input.json

What you might get is a big nothing. Executed on Windows it gives you the result you’ve expected. Aargh, what is that supposed to mean?

Well, that’s the ogr2ogr’s way to tell you: Hello there, you need to switch single quotes for double quotes and vice versa, you dumb! I don’t know why and I find it really annoying. Just in case you get stuck with ogr2ogr (or probably any other command line tool), try this.

Notify When Average of 10 Subsequent Numbers Is Bigger Than Given Value

I found an interesting question at StackOverflow asking for help finding solution to what I have already mentioned in the title, with PHP. I gave it a try before reading answers and came up with the following code:

$avg  = // value we are looking for
$size = count($numbers);

for ($i = 0; $i < $size; $i += 1) {
    if ($i + 9 < 51) {
        $val += $numbers[$i];
        for ($j = $i + 1; $j < 10 + $i; $j += 1) {
            $val += $numbers[$j];
        }
        if ($val / 10 >= $avg) { // hit
            // do something
        }
        $val = 0;
    }
}

That was the first that I could think of. And it worked. The answer given by Dave Chen was much more elegant than my solution (although I think it does something a bit different, but that’s not the point here):

$number = 10; //numbers in a set
$max = 0;
$index = 0;

$size = sizeof($numbers) - $number;
for ($i = 0; $i < $size; $i++) {
    $tmp = array_sum(array_slice($numbers, $i, $number)) / $number;
    if ($tmp > $max) {
        $max = $tmp;
        $index = $i;
    }
}

I made a simple benchmark with microtime() and found out that my solution (ran 100k times) took about ~12.3 seconds while Dave’s took only ~7.4 seconds to finish. That makes his code almost twice faster than mine.

Lesson learned: do not stop learning!

PostgreSQL Remote Access

PostgreSQL is set to listen only to connections coming from localhost by default. I guess that’s fine as far as you don’t need access to the database from anywhere else (like your work network). If you do, you need to log via SSH or use some online database management tool (go for Adminer and forget about anything called php[pg|my]admin). Or you can set it up to access connections from other locations.

You need to:

  1. set listen_addresses to * in your postgres.conf. That does not mean anyone can connect to your database, that means that the server will listen to connections coming from any available IP interface.
  2. insert new entry into pg_hba.conf looking like this: host database user xxx.xxx.xxx.xxx md5. Now we’re saying we only want connections coming from IP xxx.xxx.xxx.xxx accepted.
  3. Add rule allowing the database server access to iptables. Number 5 says it will be the fifth rule in the order. It must come before the final REJECT ALL rule if present.

    iptables -I INPUT 5 -p tcp --dport 5432 -s xxx.xxx.xxx.xxx -j ACCEPT 4. Just to be sure noone else is able to connect, reject all on port 5432.

    iptables -I INPUT 6 -p tcp --dport 5432 -j REJECT

You’re set to remotely connect to your database server.