Wifileaks Wi-Fi Networks Dataviz
Wifileaks is a project by Jakub Čížek aimed to map the Czech wi-fi networks with Android/iOS app. The data gathered by people using the app is available to download and features ~ 90,000,000 records, each representing the position of the cellphone when connecting to the network. Just about perfect to craft some maps!
Using PostgreSQL cstore_fdw
I ran out of disk space immediately after loading the dataset into the PostgreSQL database. After fiddling around I remembered that columnar store should be a bit space-friendlier than the old fashioned relational database. Thus, I installed the cstore_fdw by Citus Data in just few steps.
sudo apt install libprotobuf-c-dev libprotobuf-c1 protobuf-c-compiler postgresql-server-dev-9.6
git clone [email protected]:citusdata/cstore_fdw.git
PATH=/usr/bin/:$PATH make
PATH=/usr/bin/:$PATH make install
# when the cstore_fdw installation finishes, add the following line to your postgresql.conf and restart the database cluster
shared_preload_libraries = 'cstore_fdw'
This makes another FDW available to you inside the PostgreSQL. The actual foreign server has to be created before loading the data into a foreign table.
cat <<END | psql -qAt --no-psqlrc
CREATE SERVER cstore_server FOREIGN DATA WRAPPER cstore_fdw;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
id integer,
mac text,
ssid text,
signal_strength numeric,
security integer,
lat numeric,
lon numeric,
alt numeric,
unixtime bigint,
filename text
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END
The foreign table is 3× smaller than it’s standard counterpart. However, this comes with some costs:
- neither
UPDATE
norDELETE
can be used - no
CREATE INDEX
- no
SERIAL
To overcome these shortcomings I used COPY
statement to spit out the slightly modified table and immediately loaded it back in.
cat <<END | psql -qAt --no-psqlrc
COPY (
SELECT
row_number() OVER (),
mac,
ssid,
signal_strength,
security,
split_part(filename, '_', 2)::integer,
to_timestamp(unixtime),
ST_Transform(ST_SetSRID(ST_MakePoint(lon, lat, alt), 4326), 32633)
FROM data_cstore.wifi
WHERE lon BETWEEN 0 AND 20
AND lat BETWEEN 18 AND 84
) TO '/tmp/wifileaks.db' WITH CSV DELIMITER ';'
DROP SCHEMA IF EXISTS data_cstore CASCADE;
DROP SCHEMA data_cstore;
CREATE SCHEMA data_cstore;
CREATE FOREIGN TABLE data_cstore.wifi (
id integer,
mac text,
ssid text,
signal_strength numeric,
security integer,
userid integer,
unixtime timestamp without time zone,
geom geometry(POINTZ, 32633)
)
SERVER cstore_server
OPTIONS (compression 'pglz');
END
Putting the networks on the map
As mentioned, each row of data represents the cellphone’s location when connecting to a wi-fi network. To get real wi-fi transmitter position, I calculated the average of location of each cellphone ever connected (although the signal strength should be taken into account here as well).
CREATE UNLOGGED TABLE data_cstore.wifi_avg_loc AS
SELECT
row_number() OVER () id,
mac,
ST_SetSRID(ST_MakePoint(x, y), 32633) geom
FROM (
SELECT
mac,
AVG(ST_X(geom)) x,
AVG(ST_Y(geom)) y
FROM data_cstore.wifi_loc
GROUP BY 1
) a;