PostGIS: Finding Biggest Land Users Nearby

At CleverMaps we heavily rely on the cadastre of real estate, which is probably the biggest data source in my country. Using their excellent knowledge of this data set, my teammates often supply me with all kinds of weird challenges.

Give me the biggest land users nearby

Find the biggest land users in surrounding cadastral communities for each cadastral community (~ 13K) being the latest task, here’s the query I tackled it with:

WITH users_ AS (
    SELECT
        cad_code,
        id,
        zipcode,
        city,
        concat_ws(' ',street, concat_ws('/', house_number, street_number)) as street,
        name,
        'Users with more than 10 ha'::text note,
        SUM(acreage) area
        FROM land_blocks -- being a table with info about all the agricultural land blocks
        JOIN users u ON id_uz = id
        GROUP BY cad_code, u.id
        HAVING SUM(acreage) &gt; 10
),
ints AS (
    SELECT
        ku.cad_code as community,
        ku2.cad_code as surrounding,
        ku2.cad_name
    FROM cadastral_community ku
    JOIN cadastral_community ku2
        ON ST_Intersects(ku.geom, ku2.geom)
    WHERE ku.cad_code &lt;&gt; ku2.cad_code
)
SELECT
    DISTINCT ON (surrounding, cad_name, u.zipcode, u.city, u.street, u.name)
    surrounding,
    cad_name,
    u.zipcode,
    u.city,
    u.street,
    u.name,
    u.note,
    u.area
FROM
    users_ u
JOIN ints
    ON cad_code = community;

Few things to note:

concat_ws() is a great function for joining values that might be NULL. If such a value is found, it is skipped and the function continues with the next one (if any). Thus, you’ll never get a string ending with a trailing slash (Street name 55/).
With users_ CTE I get a list of owners having more than 10 hectares of land for each cadastral community. This gives me the inverse result of what I want (if I know the biggest owners in the cadastral community, I know these are the ones that should be listed for surrounding c. communities).
This putting-it-all-together step is done with ints CTE that outputs the list of surrounding c. communities for each of them.
DISTINCT ON cleans up the list so the same owners don’t appear more than once for any given c. community.

Writing this makes me realize the list should be probably sorted by area so only the occurence with the biggest value is kept for each c. community. Simple ORDER BY should deal with this just fine. Or even more sophisticated, using GROUP BY to output the total acreage in all surrounding c. communities.