Page 1 of 1 (8 posts)

  • talks about »
  • opengovermentdata


Last update:
Sat Apr 19 12:45:10 2014

A Django site.

QGIS Planet

OSM quality assessment with QGIS: positional accuracy

Over the last years, research on OpenStreetMap data quality has become increasingly popular. At this year’s FOSS4G, I had the honor to present some work we did at the AIT to assess OSM quality in Vienna, Austria. In the meantime, our paper “Towards an Open Source Analysis Toolbox for Street Network Comparison” has been published for early access. Thanks to the conference organizers who made this possible! I’ve implemented comparison tools found in related OSM literature as well as new tools for oneway street and turn restriction comparison using Sextante scripts and models for QGIS 1.8. All code is available on Github to enable collaboration. If you are interested in OSM data quality research, I’d like to invite you to give the tools a try.

Since most users probably don’t have access to QGIS 1.8 anymore, I’ll be updating the tools to QGIS 2.0 Processing. I’m starting today with the positional accuracy comparison tool. It is based on a method described by Goodchild & Hunter (1997). Here’s the corresponding slide from my FOSS4G presentation:


The basic idea is to evaluate the positional accuracy of a street graph by comparing it with a reference graph. To do that, we check how much of the graph lies within a certain tolerance (buffer) of the reference graph.

The processing model uses the following input: the two street graphs which should be compared, the size of the buffer (tolerance for positional accuracy), a polygon layer with analysis regions, and the field containing the region id. This is how the model looks in Processing modeler:


First, all layers are reprojected into a common CRS. This will have to be adjusted if the tool is used in other geographic regions. Then the reference graph is buffered and – since I found that dissolving buffers directly in the buffer tool can become very slow with big datasets – the faster difference tool is used to dissolve the buffers before we calculate the graph length inside the buffer (inbufLEN) as well as the total graph length in the analysis region (totalLEN). Finally, the two results are joined based on the region id field and the percentage of graph length within the buffered reference graph (inbufPERC) is calculated. A high percentage shows that both graphs agree very well geometrically.

The following image shows the tool applied to a sample of OpenStreetMap (red) and official data published by the city of Vienna (purple) at Wien Handelskai. OSM was used as a reference graph and the buffer size was set to 10 meters.


In general, both graphs agree quite well. The percentage of the official graph within 10 meters of the OSM graph is 93% in the 20th district. In the above image, we can see that links available in OSM are not contained in the official graph (mostly pedestrian/bike links) and there seem to be some connectivity issues as well in the upper right corner of the image.

In my opinion, Processing models are a great solution to document geoprocessing work flows and share them with others. If you want to collaborate on building more models for OSM-related analysis, just leave a comment bellow.

Public transport isochrones with pgRouting

This post covers a simple approach to calculating isochrones in a public transport network using pgRouting and QGIS.

For this example, I’m using the public transport network of Vienna which is loaded into a pgRouting-enable database as network.publictransport. To create the routable network run:

select pgr_createTopology('network.publictransport', 0.0005, 'geom', 'id');

Note that the tolerance parameter 0.0005 (units are degrees) controls how far link start and end points can be apart and still be considered as the same topological network node.

To create a view with the network nodes run:

create or replace view network.publictransport_nodes as
select id, st_centroid(st_collect(pt)) as geom
from (
	(select source as id, st_startpoint(geom) as pt
	from network.publictransport
	(select target as id, st_endpoint(geom) as pt
	from network.publictransport
) as foo
group by id;

To calculate isochrones, we need a cost attribute for our network links. To calculate travel times for each link, I used speed averages: 15 km/h for buses and trams and 32km/h for metro lines (similar to data published by the city of Vienna).

alter table network.publictransport add column length_m integer;
update network.publictransport set length_m = st_length(st_transform(geom,31287));

alter table network.publictransport add column traveltime_min double precision;
update network.publictransport set traveltime_min = length_m  / 15000.0 * 60; -- average is 15 km/h
update network.publictransport set traveltime_min = length_m  / 32000.0 * 60 where "LTYP" = '4'; -- average metro is 32 km/h

That’s all the preparations we need. Next, we can already calculate our isochrone data using pgr_drivingdistance, e.g. for network node #1:

create or replace view network.temp as
 SELECT seq, id1 AS node, id2 AS edge, cost, geom
  FROM pgr_drivingdistance(
    'SELECT id, source, target, traveltime_min as cost FROM network.publictransport',
    1, 100000, false, false
  ) as di
  JOIN network.publictransport_nodes pt
  ON di.id1 =;

The resulting view contains all network nodes which are reachable within 100,000 cost units (which are minutes in our case).

Let’s load the view into QGIS to visualize the isochrones:


The trick is to use data-defined size to calculate the different walking circles around the public transport stops. For example, we can set up 10 minute isochrones which take into account how much time was used to travel by pubic transport and show how far we can get by walking in the time that is left:

1. We want to scale the circle radius to reflect the remaining time left to walk. Therefore, enable Scale diameter in Advanced | Size scale field:


2. In the Simple marker properties change size units to Map units.
3. Go to data defined properties to set up the dynamic circle size.


The expression makes sure that only nodes reachable within 10 minutes are displayed. Then it calculates the remaining time (10-"cost") and assumes that we can walk 100 meters per minute which is left. It additionally multiplies by 2 since we are scaling the diameter instead of the radius.

To calculate isochrones for different start nodes, we simply update the definition of the view network.temp.

While this approach certainly has it’s limitations, it’s a good place to start learning how to create isochrones. A better solution should take into account that it takes time to change between different lines. While preparing the network, more care should to be taken to ensure that possible exchange nodes are modeled correctly. Some network links might only be usable in one direction. Not to mention that there are time tables which could be accounted for ;)

Improving Population Density Maps Using Dasymetric Mapping

Yesterday, I described my process to generate a basic population density map from the city of Vienna’s open government data. In the end of that post, I described some ideas for further improvement. Today, I want to follow-up on those ideas using what is known as dasymetric mapping. GIS Dictionary defines it well (much better than Wikipedia):

Dasymetric mapping is a technique in which attribute data that is organized by a large or arbitrary area unit is more accurately distributed within that unit by the overlay of geographic boundaries that exclude, restrict, or confine the attribute in question.
For example, a population attribute organized by census tract might be more accurately distributed by the overlay of water bodies, vacant land, and other land-use boundaries within which it is reasonable to infer that people do not live.

That’s exactly what I want to do: Based on subdistricts with population density values and auxiliary data – Corine Land Cover to be exact – I want to create an improved representation of population density within the city.

This is the population density map I start out with:

… and this is the Corine Land Cover dataset for the same area:

It shows built-up areas (red), parks and natural areas (green) as well as water-covered regions (blue). For further analysis, I follow the assumption that people only live in areas with Corine code 111 “Continuous urban fabric” and 112 “Discontinuous urban fabric”. Therefore, I use the Intersection tool to clip only these residential areas from the subdistrict polygons. The subdistrict population can now be distributed over these new, smaller areas (use Field Calculator) to create a more realistic visualization of population density:

For easier comparison, I put the original density and the dasymetric map into a looping animation. Some subdistricts change their population density values quite drastically, especially in regions where big parts covered by water or rail infrastructure were removed:

Corine Land Cover is not too detailed but I think it still usable on this scale. One thing to note is that I used data from 2006 with population data from 2012 so some areas in the outer districts will have been turned residential in the meantime. But I hope this doesn’t distort the overall picture too much.

Mapping OGDWien Population Density

The city of Vienna provides both subdistrict geometries and population statistics. Mapping the city’s population density should be straightforward, right? Let’s see …

We should be able to join on ZBEZ and SUB_DISTRICT_CODE, check! But what about the actual population counts? Unfortunately, there is no file which simply lists population per subdistrict. The file I found contains four lines for each subdistrict: females 2011, males 2011, females 2012 and males 2012. That’s not the perfect format for mapping general population density.

A quick way to prepare our input data is applying pivot tables, eg. in Open Office: The goal is to have one row per subdistrict and columns for population in 2011 and 2012:

Export as CSV, add CSVT and load into QGIS. Finally, we can join geometries and CSV table:

A quick look at the joined data confirms that each subdistrict now has a population value. But visualizing absolute values results in misleading maps. Big subdistricts with only average density will overrule smaller but much denser subdistricts:

That’s why we need to calculate population density. This is easy to do using Field Calculator. The subdistrict file already contains area values but even if they were missing, we could calculate it using the $area operator: "pop2012" / ($area / 10000). The resulting population density in population per ha finally shows which subdistricts are the most densely populated:

One could argue that this is still no accurate representation of population density: Big parts of some subdistricts are actually covered by water – especially along the Danube – and therefore uninhabited. There are also big parks which could be excluded from the subdistrict area. But that’s going to be the topic of another post.

If you want to use my results so far, you can download the GeoJSON file from Github.

WFS to PostGIS in 3 Steps

This is a quick note on how to download features from a WFS and import them into a PostGIS database. The first line downloads a zipped Shapefile from the WFS. The second one unzips it and the last one loads the data into my existing “gis_experimental” database:

wget "" -O
unzip -d /tmp
shp2pgsql -s 4326 -I -S -c -W Latin1 "/tmp/BEZIRKSGRENZEOGD.shp" | psql gis_experimental

Now, I’d just need a loop through the WFS Capabilities to automatically fetch all offered layers … Ideas anyone?

Thanks to Tim for his post “Batch importing shapefiles into PostGIS” which was very useful here.

Update: Many readers have pointed out that ogr2ogr is a great tool for this kind of use cases and can do the above in one line. That’s true – if it works. Unfortunately, it is picky about the supported encodings, e.g. doesn’t want to parse ISO-8859-15. In such cases, the three code lines above can be a good alternative.

Green Vienna – Tree Cadastre

The Viennese tree cadastre “Baumkataster” contains all trees growing on city property. Amongst the attributes you can find the tree species and year of plantation as well as the diameter of it’s crown.

Using the crown diameter, we can create a foliage map of the city of Vienna. (Note that trees on private ground are missing.)

trees in Vienna scaled by crown diameter

Here’s a high-resolution version of the map: PDF (10 MB)

The tree cadastre is still under construction. If you miss a tree, let the OGD team know.

Finding Open Data – aims to be the most comprehensive list of open data catalogs in the world. [...] The alpha version of was launched at OKCon 2011 in Berlin.

As of today, 134 data catalogs have been registered. From Dalian, China to Rhode Island, data from all over the world is ready to be discovered.

Open data and open GIS – 2011 is an exciting year!

Open Government Data Wien in QGIS

Vienna’s Open Government Data initiative is publishing an increasing amount of Geodata and the best thing is: They’re publishing it via open standardized services! Both WMS and WFS are available and ready to be used in QGIS.

Let’s see how we can use the data available through their WFS using the district information layer “Bezirksgrenzen” as an example. The page lists a GML and a GeoJSON version of WFS. For now, we’ll use GeoJSON.

In QGIS, the layer can be loaded using “Add vector layer” – “Protocol” and inserting the GeoJSON url there. The encoding should be changed to ISO8859-15 to account for “Umlaute”.

The loaded GeoJSON layer "Bezirksgrenzen"

Now, we have geodata. Let’s add some attribute data too! Attribute data is available in CSV format. After downloading e.g. some information on the district population, check the file content and remove excessive header lines so that there is only one header line containing attribute names left. Then, you can load the CSV file into QGIS too (“Add vector layer”).

Last step: Joining geodata and attribute data! Go to the vector layer’s properties – Join tab and add the following join relation:

Joining GeoJSON and attribute layer

Now, the attribute table of the vector layer contains the additional CSV attributes – ready for further analysis. If you want to classify based on numerical CSV attributes, you’ll have to create a .csvt file first otherwise all fields are interpreted as texts.

Works great. Thumbs up for this great initiative!

  • Page 1 of 1 ( 8 posts )
  • opengovermentdata

Back to Top