Tag: en_gb

New Year's present - QField 1.0 RC1

It was a long and winding road but we are very excited to announce the general availability of QField 1.0 Release Candidate 1.

Packed with loads of useful features like online and offline features digitizing, geometry and attributes editing, attribute search, powerful forms, theme switching, GPS support, camera integration and much more, QField is the powerful tool for those who need to edit on the go and would like to avoid standing in the swamp with a laptop or paper charts.

With a slick user interface, QField allows using QGIS projects on tablets and mobile devices. Thanks to the QGIS rendering engine, the map-results are identical and come with the full range of styling possibilities available on the desktop.

We ask you to help us test as much as possible this Release Candidate so that we can iron out as many bugs as possible before the final release of QField 1.0.

You can easily install QField using the playstore ( https://qfield.org/get), find out more on the documentation site ( https://qfield.org) and report problems to our issues tracking system ( https://qfield.org/issues)

QField, like QGIS, is an open source project. Everyone is welcome to contribute to make the product even better - whether it is with financial support, enthusiastic programming, translation and documentation work or visionary ideas.

If you want to help us build a better QField or QGIS, or need any services related to the whole QGIS stack don’t hesitate to contact us.

Learn More

Thoughts on “FOSS4G/SOTM Oceania 2018”, and the PyQGIS API improvements which it caused

Last week the first official “FOSS4G/SOTM Oceania” conference was held at Melbourne University. This was a fantastic event, and there’s simply no way I can extend sufficient thanks to all the organisers and volunteers who put this event together. They did a brilliant job, and their efforts are even more impressive considering it was the inaugural event!

Upfront — this is not a recap of the conference (I’m sure someone else is working on a much more detailed write up of the event!), just some musings I’ve had following my experiences assisting Nathan Woodrow deliver an introductory Python for QGIS workshop he put together for the conference. In short, we both found that delivering this workshop to a group of PyQGIS newcomers was a great way for us to identify “pain points” in the PyQGIS API and areas where we need to improve. The good news is that as a direct result of the experiences during this workshop the API has been improved and streamlined! Let’s explore how:

Part of Nathan’s workshop (notes are available here) focused on a hands-on example of creating a custom QGIS “Processing” script. I’ve found that preparing workshops is guaranteed to expose a bunch of rare and tricky software bugs, and this was no exception! Unfortunately the workshop was scheduled just before the QGIS 3.4.2 patch release which fixed these bugs, but at least they’re fixed now and we can move on…

The bulk of Nathan’s example algorithm is contained within the following block (where “distance” is the length of line segments we want to chop our features up into):

for input_feature in enumerate(features):
    geom = feature.geometry().constGet()
    if isinstance(geom, QgsLineString):
        continue
    first_part = geom.geometryN(0)
    start = 0
    end = distance
    length = first_part.length()

    while start < length:
        new_geom = first_part.curveSubstring(start,end)

        output_feature = input_feature
        output_feature.setGeometry(QgsGeometry(new_geom))
        sink.addFeature(output_feature)

        start += distance
        end += distance

There’s a lot here, but really the guts of this algorithm breaks down to one line:

new_geom = first_part.curveSubstring(start,end)

Basically, a new geometry is created for each trimmed section in the output layer by calling the “curveSubstring” method on the input geometry and passing it a start and end distance along the input line. This returns the portion of that input LineString (or CircularString, or CompoundCurve) between those distances. The PyQGIS API nicely hides the details here – you can safely call this one method and be confident that regardless of the input geometry type the result will be correct.

Unfortunately, while calling the “curveSubstring” method is elegant, all the code surrounding this call is not so elegant. As a (mostly) full-time QGIS developer myself, I tend to look over oddities in the API. It’s easy to justify ugly API as just “how it’s always been”, and over time it’s natural to develop a type of blind spot to these issues.

Let’s start with the first ugly part of this code:

geom = input_feature.geometry().constGet()
if isinstance(geom, QgsLineString):
    continue
first_part = geom.geometryN(0)
# chop first_part into sections of desired length
...

This is rather… confusing… logic to follow. Here the script is fetching the geometry of the input feature, checking if it’s a LineString, and if it IS, then it skips that feature and continues to the next. Wait… what? It’s skipping features with LineString geometries?

Well, yes. The algorithm was written specifically for one workshop, which was using a MultiLineString layer as the demo layer. The script takes a huge shortcut here and says “if the input feature isn’t a MultiLineString, ignore it — we only know how to deal with multi-part geometries”. Immediately following this logic there’s a call to geometryN( 0 ), which returns just the first part of the MultiLineString geometry.

There’s two issues here — one is that the script just plain won’t work for LineString inputs, and the second is that it ignores everything BUT the first part in the geometry. While it would be possible to fix the script and add a check for the input geometry type, put in logic to loop over all the parts of a multi-part input, etc, that’s instantly going to add a LOT of complexity or duplicate code here.

Fortunately, this was the perfect excuse to improve the PyQGIS API itself so that this kind of operation is simpler in future! Nathan and I had a debrief/brainstorm after the workshop, and as a result a new “parts iterator” has been implemented and merged to QGIS master. It’ll be available from version 3.6 on. Using the new iterator, we can simplify the script:

geom = input_feature.geometry()
for part in geom.parts():
    # chop part into sections of desired length
    ...

Win! This is simultaneously more readable, more Pythonic, and automatically works for both LineString and MultiLineString inputs (and in the case of MultiLineStrings, we now correctly handle all parts).

Here’s another pain-point. Looking at the block:

new_geom = part.curveSubstring(start,end)
output_feature = input_feature
output_feature.setGeometry(QgsGeometry(new_geom))

At first glance this looks reasonable – we use curveSubstring to get the portion of the curve, then make a copy of the input_feature as output_feature (this ensures that the features output by the algorithm maintain all the attributes from the input features), and finally set the geometry of the output_feature to be the newly calculated curve portion. The ugliness here comes in this line:

output_feature.setGeometry(QgsGeometry(new_geom))

What’s that extra QgsGeometry(…) call doing here? Without getting too sidetracked into the QGIS geometry API internals, QgsFeature.setGeometry requires a QgsGeometry argument, not the QgsAbstractGeometry subclass which is returned by curveSubstring.

This is a prime example of a “paper-cut” style issue in the PyQGIS API. Experienced developers know and understand the reasons behind this, but for newcomers to PyQGIS, it’s an obscure complexity. Fortunately the solution here was simple — and after the workshop Nathan and I added a new overload to QgsFeature.setGeometry which accepts a QgsAbstractGeometry argument. So in QGIS 3.6 this line can be simplified to:

output_feature.setGeometry(new_geom)

Or, if you wanted to make things more concise, you could put the curveSubstring call directly in here:

output_feature = input_feature
output_feature.setGeometry(part.curveSubstring(start,end))

Let’s have a look at the simplified script for QGIS 3.6:

for input_feature in enumerate(features):
    geom = feature.geometry()
    for part in geom.parts():
        start = 0
        end = distance
        length = part.length()

        while start < length:
            output_feature = input_feature
            output_feature.setGeometry(part.curveSubstring(start,end))
            sink.addFeature(output_feature)

            start += distance
            end += distance

This is MUCH nicer, and will be much easier to explain in the next workshop! The good news is that Nathan has more niceness on the way which will further improve the process of writing QGIS Processing script algorithms. You can see some early prototypes of this work here:

So there we go. The process of writing and delivering a workshop helps to look past “API blind spots” and identify the ugly points and traps for those new to the API. As a direct result of this FOSS4G/SOTM Oceania 2018 Workshop, the QGIS 3.6 PyQGIS API will be easier to use, more readable, and less buggy! That’s a win all round!

Learn More

Add Realistic Mist and Fog to Topography in QGIS 3.2

I recently came across a great tutorial by in which he demonstrated how to create map of Switzerland in the style of Edward Imhof, the famed Swiss cartographer renowned for his hand painted maps of Switzerland and other mountainous regions of the world. John’s map used traditional hillshading, multidirectional hillshading and crucially, a translucent topographic layer that created a mist like appearance he likened to the sfumato technique used by painters since the Renascence.

I followed John’s tutorial in QGIS 3.2 and I was quite pleased with the initial result below. However, the process creating it is a bit too complicated for a tutorial so I set about simplifying the process and rather than imitating Imhof’s distinct style, my goal this time is realism.

The heart of the effect involves the very clever idea of using the topographic layer as a subtle opacity mask to simulate mist, fog and atmospheric haze. Have a look at the image below taken on March 17th, 2005 by NASA’s Terra satellite. This is the industrialised Po valley of northern Italy, surrounded by the Alps and Apennine Mountains that rise above the valley’s hazy pollution. The haze adds a sense of depth to the surrounding hills and mountains. It’s not uncommon to see fog and pollution in satellite imagery that gives way to the clear air in high mountains e.g. northern India and Nepal, China, Pakistan and India. Creating a similar mist effect in QGIS is actually quite simple.

First download topography for the Alps and Po region (a 68.55 Mb GeoTiff file derived from freely available EU-DEM data I resampled from 25 to 100m resolution). Next, make sure you have the plugin QuickMapServics (QMS) installed (menu Plugins – Manage and Install Plugins). This great plugin provides access to over 1000 basemaps.

Load the GeoTiff file into QGIS (Raster – Load) and rename the layer Hillshade. Right click the layer to open the Layer Properties window. In the Symbology panel, next to Render Type, choose Hillshade. Change the altitude to 35 degrees, Azimuth to 300 degrees and Z Factor of 1.5 (illuminating the landscape from the top left). Finally, change the Blending mode to Multiply. Click OK to close the dialogue.

To add the basemap layer, Esri World Imagery (Clarity), type “ESRI clarity” in the QMS search bar to find and add the basemap; Go to View – Panels and activate the QMS search bar if it isn’t initially visible. Make sure it’s the bottommost layer.

Oh, that’s a bit disappointing, we only increased the relief little a bit. It’s missing the vitally important mist layer.

To create mist, right click the Hillshade layer and choose Duplicate. Rename the new layer Mist and make sure it’s above the Hillshade layer. Now open the Layer Properties window of the layer, we’re going edit it’s attributes to make it look like mist.

Change the Render type to Singleband Pseudocolor and use 0 and 3000 for the min and max values (limiting maximum latitude of the mist to 3000 meters). Then open the colour ramp window by clicking on the Color ramp and enter these values:

  • Left Gradient – HSV 215 15 50 and 75% transparency
  • Right Gradient – HSV 215 15 50 and 0% transparency

Close the Color Ramp dialogue. In the Layer Properties window, and this is very important, change the Blending mode to Lighten. Click OK to close the Layer Properties window.

Wow, we have mist!

The mist effect looks great. It certainly adds a lot of realism to the topographic map, it now looks quite like NASA’s images. This is just a quick and basic map so there’s lots of scope to improve the effect. Play around with the colour of the mist layer and its opacity, or even brighten the Hillshade layer underneath. See what effects these changes have.

Here’s another example below. In this example I duplicated the hillshade layer and set the second hillshade layer to Multidirectional Hillshading (yes, QGIS 3.2 has Multidirectional Hillshading). I then adjusted the transparency of both hillshade layers so they blended together nicely. I then replaced the basemap with another duplicated topography layer that I coloured using the gradient sd-a (by Jim Mossman, 2005) using the cpt-city plugin. And lastly, I doubled the opacity of the mist layer turning it into a milky fog. I think it looks great!

What next? Well, there’s lots of possibilities. Perhaps download Martian topography and add mist to the bottom of Valles Marineris?

References:

Eduard Imhof – Biography

TV documentary about Eduard Imhof

The Map as an Artistic Territory: Relief Shading Works and Studies by Eduard Imhof

Haze in northern Italy – NASA Terra Satellite

Tzvetkov, J., 2018. Relief visualization techniques using free and open source GIS tools. Polish Cartographical Review, 50(2), pp.61-71.
Learn More

OpenCL acceleration now available in QGIS

What is OpenCL?

From https://en.wikipedia.org/wiki/OpenCL:

OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages (based on C99 and C++11) for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

Basically, you write a program and you execute it on a GPU (or, less frequently, on a CPU or on a DSP) taking advantage of the huge parallel programming capabilities of the modern graphic cards.

Depending on many different factors, the speed gain can vary to a great extent, but it is typically around one order of magnitude.

How QGIS benefits from OpenCL?

The work I’ve done consisted in integrating OpenCL support into QGIS and writing all the utilities to load, build and run OpenCL programs.

For now, I’ve ported the following QGIS core algorithms, all of them are availabe in processing:

  • slope
  • aspect
  • hillshade
  • ruggedness

Since the framework to support OpenCL is now in place, I think that more algorithms will be ported over the time.

During this development, even if was not in scope, the hillshade renderer has been optimized for speed and it can also benefit of OpenCL acceleration.

How to activate OpenCL support

OpenCL support is optional and opt-in, to use it, you need to activate it into the QGIS options dialog like shown in the screenshot below:

How much performance gain can I expect?

Well, YMMV, but here are some figures for a big DEM raster, low values mean faster execution.

GDAL means CPU execution using the GDAL processing algorithm.

How to install the OpenCL drivers?

Of course it depends on your specific hardware and on your O.S., AMD, NVidia and Intel have different distributions channels, in general the driver for your graphic card will also provide the OpenCL driver, if your GPU is compatible, if OpenCL is not available on your current machine, try to Google for OpenCL, your O.S. and graphic card.

If there is no OpenCL support for your graphic card, you might try to install a driver for your GPU (Intel for example provides them) and you will probably have a decent acceleration even if not as much as you can get on a real graphic card.

This fact worths some more explanation: you might ask your self why running and algorithm directly on the CPU and running it on the same CPU but using OpenCL would make any difference and the reason why it is generally faster by using OpenCL is that OpenCL will run the algorithm in parallel on all cores of your CPU, while a normal application (and QGIS does not make an exception here) will use a single core.

How to build QGIS with OpenCL support on Ubuntu

Just a quick note: you’ll need to install the OpenCL headers and the ICD library:

sudo apt-get install opencl-headers ocl-icd-opencl-dev

 

Credits

I started this work as a proof of concept in my spare time (that it is not much, lately) and when I realized that it was promising, I submitted a QGIS grant proposal in order to allocate some working time to port more algorithms, write tests and polish the implementation.

This work would not be possible without all the generous sponsors and donors that feed the QGIS grant program year after year, many thanks to the QGIS community for this amazing support!

Jürgen Fischer was as usual very helpful and took care of the windows builds, now available in OSGeo4W packages.

Nyall Dawson helped with the code review and with testing the implementation on different cards and machines.

Matthias Kuhn reviewed the code.

Even Rouault pointed me to some highly efficient GDAL algorithm optimizations that I’ve been able to integrate in QGIS.

 

 

Learn More

Create a QGIS vector data provider in Python is now possible

 

Why python data providers?

My main reasons for having Python data provider were:

  • quick prototyping
  • web services
  • why not?

 

This topic has been floating in my head for a while since I decided to give it a second look and I finally implemented it and merged for the next 3.2 release.

 

How it’s been done

To make this possible I had to:

  • create a public API for registering the providers
  • create the Python bindings (the hard part)
  • create a sample Python vector data provider (the boring part)
  • make all the tests pass

 

First, let me say that it wasn’t like a walk in the park: the Python bindings part is always like diving into woodoo and black magic recipes before I can get it to work properly.

For the Python provider sample implementation I decided to re-implement the memory (aka: scratch layers) provider because that’s one of the simplest providers and it does not depend on any external storage or backend.

 

How to and examples

For now, the main source of information is the API and the tests:

To register your own provider (PyProvider in the snippet below) these are the basic steps:

metadata = QgsProviderMetadata(PyProvider.providerKey(), PyProvider.description(), PyProvider.createProvider)
QgsProviderRegistry.instance().registerProvider(metadata)

To create your own provider you will need at least the following components:

  • the provider class itself (subclass of QgsVectorDataProvider)
  • a feature source (subclass of QgsAbstractFeatureSource)
  • a feature iterator (subclass of QgsAbstractFeatureIterator)

Be aware that the implementation of a data provider is not easy and you will need to write a lot of code, but at least you could get some inspiration from the existing example.

 

Enjoy wirting data providers in Python and please let me know if you’ve fond this implementation useful!

Learn More

How to filter features in QGIS using the graphical processing modeler

This article describes a new algorithm for the processing modeler called feature filter algorithm. If you are already familiar with ETL concepts and the graphical modeler, you can directly jump to the section the feature filter algorithm.

Building workflows for repetitive tasks

When building workflows for simple or complex geodata infrastructures, one of the most common tasks one encounters is to extract some of the features and copy them to another destination. Sometimes they need to be modified and a few attributes calculated or deleted, maybe even the geometry needs to be adjusted or in some fancy situations one even wants to generate a couple of objects from one input object. This process is often called ETL (Extract, Transform, Load) and it is something that is worth mastering as a GIS expert. Let’s imagine a situation where we sent a field worker out to collect information about public infrastructure, equipped with a brand-new tablet and the latest and greatest version of QField. To make his task super easy, we prepare one single layer for him with an attribute type which can be set to Bus Station, Car Parking or Train Station. Now back in the office we want to integrate this back into our spatially enabled database which has been designed with 3 target tables.

Easy enough to go to QGIS and select those features by type one after the other and use a bit of copy-paste. And maybe fiddling a bit with the attributes. But hey, after all we are a bit lazy and on the one hand like to have an ice cream later on that afternoon and on the other hand like to avoid errors. Those who are lucky enough to know SQL and have full access to the database are well enough equipped to do the job.

Short introduction to the graphical modeler

For those who just want to quickly do this job visually within QGIS, there is a tool called modeler in the processing plugin. With the help of this tool it is straightforward for everyone to automate processes. To get started with the modeler, simply enable the processing plugin and click on Processing > Graphical Modeler. Within the modeler, there are Inputs and Algorithms available. Inputs are there to define variables, algorithms on the other hand transform those variables. In its most simple form, there is one vector feature source (a layer) as input and one algorithm, for example a fixed distance buffer which in turn has one output layer with all buffered features. Such a model can be saved and reused. To run a model directly from the modeler, click the play button on top. Once saved, it appears in the processing toolbox. Every time a model is run, the input layer can be handed to the model. Or it can even run in batch mode on a list of layers or files. With this in place, the job of doing the buffer can now be run on 200 input layers without any manual interaction. Simple as that. Pro tip: processing models do not have to be complex. They can also be used to preconfigure single algorithms so when an algorithm is run, the parameters which you never change are predefined already. For example you can add a Simplify geometries to 1 meter algorithm which only takes a layer as parameter and has the 1 meter tolerance built-in.

The feature filter algorithm

Now back to the job of splitting the infrastructure layer into 3 different layers. To do this job visually and easily within QGIS, there is now a new algorithm available in QGIS 3.2. It is called Feature Filter and available in the processing modeler. To make use of it, we open the processing modeler and first add a new Vector Features input and name it Infrastructure. Since we know in this project we will always deal with points, we can make already specify that in this first dialog.

Let’s now add a Feature Filter algorithm and use the following configuration: The Infrastructure layer is set as input, and we define three outputs for Train Stations, Bus Stations and Car Parking. All layers will be final outputs on which no further transformations will be applied within this model and they will be directly written to a new layer.

Now it’s time to run our new model and check that it does what it promised. We can also uncheck the final output checkbox and send filtered features to further processing algorithms. For example sending them through a buffer based on an attribute size (although as a QGIS professional you know you should rather be using styles than modifying the geometry in most situations in such cases).

Conclusion

With this new algorithm built directly inside the core of QGIS, the processing framework is now able to transform and refine features of a dataset with the same precision as an open heart surgery. Of course you can get more creative in the filter criteria. Apart from the obvious ones to do geometry modifications, there are two particularly interesting ones if you liked this one

  • The Refactor Fields algorithm allows calculating new fields or rename fields based on expressions
  • The Append plugin allows adding those features to an existing vector layer such as a database table

The data from this walkthrough is available for download as [download id=“3917”]. If you would like to test this new feature but do not yet have a concrete use-case in mind, here is a task for you: get an openstreetmap extract, import it using ogr2ogr and split the lines into different layers roads, rivers and railways, the polygons into lakes, forests and cities, the points according to your own liking. If there is big enough interest for this, we might write another blog post on this topic.

We would like to thank the QGIS user group Switzerland for making this project possible through funding.

Learn More

QGIS 3 Server deployment showcase with Python superpowers

Recently I was invited by the colleagues from OpenGIS.ch to lend a hand in a training session about QGIS server.

This was a good opportunity to update my presentation for QGIS3, to fix a few bugs and to explore the powerful capabilities of QGIS server and Python.

As a result, I published the full recipe of a Vagrant VM on github: https://github.com/elpaso/qgis3-server-vagrant

The presentation is online here: http://www.itopen.it/bulk/qgis3-server/

What’s worth mentioning is the sample plugins (I’ll eventually package and upload them to the official plugin site):

 

The VM uses 4 different (although similar) deployment strategies:

  • good old Apache + mod_fcgi and plain CGI
  • Nginx + Fast CGI
  • Nginx + standalone HTTP Python wrapped server
  • Nginx + standalone WSGI Python wrapped server

Have fun with QGIS server: it was completely refactored in QGIS 3 and it’s now better than ever!

 

Learn More

Use your android phone’s GPS in QGIS

Do you want to share your GPS data from your phone to QGIS? Here is how:   QGIS comes with a core plugin named GPS Tools that can be enabled in the Plugin installer dialog:   There are several ways to forward data from your phone and most of them are very well described in the QGIS manual page: https://docs.qgis.org/testing/en/docs/user_manual/working_with_gps/plugins_gps.html What I’m going to describe here is mostly useful when your phone and your host machine running QGIS are on the same network (for example they are connected to the same WiFi access point) and it is based on the simple application GPS 2 NET   Once the application is installed and started on your phone, you need to know the IP address of the phone, on a linux box you can simply run a port scanner and it will find all devices connected to the port 6000 (the default port used by GPS 2 NET):  
# Assuming your subnet is 192.168.9

nmap -p 6000 192.168.1.*

Nmap scan report for android-8899989888d02271.homenet.telecomitalia.it (192.168.99.50)
Host is up (0.0093s latency).
PORT STATE SERVICE
6000/tcp open X11

  Now, in QGIS you can open the plugin dialog through Vector -> GPS -> GPS Tools and enter the IP address and port of your GPS device:   Click on Connect button on the top right corner (mouse over the gray square for GPS status information)   Start digitizing!
Learn More

Welcome QGIS 3 and bye bye Madeira

Last week I’ve been in Madeira at the hackfest, like all the past events this has been an amazing happening, for those of you who have never been there, a QGIS hackfest is typically an event where QGIS developers and other pasionate contributors like documentation writers, translators etc. gather together to discuss the future of their beloved QGIS software. QGIS hackfest are informal events where meetings are scheduled freely and any topic relevant to the project can be discussed. This time we have brought to the table some interesting topics like:
  • the future of processing providers: should they be part of QGIS code or handled independently as plugins?
  • the road forward to a better bug reporting system and CI platform: move to gitlab?
  • the certification program for QGIS training courses: how (and how much) training companies should give back to the project?
  • SWOT analysis of current QGIS project: very interesting discussion about the status of the project.
  • QGIS Qt Quick modules for mobile QGIS app
Tehre were also some mentoring sessions where I presented:
  • How to set up a development environment and make your first pull request
  • How to write tests for QGIS (in both python and C++)
  At this link you can find all the video recordings of the sessions: https://github.com/qgis/QGIS/wiki/DeveloperMeetingMadeira2018   Here is a link to the Vagrant QGIS developer VM I’ve prepared for the session: https://github.com/elpaso/qgis-dev-vagrant/   I’ve got a good feedback from other devs about my sessions and I’m really happy that somebody found them useful, one of the main goals of a QGIS hackfest should really be to help other developers to ramp up quicly into the project. Other than that, I’ve also find the time to update to QGIS 3.0 some of my old plugins like GeoCoding and QuickWKT.   Thanks to Giovanni Manghi and to Madeira Government for the organizazion and thanks to all QGIS sponsors and donors!   About me: I started as a QGIS plugin author, continued as the developer of the plugin official repository at https://plugins.qgis.org and now I’m one of the top 5 QGIS core contributors. After almost 10 years that I’m in the QGIS project I’m now not only a proud member of the QGIS community but also an advocate for the open source GIS software movement.
Learn More

PostgreSQL back end solution for quality assurance and data archive

Did you know that the possibilities to make a full QGIS back end solution for quality assurance and archiving in PostgreSQL are immense? SQL has it’s well known limitations, but with a little bit creativity you can make quite nice solutions just using triggers and rules. In this post I’ll explain what we did lately based on a project with a customer. He needed to assure the consistency of data but still give his employees the possibility of a fast feeding of the data collected on the field to the database. Another request was to keep every status of the data with the information about the changes (archiving). It’s always the question, where to put the logical part of the solution. QGIS is quite powerful with constraints, but the undeniable advantage of a back end solution is, that you can use any front end - no matter what configuration you have on QGIS or what Feature Manipulation Engine (FME) you use - without influencing the guarantee of data validity.

Situation

It’s all about trees

At least for that customer we got lately. The customer owns pieces of land all over Switzerland. On this pieces are forests and in the forests are - as expected - trees. Well, mostly - if you are not a bark beetle or a squirrel - you don’t care about a single tree. Except if there is something special with it. For example, a branch that could fell down on your brand new Citroën DS or if the tree has a disease that could kill the whole forest, that is actually needed to convert the carbon dioxide (from your DS) into oxygen. The issuetrees (yellow) lie on the forest (green) - and the forest lies on the land piece (brown). And the (Entity Relationship Model) ERM looks like this. A land can have zero, one or more forests - and a forest can have zero, one or more trees with issues.

It’s not really about trees

The situation is, that a lot of field workers (so called tree-inspectors) work with our mobile solution QField, where they can collect the data while standing in the middle of a wild forest with one foot in a rabbit hole and the other one in the stinging nettle. It’s quite possible and usual that there can be some problems entering all the data correctly. Typing issues on the tablet while running away from wolves or just lack of concentration because of the beauty of the swiss forests.

And it’s about lots of front ends

But there are not only the tree-inspectors. There are the office-clerks working with QGIS and planning, when the problems on the tree has to be solved. And finally there are the woodsmen solving the issues and setting the status to done on QField again. So there have to be a lot of projects using the same data but with different configurations. If you make all the quality assurance on the front end you won’t have time to care about the trees anymore and beside of that it’s fault-prone.

Quality assurance in the back end

Data integrity with constraint functions

There are simple constraints like that a field is not empty and more complex constraints with a lot of logic regarding the content of the field.

Simple constraints

Lots of data integrity issues can be solved by using simple constraints like NOT NULL (column must not assume null), UNIQUE (column must be unique among all the rows in table) or Primary Key and Foreign Keys constraints.

CREATE TABLE live.issuetree (
  issuetree_id integer UNIQUE NOT NULL,
  gps_id text NOT NULL,
  issue text,
  assignee text,
  done boolean,
  donedate date,
  forest_id NOT NULL,
  CONSTRAINT issuetree_pkey PRIMARY KEY (issuetree_id),
  CONSTRAINT forest_fkey FOREIGN KEY (forest_id) REFERENCES live.forest;
);

Checks and constraint functions

For more special cases or not really technical constraints, we can use checks. Here for example: If the issue is done, then it needs to have a donedate. But not if done is not TRUE (NULL or FALSE).

CHECK (done IS NOT TRUE OR donedate IS NOT NULL )

And if these cases are more complex and not technical at all, we can put it to a function and use the return value (for example the error message) as condition. In the following example we want to assure that assignee is the name of one of the employed woodsmen. Of course it can be NULL too.

CONSTRAINT chk_assignee_valid CHECK (live.chk_assignee_valid(assignee) = ''::text)

And the function live.chk_assignee_valid:

CREATE OR REPLACE FUNCTION live.chk_assignee_valid(a_assignee text ) RETURNS text
LANGUAGE plpgsql AS $$
DECLARE
result text;
BEGIN
  IF (
    SELECT TRUE
    WHERE a_assignee NOT IN ( 'Fritz Fangorn' ,  'Fiona Finglas',  'Fred Fladrif', 'Barbara Beechbone', 'Berthold Bregalad' )
          AND a_assignee IS NOT NULL
  ) THEN
    result='The assignee has to be one of these guys: Fritz Fangorn, Fiona Finglas, Fred Fladrif, Barbara Beechbone, Berthold Bregalad';
    RAISE EXCEPTION '%', result;
  ELSE
    RAISE NOTICE 'CHECK chk_assignee_valid SUCCESSFUL';
    result='';
  END IF;
  RETURN result;
END;

So with many of these constraints, we can assure a lot and the data are fully correct. But this is not always comfortable to use. Why? Go on reading…

Using of a “data quarantine”

Let’s imagine that the tree-inspector collected all day data in QField. Standing in the middle of the mentioned stinging nettle and rabbit holes, running from wolves etc… Of course he made some mistakes while collecting data. In the evening he returns tired to the office, already thinking about the dinner meal his wife is cooking (or his husband, of course), and wants to upload the data from the QField project to the database. And what happens? Lot’s of error messages. He thinks about to solve them tomorrow, because his wife (or his husband) can get quite angry when he is late for dinner. But if he does it tomorrow, the data are only stored on the device and nowhere else overnight. He need to have them in the database. No matter, if correct or not. And this leads to the idea of the “data quarantine”.

Use Case

All data entered to the database (valid or not) need to be stored. The entries accepted from the so called live tables with all constraints, are stored normally. The entries failed because of the constraint, are stored in another table. In the so called quarantine table. So you have for every live table another quarantine table. This means, we need another table structure existing parallel to the live tables. We do it in two schemas: The live schema and the quarantine schema. So the tree-inspector synchronizes his QField without any problem to the database. The correct entries are written into the live tables. The incorrect into the quarantine. Actually all the data are coming into the quarantine and there is a Trigger passing them through to the live table. If they success, they will be stored in live and removed from quarantine. Otherwise they keeps staying in the quarantine. Same situation when the quarantine-clerk later corrects the data entries in the quarantine. On an update they are pushed into the live-table. If success, all good. Otherwise the entry keeps staying in the quarantine.

Structure

And how we do that?

It’s all solved by using triggers. SQL triggers are procedural code that are automatically executed on an action on a table or view. For this solution we actually need two trigger per quarantine table. After insert into or update quarantine table, a trigger should be fired for every entry, doing this: Insert the same entry into the live table. If success, then delete the entry in the quarantine table. Else write the info to the current entry in the quarantine table. Probably you noticed the problem with the recursion, but let’s not think about it at the moment :-)

Code

In PostgreSQL we can use trigger functions. Means you have the triggers on the table calling the functions.

Trigger on table quarantine.issuetree after update

CREATE TRIGGER pushtolive
AFTER UPDATE
ON quarantine.issuetree
FOR EACH ROW
EXECUTE PROCEDURE quarantine.pushtolive();

Trigger function (simplified)

CREATE OR REPLACE FUNCTION quarantine.pushtolive() RETURNS trigger AS $BODY$
BEGIN
  INSERT INTO live.issuetree
  SELECT * FROM quarantine.issuetree
  WHERE quarantine_serial = NEW.quarantine_serial
  RETURN NEW;
END; $BODY$
LANGUAGE plpgsql;

Trigger function used for the solution when inserting into live

And this is the function with the logical part with success and failing.

CREATE OR REPLACE FUNCTION quarantine.pushtolive() RETURNS trigger AS $BODY$
BEGIN
-- insert into live
INSERT INTO live.issuetree
SELECT * FROM quarantine.issuetree
WHERE quarantine_serial = NEW.quarantine_serial
RAISE NOTICE 'Inserted row in live.issuetree';
-- delete in quarantine
DELETE FROM quarantine.issuetree
WHERE quarantine_serial = NEW.quarantine_serial
RAISE NOTICE 'Deleted row in quarantine.issuetree';
-- return when here
RETURN NEW;
-- if it failed:
EXCEPTION WHEN OTHERS THEN
  RAISE NOTICE 'EXCEPTION: %', SQLERRM;
  UPDATE quarantine.issuetree
  SET fail_info = SQLERRM,
  WHERE NEW.quarantine_serial = quarantine_serial;
  RAISE NOTICE 'Updated row in quarantine.issuetree';
  RETURN NEW;
END; $BODY$

As you can see, we use here an id called quarantine_serial. We can not use the primary key in the quarantine, because here everything is accepted and so nothing of the entered data (not even issuetree_id) has to be be unique. But to identify the entry in the quarantine table we create the serial quarantine_serial.

Trigger function used for the solution when inserting into or update live

Actually the trigger function before is not usable. Because it works only to insert new data into the live system. Now we remember the use case. The trigger here in the quarantine does not know if the tree-inspector created a new issuetree or updated an old one. On synchronization he made an INSERT INTO to the quarantine with all entries. But these could be new entries (new trees) or already existing ones in the live table. So the trigger function has to decide, if it’s an insert or an update on the live table.

CREATE OR REPLACE FUNCTION quarantine.pushtolive() RETURNS trigger AS $BODY$
BEGIN
-- check if an entry with this id is already existing
IF( SELECT TRUE FROM live.issuetree WHERE issuetree_id = NEW.issuetree ) THEN
  -- update into live
  UPDATE live.issuetree
  SET gps_id = NEW.gps_id, issue = NEW.issue, assignee = NEW.assignee, done = NEW.done, date = NEW.date, forest_id = NEW.forest_id
  WHERE issuetree_id = NEW.issuetree_id
  RAISE NOTICE 'Updated row in live.issuetree';
ELSE
  -- insert into live
  INSERT INTO live.issuetree ( issuetree_id, gps_id, issue, assignee, done, date, forest_id )
  VALUES ( NEW.issuetree_id, NEW.gps_id, NEW.issue, NEW.assignee, NEW.done, NEW.date, NEW.forest_id )
  RAISE NOTICE 'Inserted row in live.issuetree';
END IF;
-- delete in quarantine
DELETE FROM quarantine.issuetree
WHERE quarantine_serial =NEW.quarantine_serial
RAISE NOTICE 'Deleted row in quarantine.issuetree';
-- return when here
RETURN NEW;
-- if it failed:
EXCEPTION WHEN OTHERS THEN
  RAISE NOTICE 'EXCEPTION: %', SQLERRM;
  UPDATE quarantine.issuetree
  SET fail_info=SQLERRM,
  WHERE NEW.quarantine_serial = quarantine_serial;
  RAISE NOTICE 'Updated row in quarantine.issuetree';
  RETURN NEW;
END; $BODY$

Recursion problem

The problem with the recursion is that we have a trigger after update of table issuetree in quarantine. This trigger calls the function, and the function (in case of fail updating live) updates the quarantine.issuetree with the error-message. So there is another update and the trigger is fired again, and again, and again… ♪Across the universe♬ We could solve the problem by checking the depth of triggers in PostgreSQL:

CREATE TRIGGER pushtolive AFTER UPDATE ON quarantine.issuetree
FOR EACH ROW
WHEN pg_trigger_depth() = 0
EXECUTE PROCEDURE quarantine.pushtolive();

And it looks like this

The yellow points are the issue trees in the live. If we create another one and have a mistake in it (GPS Id wrong), then it’s stored in the quarantine (pink). When we correct the data it’s written over the quarantine trigger into live. If succeeded, the point changes the color to yellow. Actually the yellow point appears (live) and the pink point(quarantine) disappears, because the entry is inserted into live and deleted in quarantine.

Archiving all data

There are different reasons why you need to archive data. Maybe somewhen you want to show your grandchildren, how much forest we still had today before the sky got dark. But this was not the reason for the mentioned customer, but legal reasons: When the woodsman cuts the last bamboo tree of the forest and this was the only food for the very last living panda bear of Switzerland, we need to know who created or changed this entry in the database and what tree should have been chopped down instead.

Third schema “archive”

So we created a third schema parallel to live and quarantine. The archive schema. This means every table in live does not only have a quarantine table accordingly, but also an archive table too where all the old status of entries including the timestamp, when it has been archived. Of course not only the changed live data are stored in the archive, but also every changed data from quarantine.

Use Case 1

The tree-inspector enters an entry of an issue tree that already existed in the live table to the quarantine (1). The after insert trigger is fired and it tries to write to the live table. And with success. The entry is written to the live table (2). This means, before the entry in live is updated, the old one was copied to the archive table (3). Then in the same transaction the entry in the quarantine is deleted (1). Means the old status is copied to the archive too (4). So there will be the updated entry in the live-table (2), no entry in the quarantine-table (1) and two entries (3 and 4) in the archive table.

Use Case 2

The tree-inspector enters an entry of an issue tree that already existed in the live table to the quarantine (1). The after insert trigger is fired and it tries to write to the live table. And it fails. The entry in the quarantine will be updated with the error-message (2). The old status is copied to archive (1). The office clerk makes no the changes to this entries. The trigger is fired and this time it could write into the live-table with success (3). So the old entry is copied to the archive (4) and after deleting the entry in the quarantine, there will be the second old status of quarantine (5) in archive too. So there will be the updated entry in the live-table (3), no entry in the quarantine-table (1 and 2) and three entries (1, 4 and 5) in the archive table.

Structure

And how we do that?

It’s solved by using triggers too. We actually need only one trigger per table, but not only in quarantine, but also in live. It has to be fired before every update of every entry, doing this: Insert a copy of the current entry into the archive table with the status it had until the update we are doing right now.

Code

It’s the same code for the live and the quarantine table triggers. So only the ones for the quarantine are explained.

Trigger on table quarantine.issuetree before update

CREATE TRIGGER archiving
BEFORE UPDATE
ON quarantine.issuetree
FOR EACH ROW
EXECUTE PROCEDURE quarantine.archiving();

Trigger Function (simplified)

CREATE OR REPLACE FUNCTION quarantine.archiving() RETURNS trigger AS $BODY$
BEGIN
  INSERT INTO archive.issuetree
  SELECT *
  FROM quarantine.issuetree
  WHERE NEW.quarantine_serial = quarantine_serial;
END; $BODY$
LANGUAGE plpgsql;

And the archive-tables have a default time-column to store the time, when the entry has been archived:

ar_time timestamp without time zone DEFAULT now()

That’s it

That’s what I just needed to tell you. It was a very interesting project and I liked working on it. Thanks for reading so far. If you have questions, improvement suggestions or anything else to tell me, then please comment it. See yah! :-)

Learn More