Page 1 of 4 (76 posts)

  • talks about »
  • python

Tags

Last update:
Fri Nov 15 16:40:14 2019

A Django site.

QGIS Planet

Folium vs. hvplot for interactive maps of Point GeoDataFrames

In the previous post, I showed how Folium can be used to create interactive maps of GeoPandas GeoDataFrames. Today’s post continues this theme. Specifically, it compares Folium to another dataviz library called hvplot. hvplot also recently added support for GeoDataFrames, so it’s interesting to see how these different solutions compare.

Minimum viable

The following snippets show the minimum code I found to put a GeoDataFrame of Points onto a map with either Folium or hvplot.

Folium does not automatically zoom to the data extent and I didn’t find a way to add the whole GeoDataFrame of Points without looping through the rows individually:

Hvplot on the other hand registers the hvplot function directly with the GeoDataFrame. This makes it as convenient to use as the original GeoPandas plot function. It also zooms to the data extent:

Standard interaction and zoom to area of interest

The following snippets ensure that the map is set to a useful extent and the map tools enable panning and zooming.

With Folium, we have to set the map center and the zoom. The map tools are Leaflet defaults, so panning and zooming work as expected:

Since hvplot does not come with mouse wheel zoom enabled by default, we need to set that:

Color by attribute

Finally, for many maps, we want to show the point location as well as an attribute value.

To create a continuous color ramp for a numeric value, we can use branca.colormap to define the marker fill color:

In hvplot, it is sufficient to specify the attribute of interest:

I’m pretty impressed with hvplot. The integration with GeoPandas is very smooth. Just don’t forget to set the geo=True parameter if you want to plot lat/lon geometries.

Folium seems less straightforward for this use case. Maybe I missed some option similar to the Choropleth function that I showed in the previous post.

Interactive plots for GeoPandas GeoDataFrames of LineStrings

GeoPandas makes it easy to create basic visualizations of GeoDataFrames:

However, if we want interactive plots, we need additional libraries. Folium (which is built on Leaflet) is a great option. However, all examples for plotting GeoDataFrames that I found focused on point or polygon data. So here is what I found to work for GeoDataFrames of LineStrings:

First, some imports:

import pandas as pd
import geopandas
import folium

Loading the data:

graph = geopandas.read_file('data/population_test-routes-geom.csv')
graph.crs = {'init' :'epsg:4326'}

Creating the map using folium.Choropleth:

m = folium.Map([48.2, 16.4], zoom_start=10)

folium.Choropleth(
    graph[graph.geometry.length>0.001],
    line_weight=3,
    line_color='blue'
).add_to(m)

m

I also tried using folium.PolyLine which seemed like the more obvious choice but does not seem to accept GeoDataFrames as input. Instead, it expects a list of coordinate pairs and of course it expects them to be in the opposite order that Shapely.LineString.coords provides … Oh the joys of geodata!

In any case, I had to limit the number of features that get plotted because Folium refuses to plot all 8778 features at once. I decided to filter by line length because drawing really short lines is pointless for my overview visualization anyway.

Configure editing form widgets using PyQGIS

PT | EN

As I was preparing a QGIS Project to read a database structured according to the new rules and technical specifications for the Portuguese Cartography, I started to configure the editing forms for several layers, so that:

  1. Make some fields read-only, like for example an identifier field.
  2. Configure widgets better suited for each field, to help the user and avoid errors. For example, date-time files with a pop-up calendar, and value lists with dropdown selectors.

Basically, I wanted something like this:

Peek 2019-09-30 15-04_2

Let me say that, in PostGIS layers, QGIS does a great job in figuring out the best widget to use for each field, as well as the constraints to apply. Which is a great help. Nevertheless, some need some extra configuration.

If I had only a few layers and fields, I would have done them all by hand, but after the 5th layer my personal mantra started to chime in:

“If you are using a computer to perform a repetitive manual task, you are doing it wrong!”

So, I began to think how could I configure the layers and fields more systematically. After some research and trial and error, I came up with the following PyQGIS functions.

Make a field Read-only

The identifier field (“identificador”) is automatically generated by the database. Therefore, the user shouldn’t edit it. So I had better make it read only

Layer Properties - cabo_electrico | Attributes Form_103

To make all the identifier fields read-only, I used the following code.

def field_readonly(layer, fieldname, option = True):
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        form_config = layer.editFormConfig()
        form_config.setReadOnly(field_idx, option)
        layer.setEditFormConfig(form_config)

# Example for the field "identificador"

project = QgsProject.instance()
layers = project.mapLayers() 

for layer in layers.values():
    field_readonly(layer,'identificador')

Set fields with DateTime widget

The date fields are configured automatically, but the default widget setting only outputs the date, and not date-time, as the rules required.

I started by setting a field in a layer exactly how I wanted, then I tried to figure out how those setting were saved in PyQGIS using the Python console:

>>>layer = iface.mapCanvas().currentLayer()
>>>layer.fields().indexOf('inicio_objeto')
1
>>>field = layer.fields()[1]
>>>field.editorWidgetSetup().type()
'DateTime'
>>>field.editorWidgetSetup().config()
{'allow_null': True, 'calendar_popup': True, 'display_format': 'yyyy-MM-dd HH:mm:ss', 'field_format': 'yyyy-MM-dd HH:mm:ss', 'field_iso_format': False}

Knowing this, I was able to create a function that allows configuring a field in a layer using the exact same settings, and apply it to all layers.

def field_to_datetime(layer, fieldname):
    config = {'allow_null': True,
              'calendar_popup': True,
              'display_format': 'yyyy-MM-dd HH:mm:ss',
              'field_format': 'yyyy-MM-dd HH:mm:ss',
              'field_iso_format': False}
    type = 'Datetime'
    fields = layer.fields()
    field_idx = fields.indexOf(fieldname)
    if field_idx >= 0:
        widget_setup = QgsEditorWidgetSetup(type,config)
        layer.setEditorWidgetSetup(field_idx, widget_setup)

# Example applied to "inicio_objeto" e "fim_objeto"

for layer in layers.values():
    field_to_datetime(layer,'inicio_objeto')
    field_to_datetime(layer,'fim_objeto')

Setting a field with the Value Relation widget

In the data model, many tables have fields that only allow a limited number of values. Those values are referenced to other tables, the Foreign keys.

In these cases, it’s quite helpful to use a Value Relation widget. To configure fields with it in a programmatic way, it’s quite similar to the earlier example, where we first neet to set an example and see how it’s stored, but in this case, each field has a slightly different settings

Luckily, whoever designed the data model, did a favor to us all by giving the same name to the fields and the related tables, making it possible to automatically adapt the settings for each case.

The function stars by gathering all fields in which the name starts with ‘valor_’ (value). Then, iterating over those fields, adapts the configuration to use the reference layer that as the same name as the field.

def field_to_value_relation(layer):
    fields = layer.fields()
    pattern = re.compile(r'^valor_')
    fields_valor = [field for field in fields if pattern.match(field.name())]
    if len(fields_valor) > 0:
        config = {'AllowMulti': False,
                  'AllowNull': True,
                  'FilterExpression': '',
                  'Key': 'identificador',
                  'Layer': '',
                  'NofColumns': 1,
                  'OrderByValue': False,
                  'UseCompleter': False,
                   'Value': 'descricao'}
        for field in fields_valor:
            field_idx = fields.indexOf(field.name())
            if field_idx >= 0:
                print(field)
                try:
                    target_layer = QgsProject.instance().mapLayersByName(field.name())[0]
                    config['Layer'] = target_layer.id()
                    widget_setup = QgsEditorWidgetSetup('ValueRelation',config)
                    layer.setEditorWidgetSetup(field_idx, widget_setup)
                except:
                    pass
            else:
                return False
    else:
        return False
    return True
    
# Correr função em todas as camadas
for layer in layers.values():
    field_to_value_relation(layer)

Conclusion

In a relatively quick way, I was able to set all the project’s layers with the widgets I needed.Peek 2019-09-30 16-06

This seems to me like the tip of the iceberg. If one has the need, with some search and patience, other configurations can be changed using PyQGIS. Therefore, think twice before embarking in configuring a big project, layer by layer, field by fields.

Advertisements

QGIS Versioning now supports foreign keys!

QGIS-versioning is a QGIS and PostGIS plugin dedicated to data versioning and history management. It supports :

  • Keeping full table history with all modifications
  • Transparent access to current data
  • Versioning tables with branches
  • Work offline
  • Work on a data subset
  • Conflict management with a GUI

QGIS versioning conflict management

In a previous blog article we detailed how QGIS versioning can manage data history, branches, and work offline with PostGIS-stored data and QGIS. We recently added foreign key support to QGIS versioning so you can now historize any complex database schema.

This QGIS plugin is available in the official QGIS plugin repository, and you can fork it on GitHub too !

Foreign key support

TL;DR

When a user decides to historize its PostgreSQL database with QGIS-versioning, the plugin alters the existing database schema and adds new fields in order to track down the different versions of a single table row. Every access to these versioned tables are subsequently made through updatable views in order to automatically fill in the new versioning fields.

Up to now, it was not possible to deal with primary keys and foreign keys : the original tables had to be constraints-free.  This limitation has been lifted thanks to this contribution.

To make it simple, the solution is to remove all constraints from the original database and transform them into a set of SQL check triggers installed on the working copy databases (SQLite or PostgreSQL). As verifications are made on the client side, it’s impossible to propagate invalid modifications on your base server when you “commit” updates.

Behind the curtains

When you choose to historize an existing database, a few fields are added to the existing table. Among these fields, versioning_ididentifies  one specific version of a row. For one existing row, there are several versions of this row, each with a different versioning_id but with the same original primary key field. As a consequence, that field cannot satisfy the unique constraint, so it cannot be a key, therefore no foreign key neither.

We therefore have to drop the primary key and foreign key constraints when historizing the table. Before removing them, constraints definitions are stored in a dedicated table so that these constraints can be checked later.

When the user checks out a specific table on a specific branch, QGIS-versioning uses that constraint table to build constraint checking triggers in the working copy. The way constraints are built depends on the checkout type (you can checkout in a SQLite file, in the master PostgreSQL database or in another PostgreSQL database).

What do we check ?

That’s where the fun begins ! The first thing we have to check is key uniqueness or foreign key referencing an existing key on insert or update. Remember that there are no primary key and foreign key anymore, we dropped them when activating historization. We keep the term for better understanding.

You also have to deal with deleting or updating a referenced row and the different ways of propagating the modification : cascade, set default, set null, or simply failure, as explained in PostgreSQL Foreign keys documentation .

Nevermind all that, this problem has been solved for you and everything is done automatically in QGIS-versioning. Before you ask, yes foreign keys spanning on multiple fields are also supported.

What’s new in QGIS ?

You will get a new message you probably already know about, when you try to make an invalid modification committing your changes to the master database

Error when foreign key constraint is violated

Partial checkout

One existing Qgis-versioning feature is partial checkout. It allows a user to select a subset of data to checkout in its working copy. It avoids downloading gigabytes of data you do not care about. You can, for instance, checkout features within a given spatial extent.

So far, so good. But if you have only a part of your data, you cannot ensure that modifying a data field as primary key will keep uniqueness. In this particular case, QGIS-versioning will trigger errors on commit, pointing out the invalid rows you have to modify so the unique constraint remains valid.

Error when committing non unique key after a partial checkout

Tests

There is a lot to check when you intend to replace the existing constraint system with your own constraint system based on triggers. In order to ensure QGIS-Versioning stability and reliability, we put some special effort on building a test set that cover all use cases and possible exceptions.

What’s next

There is now no known limitations on using QGIS-versioning on any of your database. If you think about a missing feature or just want to know more about QGIS and QGIS-versioning, feel free to contact us at [email protected]. And please have a look at our support offering for QGIS.

Many thanks to eHealth Africa who helped us develop these new features. eHealth Africa is a non-governmental organization based in Nigeria. Their mission is to build stronger health systems through the design and implementation of data-driven solutions.

Movement data in GIS #24: MovingPandas hands-on tutorials

Last week, I had the pleasure to give a movement data analysis workshop at the OpenGeoHub summer school at the University of Münster in Germany. The workshop materials consist of three Jupyter notebooks that have been designed to also support self-study outside of a workshop setting. So you can try them out as well!

All materials are available on Github:

  • Tutorial 0 provides an introduction to the MovingPandas Trajectory class.
  • Tutorials 1 and 2 provide examples with real-world datasets covering one day of ship movement near Gothenburg and multiple years of gull migration, respectively.

Here’s a quick preview of the bird migration data analysis tutorial (click for full size):

Tutorial 2: Bird migration data analysis

You can run all three Jupyter notebooks online using MyBinder (no installations required).

Alternatively or if you want to dig deeper: installation instructions are available on movingpandas.org

The OpenGeoHub summer school this year had a strong focus on spatial analysis with R and GRASS (sometimes mixing those two together). It was great to meet @mdsumner (author of R trip) and @edzerpebesma (author of R trajectories) for what might have well been the ultimate movement data libraries geek fest. In the ultimate R / Python cross-over,  0_getting_started.Rmd

Both talks and workshops have been recorded. Here’s the introduction:

and this is the full workshop recording:

(Fr) Oslandia recrute : développeur(se) C++ et Python

Sorry, this entry is only available in French.

Movement data in GIS #23: trajectories in context

Today’s post continues where “Why you should be using PostGIS trajectories” leaves off. It’s the result of a collaboration with Eva Westermeier. I had the pleasure to supervise her internship at AIT last year and also co-supervised her Master’s thesis [0] on the topic of enriching trajectories with information about their geographic context.

Context-aware analysis of movement data is crucial for different domains and applications, from transport to ecology. While there is a wealth of data, efficient and user-friendly contextual trajectory analysis is still hampered by a lack of appropriate conceptual approaches and practical methods. (Westermeier, 2018)

Part of the work was focused on evaluating different approaches to adding context information from vector datasets to trajectories in PostGIS. For example, adding land cover context to animal movement data or adding information on anchoring and harbor areas to vessel movement data.

Classic point-based model vs. line-based model

The obvious approach is to intersect the trajectory points with context data. This is the classic point data model of contextual trajectories. It’s straightforward to add context information in the point-based model but it also generates large numbers of repeating annotations. In contrast, the line data model using, for example, PostGIS trajectories (LinestringM) is more compact since trajectories can be split into segments at context borders. This creates one annotation per segment and the individual segments are convenient to analyze (as described in part #12).

Spatio-temporal interpolation as provided by the line data model offers additional advantages for the analysis of annotated segments. Contextual segments start and end at the intersection of the trajectory linestring with context polygon borders. This means that there are no gaps like in the point-based model. Consequently, while the point-based model systematically underestimates segment length and duration, the line-based approach offers more meaningful segment length and duration measurements.

Schematic illustration of a subset of an annotated trajectory in two context classes, a) systematic underestimation of length or duration in the point data model, b) full length or duration between context polygon borders in the line data model (source: Westermeier (2018))

Another issue of the point data model is that brief context changes may be missed or represented by just one point location. This makes it impossible to compute the length or duration of the respective context segment. (Of course, depending on the application, it can be desirable to ignore brief context changes and make the annotation process robust towards irrelevant changes.)

Schematic illustration of context annotation for brief context changes, a) and b)
two variants for the point data model, c) gapless annotation in the line data model (source: Westermeier (2018) based on Buchin et al. (2014))

Beyond annotations, context can also be considered directly in an analysis, for example, when computing distances between trajectories and contextual point objects. In this case, the point-based approach systematically overestimates the distances.

Schematic illustration of distance measurement from a trajectory to an external
object, a) point data model, b) line data model (source: Westermeier (2018))

The above examples show that there are some good reasons to dump the classic point-based model. However, the line-based model is not without its own issues.

Issues

Computing the context annotations for trajectory segments is tricky. The main issue is that ST_Intersection drops the M values. This effectively destroys our trajectories! There are ways to deal with this issue – and the corresponding SQL queries are published in the thesis (p. 38-40) – but it’s a real bummer. Basically, ST_Intersection only provides geometric output. Therefore, we need to reconstruct the temporal information in order to create usable trajectory segments.

Finally, while the line-based model is well suited to add context from other vector data, it is less useful for context data from continuous rasters but that was beyond the scope of this work.

Conclusion

After the promising results of my initial investigations into PostGIS trajectories, I was optimistic that context annotations would be a straightforward add-on. The line-based approach has multiple advantages when it comes to analyzing contextual segments. Unfortunately, generating these contextual segments is much less convenient and also slower than I had hoped. Originally, I had planned to turn this work into a plugin for the Processing toolbox but the results of this work motivated me to look into other solutions. You’ve already seen some of the outcomes in part #20 “Trajectools v1 released!”.

References

[0] Westermeier, E.M. (2018). Contextual Trajectory Modeling and Analysis. Master Thesis, Interfaculty Department of Geoinformatics, University of Salzburg.


This post is part of a series. Read more about movement data in GIS.

Easy Processing scripts comeback in QGIS 3.6

When QGIS 3.0 was release, I published a Processing script template for QGIS3. While the script template is nicely pythonic, it’s also pretty long and daunting for non-programmers. This fact didn’t go unnoticed and Nathan Woodrow in particular started to work on a QGIS enhancement proposal to improve the situation and make writing Processing scripts easier, while – at the same time – keeping in line with common Python styles.

While the previous template had 57 lines of code, the new template only has 26 lines – 50% less code, same functionality! (Actually, this template provides more functionality since it also tracks progress and ensures that the algorithm can be cancelled.)

from qgis.processing import alg
from qgis.core import QgsFeature, QgsFeatureSink

@alg(name="ex_new", label=alg.tr("Example script (new style)"), group="examplescripts", group_label=alg.tr("Example Scripts"))
@alg.input(type=alg.SOURCE, name="INPUT", label="Input layer")
@alg.input(type=alg.SINK, name="OUTPUT", label="Output layer")
def testalg(instance, parameters, context, feedback, inputs):
    """
    Description goes here. (Don't delete this! Removing this comment will cause errors.)
    """
    source = instance.parameterAsSource(parameters, "INPUT", context)

    (sink, dest_id) = instance.parameterAsSink(
        parameters, "OUTPUT", context,
        source.fields(), source.wkbType(), source.sourceCrs())

    total = 100.0 / source.featureCount() if source.featureCount() else 0
    features = source.getFeatures()
    for current, feature in enumerate(features):
        if feedback.isCanceled():
            break
        out_feature = QgsFeature(feature)
        sink.addFeature(out_feature, QgsFeatureSink.FastInsert)
        feedback.setProgress(int(current * total))

    return {"OUTPUT": dest_id}

The key improvement are the new decorators that turn an ordinary function (such as testalg in the template) into a Processing algorithm. Decorators start with @ and are written above a function definition. The @alg decorator declares that the following function is a Processing algorithm, defines its name and assigns it to an algorithm group. The @alg.input decorator creates an input parameter for the algorithm. Similarly, there is a @alg.output decorator for output parameters.

For a longer example script, check out the original QGIS enhancement proposal thread!

For now, this new way of writing Processing scripts is only supported by QGIS 3.6 but there are plans to back-port this improvement to 3.4 once it is more mature. So give it a try and report back!

Movement data in GIS #20: Trajectools v1 released!

In previous posts, I already wrote about Trajectools and some of the functionality it provides to QGIS Processing including:

There are also tools to compute heading and speed which I only talked about on Twitter.

Trajectools is now available from the QGIS plugin repository.

The plugin includes sample data from MarineCadastre downloads and the Geolife project.

Under the hood, Trajectools depends on GeoPandas!

If you are on Windows, here’s how to install GeoPandas for OSGeo4W:

  1. OSGeo4W installer: install python3-pip
  2. Environment variables: add GDAL_VERSION = 2.3.2 (or whichever version your OSGeo4W installation currently includes)
  3. OSGeo4W shell: call C:\OSGeo4W64\bin\py3_env.bat
  4. OSGeo4W shell: pip3 install geopandas (this will error at fiona)
  5. From https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona: download Fiona-1.7.13-cp37-cp37m-win_amd64.whl
  6. OSGeo4W shell: pip3 install path-to-download\Fiona-1.7.13-cp37-cp37m-win_amd64.whl
  7. OSGeo4W shell: pip3 install geopandas
  8. (optionally) From https://www.lfd.uci.edu/~gohlke/pythonlibs/#rtree: download Rtree-0.8.3-cp37-cp37m-win_amd64.whl and pip3 install it

If you want to use this functionality outside of QGIS, head over to my movingpandas project!

Call for testing: GRASS GIS with Python 3

Please help us testing the Python3 support in the yet unreleased GRASS GIS trunk (i.e., version “grass77” which will be released as “grass78” in the near future).

1. Why Python 3?

Python 2 is end-of-life (EOL); the current Python 2.7 will retire in 11 months from today (see https://pythonclock.org). We want to follow the “Moving to require Python 3” and complete the change to Python 3. And we need a broader community testing.

2. Download and test!

Packages are available at time:

3. Instructions for testing

4. Problems found? Please report them to us

Problems and bugs can be reported in the GRASS GIS trac. Code changes are welcome!

Thanks for testing grass77!

The post Call for testing: GRASS GIS with Python 3 appeared first on GFOSS Blog | GRASS GIS and OSGeo News.

Dealing with delayed measurements in (Geo)Pandas

Yesterday, I learned about a cool use case in data-driven agriculture that requires dealing with delayed measurements. As Bert mentions, for example, potatoes end up in the machines and are counted a few seconds after they’re actually taken out of the ground:

Therefore, in order to accurately map yield, we need to take this temporal offset into account.

We need to make sure that time and location stay untouched, but need to shift the potato count value. To support this use case, I’ve implemented apply_offset_seconds() for trajectories in movingpandas:

    def apply_offset_seconds(self, column, offset):
        self.df[column] = self.df[column].shift(offset, freq='1s')

The following test illustrates its use: you can see how the value column is shifted by 120 second. Geometry and time remain unchanged but the value column is shifted accordingly. In this test, we look at the row with index 2 which we access using iloc[2]:

    def test_offset_seconds(self):
        df = pd.DataFrame([
            {'geometry': Point(0, 0), 't': datetime(2018, 1, 1, 12, 0, 0), 'value': 1},
            {'geometry': Point(-6, 10), 't': datetime(2018, 1, 1, 12, 1, 0), 'value': 2},
            {'geometry': Point(6, 6), 't': datetime(2018, 1, 1, 12, 2, 0), 'value': 3},
            {'geometry': Point(6, 12), 't': datetime(2018, 1, 1, 12, 3, 0), 'value':4},
            {'geometry': Point(6, 18), 't': datetime(2018, 1, 1, 12, 4, 0), 'value':5}
        ]).set_index('t')
        geo_df = GeoDataFrame(df, crs={'init': '31256'})
        traj = Trajectory(1, geo_df)
        traj.apply_offset_seconds('value', -120)
        self.assertEqual(traj.df.iloc[2].value, 5)
        self.assertEqual(traj.df.iloc[2].geometry, Point(6, 6))

From CSV to GeoDataFrame in two lines

Pandas is great for data munging and with the help of GeoPandas, these capabilities expand into the spatial realm.

With just two lines, it’s quick and easy to transform a plain headerless CSV file into a GeoDataFrame. (If your CSV is nice and already contains a header, you can skip the header=None and names=FILE_HEADER parameters.)

usecols=USE_COLS is also optional and allows us to specify that we only want to use a subset of the columns available in the CSV.

After the obligatory imports and setting of variables, all we need to do is read the CSV into a regular DataFrame and then construct a GeoDataFrame.

import pandas as pd
from geopandas import GeoDataFrame
from shapely.geometry import Point

FILE_NAME = "/temp/your.csv"
FILE_HEADER = ['a', 'b', 'c', 'd', 'e', 'x', 'y']
USE_COLS = ['a', 'x', 'y']

df = pd.read_csv(
    FILE_NAME, delimiter=";", header=None,
    names=FILE_HEADER, usecols=USE_COLS)
gdf = GeoDataFrame(
    df.drop(['x', 'y'], axis=1),
    crs={'init': 'epsg:4326'},
    geometry=[Point(xy) for xy in zip(df.x, df.y)])

It’s also possible to create the point objects using a lambda function as shown by weiji14 on GIS.SE.

Movement data in GIS #21: new interactive notebook to get started with MovingPandas

MovingPandas is my attempt to provide a pure Python solution for trajectory data handling in GIS. MovingPandas provides trajectory classes and functions built on top of GeoPandas. 

To lower the entry barrier to getting started with MovingPandas, there’s now an interactive iPython notebook hosted on MyBinder. This notebook provides all the necessary imports and demonstrates how to create a Trajectory object.

Launch MyBinder for MovingPandas to get started!

PyQGIS101 part 10 published!

PyQGIS 101: Introduction to QGIS Python programming for non-programmers has now reached the part 10 milestone!

Beyond the obligatory Hello world! example, the contents so far include:

If you’ve been thinking about learning Python programming, but never got around to actually start doing it, give PyQGIS101 a try.

I’d like to thank everyone who has already provided feedback to the exercises. Every comment is important to help me understand the pain points of learning Python for QGIS.

I recently read an article – unfortunately I forgot to bookmark it and cannot locate it anymore – that described the problems with learning to program very well: in the beginning, it’s rather slow going, you don’t know the right terminology and therefore don’t know what to google for when you run into issues. But there comes this point, when you finally get it, when the terminology becomes clearer, when you start thinking “that might work” and it actually does! I hope that PyQGIS101 will be a help along the way.

Movement data in GIS #18: creating evaluation data for trajectory predictions

We’ve seen a lot of explorative movement data analysis in the Movement data in GIS series so far. Beyond exploration, predictive analysis is another major topic in movement data analysis. One of the most obvious movement prediction use cases is trajectory prediction, i.e. trying to predict where a moving object will be in the future. The two main categories of trajectory prediction methods I see are those that try to predict the actual path that a moving object will take versus those that only try to predict the next destination.

Today, I want to focus on prediction methods that predict the path that a moving object is going to take. There are many different approaches from simple linear prediction to very sophisticated application-dependent methods. Regardless of the prediction method though, there is the question of how to evaluate the prediction results when these methods are applied to real-life data.

As long as we work with nice, densely, and regularly updated movement data, extracting evaluation samples is rather straightforward. To predict future movement, we need some information about past movement. Based on that past movement, we can then try to predict future positions. For example, given a trajectory that is twenty minutes long, we can extract a sample that provides five minutes of past movement, as well as the actually observed position five minutes into the future:

But what if the trajectory is irregularly updated? Do we interpolate the positions at the desired five minute timestamps? Do we try to shift the sample until – by chance – we find a section along the trajectory where the updates match our desired pattern? What if location timestamps include seconds or milliseconds and we therefore cannot find exact matches? Should we introduce a tolerance parameter that would allow us to match locations with approximately the same timestamp?

Depending on the duration of observation gaps in our trajectory, it might not be a good idea to simply interpolate locations since these interpolated locations could systematically bias our evaluation. Therefore, the safest approach may be to shift the sample pattern along the trajectory until a close match (within the specified tolerance) is found. This approach is now implemented in MovingPandas’ TrajectorySampler.

def test_sample_irregular_updates(self):
    df = pd.DataFrame([
        {'geometry':Point(0,0), 't':datetime(2018,1,1,12,0,1)},
        {'geometry':Point(0,3), 't':datetime(2018,1,1,12,3,2)},
        {'geometry':Point(0,6), 't':datetime(2018,1,1,12,6,1)},
        {'geometry':Point(0,9), 't':datetime(2018,1,1,12,9,2)},
        {'geometry':Point(0,10), 't':datetime(2018,1,1,12,10,2)},
        {'geometry':Point(0,14), 't':datetime(2018,1,1,12,14,3)},
        {'geometry':Point(0,19), 't':datetime(2018,1,1,12,19,4)},
        {'geometry':Point(0,20), 't':datetime(2018,1,1,12,20,0)}
        ]).set_index('t')
    geo_df = GeoDataFrame(df, crs={'init': '4326'})
    traj = Trajectory(1,geo_df)
    sampler = TrajectorySampler(traj, timedelta(seconds=5))
    past_timedelta = timedelta(minutes=5)
    future_timedelta = timedelta(minutes=5)
    sample = sampler.get_sample(past_timedelta, future_timedelta)
    result = sample.future_pos.wkt
    expected_result = "POINT (0 19)"
    self.assertEqual(result, expected_result)
    result = sample.past_traj.to_linestring().wkt
    expected_result = "LINESTRING (0 9, 0 10, 0 14)"
    self.assertEqual(result, expected_result)

The repository also includes a demo that illustrates how to split trajectories using a grid and finally extract samples:

 

Thoughts on “FOSS4G/SOTM Oceania 2018”, and the PyQGIS API improvements which it caused

Last week the first official “FOSS4G/SOTM Oceania” conference was held at Melbourne University. This was a fantastic event, and there’s simply no way I can extend sufficient thanks to all the organisers and volunteers who put this event together. They did a brilliant job, and their efforts are even more impressive considering it was the inaugural event!

Upfront — this is not a recap of the conference (I’m sure someone else is working on a much more detailed write up of the event!), just some musings I’ve had following my experiences assisting Nathan Woodrow deliver an introductory Python for QGIS workshop he put together for the conference. In short, we both found that delivering this workshop to a group of PyQGIS newcomers was a great way for us to identify “pain points” in the PyQGIS API and areas where we need to improve. The good news is that as a direct result of the experiences during this workshop the API has been improved and streamlined! Let’s explore how:

Part of Nathan’s workshop (notes are available here) focused on a hands-on example of creating a custom QGIS “Processing” script. I’ve found that preparing workshops is guaranteed to expose a bunch of rare and tricky software bugs, and this was no exception! Unfortunately the workshop was scheduled just before the QGIS 3.4.2 patch release which fixed these bugs, but at least they’re fixed now and we can move on…

The bulk of Nathan’s example algorithm is contained within the following block (where “distance” is the length of line segments we want to chop our features up into):

for input_feature in enumerate(features):
    geom = feature.geometry().constGet()
    if isinstance(geom, QgsLineString):
        continue
    first_part = geom.geometryN(0)
    start = 0
    end = distance
    length = first_part.length()

    while start < length:
        new_geom = first_part.curveSubstring(start,end)

        output_feature = input_feature
        output_feature.setGeometry(QgsGeometry(new_geom))
        sink.addFeature(output_feature)

        start += distance
        end += distance

There’s a lot here, but really the guts of this algorithm breaks down to one line:

new_geom = first_part.curveSubstring(start,end)

Basically, a new geometry is created for each trimmed section in the output layer by calling the “curveSubstring” method on the input geometry and passing it a start and end distance along the input line. This returns the portion of that input LineString (or CircularString, or CompoundCurve) between those distances. The PyQGIS API nicely hides the details here – you can safely call this one method and be confident that regardless of the input geometry type the result will be correct.

Unfortunately, while calling the “curveSubstring” method is elegant, all the code surrounding this call is not so elegant. As a (mostly) full-time QGIS developer myself, I tend to look over oddities in the API. It’s easy to justify ugly API as just “how it’s always been”, and over time it’s natural to develop a type of blind spot to these issues.

Let’s start with the first ugly part of this code:

geom = input_feature.geometry().constGet()
if isinstance(geom, QgsLineString):
    continue
first_part = geom.geometryN(0)
# chop first_part into sections of desired length
...

This is rather… confusing… logic to follow. Here the script is fetching the geometry of the input feature, checking if it’s a LineString, and if it IS, then it skips that feature and continues to the next. Wait… what? It’s skipping features with LineString geometries?

Well, yes. The algorithm was written specifically for one workshop, which was using a MultiLineString layer as the demo layer. The script takes a huge shortcut here and says “if the input feature isn’t a MultiLineString, ignore it — we only know how to deal with multi-part geometries”. Immediately following this logic there’s a call to geometryN( 0 ), which returns just the first part of the MultiLineString geometry.

There’s two issues here — one is that the script just plain won’t work for LineString inputs, and the second is that it ignores everything BUT the first part in the geometry. While it would be possible to fix the script and add a check for the input geometry type, put in logic to loop over all the parts of a multi-part input, etc, that’s instantly going to add a LOT of complexity or duplicate code here.

Fortunately, this was the perfect excuse to improve the PyQGIS API itself so that this kind of operation is simpler in future! Nathan and I had a debrief/brainstorm after the workshop, and as a result a new “parts iterator” has been implemented and merged to QGIS master. It’ll be available from version 3.6 on. Using the new iterator, we can simplify the script:

geom = input_feature.geometry()
for part in geom.parts():
    # chop part into sections of desired length
    ...

Win! This is simultaneously more readable, more Pythonic, and automatically works for both LineString and MultiLineString inputs (and in the case of MultiLineStrings, we now correctly handle all parts).

Here’s another pain-point. Looking at the block:

new_geom = part.curveSubstring(start,end)
output_feature = input_feature
output_feature.setGeometry(QgsGeometry(new_geom))

At first glance this looks reasonable – we use curveSubstring to get the portion of the curve, then make a copy of the input_feature as output_feature (this ensures that the features output by the algorithm maintain all the attributes from the input features), and finally set the geometry of the output_feature to be the newly calculated curve portion. The ugliness here comes in this line:

output_feature.setGeometry(QgsGeometry(new_geom))

What’s that extra QgsGeometry(…) call doing here? Without getting too sidetracked into the QGIS geometry API internals, QgsFeature.setGeometry requires a QgsGeometry argument, not the QgsAbstractGeometry subclass which is returned by curveSubstring.

This is a prime example of a “paper-cut” style issue in the PyQGIS API. Experienced developers know and understand the reasons behind this, but for newcomers to PyQGIS, it’s an obscure complexity. Fortunately the solution here was simple — and after the workshop Nathan and I added a new overload to QgsFeature.setGeometry which accepts a QgsAbstractGeometry argument. So in QGIS 3.6 this line can be simplified to:

output_feature.setGeometry(new_geom)

Or, if you wanted to make things more concise, you could put the curveSubstring call directly in here:

output_feature = input_feature
output_feature.setGeometry(part.curveSubstring(start,end))

Let’s have a look at the simplified script for QGIS 3.6:

for input_feature in enumerate(features):
    geom = feature.geometry()
    for part in geom.parts():
        start = 0
        end = distance
        length = part.length()

        while start < length:
            output_feature = input_feature
            output_feature.setGeometry(part.curveSubstring(start,end))
            sink.addFeature(output_feature)

            start += distance
            end += distance

This is MUCH nicer, and will be much easier to explain in the next workshop! The good news is that Nathan has more niceness on the way which will further improve the process of writing QGIS Processing script algorithms. You can see some early prototypes of this work here:

So there we go. The process of writing and delivering a workshop helps to look past “API blind spots” and identify the ugly points and traps for those new to the API. As a direct result of this FOSS4G/SOTM Oceania 2018 Workshop, the QGIS 3.6 PyQGIS API will be easier to use, more readable, and less buggy! That’s a win all round!

GRASS GIS 7.4.2 released

We are pleased to announce the GRASS GIS 7.4.2 release

What’s new in a nutshell

After a bit more than four months of development the new update release GRASS GIS 7.4.2 is available. It provides more than 50 stability fixes and improvements compared to the previous stable version 7.4.1. An overview of the new features in the 7.4 release series is available at New Features in GRASS GIS 7.4.

Efforts have concentrated on making the user experience even better, providing many small, but useful additional functionalities to modules and further improving the graphical user interface. Segmentation now support extremely large raster maps. Dockerfile and Windows support received updates. Also the manual was improved. For a detailed overview, see the list of new features. As a stable release series, 7.4.x enjoys long-term support.

Binaries/Installer download:

Source code download:

More details:

See also our detailed announcement:

About GRASS GIS

The Geographic Resources Analysis Support System (https://grass.osgeo.org/), commonly referred to as GRASS GIS, is an Open Source Geographic Information System providing powerful raster, vector and geospatial processing capabilities in a single integrated software suite. GRASS GIS includes tools for spatial modeling, visualization of raster and vector data, management and analysis of geospatial data, and the processing of satellite and aerial imagery. It also provides the capability to produce sophisticated presentation graphics and hardcopy maps. GRASS GIS has been translated into about twenty languages and supports a huge array of data formats. It can be used either as a stand-alone application or as backend for other software packages such as QGIS and R geostatistics. It is distributed freely under the terms of the GNU General Public License (GPL). GRASS GIS is a founding member of the Open Source Geospatial Foundation (OSGeo).

The GRASS Development Team, October 2018

The post GRASS GIS 7.4.2 released appeared first on GFOSS Blog | GRASS GIS and OSGeo News.

Geocoding with Geopy

Need to geocode some addresses? Here’s a five-lines-of-code solution based on “An A-Z of useful Python tricks” by Peter Gleeson:

from geopy import GoogleV3
place = "Krems an der Donau"
location = GoogleV3().geocode(place)
print(location.address)
print("POINT({},{})".format(location.latitude,location.longitude))

For more info, check out geopy:

geopy is a Python 2 and 3 client for several popular geocoding web services.
geopy includes geocoder classes for the OpenStreetMap Nominatim, ESRI ArcGIS, Google Geocoding API (V3), Baidu Maps, Bing Maps API, Yandex, IGN France, GeoNames, Pelias, geocode.earth, OpenMapQuest, PickPoint, What3Words, OpenCage, SmartyStreets, GeocodeFarm, and Here geocoder services.

Using Threads in PyQGIS3

While porting a plugin to QGIS3 I decided to also move all it’s threading infrastructure to QgsTasks. Here three possible variants to implement this.
the first uses the static method QgsTask.fromFunction and is simpler to use. A great quick solution. If you want need control you can look at the second solution that subclasses QgsTask. In this solution I also show how to create subtasks with interdependencies. The third variant, illustrates how to run a processing algorithm in a separate thread.
One thing to be very careful about is never to create widgets or alter gui in a task. This is a strict Qt guideline – gui must never be altered outside of the main thread. So your progress dialog must operate on the main thread, connecting to the progress report signals from the task which operates in the background thread. This also applies to “print” statements — these aren’t safe to use from a background thread in QGIS and can cause random crashes. Use the thread safe QgsMessageLog.logMessage() approach instead. Actually you should forget print and always use QgsMessageLog.

using QgsTask.fromFunction

this is a quick and simple way of running a function in a separate thread. When calling QgsTask.fromFunction() you can pass an on_finished argument with a callback to be executed at the end of run.

from time import sleep
import random
MESSAGE_CATEGORY = 'My tasks from a function'
def run(task, wait_time):
"""a dumb test function
to break the task raise an exception
to return a successful result return it. This will be passed together
with the exception (None in case of success) to the on_finished method
"""
QgsMessageLog.logMessage('Started task {}'.format(task.description()),
MESSAGE_CATEGORY, Qgis.Info)
wait_time = wait_time / 100
total = 0
iterations = 0
for i in range(101):
sleep(wait_time)
# use task.setProgress to report progress
task.setProgress(i)
total += random.randint(0, 100)
iterations += 1
# check task.isCanceled() to handle cancellation
if task.isCanceled():
stopped(task)
return None
# raise exceptions to abort task
if random.randint(0, 500) == 42:
raise Exception('bad value!')
return {
'total': total, 'iterations': iterations, 'task': task.description()
}
def stopped(task):
QgsMessageLog.logMessage(
'Task "{name}" was cancelled'.format(name=task.description()),
MESSAGE_CATEGORY, Qgis.Info)
def completed(exception, result=None):
"""this is called when run is finished. Exception is not None if run
raises an exception. Result is the return value of run."""
if exception is None:
if result is None:
QgsMessageLog.logMessage(
'Completed with no exception and no result '\
'(probably the task was manually canceled by the user)',
MESSAGE_CATEGORY, Qgis.Warning)
else:
QgsMessageLog.logMessage(
'Task {name} completed\n'
'Total: {total} ( with {iterations} '
'iterations)'.format(
name=result['task'],
total=result['total'],
iterations=result['iterations']),
MESSAGE_CATEGORY, Qgis.Info)
else:
QgsMessageLog.logMessage("Exception: {}".format(exception),
MESSAGE_CATEGORY, Qgis.Critical)
raise exception
# a bunch of tasks
task1 = QgsTask.fromFunction(
'waste cpu 1', run, on_finished=completed, wait_time=4)
task2 = QgsTask.fromFunction(
'waste cpu 2', run, on_finished=completed, wait_time=3)
QgsApplication.taskManager().addTask(task1)
QgsApplication.taskManager().addTask(task2)

Subclassing QgsTask

this solution gives you the full control over the task behaviour. In this example I also illustrate how to create subtasks dependencies.

import random
from time import sleep
from qgis.core import (
QgsApplication, QgsTask, QgsMessageLog,
)
MESSAGE_CATEGORY = 'My subclass tasks'
class MyTask(QgsTask):
"""This shows how to subclass QgsTask"""
def __init__(self, description, duration):
super().__init__(description, QgsTask.CanCancel)
self.duration = duration
self.total = 0
self.iterations = 0
self.exception = None
def run(self):
"""Here you implement your heavy lifting. This method should
periodically test for isCancelled() to gracefully abort.
This method MUST return True or False
raising exceptions will crash QGIS so we handle them internally and
raise them in self.finished
"""
QgsMessageLog.logMessage('Started task "{}"'.format(
self.description()), MESSAGE_CATEGORY, Qgis.Info)
wait_time = self.duration / 100
for i in range(101):
sleep(wait_time)
# use setProgress to report progress
self.setProgress(i)
self.total += random.randint(0, 100)
self.iterations += 1
# check isCanceled() to handle cancellation
if self.isCanceled():
return False
# simulate exceptions to show how to abort task
if random.randint(0, 500) == 42:
# DO NOT raise Exception('bad value!')
# this would crash QGIS
self.exception = Exception('bad value!')
return False
return True
def finished(self, result):
"""This method is automatically called when self.run returns. result
is the return value from self.run.
This function is automatically called when the task has completed (
successfully or otherwise). You just implement finished() to do whatever
follow up stuff should happen after the task is complete. finished is
always called from the main thread, so it's safe to do GUI
operations and raise Python exceptions here.
"""
if result:
QgsMessageLog.logMessage(
'Task "{name}" completed\n' \
'Total: {total} ( with {iterations} iterations)'.format(
name=self.description(),
total=self.total,
iterations=self.iterations),
MESSAGE_CATEGORY, Qgis.Success)
else:
if self.exception is None:
QgsMessageLog.logMessage(
'Task "{name}" not successful but without exception '\
'(probably the task was manually canceled by the user)'.format(
name=self.description()),
MESSAGE_CATEGORY, Qgis.Warning)
else:
QgsMessageLog.logMessage(
'Task "{name}" Exception: {exception}'.format(
name=self.description(), exception=self.exception),
MESSAGE_CATEGORY, Qgis.Critical)
raise self.exception
def cancel(self):
QgsMessageLog.logMessage(
'Task "{name}" was cancelled'.format(name=self.description()),
MESSAGE_CATEGORY, Qgis.Info)
super().cancel()
t1 = MyTask('waste cpu long', 10)
t2 = MyTask('waste cpu short', 6)
t3 = MyTask('waste cpu mini', 4)
st1 = MyTask('waste cpu Subtask 1', 5)
st2 = MyTask('waste cpu Subtask 2', 2)
st3 = MyTask('waste cpu Subtask 3', 4)
t2.addSubTask(st1, [t3, t1])
t1.addSubTask(st2)
t1.addSubTask(st3)
QgsApplication.taskManager().addTask(t1)
QgsApplication.taskManager().addTask(t2)
QgsApplication.taskManager().addTask(t3)

NEVER, EVER, EVER use print in the QgsTask outside from finished(). finished() is called on the main event loop

class MyTask(QgsTask):
def __init__(self, description, flags):
super().__init__(description, flags)
def run(self):
QgsMessageLog.logMessage('Started task {}'.format(self.description()))
#print('crashandburn')
return True
t1 = MyTask('waste cpu', QgsTask.CanCancel)
QgsApplication.taskManager().addTask(t1)

Call a Processing algorithm in a separate thread

You can simply execute a processing algorithm in a separate thread thanks to QgsProcessingAlgRunnerTask. This class takes a processing algorithm, its parameters, a context and a feedback objects and execute the algorithm. QgsProcessingAlgRunnerTask offers an executed signal to which you can connect and execute further code. executed sends two arguments bool successful and dict results. If you want to retrieve a memory layer you can pass the context as well by using partial or lambda.
If you’re wondering what parameter values you need to specify for an algorithm, and what values are acceptable, try running processing.algorithmHelp('qgis:randompointsinextent') in the python console. In QGIS 3.2 you’ll get a detailed list of all the parameter options for the algorithm and a summary of acceptable value types and formats for each. Another nice possibility is to run the algorithm from the gui and check the history after.

from functools import partial
from qgis.core import (QgsTaskManager, QgsMessageLog, QgsProcessingAlgRunnerTask,
QgsApplication, QgsProcessingContext, QgsProcessingFeedback, QgsProject)
MESSAGE_CATEGORY = 'My processing tasks'
def task_finished(context, successful, results):
if not successful:
QgsMessageLog.logMessage('Task finished unsucessfully', MESSAGE_CATEGORY, Qgis.Warning)
output_layer = context.getMapLayer(results['OUTPUT'])
# because getMapLayer doesn't transfer ownership the layer will be
# deleted when context goes out of scope and you'll get a crash.
# takeMapLayer transfers ownership so it's then safe to add it to the
# project and give the project ownership.
if output_layer.isValid():
QgsProject.instance().addMapLayer(
context.takeResultLayer(output_layer.id()))
alg = QgsApplication.processingRegistry().algorithmById(
'qgis:randompointsinextent')
context = QgsProcessingContext()
feedback = QgsProcessingFeedback()
params = {
'EXTENT': '4.63,11.57,44.41,48.78 [EPSG:4326]',
'MIN_DISTANCE': 0.1,
'POINTS_NUMBER': 100,
'TARGET_CRS': 'EPSG:4326',
'OUTPUT': 'memory:My random points'
}
task = QgsProcessingAlgRunnerTask(alg, params, context, feedback)
task.executed.connect(partial(task_finished, context))
QgsApplication.taskManager().addTask(task)

I hope this post can help you porting your plugins to QGIS3 and again if you need professional help for your plugins, don’t hesitate to contact us.

PyQGIS for non-programmers

If you’re are following me on Twitter, you’ve certainly already read that I’m working on PyQGIS 101 a tutorial to help GIS users to get started with Python programming for QGIS.

I’ve often been asked to recommend Python tutorials for beginners and I’ve been surprised how difficult it can be to find an engaging tutorial for Python 3 that does not assume that the reader already knows all kinds of programming concepts.

It’s been a while since I started programming, but I do teach QGIS and Python programming for QGIS to university students and therefore have some ideas of which concepts are challenging. Nonetheless, it’s well possible that I overlook something that is not self explanatory. If you’re using PyQGIS 101 and find that some points could use further explanations, please leave a comment on the corresponding page.

PyQGIS 101 is a work in progress. I’d appreciate any feedback, particularly from beginners!

  • Page 1 of 4 ( 76 posts )
  • >>
  • python

Back to Top

Sponsors