Page 1 of 1 (19 posts)

  • talks about »
  • postgres

Tags

Last update:
Mon Dec 22 07:25:08 2014

A Django site.

QGIS Planet

Dynamically updating layers in QGIS with PostGIS views

View here on youtube: https://www.youtube.com/watch?v=oOhbbEkl4Kg

Installing PostGIS 2.0 on ubuntu

PostGIS 2.0 is out and the awesomness continues! You can install PostGIS 2.0 on Ubuntu using packages which is exactly what I am going to show you here. Read on for details on how to get up and running and do your first simple raster analysis!

Note: You should make good backups first!

Before we begin, you should uninstall your existing postgis packages:

sudo dpkg --purge postgis postgresql-9.1-postgis

Then add a new repository and install PostGIS from there (based on this post):

sudo apt-add-repository ppa:sharpie/for-science  # To get GEOS 3.3.2
sudo apt-add-repository ppa:sharpie/postgis-nightly
sudo apt-get update
sudo apt-get install postgresql-9.1-postgis

Next we should create a new template database (optional but recommended).

createdb -E UTF8 template_postgis2
createlang -d template_postgis2 plpgsql
psql -d postgres -c "UPDATE pg_database SET datistemplate='true' WHERE datname='template_postgis2'"
psql -d template_postgis2 -f /usr/share/postgresql/9.1/contrib/postgis-2.0/postgis.sql
psql -d template_postgis2 -f /usr/share/postgresql/9.1/contrib/postgis-2.0/spatial_ref_sys.sql
psql -d template_postgis2 -f /usr/share/postgresql/9.1/contrib/postgis-2.0/rtpostgis.sql
psql -d template_postgis2 -c "GRANT ALL ON geometry_columns TO PUBLIC;"
psql -d template_postgis2 -c "GRANT ALL ON geography_columns TO PUBLIC;"
psql -d template_postgis2 -c "GRANT ALL ON spatial_ref_sys TO PUBLIC;"
createdb training -T template_postgis2

Ok now we can load a raster (see sample data download below):

raster2pgsql -s 4326 srtm_4326.tif | psql training
shp2pgsql -s 4326 -d -g geom -I places.shp places| psql training

Good - now our spatial database is ready to use - and has raster support! Here is a nice example of what you can do. The query looks up the altitude from the SRTM raster for each place listed using the ST_Value function:

select ST_Value(rast, geom, true) from places, srtm_4326;

It should produce something like this:

Doing a 'point on raster' query on our raster in QGIS

Further reading: A really good place to start is the Boston GIS cheatsheets - I am really looking forward to exploring all the new features that are available in PostGIS 2.0, thanks to all those involved in building it!

Sample data for the example listed

Another bash one-liner - load all natural earth layers into postgis in one go

Having a good world dataset is kind of essential for most GIS users to have as a standard reference dataset, give context to maps and so on. Anita Graser's recent blog post about the Natural Earth Data project added another nice dataset to my world dataset collection. The Natural Earth Data set (I got the 'full' one) is provided as shapefiles. Of course as a PostGIS fan I'd rather have these in my geospatial database, so I wrote a one liner in bash to load all the datasets.

First download the full dataset from here, and then extract the contents into a working directory somewhere. Then simply run this line, adjusting your destination database and schema as needed:

for FILE in `find . -name *.shp`; do \
  BASE=`basename $FILE .shp`; \
  /usr/lib/postgresql/9.1/bin/shp2pgsql -s 4326 -I $FILE world.$BASE \
  | psql gis; done

Ok that looks like four lines above, but only because I added some line breaks so things would be readable on this blog.

Note that you should replace 'world' with the name of the schema you are importing the data into, and 'gis' with the name of the database you are importing your data into. Thanks for the awesome dataset Natural Earth Team!

Adding a counter to postgresql query results

Just a quickie because I always forget this and need to look it up again. If you have a query in postgresql and you want to include a sequential identifier in the output, you can use the very handy ROW_NUMBER() function. Note, you need Postgresql 8.4+ I believe. Here is a quick example:

create view vw_farms
as
select ROW_NUMBER() over (order by parcel.farm_name) as id,
parcel.farm_name,
parcel.geometry
from parcel
where parcel.farm_name != '' limit 10 ;

Which returns something like this:

 id |     farm_name
 ----+-------------------
  1 | ADDERLEY 66
  2 | ADDERLEY 66
  3 | ADDERLEY 66
  4 | ADDERLEY 66
  5 | ADDERLEY 66
  6 | AFDAKS RIVIER 575
  7 | AFDAKS RIVIER 575
  8 | AFDAKS RIVIER 575
  9 | AFDAKS RIVIER 575
 10 | AFDAKS RIVIER 575
(10 rows)

This function is documented in the Postgresql docs here.

One more thing - if you would like to use the above technique to bring a view into QGIS which does not otherwise meet QGIS' requirement that the view have a column derived from a primary key or a column with a unique constraint on it, you can cast the row number to int4 like this:

create view vw_farms
as
select int4(ROW_NUMBER() over (order by parcel.farm_name)) as id,
parcel.farm_name,
parcel.geometry
from parcel
where parcel.farm_name != '' limit 10 ;

The last step then is to explicitly tell QGIS to use this column as the primary key. Do this in the PostgGIS add layer dialog, by selecting the view and then selecting the id field from the combo box available in the primary key column as illustrated in the image below.

Specify the primary key column manually

Listing the number of records in all postgresql tables

I love bash and the gnu tools provided on a Linux system. I am always writing little oneliners that help me do my work. Today I was generating some fixtures for django (yes I am finally properly learning to use unit testing in django). I wanted to know how many records where in each table in my database. Here is the little one liner I came up with:

for TABLE in $(psql foo-test -c "\dt" | awk '{print $3}'); do \
psql foo-test -c "select '$TABLE', count(*) from $TABLE;" | grep -A1 "\-\-\-\-" | tail -1; done

It produces output that looks something like this:

auth_group |     0
auth_group_permissions |     0
auth_message |     0
auth_permission |   273
auth_user |   366
auth_user_groups |     0
auth_user_user_permissions |     0
etc.

Running Posgresql 8.4 and 9.0 side by side on ubuntu

Ok so I'm feeling a bit left behind - everyone else seems to have jumped over to 9.0 and I have resolutely been sticking to the stable version shipped with ubuntu or debian (depending on the machine I am working on). However a client has a bunch of gis data in pg 9.0 / postgis 1.5 and I needed to be able to replicate their environment. In this article I will describe how I got Postgresql 8.4 running side by side with Postgresql 9.0, both with Postgis installed.

I will assume you already have postgresql 8.4 and postgis installed via Ubuntu apt...

First install postgres 9.0 from ppa

Add this to your /etc/apt/sources.list (as root):

# ppa for postgis 9.x
deb http://ppa.launchpad.net/pitti/postgresql/ubuntu natty main
deb-src http://ppa.launchpad.net/pitti/postgresql/ubuntu natty main

Then do:

sudo apt-get update

Note: Although pg9.1 beta packages are available, postgis 1.5  wont work with it so use 9.0

sudo apt-get install postgresql-contrib-9.0 postgresql-client-9.0 \
postgresql-server-dev-9.0 postgresql-9.0

You should check that you have the correct pg_config in your path

pg_config --version

If it is not 9.0, link it:

cd /usr/bin
sudo mv pg_config pg_config.old
sudo ln -s /usr/lib/postgresql/9.0/bin/pg_config .

Download and install postgis

There are no packages available for postgresql 9 so we need to build from source.

cd ~/dev/cpp
wget -c http://postgis.refractions.net/download/postgis-1.5.2.tar.gz
tar xfz postgis-1.5.2.tar.gz
cd postgis-1.5.2
./configure

You should get something like this (note building against postgresql 9)

PostGIS is now configured for x86_64-unknown-linux-gnu
 -------------- Compiler Info -------------
 C compiler:           gcc -g -O2
C++ compiler:         g++ -g -O2
 -------------- Dependencies --------------
GEOS config:          /usr/bin/geos-config
GEOS version:         3.2.0
PostgreSQL config:    /usr/bin/pg_config
PostgreSQL version:   PostgreSQL 9.0.4
PROJ4 version:        47
Libxml2 config:       /usr/bin/xml2-config
Libxml2 version:      2.7.8
PostGIS debug level:  0
 -------- Documentation Generation --------
xsltproc:             /usr/bin/xsltproc
xsl style sheets:
dblatex:
convert:              /usr/bin/convert

Now build and install

make
sudo make install

Using 8.4 and 9.0 side by side

Apt will install 9.0 side by side with 8.x without issue - it just sets it to run on a different port (5433 as opposed to the default 5432). This means you can comfortably use both (at the expense of some disk and processor load on your system). In order to make sure you are using the correct database, always add a -p 5433 when you wish to connect to 9.0. Your 8.4 should be running unchanged. The 9.0 install will be 'blank'. to begin with (no user created tables). I always make myself a superuser on the database and create and account that matches my unix login account so that I can use ident authentication for my day to day work:

sudo su - postgres
createuser -p 5433 -d -E -i -l -P -r -s timlinux

Note the -p option to specify that I am connecting to postgresql 9.0! After you have entered your password twice type ```exit``` to exit the postgres user account.

Install postgis tables and functions into template1

You can create a new template, install postgis functions into each database that needs them separately, or (as I prefer), install them to template1 so that every database you creates will automatically be spatially enabled. I do this because I very seldom need a non-spatial database, and I like to have postgis there as default.

createlang -p 5433 plpgsql template1
psql -p 5433 template1 -f \
/usr/share/postgresql/9.0/contrib/postgis-1.5/postgis.sql
psql -p 5433 template1 -f \
/usr/share/postgresql/9.0/contrib/postgis-1.5/spatial_ref_sys.sql

Validating that postgis is installed

To do this we will connect to the template1 database and run a simple query:

psql -p 5433 template1
select postgis_lib_version();
\q

Should produce output like this:

WARNING: psql version 8.4, server version 9.0.
Some psql features might not work.Type "help" for help.
template1=# select postgis_lib_version();
postgis_lib_version
--------------------- 1.5.2(1 row)

Creating a new database

The last thing to do is to create a new database. As you may have noticed above, the 8.4 command line tools are still default as they have PATH preference, so when working explicitly with pg 9.0, its good to do:

export PATH=/usr/lib/postgresql/9.0/bin:$PATH

in your shell to put postgresql first in your path. A quick test will confirm that it has worked:

createdb --version
createdb (PostgreSQL) 9.0.4

With that setup nicely, lets create our new database:

createdb -p 5433 foo
psql -p 5433 -l
You can verify the database was created correctly by using psql -l (once again also explicitly setting the port no.).
Thats all there is to it! Now you can connect to the database from QGIS (just make sure to use the correct port number!), load spatial data into it using shp2pgsl and so on. I for one am looking forward to trying out some of  the new goodies in Postgresql 9.0.

Update: There are packages available for postgis (see comments below). Also, for convenience you can set your pg port to 5433 for your session by doing:

export PGPORT=5433

Fun with GeoNames

The GeoNames dataset provides a list of global placenames, their location and some additional information such as population and so on. It is published under the Creative Commons Attribution 3.0 License, which allows you to use the data for your own purposes. You can download the data by country, or the entire global dataset. In this article, I will walk you though how I downloaded the entire dataset, loaded it into PostgreSQL and added a geometry column so that I could view it in QGIS. Note that you can substitute these instructions for a specific country's data easily.

First up, lets get the data from the geonames download page!

wget -c http://download.geonames.org/export/dump/allCountries.zip

Note the download is around 196mb so if you live in an internet backwater like I do, expect it to take a little while. If the download gets disconnected, just rerun the same command again - the '-c' option tells wget to continue where it left off last time.

Once the data is downloaded, unzip it:

unzip allCountries.zip

You should now have a text file called allCountries.txt weighing in at just under 900mb. Next we can load it into PostgreSQL using a variation of this article. I highly recommend the use of schemas to partition your database into logical units. In the code listings that follow, it is assumed you have a schema called 'world'. If you need to create it, simply do:

create schema world;

From the psql prompt. Since I am only interested in the geoname table at the moment I simply do this in my database.

create table world.geoname (
         geonameid       int,
         name            varchar(200),
         asciiname        varchar(200),
         alternatenames  varchar(8000),
         latitude        float,
         longitude       float,
         fclass  char(1),
         fcode   varchar(10),
         country varchar(2),
         cc2 varchar(60),
         admin1  varchar(20),
         admin2  varchar(80),
         admin3  varchar(20),
         admin4  varchar(20),
         population      bigint,
         elevation       int,
         gtopo30         int,
         timezone varchar(40),
         moddate         date
 );

You will notice that I extended the alternatenames field size from the original tutorial's 4000 characters to 8000 characters in order to accommodate some longer entries that were causing my imports to fail with 4000 chars.

Next we can import the data (also from the psql prompt):

copy world.geoname (geonameid,name,asciiname,alternatenames,
latitude,longitude,fclass,fcode,country,cc2,
admin1,admin2,admin3,admin4,population,elevation,gtopo30,
timezone,moddate) from '/home/web/allCountries.txt' null as '';

Once again this is similar to the import line used by the original article I used, except I have used a full path to my allCountries.txt file. The import may take a little while depending on the speed of your computer.

When it is completed, you should have a bunch of data (~7.5 million records) in your table:

gis=# select count(*) from world.geoname ;
  count
---------
 7664712
(1 row)

It is going to be useful to have a unique id for every record - QGIS for one would like to have it, so lets add one (in addition to the geonameid field):

alter table world.geoname add column id serial not null;
CREATE UNIQUE INDEX idx_geoname_id ON world.geoname (id);

Because I know I will be using some other fields as basis for subset queries etc., I added some indexes on those too:

CREATE INDEX idx_geoname_population ON world.geoname (population);
CREATE INDEX idx_geoname_fcode ON world.geoname(fcode);

Ok thats all great, but there is no geometry column yet so we can't view this in QGIS easily. So lets GIS enable the table! First we add a new geometry column:

alter table world.geoname add column the_geom geometry;

Now we populate the geometry column:

update world.geoname set the_geom = st_makepoint(longitude,latitude);

Next we add a constraint to ensure that the column always contains a point record or is null

alter table world.geoname add constraint enforce_geotype_the_geom
CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL);

Finally lets index the table on the geometry field:

CREATE INDEX idx_geoname_the_geom ON world.geoname USING gist(the_geom);

Ok next we can connect to the database using QGIS and view our data! Note that you may want to filter the data or set up scale dependent visibility so that QGIS doesn't try to render all 7.5 million points when you zoom out.

I added a query filter like this to the layer properties -> general tab -> subset:

"population" >= 10000 AND fcode != 'PPLA' and fcode != 'PCLI' AND fcode != 'ADM1'

You should experiment and read the geonames documentation in order to define a filter that is useful for your purposes.

Geonames layer loaded in QGIS

The Geonames dataset is a wonderful asset to the GIS community, particularly to those of us who value Free Software and Free Data.

Update 6 Aprl 2011:

I encountered one issue with the above procedure: the SRID for the imported points is -1 when loaded instead of 4326. This cause problems e.g. in mapserver you get an error like this:

ERROR:  Operation on two geometries with different SRIDs

To fix the issue I ran the following update query to assign all points a proper SRID:

update world.geoname set the_geom = ST_SETSRID(the_geom,4326);

Note that it takes quite a while to run.  When it completes, you can verify that the points are all in the correct CRS by running this little query:

gis=# select distinct ST_SRID(the_geom) from world.geoname;

Which should produce something like this:

 st_srid
---------
 4326
(1 row)

For good measure, I also added the geoname table to my geometry columns table:

insert into geometry_columns (f_table_catalog,f_table_schema,f_table_name,f_geometry_column,
coord_dimension,srid,type) values ('','world','geoname','the_geom',2,4326,'POINT');

Lastly I also gave select permissions to my readonly user (which I use when publishing in a web environment):

grant usage on schema world to readonly;
grant select on world to readonly;

I have also fixed the formatting of some of the SQL snippets above for those who struggled to read it within the width of this page.

A week in Tanzania

I spent most of last week in Dar es Salaam, Tanzania. A lovely tropical country in the heart of Africa. I was there as part of a project I am working to create tools for Biodiversity Informatics practitioners. Of course the tools are based on Free Software:Quantum GIS and openModeller.

The attendees at the workshop were entertained by my talk about what FOSS is and why it is important, an introduction to QGIS slideshow (superbly presented by Marco Hugentobler), and ending with a tour of openModellerDesktop. We also did some live demonstrations of QGIS and openModeller, before going on to discuss details about how these tools can be used to support their Biodiversity Informatics workflows.

The meeting was funded by the Global Biodiversity Information Facility (GBIF) with Juan Bello as their representitive, and hosted by the Tanzanian Commission for Science and Technology (COSTECH).

In case you are unfamiliar with the aims of GBIF, they are facilitating the digitisation (or digitization for our american readers) of the worlds biodiversity records - herbarium records, museum collections and so on. COSTECH provides the local infrastructure and staff for the 'TanBif' node in Tanzania.

The meeting also included 'in-country' experts in the fields of GIS, Meteorology, Ecology, IT and so on. I think for all of the attendees, the concept of FOSS was a real eye-opener. African economies can't compare with those in Europe and the USA and the capital outlay for proprietary software that presents an irritation in the Western world is a major burden in the third world. So just knowing that they could dive in and use QGIS was a great revelation.

We finished our workshop a little early on the Friday so Marco and I offered to go along to the COSTECH offices and geo-enable their PostgreSQL species occurrence database and install QGIS on their desktop PC's running Windows XP. In the space of a couple of hours we were done - the major part of which was spent showing the TanBif staff members how to bring up the PostGIS layer in QGIS, perform simple queries and make maps. Having spent days in the past trying to get proprietary software like Oracle and Arc*** configured, optimised, licensed and generally usable, I was struck by just how easy and quick it is to get someone up and running with a robust enterprise ready PostGIS geospatial datastore and a user friendly Free Software desktop GIS like QGIS.

Thanks to the friendly Tanzanian folks for their hospitality - I look forward to my next visit! Here are some piccies from the trip...

image0

Juan Bello telling us about the cool things you can do with a good Biodiversity Information repository.

image1

The workshop attendees (Marco and Juan out of shot)

image2

Marco showing Godfrey how to use QGIS to bring up their PostGIS Biodiversity dataset.

image3

Godfrey proudly showing off his first map (made with QGIS)!

image4

Marco killing a mosquito - he became something of an expert!

Overpainting with Mapnik

The problem

I've been having a little poke around with Mapnik today (awesome software!). One of the things on my todo list has been to sort out rendering issues with roads we have been having. Our last iteration described roads something like this:

A style...

<Style name="Freeway30th_style">
    <Rule>
        <LineSymbolizer>
            <CssParameter name="stroke">rgb(169,170,153)</CssParameter>
            <CssParameter name="stroke-width">12.26</CssParameter>
            <CssParameter name="stroke-linejoin">bevel</CssParameter>
            <CssParameter name="stroke-linecap">round</CssParameter>
            <CssParameter name="stroke-opacity">1</CssParameter>
        </LineSymbolizer>
        <LineSymbolizer>
            <CssParameter name="stroke">rgb(255,172,88)</CssParameter>
            <CssParameter name="stroke-width">12.16</CssParameter>
            <CssParameter name="stroke-linejoin">miter</CssParameter>
            <CssParameter name="stroke-linecap">round</CssParameter>
        </LineSymbolizer>
    </Rule>
</Style>

...and this layer definition...

<Layer name="Freeway30th" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
  <StyleName>Freeway30th_style</StyleName>
  <Datasource>
      <Parameter name="dbname">&dbname;</Parameter>
      <Parameter name="estimate_extent">0</Parameter>
      <Parameter name="extent">&extent;</Parameter>
      <Parameter name="geometry_field">&geometry_field;</Parameter>
      <Parameter name="host">&host;</Parameter>
      <Parameter name="password">&password;</Parameter>
      <Parameter name="port">&port;</Parameter>
      <Parameter name="srid">&srid;</Parameter>
      <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
      'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
      <Parameter name="type">&datasourcetype;</Parameter>
      <Parameter name="user">&password;</Parameter>
  </Datasource>
</Layer>

With the idea being to render freeways with a gray outline and orange center. Unfortunately, it doesnt produce good results:

image0

The problem being those little line ends you see making gray splodges at the end of each segment.

The solution

Michael Migurski's blog discusses this issue a little in this article but doesnt directly explain how to achieve the desired effect. So here is what you do:

First the styles are split into two...

<Style name="Freeway30th_style-bottom">
   <Rule>
       <LineSymbolizer>
           <CssParameter name="stroke">rgb(169,170,153)</CssParameter>
           <CssParameter name="stroke-width">12.26</CssParameter>
           <CssParameter name="stroke-linejoin">bevel</CssParameter>
           <CssParameter name="stroke-linecap">round</CssParameter>
           <CssParameter name="stroke-opacity">1</CssParameter>
       </LineSymbolizer>
   </Rule>
 </Style>
 <Style name="Freeway30th_style-top">
   <Rule>
     <LineSymbolizer>
       <CssParameter name="stroke">rgb(255,172,88)</CssParameter>
       <CssParameter name="stroke-width">12.16</CssParameter>
       <CssParameter name="stroke-linejoin">miter</CssParameter>
       <CssParameter name="stroke-linecap">round</CssParameter>
     </LineSymbolizer>
   </Rule>
 </Style>

and then the layer is now rendered as two layers, the bottom layer first, then the top:

<Layer name="Freeway30th-bottom" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
    <StyleName>Freeway30th_style-bottom</StyleName>
    <Datasource>
        <Parameter name="dbname">&dbname;</Parameter>
        <Parameter name="estimate_extent">0</Parameter>
        <Parameter name="extent">&extent;</Parameter>
        <Parameter name="geometry_field">&geometry_field;</Parameter>
        <Parameter name="host">&host;</Parameter>
        <Parameter name="password">&password;</Parameter>
        <Parameter name="port">&port;</Parameter>
        <Parameter name="srid">&srid;</Parameter>
        <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
        'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
        <Parameter name="type">&datasourcetype;</Parameter>
        <Parameter name="user">&password;</Parameter>
    </Datasource>
</Layer>
<Layer name="Freeway30th-top" srs="+init=epsg:&srid;" maxzoom="39105.90277777778">
    <StyleName>Freeway30th_style-top</StyleName>
    <Datasource>
        <Parameter name="dbname">&dbname;</Parameter>
        <Parameter name="estimate_extent">0</Parameter>
        <Parameter name="extent">&extent;</Parameter>
        <Parameter name="geometry_field">&geometry_field;</Parameter>
        <Parameter name="host">&host;</Parameter>
        <Parameter name="password">&password;</Parameter>
        <Parameter name="port">&port;</Parameter>
        <Parameter name="srid">&srid;</Parameter>
        <Parameter name="table">(SELECT * FROM "l_roads" WHERE "type" = \
        'Freeway' ORDER BY LENGTH(&geometry_field;) DESC) as "l_roads"</Parameter>
        <Parameter name="type">&datasourcetype;</Parameter>
        <Parameter name="user">&password;</Parameter>
     </Datasource>
</Layer>

image1

A much cleaner rendering!

Note

This approach consumes more cpu time and hits your database harder than the 'messier' approach shown first.

Also you can see in the example above, I have adopted Michaels approach of rendering long lines first.

Have fun with your mapnik maps!

Introduction to PostGIS

Horst and I are spending the week up in Johannesburg at the Satellite Applications Center in Hartebeeshoek. We are doing yet another week long training course (I hope I'm not working the poor guy too hard :-P ). This time we are doing:

- Two days QGIS (with a little GRASS)
- One day PostGIS
- Two days geospatial programming with Bash, Python and QGIS

Tomorrow we start with the PostGIS component. Horst and I have been compiling some course notes for the PostGIS module which we are making available to the world as per usual. The pdf still has some rendering issues - we are aware of that. The document tries to walk the reader through the basics of using SQL and then some basic activities with PostGIS and working with geometries.

I hope some of you out there find it useful - let us know if you do! Also if you have any improvements to make, we'd love to hear from you.

Here is a quick pic or two from the course:

image0
image1
image2
image3

Automated backup of multiple PostgreSQL/PostGIS databases

I have a cron job that backs up my PG databases and emails them to me every night. Today I wanted to upgrade my PostgreSQL 8.3 databases to PG 8.4 so I made a few modifications to my script so that I could dump out my PG data, and then restore it under PG 8.4. In case you are wondering I am doing this because Ubuntu Lucid now ships with PG 8.4 (yay!). I also made the script generate another script to restore the databases. So basically the procedure is to run the script below on your 8.3 cluster, shut that down and bring up your 8.4 cluster, and then restore your databases into that. Here follows my little script:

MYDATE=`date +%d-%B-%Y`
MYBACKUPDIR=/home/timlinux/sql_backups
MYRECIPIENT=tim@linfiniti.com

DBLIST=`psql -l \
  | awk '{print $1}' | grep -v "+" | grep -v "Name" | \
  grep -v "List" | grep -v "(" | grep -v "template" | \
  grep -v "postgres"`
for DB in ${DBLIST}
do
  echo "Backing up $DB"
  FILENAME=${MYBACKUPDIR}/PG_${DB}.${MYDATE}.sql.tar.gz
  pg_dump -f ${FILENAME} -x -O -F tar ${DB}
  #If you want to email the database uncomment
  #below (will cause issues if backups are large)
  #mpack -s "Daily $DB PG backup for ${MYDATE}" $FILENAME $MYRECIPIENT
done

echo "Procedure to restore one of your backups:"
echo "createdb "
echo "pg_restore -F t .sql.tar.gz |psql "
echo "Or to restore in a batch make a script like this:"
echo "for FILE in /home/timlinux/sql_backups/*; do DB=\$(echo $FILE | \"
echo "  sed 's\/home\/timlinux\/sql_backups\/PG_//g' | sed 's/.${MYDATE}.sql.tar.gz//g'); "
echo " 'Restoring: \$DB'; createdb \$DB; pg_restore -F t \$FILE |psql \$DB; done"

Update 02 May 2010 Uncommented pg_dump line which was inadvertantly commented in my original post.

Dissolving features by an attribute

Hi, I'm Sam. I've been learning a lot here at Linfiniti (thanks to the brilliant team!) Just like to add a quick note on one of the tasks I learnt this week.

I was working on a shapefile of the suburbs in Cape Town. A client required the suburbs to be grouped by region. After the tedious part of manually grouping the suburbs (using a created field, REGION), the unioning (dissolving) of the suburbs proved to be quick and painless through PostgreSQL, using this SQL command that implements the geomunion function:

create table ct_regions as select geomunion(the_geom), "REGION" \
 from "ct_suburbs" group by "REGION";

“ct_suburbs” is the original shapefile that was loaded into the PostGIS database using the Quantum GIS 'SPIT' plugin. “REGION” is the class (attribute) that I wish to union by.  And ct_regions will be the output shapefile. See the result here:

Before dissolving (ct_suburbs shows suburbs)

Cape Town suburbs before dissolving

After dissolving by suburbs (ct_regions shows collections of suburbs that have been merged)

Cape Town suburbs after dissolving

Hopefully this will be of some use when it comes to your own mapping!

Getting up and running with PostGIS in a jiffy

I've posted this before on my old blog so this is a repeat for those looking to get going with PostGIS in a hurry. This procedure should work on Ubuntu Jaunty or Ubuntu Karmic and possibly earlier versions.

sudo apt-get install postgresql-8.3-postgis
sudo su - postgres
createuser -s -d -r -l -P -E timlinux
createdb gis
createlang plpgsql gis
psql gis < /usr/share/postgresql-8.3-postgis/lwpostgis.sql
psql gis < /usr/share/postgresql-8.3-postgis/spatial_ref_sys.sql

Having done that, you should be able to connect to the database by creating a new postgis connection in QGIS. After that you can start to load data into your spatial database using SPIT (QGIS plugin to import shapefiles into QGIS) or the shp2psql command line tool.

Visual PostgreSQL Query Builder coming in Karmic

I usually upgrade to the next-gen ubuntu release a month or two before it comes out so I can test QGIS and other software on the new platform. Today I was looking for a visual query designer for PostgreSQL because I wanted to make some complex queries with multiple joins. I'm ok for dealing with a few simple joins when hand writing my SQL but when things get complex, having a visual query builder is a great time saver.

I tried a whole bunch of options today:

- pgdesigner (linux) crashed all the time and was unusable
- SQL Maestro (win) a very nice looking app but commercial and runsonly under windows.
- Navicat (linux via wine) looked reasonable at first but had some severe rendering issues preventing me from being able to see therelation lines between entities. It also seemed to have trouble actually running the SQL it generated. Also it is commercial software and I prefer FOSS.
- OO Base (linux) with the sjdbc driver. I was able to test connect, but try as I might I couldnt get it to save my connection and let me enter the visual design mode.

Finally I took a re-look at PGAdmin III. I have used it on and off over the years but usually settle back to using psql from the command line. Imagine my surprise to see that in version 1.10 that ships with Karmic, it now includes a visual query builder.

Visual Query Builder in PGAdmin III

This is a real notch in the belt for postgres as we are now able to provide a simple solution for novice users to build their own queries visually!

Clipping data from Postgis

So you have moved your spatial data into PostGIS and everything is great....until you start wanting to clip it out again. Luckily clipping data within the postgres environment is trivial with the little help of SQL. However I wanted something a little more fancy - a way to iterate through all my spatial tables and clip out an arbitary polygon, saving the result to a shapefile. To achieve this I wrote a little bash script that does just that. The script will also optionally make sql dumps of the data and clean up after itself when it is done.

The SUFFIX option is used to append to the newly created shapefile / table / sqldump file.
The CLIPTABLE is the table containing polygons that should be used to intersect each table.
The CLIPFIELD is a field used to subset the cliptable e.g. to select a single polygon only.
The CLIPGEOMFIELD is the name of the geometry field in the CLIPTABLE.
The CLIPFIELDVALUE is the clipfield 'where' clause e.g. "Eastern Cape"
The DB is the database containing tables to be intersected.
If you set DELETETABLE to 0, it will create clip tables in the database with the suffix you specify.
If you set DELETEGEOM to 1 it will delete the original geometry, leaving only the intersected geometry.
If you set SHPDUMP to 1 it iwll create a shapefile for each intersected table.
If you set SQLDUMP to 1 it will make a sql dump file of each intersected table.

Note that for large dataset, clipping with an arbitary polygon can take a long time! Here is the script:

#!/bin/bash
SUFFIX="ecape"
CLIPTABLE="ecape"
CLIPFIELD="provname"
CLIPGEOMFIELD="the_geom"
CLIPFIELDVALUE="Eastern Cape"
DB="cdsm50k"
DELETETABLE=1
DELETEGEOM=1
SHPDUMP=1
SQLDUMP=0
pushd .
cd /tmp

for TABLE in `echo "\dt" | psql $DB \
    | awk '{print $3}' | grep "^[a-zA-Z]"`

do
  GEOMFIELD=`echo "\d $TABLE" | psql $DB | grep "geometry" \
  | grep -v "geometrytype" |awk '{print $1}'`
  if [ "$GEOMFIELD" == "" ]
  then
    echo "$TABLE has no geometry column, skipping"
  else
    echo "$TABLE -> $GEOMFIELD"
    echo "drop table \"${TABLE}_${SUFFIX}\";" | psql $DB
    # Note we use the && bounding box query first and
    # then the intersects query to avoid unneeded comparison of
    # complex geometries.
    SQL="CREATE TABLE \"${TABLE}_${SUFFIX}\" AS \
            SELECT \
            ST_Intersection(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AS intersection_geom, \
            v.*, \
            m.$CLIPFIELD \
            FROM \
              \"$TABLE\" v, \
              $CLIPTABLE m \
            WHERE \
              ST_Intersects(v.$GEOMFIELD, m.$CLIPGEOMFIELD) AND \
              $CLIPFIELD='$CLIPFIELDVALUE';"

    echo $SQL  | psql $DB

    if [ $DELETEGEOM -eq 1 ]
    then
      echo "alter table \"${TABLE}_${SUFFIX}\" drop column $GEOMFIELD;" | psql $DB
    fi

    if [ $SHPDUMP -eq 1 ]
    then
      pgsql2shp -f ${TABLE}.shp -g "intersection_geom" $DB ${TABLE}_${SUFFIX}
    fi

    if [ $SQLDUMP -eq 1 ]
    then
      pg_dump -D $DB -t ${TABLE}_${SUFFIX} > ${TABLE}_${SUFFIX}.sql
    fi

    if [ $DELETETABLE -eq 1 ]
    then
      echo "drop table ${TABLE}_${SUFFIX};" | psql $DB
    fi
  fi
done
echo "vacuum analyze;" | psql $DB
popd

EAV Database Modelling

Sometimes just because a thing is possible doesn't make it a good idea. In the last week I have been helping a client whose web developer has disappeared. The developer implemented the database using a variation on the Entity Attribute Value (EAV) modelling paradigm. The database consists of only four tables: Entities, Attributes, Values, EntityTypes. So the entities in the database don't match any real-world entities - there is no products table for products, no categories table for categories and so on. This generic modelling approach might sound good but in practice it's a real pain. Look at this query diagram representing a simple query to find out what categories exist in the database:

EAVDBModel

The query consists of numerous joins and is completely opaque. The EAV wikipedia article I referenced above contains a list of downsides to the EAV approach, with which I can only concur. For a new developer approach an EAV implementation its an uphill battle to understand what is going on. Writing queries is a cumbersome task, and the logic such as foreign key constraints that would normally reside in the database layer now needs to be implemented in the application codebase.

For me EAV is something I will be avoiding in any new developments I embark on...

Automatically dumping all Postgres tables into their own SQL files

Someone asked on twitter it is possible to dump all the tables in Postgres to individual shp files. Some time ago I wrote a script to dump all tables as SQL dumps. The question prompted me to tweak that script to drop out shapefiles instead.

My original script looked like the listing below. The dump files contain data only (see the comments in the bash script below) because I use this script to create fixtures for my django projects.

#!/bin/bash

# A script to create sql formatted fixtures (serialised models)
# used to initialise the application if you install it to another
# machine. You should run this any time you change your models
# or when you need to make a backup of all your data.

# Tim Sutton 2009
mkdir bees/sql
for TABLE in `echo "\d" | psql sabio | grep -v seq | awk '{print $3}'`
do
  echo $TABLE
  # -a data only
  # -t table
  # -D dump as sql inserts
  pg_dump -a -t $TABLE -D sabio > bees/sql/${TABLE}.sql
  #bzip2 bees/sql/${TABLE}.sql
done

To make the script drop out shapefiles I modified it a bit as shown in the next listing. Obviously as we are dumping shapefiles, we should only bother dumping tables with geometry in them so I went the route of using the geometry_columns table to decide which tables to dump...

#!/bin/bash

# A script to dump shapefiles of all tables listed in geometry_columns
# Tim Sutton 2009
mkdir bees/sql
for TABLE in `echo "select f_table_name from geometry_columns;" | psql sabio \
  | head -n -2 | egrep -v "\-\-\-\-\-\-\-\-\-" | egrep -v "f_table_name"`
do
  echo $TABLE
  pgsql2shp sabio $TABLE
done

Hope this is useful to someone out there :-)

Handy tip for the day : Backing up PostgreSQL data to a remote machine

Ok so I have a few production databases that I need to back up regularly. The trick is I want to run the backup from a remote machine so that the backup lives on a separate server to the actual database system. You can run backups manually like this (assuming your database is called 'postgis'):

pg_dump -h dbhost -f postgis_`date +%d%B%Y`.sql.tar.gz -x -O -F tar postgis

When you run the above command, you will be prompted for a password. After entering the password you will find a date stamped backup. Very nice, but you may have noted that pg_dump has no option for giving it the password on the command line - it expects you to do that interactively. So what do we do if we need to automate the packup using a cron job? The solution is to use either ~/.pgpass or the PGPASSWORD environment variable. So here is how I automated the backup by placing a script in /etc/cron.daily/

export PGPASSWORD=secret
pg_dump -h dbhost -f postgis_`date +%d%B%Y`.sql.tar.gz -x -O -F tar postgis

PostGIS 1.4 Released

Good news if you are a PostGIS fan - the PostGIS team just released version 1.4! See the release announcement for full details...

  • Page 1 of 1 ( 19 posts )
  • postgres

Back to Top

Sponsors