GIS Enabled Database Part 1 – MySQL

April 4th, 2012

The first step in putting together an open source suite of GIS software is to install some sort of storage medium for your spatial data.  There are many options out there, ranging from the ESRI personal geodatabase, which uses a Microsoft Access database, to a fully relational database management system like PostGreSQL with PostGIS.  Each post in this series will look at one of these database types.  I will discuss basic installation, and then customization like changing the location of the database files, and installation of spatial functionality.

The first database in the series is MySQL.  I am starting with this database because it is arguably the most commonly installed database on most web hosting services.  It is an integral part of the LAMP (Linux, Apache, MySQL, Python/PHP) open source web-hosting application stack.  As with any database, there are pros and cons, some of which I will mention at the end of the post.

Now, on to the installation. This will be focused mostly on a Windows 7 installation, though some options will be applicable to other platforms.

  1. Download the appropriate installation package:  MySQL Downloads
    , The free version is the Community Server, and 5.5 is the current version.
  2. Start the installer program.
  3. Choose the installation type: Typical, Custom, or Complete. I usually choose the custom or complete in the cases of GIS and professional software because I do not want to have to go back and find some part of the software that is left off in the typical install, that I end up needing later. Besides, though it takes more space, wherever you install the database, whether a server or personal computer, needs to have adequate storage space anyway. In this case, I am choosing the custom install because I want to modify the standard database locations.
  4. Specify the path for database storage.

    Set Custom Data Path

  5. That completes the setup options. Now, start the installation.
  6. When the installation is finished, make sure that the box is checked to run the MySQL Instance Configuration Wizard. This will set parameters to tune the database depending on your use case.

    MySQL Finish Install - Run Instance Configuration

  7. Continue to configure the MySQL Server Instance.
  8. Set the configuration type. I chose the “Detailed” configuration as I want to specify at least one parameter.
  9. Set the server type: Developer Machine; Server Machine; or Dedicated MySQL Server Machine. I chose the Developer Machine in this case as the install is on a home computer with many other applications. If you are installing this on a server, or on a dedicated machine, then obviously, the other options would be applicable. The option chosen here affects settings related to cpu usage, and memory allocation.
  10. Specify the database usage: Multifunctional database; Transactional Database only; Non-Transactional database only. I chose Multifunctional database as this takes into account all different types of transactions you may have on the database. This setting to relates to the Database engine that is set to primary for the database. The two engines are InnoDB and MyISAM. Prior to version 5.5, the MyISAM engine was standard, and this was the only one that supported the spatial data types. This engine had no failure protection however, and thus would not ensure data integrity. The InnoDB engine supports ACID transactions, and has now been made the standard engine. As well, it now supports all the spatial data types. This is testing the limits of my knowledge, but as I understand it, the InnoDB engine works more like the PostgreSQL database engine.

    MySQL Database Usage

  11. Configure the InnoDB Tablespace location. These are indexing files created by the InnoDB engine. They should be located in the same directory as the databases.

    Configure InnoDB Table Locations

  12. The last configuration screen covers whether to set up MySQL as a Windows Service, and whether to start it automatically. It also gives you the option to automatically add the “bin” path to the Windows Path environment variable. This will allow you to run MySQL directly from the command line.

    Configure MS Windows specific options

    That completes the installation. Unlike some of the other database packages out there, no configuration is required to enable to spatial data types. Many or most of the options that were set through the wizards above, may also be specified in the “my.ini” file, located in the MySQL installation directory. A couple of the easiest ones are shown below, in case you were to run the typical or complete installs, which don’t allow for configuration. These are the options for setting the directory for database storage, and the directory for the InnoDB tables.

    MySQL ini file - Set database directory

    MySQL ini - SEt InnoDB tables directory

Hopefully this gives you a clear path for installing MySQL on your workstation or even on a basic server. There are obviously security considerations for installing on a server that are out of my expertise. There is a lot of talk on the interwebs regarding the stability and standards-compliance of MySQL compared with Postgresql and other RDBMS. I am not going to dive into that argument except to say that until more web hosting options begin to include Postgresql as an installable database option, then people will be forced to use MySQL to host spatial data that they want to use as part of a web application. I think it is possible the data stability is changing with the switch to the InnoDB engine as standard. Since this engine supports ACID compliance, which provides transactional redundancy and recovery options, it would seem the developers are at least making an attempt to address those concerns.
That is all for now. The next post in this series will cover installation and configuration of PostgreSQL. I will also be putting up some posts covering importing data into the different spatial databases, using a variety of tools.

The Command LIne – What’s the Use?

March 23rd, 2012

The title of this post is in some ways rhetorical because to some people, this is their primary means of interaction with a computer. With the majority of people, though, I think a safe assumption is that the command line is something they have heard about, but don’t really understand where it is, or even, why they would want to use it.

WIndows 7 Command Line

Command Line

To gain a better understanding of why the command line can be useful, it might be helpful to look at what software does, and how that is shown to you on the monitor. With modern software, programs use a graphical user interface with buttons and dialog boxes that allow you to enter, or choose, the appropriate parameters for a given task. Back in the days of DOS, AutoCAD and command line Arc-INFO, this was not the cause, mainly because the software didn’t have the graphical display capabilities of today. What this meant at the time, was that a person had to be able to view every parameter or command that was available, through the command line. Users also had to be able to enter commands and specify these parameters. This of course, created some standardization of format for command line entry. With the advent of the GUI, there is now a dialog box for each command. That dialog had to be designed to incorporate all of the possible parameters for each command. This need to expose all the parameters is, I think, where the strength of the command line rises. Many of these commands have so many possible parameters, and options, that it is not possible to fit them all into a single dialog. Thus what happens, is that people end up using default values where they may actually have had a different choice, they were just not aware. Had they been using a command line, all the options available to that parameter would have been exposed to them.

SQLite Help - Command Line

Command LIne - SQLite Help

Another place where the command line shines has to do with clarity. In order to save space, the options in a dialog box frequently have brief or no descriptions of what is required. The descriptions in the command line window can be more complete because there is not the limitation of space. Even if you can’t see the descriptions all at once, you can still scroll the window back up to review as necessary.
The final place where the command line shines is speed. This relates to both speed of entry, and the speed of seeing results. All of the entry in the command line is done with the keyboard. Many command line implementations have many shortcuts built in to allow you to recall commands, and do quick entry using the arrow keys or certain keystrokes. This makes it very easy to enter a large amount of information very quickly. Also, in some cases, you can write out a series of things in a text editor and copy and paste to the command line. The speed advantage arises over the dialog because you never have to move your hands from the keyboard. When you are entering data through a dialog box, there is much more movement required. You have to move the mouse around to the correct entry point. Then, you may have to use the keyboard to type something in and repeat this cycle many times. Each time you switch entry methods, it takes time and breaks continuity.

Command LIne vs GUI

SQLite Data Entry - Command Line vs GUI

The picture above is a very good example of the previous paragraph. It shows data entry into an SQLite database, using both the command line and a graphical user interface for this database type. On the left, you see that there are 3 lines showing new rows being added to the database, with values for each of the fields. On the right, you see the GUI, which contains a window in the back with the rows in the database. The front window is one of the fields with data being added to it. This window will have to be opened to enter data into each field for each row.
So, to recap. Using the command line can save time by giving more detail about possible commands, quicker access to multiple programs, and fast data entry. Many programs have the ability to be run from a command line interface. Finally, one last benefit of the command line: You may come to understand the program you are using more thoroughly because as you enter commands and their parameters, you have to make sure that you enter all the parameters necessary, and have data in the correct form for the command to execute successfully. This gives you the nuts and bolts of the software that the graphical user interface doesn’t always reveal.

QGIS – Vector Data Connection

March 24th, 2011

One of the major differences between many COTS applications and Open-source is in the area of adding layers and data to your project.  While these programs share many of the same technologies on the back end to actually connect with the various data sources, for example, the GDAL/OGR libary, the user-interface is very different.
ESRI products have the place where you connect to a data source, separate from the interface to add data to a project.  For those products, ArcCatalog is the browser where you load in various data layers, and the data connections are set up under a different menu.  Once you have established a particular data connection, then any layers available through that connection are integrated into the ArcCatalog window, or the “Add Data” window in ArcGIS.
The interface that qGIS uses to connect to data sources varies depending on what source you are connecting to.  There is not a single data connection button.  Instead, there are several, one for vector layers, raster layers, PostGIS, SpatiaLite, WMS, and WFS layers.  There are also buttons to create a new ESRI Shapefile layer, remove a layer, and for GPS tools.

QGIS Data Connection Toolbar

This post will focus on “Add Vector Layer” dialog.  Future posts will address more obscure features of some of the other options.

QGIS Add Vector Layer Dialog

Unlike in the ESRI products, the dialog in QGIS where you add a layer is also where you define connections to new data sources and choose layers from those data sources to be included.

We will review each of the major connection types, with samples of connections that worked and didn’t work.  Each connection window gives you the option to test the connection.  I will include a couple of failed connection tests to help interpret the errors so you know what to look for in the future.

Add ESRI Shape

QGIS Add Vector - ESRI Shape

Adding an ESRI Shapefile to QGIS is about as straight-forward as you can get it.  Open up the add vector data dialog, leave it on the default selection which is “file”, then browse for the shapefile.

Add ESRI Personal Geodatabase Layer

Add ESRI Personal Geodatabase Layers

Adding a layer from an ESRI Personal Geodatabase layer is also relatively simple.  Simply open up the add vector layer dialog, leave on the default of “file”, browse to the desired database.  If the database is not showing up in the browse window, change the file type filter to “.mdb”.  Once you select open, a dialog will come up that lists all of the layers present in the PGDB, with a count of the features in each layer.

Select Layer from ESRI PGDB

Select one or more of the layers that you want and click “OK”.  The layers will be loaded.

Add ArcINFO Coverage

Add Vector Layer - ArcINFO Coverage

For an ArcINFO Coverage, select the “Directory” radio button.  Under Source, Type, choose the “ArcINFO Binary Coverage” option.  Then browse and select the coverage name.  This will then open the list of vector layers within the coverage, the point, line and polygon representations of the features.

QGIS Add Vector Layer - ArcINFO

Add MySQL Spatial Layer

Of the data types that I have discussed adding so far, the MySQL process is the most complex.  This is evident by looking at the changes to the dialog box when you select the “database” option:

Add MySQL Vector Layer

The dialog has expanded to show a section for making connections.  There is an existing connection here, named “socalurisa”.  I am going to set up a new one.  First step is select “New” connection.   This brings up the following dialog, which I have already filled a number of values into.

QGIS - New Database Connection

Stepping through these values one by one, we have:

  • Type:  Pulldown options include MySQL, PostGreSQL, ODBC, etc.  I have chosen “MySQL” for this example.  One item to note here is that the option for an ESRI Personal Geodatabase seems to be implemented incorrectly.  This is the second place where you can add a PGDB.  The first is through the “File” selection option discussed above.  That option works simply.  As yet, I have not been able to successfully set up a database connection to the PGDB.
  • Name:  Choose the name for this connection.  Connections are made to individual databases, and may be local or remote, so make sure it is relatively descriptive.  It is easy enough to simply view the connection to review the details, but a descriptive name is always beneficial.
  • Host:  This describes where the database is hosted.  In this case, the MySQL installation is on the local machine, so the name is “localhost”.  If you were loading a database from a remote server, like a website, the host name might look something like this: “somedatabase.db.3111.somedomain.com”.  It can be a challenge to find the correct name of a remote database, but it is usually listed on the administration page of your remote host.
  • Database:  The name of your database.  In this case, “socalurisa”.  From a quick test, it seems that the database name is NOT case sensitive.  I do not, however know if this would raise problems when trying to execute other queries to that database from within QGIS if you do not match cases correctly.  Better to be safe and match the database name exactly.
  • Port:  3306 – Short answer – Computers have virtual ports open through the operating system that different types of data are allowed to pass through.  Some programs use a specific port to pass their data through.  RDBMS systems are embedded into the system to an extent that you actually access them by going out of the computer and creating a connection back in, through this port.  Hence, the reason for a data connection and a host and port name.  This is basically a WAG (Wild Ass Guess) on my part as operating system inner workings are definitely not a specialty of mine.
  • Username:  testuser
  • Password:  usertest

Let’s use the test connection to see if this connection is successful

QGIS - Failed Database Connection Test

There seems to be a problem with this connection.  It looks like I put in the wrong password.  Of course, the only way I know this is that I purposefully did this.  There really isn’t that much in the way of explanation as to why the connection failed, just “Access denied” in the last line.  Let’s fix the password and try the connection again:

QGIS - Successful database connection

This time, the connection was successful because the password was fixed.  This now will bring up a layer selection dialog similar to those seen previously where you may choose which database layers to load.  I have one comment about connecting to databases that is very important for people to consider.  The passwords are NOT encrypted.  If you load in a database layer, and go to the layer properties in the table of contents, the database connection properties are clearly listed.  This seems like a rather Glaring security lapse, but there may be a reason for it.  I will do some research and post a follow up.

This winds up the major vector data types you can load in to QGIS.  There are a couple of database types that I didn’t walk through here, but their connection format is going to be very similar to MySQL.  There seems to be a lack of documentation in the manual about setting up data types.  The only ones discussed there are SpatiaLite and PostGIS.  I suppose most of the others are self explanatory, but a primer never hurts.

Open Source GIS Central – OSGeo.org

March 13th, 2011

Finding open source GIS software is as simple as a Google Search.  Of course, once you do that, sorting through the hundreds of returns to find something useful, is likely to be a challenge.  If you look at the results for “open source GIS”, the first would seem likely.  That site, however, hasn’t been updated since 2008.  The 2nd return, QGIS is a much better bet.  As well, it will provide a link to what I consider the natural starting point for all open-source GIS hopefuls.That site is the Open Source Geospatial Foundation – www.osgeo.org.
Here is what the front page of their site says:

“Created to support and build the highest-quality open source geospatial software. Our goal is to encourage the use and collaborative development of community-led projects.”

Here is a quick run-down of the links and information available on the website.
The left side-bar is all about OSGeo.  There is information about the Foundation, blogs, a store, and an extensive user community.  In addition, there are translation pages for languages around the world.
The right side-bar contains the projects that OSGeo has a direct influence in, or is providing support for.  They are divided into the following sections:  Web Mapping; Desktop Applications; Geospatial Libraries; Metadata Catalog and Other Projects.
The middle section contains news and updates to a number of geospatial focused blogs.
There are a number of projects that are marked with a bullet.  These are projects that are being incubated by OSGeo.  This means that they are being given extra attention and funding to help develop them more quickly.
The programs that I am using that are referenced here include:  Quantum GIS – Desktop Mapping; GDAL/OGR – Geospatial Libraries; and OpenLayers, MapServer and Geoserver – Web Mapping.
There is a great deal of information about each project available through the various links.  I encourage you to check them out and develop your own open source GIS software suite.

Where, but also why and how

February 11th, 2011

This is the first post of the blog, so I suppose I should say what I think it is going to be about.  Initially, I was going to begin writing a series of tutorials for using open source GIS products to mimic the functions of ESRI’s ArcGIS software suite.  The more that I have started using free, open-source software though, the more that I realize that is much to high a level for where my capabilities currently are.  That requires far more expertise and experience than I currently have with the software.  For this reason, I am going to slightly change my plan for the site, though it may turn out similar in the end.

Initially, instead of doing specific tutorials to describe a function from the open source function in terms of the corresponding ArcGIS function, I am going to simply start to document my experiences with installing and starting to use the various software.  The software is all going through pretty significant development, and people are trying to add functionality to them.  Where they seem to be lacking, though, is in the area of documentation.  This is going to be my focus.  Using the software, and breaking it down as I use it, with comments about what works, and what doesn’t.  At the same time, if it fits, or there are similarities, I will provide comparisons to ArcGIS.

There are a number of programs that I have used so far.  Here is a breakdown:

  • Desktop Software:  Quantum GIS is the primary package that I am using.  I also have uDig installed, but have not used it as yet.  I have also used the GUI of the SpatiaLite database referenced below.
  • Database:  I am using SQLite for smaller functionality.  This database is an open source, single file database structure similar to Microsoft Access.  It however, complies with SQL querying standards to a large degree.  There is also a spatial component for SQLite, called SpatiaLite.  Quantum GIS is able to write directly to an SQLite database.  At this point, I see it being my data storage format of choice, since it is easy to use and install.  The other database that I am using is MySQL.  I am using this one because it is so widely installed on web hosts, and thus is a good choice for web-based mapping.  I know that a lot of people use PostGreSQL/PostGIS, and I am not knocking that software at all.  I have it installed as well and will be working with it too.  I feel that it is important to become familiar with both RDBMS.  With the ease and ability of these systems to store spatial data, I see no point in having spatial data on a website, in a format other than a database.
  • Programming Languages:  PHP, Javascript, Python.  I list all of these because from what I have seen, anyone getting involved with open source software, is going to have to do some programming.  I will be clear in saying that I am not a programmer, yet.  I think of the 3 listed, that Python is the one language that all GIS professionals should probably learn, whether or not they are using open source software.  ESRI is adopting it more and more as their language of choice as well.  PHP is a key language for accessing databases, and thus attribute data, from a web site.  It is obviously useful for many other things, but this has been my primary focus.  Javascript is the web-mapper’s customization gold mine.  The open source map serving software mostly uses javascript for implementation, and thus customization.  Fortunately, all of these are object oriented, and once you learn the syntax, the theory is pretty similar.

I have started to investigate some of the web-based GIS products, but that is a topic for another blog.

I am interested in finding other software that people are utilizing, so feel free to make comments with ideas on other software and your experiences with the ones listed here.  Future posts will have more in depth discussions of them, as I talk about installing, setting up, and eventually using the software.  This will definitely include challenges that I have encountered, as I believe that reinventing the wheel is pointless.  If I have had a problem with something, and figured out a solution, or process by which the given task was accomplished, then I will share it.  The big drawback of free and open source software, is the lack of documentation.  I think this is mainly due to the fact that development is progressing rather rapidly, and documentation is difficult, though necessary.  Hopefully this blog will help draw the curtains back and show that it is not as hard to use as people think.