Tip of the Day – Create indexes for all spatial and attribute tables

2017/03/05

Today’s Tip

Create indexes for all spatial and attribute tables

Discussion

Today’s tip arises as a result of setting up layers using a Foreign Data Wrapper in Postgres. The layers were initially loading quite slowly, and I realized that I had not added a UniqueID, which I mentioned in TOTD 002, or any sort of an index.

Before we get into it, let’s talk about what an index is. Wikipedia says here, that "An index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure."

In simpler terms, an index is essentially a table that contains the values you wish to index, and assigns each of them either an integer, or a binary number. In some cases, the values are sorted prior to the index values being created, to increase performance. Another table is then created which relates these integer values with the data table rows they correspond to.
Here is a sample data table:

Data Table
Row State       County     City
1   Washington  King       Seattle
2   Washington  Snohomish  Everett
3   Washington  King       Bellevue
4   Washington  King       Renton
5   Washington  Thurston   Olympia

Here is the index built on that data table:

Index             Index-Data Link
County       ID    Row  IndexID
King         1     1    1
Snohomish    2     2    2
Thurston     3     3    1
                   4    1
                   5    3

When you have an index and perform a search, instead of performing a text search, or searching for a complex number, the search is performed on an integer or a binary number, which is generally faster.
The example above is vastly simplified. When you move into a spatial data, the indexing ability, and thus the benefits, can be much greater. With most spatial indexes, the features are placed into smaller and smaller groupings of cells. Since the index tracks what objects are in a particular cell, it is easy to know what other objects are in the same cell, for comparison purposes. When you perform a spatial search, your search parameters are compared to the index. This comparison can narrow, and thus speed the search by excluding those cells not touched by the search extents.
Most operations you do with a database involve some form of search. This search can take many forms including:

Finding particular row or attribute for editing
Identifying particular features for intersection or merging in a spatial function
Identify function to retrieve attributes for particular feature on map
Sub-query of features by attribute or by spatial relation to other features.

The common theme among all of these is that they are searching for particular attributes or locations of features. When indexes have been created, some level of this information is already being maintained, whether the list of attributes in a field, or the coordinates of all features in a layer. If they didn’t exist, determining this information would have to be repeated with every query performed.

So, if you deal with databases, or spatial data, do yourselves a favor: Check all of your tables, and make sure that indexes have been created for all spatial fields, and any attributes where you would expect regular searches to be performed. You may very well see a dramatic increase in performance.

Let me know what features or attributes you always make sure to index.

Database GIS Tip of the Day