Today’s tip came to me after I was trying to link some tables into my database, from another database. What I saw in that database looked something like this:
Repeat this for a couple of hundred tables. This structure, I feel, is a holdover from the old days of taking a database and normalizing it to be as efficient as possible. Now, don’t get me wrong, normalizing a database for efficiency is a great thing. The issue is that it shouldn’t be automatic, and there needs to be some thought put into it.
Let’s look at what to do instead. First, another thought about the table names, as they reflect a situation that has changed with modern databases. Back in the early days of computers, software, etc., there were limitations on how long a file name or a table name could be. That required you to essentially encode information into the database table names when it could only be 8 characters long. Modern databases have gotten past that limitation, and I think it is high time we take advantage of this.
First lets talk about grouping your data. Most databases support the creation of schemas. These allow you to divide your data up into logical groups to keep similar, or related, tables together. Some common groups might be:
- CIP (Capital Improvement Planning)
In fact, some groups require that your tables be inside of a schema, with
Public being the default option. In addition, other schemas are created for individual users and also for extensions you have added to your database. A common one is Tiger for street routing and geocoding purposes. When you are browsing through your data, it could become somewhat confusing if your data schemas are intermixed with other database maintained schemas. Therefore, I generally add a prefix to the ones I create, that will introduce an alphabetical grouping.
A simple way to keep your schemas together, and at the top of the list, so to speak, would be to add “aa”. For example:
If you want to avoid the possibility that a user name schema could possibly fall into that order, then include an underscore, ie:
That covers the larger groupings of data. Now, let’s look at the table names themselves. As mentioned above, table names were restricted in length. This meant that every character was important, and generally had some specific meaning. Unfortunately, this meaning was not always clear to the uninitiated user. With this restriction having been removed, we have the opportunity to be more descriptive. At the same time, we should be logical.
When you are working with GIS data, there are a number of data types that you run across. You can have data that is in point, line, or polygon format. You might have tables that are simple attribute information. You may also have rasters in your database. In addition, you may have database structures like views, materialized views, or foreign tables.
Spatial Data Types
Let’s look at how to deal with the spatial data types first. Their handling will lead into how the others are dealt with. Here are some common ways you may see spatial layers named in a database:
What do we see in these names? There are a couple items to consider.
Many do not indicate what data group, or schema, they are part of. When you are browsing through database software, you access the tables through the schema they are contained within. Some software does not recognize schema names in their data list. A common example of this is in Esri’s ArcGIS. When you browse the list of tables in a database in order to add a “Query Layer”, no schema is shown. This could raise an issue in the layers for “building_polygon”, “pump_station_point”, and “pump_station_polygon”. What sort of buildings or pump stations are these? A building could be city hall, a gym, a fire station, or even a pump station. Someone browsing without the benefit of schema names could be left searching for what they need. The way I handle this is to include an abbrevation of the schema name as the first 2-3 letters of a layer name. For example:
- base: ba
- cip: cip
- facilities: fa
- fleet: fl
- streets: st
- wastewater: ww
That will give you the following layer names:
Spatial Type: These layers all show the spatial type written out at the end. This adds a lot of length to the name, while only imparting a simple piece of information. Often, if you are viewing these tables in GIS software, like QGIS or ArcGIS, they autodetect the spatial type and change the icon to reflect as such. It is still important to reveal something of the spatial type in the name, as other software may not do this automatically. My preferred method is to put it at the front of the name as an abbreviation. This gives the added benefit of organizing your layers by spatial type within a particular schema. Some people would rather have related features grouped together, like this:
I prefer to have the spatial types grouped together using abbreviations for point(pt), line(li), and polygon(po). It helps me narrow down what I am looking for. So, the list above would be further modified to look like this:
Other database structures
With the abbreviations for schema and spatial type included in the layer names, you now have a framework for dealing with other types of data. A couple of other common types you will run into are:
- Attribute only tables (tb) – Anything without spatial data. Obviously, there are going to be a lot of them, but it is still useful to be able to tell what is or isn’t a simple table. In keeping with the style above, a couple of names would look like this:
- Raster data (ra) – It is possible to store raster/grid data in a database as well. This sometimes makes sense for performance purposes, or the ability to share data more widely. Example:
The next database structure is the View. This is a stored query inside the database. Since it is a stored query, it can contain any of the database structures listed above. It is important to not only differentiate the fact that this is a
View, but also show the type of data represented in the view. The examples below are simply the tables above represented as views, however a view can be much more complex.
The last database structure is another form of view called a Materialized View. This is the same as a standard view in that it is a stored query in the database. The difference is that in addition to storing the query, the first time this view is run, a snapshot of the query results are stored as well. This cached data is then returned in subsequent runs of the materialized view, until such time as you refresh the cache. The naming is thus the same format as with a standard view.
Now that we’ve spent a ton of time talking about the first 6 or 7 characters of a table name, let’s move to the actual descriptive part of the table name.
Table Names – Main portion
The first part of this article dealt with a small portion, though important, of the overall table name. Now, let’s move into what you want to include, or not, in the main portion of the table name.
If you recall, from the very beginning, database names for a long time have consisted of an alphanumeric string that may or may not be overly descriptive. This then, makes up the first set of things NOT to include in your table names. Don’t include a bunch of abbreviations or character strings that are easily read. A good rule of thumb is to write table names such that someone who has never used a database before could open it up, browse through the tables, and come out with at least a basic idea of the contents.
The names listed above are some good examples of names to use. Let’s look at the
site_amenity table in more detail. At first glance it seems pretty descriptive. Stop though, and think how that could be interpreted. What sites? What amenities? Should this be broken down into different layers, say
building_amenity? That choice should be made based on the needs of the GIS. Are there sufficiently different attributes for each type that would make it worth separating them? On the other hand, are there enough amenities, that putting them all into one table for every location might get unwieldy and hard to maintain? This is a topic for a different discussion. Suffice it to say that the data in the table may need to split up, with more specific names for the new tables.
This brings up a way in which even the main portion of the table name may become a tool for further organization. Take the detailed table names above, and reorder them as follows:
This same theory can be applied to any number of different terms for grouping. One that works really well for simple attribute tables is if you are creating tables that build links between different tables. For example:
This would effectivly group all of these linking tables together. The result is that you can easily find for which layers you are tracking many-to-many relationships.
Sometimes when I have a thought about something, it ends up being a lot more involved to explain than others. I think this is one of those times. Talking about table names in a database is pretty abstract until you show some examples. Even then, the organizational structure that you use for your organization may be (will be) different than what is appropriate for another organization. However you choose to organize them, the key is to use words or defined abbreviations as much as possible in your naming.
What naming conventions do you prefer for your database? Let me know in the comments below.