Tip of the Day – Use Descriptive Database Table Names

2017/11/08

Discussion

Today’s tip came to me after I was trying to link some tables into my database, from another database. What I saw in that database looked something like this:

ce410ap
sf200xp
jj310fe

Repeat this for a couple of hundred tables. This structure, I feel, is a holdover from the old days of taking a database and normalizing it to be as efficient as possible. Now, don’t get me wrong, normalizing a database for efficiency is a great thing. The issue is that it shouldn’t be automatic, and there needs to be some thought put into it.

Let’s look at what to do instead. First, another thought about the table names, as they reflect a situation that has changed with modern databases. Back in the early days of computers, software, etc., there were limitations on how long a file name or a table name could be. That required you to essentially encode information into the database table names when it could only be 8 characters long. Modern databases have gotten past that limitation, and I think it is high time we take advantage of this.

Schemas

First lets talk about grouping your data. Most databases support the creation of schemas. These allow you to divide your data up into logical groups to keep similar, or related, tables together. Some common groups might be:

Base
Wastewater
CIP (Capital Improvement Planning)
Fleet
Facilities
Streets

In fact, some groups require that your tables be inside of a schema, with Public being the default option. In addition, other schemas are created for individual users and also for extensions you have added to your database. A common one is Tiger for street routing and geocoding purposes. When you are browsing through your data, it could become somewhat confusing if your data schemas are intermixed with other database maintained schemas. Therefore, I generally add a prefix to the ones I create, that will introduce an alphabetical grouping.

A simple way to keep your schemas together, and at the top of the list, so to speak, would be to add “aa”. For example:

aabase
aacip
aafacilities
aafleet
aastreets
aawastewaater

If you want to avoid the possibility that a user name schema could possibly fall into that order, then include an underscore, ie:

aa_base
aa_cip
aa_facilities
aa_fleet
aa_streets
aa_wastewater

Name Prefixes

That covers the larger groupings of data. Now, let’s look at the table names themselves. As mentioned above, table names were restricted in length. This meant that every character was important, and generally had some specific meaning. Unfortunately, this meaning was not always clear to the uninitiated user. With this restriction having been removed, we have the opportunity to be more descriptive. At the same time, we should be logical.

When you are working with GIS data, there are a number of data types that you run across. You can have data that is in point, line, or polygon format. You might have tables that are simple attribute information. You may also have rasters in your database. In addition, you may have database structures like views, materialized views, or foreign tables.

Spatial Data Types

Let’s look at how to deal with the spatial data types first. Their handling will lead into how the others are dealt with. Here are some common ways you may see spatial layers named in a database:

building_polygon
capital_projects_polygon
city_point
cityboundary_polygon
pump_station_point
pump_station_polygon
road_centerlines_line
sewer_mains_line
site_amenity_point

What do we see in these names? There are a couple items to consider.

Many do not indicate what data group, or schema, they are part of. When you are browsing through database software, you access the tables through the schema they are contained within. Some software does not recognize schema names in their data list. A common example of this is in Esri’s ArcGIS. When you browse the list of tables in a database in order to add a “Query Layer”, no schema is shown. This could raise an issue in the layers for “building_polygon”, “pump_station_point”, and “pump_station_polygon”. What sort of buildings or pump stations are these? A building could be city hall, a gym, a fire station, or even a pump station. Someone browsing without the benefit of schema names could be left searching for what they need. The way I handle this is to include an abbrevation of the schema name as the first 2-3 letters of a layer name. For example:
- base: ba
- cip: cip
- facilities: fa
- fleet: fl
- streets: st
- wastewater: ww
That will give you the following layer names:
- ba_city_point
- ba_cityboundary_polygon
- cip_capital_projects_polygon
- fa_building_polygon
- fa_site_amenity_point
- st_road_centerlines_line
- ww_pump_station_point
- ww_pump_station_polygon
- ww_sewer_mains_line
Spatial Type: These layers all show the spatial type written out at the end. This adds a lot of length to the name, while only imparting a simple piece of information. Often, if you are viewing these tables in GIS software, like QGIS or ArcGIS, they autodetect the spatial type and change the icon to reflect as such. It is still important to reveal something of the spatial type in the name, as other software may not do this automatically. My preferred method is to put it at the front of the name as an abbreviation. This gives the added benefit of organizing your layers by spatial type within a particular schema. Some people would rather have related features grouped together, like this:
- ww_pump_station_point
- ww_pump_station_polygon
I prefer to have the spatial types grouped together using abbreviations for point(pt), line(li), and polygon(po). It helps me narrow down what I am looking for. So, the list above would be further modified to look like this:
- bapo_cityboundary
- bapt_city
- cippo_capital_projects
- fapo_building
- fapt_site_amenity
- stli_road_centerlines
- wwli_sewer_mains
- wwpo_pump_station
- wwpt_pump_station

Other database structures

With the abbreviations for schema and spatial type included in the layer names, you now have a framework for dealing with other types of data. A couple of other common types you will run into are:

Attribute only tables (tb) – Anything without spatial data. Obviously, there are going to be a lot of them, but it is still useful to be able to tell what is or isn’t a simple table. In keeping with the style above, a couple of names would look like this:
- sttb_street_attributes
- wwtb_cctv_inspection
Raster data (ra) – It is possible to store raster/grid data in a database as well. This sometimes makes sense for performance purposes, or the ability to share data more widely. Example:
- bara_aerial_2014_county

The next database structure is the View. This is a stored query inside the database. Since it is a stored query, it can contain any of the database structures listed above. It is important to not only differentiate the fact that this is a View, but also show the type of data represented in the view. The examples below are simply the tables above represented as views, however a view can be much more complex.

bavw_pt_city
favw_pt_site_amenity
stvw_li_road_centerlines
stvw_tb_street_attributes
wwvw_li_sewer_mains
wwvw_po_pump_station

The last database structure is another form of view called a Materialized View. This is the same as a standard view in that it is a stored query in the database. The difference is that in addition to storing the query, the first time this view is run, a snapshot of the query results are stored as well. This cached data is then returned in subsequent runs of the materialized view, until such time as you refresh the cache. The naming is thus the same format as with a standard view.

bamw_pt_city
famw_pt_site_amenity
stmw_li_road_centerlines
stmw_tb_street_attributes
wwmw_li_sewer_mains
wwmw_po_pump_station

Now that we’ve spent a ton of time talking about the first 6 or 7 characters of a table name, let’s move to the actual descriptive part of the table name.

Table Names – Main portion

The first part of this article dealt with a small portion, though important, of the overall table name. Now, let’s move into what you want to include, or not, in the main portion of the table name.

If you recall, from the very beginning, database names for a long time have consisted of an alphanumeric string that may or may not be overly descriptive. This then, makes up the first set of things NOT to include in your table names. Don’t include a bunch of abbreviations or character strings that are easily read. A good rule of thumb is to write table names such that someone who has never used a database before could open it up, browse through the tables, and come out with at least a basic idea of the contents.

The names listed above are some good examples of names to use. Let’s look at the site_amenity table in more detail. At first glance it seems pretty descriptive. Stop though, and think how that could be interpreted. What sites? What amenities? Should this be broken down into different layers, say park_amenity, and building_amenity? That choice should be made based on the needs of the GIS. Are there sufficiently different attributes for each type that would make it worth separating them? On the other hand, are there enough amenities, that putting them all into one table for every location might get unwieldy and hard to maintain? This is a topic for a different discussion. Suffice it to say that the data in the table may need to split up, with more specific names for the new tables.

This brings up a way in which even the main portion of the table name may become a tool for further organization. Take the detailed table names above, and reorder them as follows:

fapt_amenity_building
fapt_amenity_park

This same theory can be applied to any number of different terms for grouping. One that works really well for simple attribute tables is if you are creating tables that build links between different tables. For example:

fatb_link_building_amenity_building_poly
fatb_link_park_amenity_park_poly
fatb_link_sewer_pump_building_poly

This would effectivly group all of these linking tables together. The result is that you can easily find for which layers you are tracking many-to-many relationships.

Wrap up

Sometimes when I have a thought about something, it ends up being a lot more involved to explain than others. I think this is one of those times. Talking about table names in a database is pretty abstract until you show some examples. Even then, the organizational structure that you use for your organization may be (will be) different than what is appropriate for another organization. However you choose to organize them, the key is to use words or defined abbreviations as much as possible in your naming.

What naming conventions do you prefer for your database? Let me know in the comments below.

Data Database Tip of the Day