The Case for (and against) Data Densification
GIS data comes in all shapes and sizes, and we understand when we utilize datasets that they can be far from perfect. The following goes over some thoughts on data densification, and what all of us might keep in mind as we create and share geographic data.
Overview
When I say “data densification,” at the most basic level I’m referring to how many vertices or nodes make up the features in a dataset. When digitizing or creating polyline and polygon data, these vertices are what would be created after each mouse click, and ultimately define the shape of the features you’re creating.
Most datasets are created for an intended mapping scale or purpose, so there is usually an inherent reason why GIS data is either super detailed or more generalized in nature. The degree of densification (or generalization) and number of vertices doesn’t have anything to do with whether a dataset is “bad” or “good,” but it does have practical implications that we should be mindful of.
As data is processed, converted between formats, or edited in different software, changes can be introduced to the vertices and overall topology of a dataset. For example, when we used to create lines and polygons using the pen tool in Illustrator and then export them to shapefiles using the MAPublisher plugin, the data came back super densified. Even the seemingly simplest data exported was cumbersome to work with in a GIS environment.
Another example comes to mind when creating or editing lines or polygons in ArcMap using the Bezier or other tools for digitizing curves. When this data is exported, the curves we create are preserved by adding (lots of ) vertices. As far as the software is concerned it’s doing the right thing, but it’s creating a dataset that might be unnecessarily bloated.
So with all of the above in mind, here are some reasons for densification:
- Preservation of data across map projections
- Ensuring distance and area calculations are accurate
- Preserving construction of geodesics/loxodromes
And the big pitfall of overly/unnecessarily densified data: It’s bigger in file size and cumbersome to deal with for rendering, geoprocessing and analysis.
A “Real World” Example
Let’s say I’m doing GIS work for a group defining and managing Marine Protected Areas. There’s been a massive new swath of protected area defined in the Indian Ocean, so I need to create data for it and add it to an existing dataset, which will be shared and utilized by thousands of GIS professionals and cartographers in their own mapping and analysis work.
I create a new feature class, define it WGS84 coordinate system, and start creating the feature on a Mercator projection.
Voila! I’ve defined this massive new protected area and it looks just as it should based on the coordinates I’ve been provided. My supervisor has also given my computer screen a glance and signed off on it. My work is done and I’ll merge this feature into the larger dataset of Marine Protected Areas I manage, then publish it online.
A major news outlet has written a piece on the new Marine Protected Area! Their graphics team and cartographers have even pulled the dataset I’ve just updated and created the map shown below. Being creative cartographers, they’re not using a Mercator projection, but instead a Lambert Azimuthal Equal Area (LAEA) centered on the new protected area.
But as I examine the map more closely, I shudder. Something isn’t right.
The protected area is not portrayed correctly on their map. The reason why is quite simple: My data was not preserved as intended when projected to the LAEA projection. And the reason for this can be seen in the first image of this example: I created the polygon using only four vertices. The way the protected area should look is overlaid below.
To our mapping software, rendering features is dependent on the vertices that make them up. The defined coordinate system and projection of the data is obviously essential as well (remember to apply coordinate system transformations!), but regardless of which map projection is used the features are going to be drawn based on their vertices.
Back to my example. I’ve gone back into the dataset and applied some edits to the data for this protected area. With the vertices shown below added, the data looks just as it should across map projections.
Conclusion
Densify your data when there is a need for it. The need may be to take accurate measurements, preserve detail at large scale, or (as exemplified above) to ensure data holds up across map projections. If there isn’t a reason for densification, don’t do it!