Author Archive

The Case for (and against) Data Densification

March 28, 2019
Kevin Danaher

GIS data comes in all shapes and sizes, and we understand when we utilize datasets that they can be far from perfect. The following goes over some thoughts on data densification, and what all of us might keep in mind as we create and share geographic data.

Overview

When I say “data densification,” at the most basic level I’m referring to how many vertices or nodes make up the features in a dataset. When digitizing or creating polyline and polygon data, these vertices are what would be created after each mouse click, and ultimately define the shape of the features you’re creating.

Vertices of the Brazil–Suriname boundary, as mapped in our Sovereign Limits database. Displayed at 1:63,360, the dataset is quite detailed such that we can’t make make out individual vertices at this scale.

Most datasets are created for an intended mapping scale or purpose, so there is usually an inherent reason why GIS data is either super detailed or more generalized in nature. The degree of densification (or generalization) and number of vertices doesn’t have anything to do with whether a dataset is “bad” or “good,” but it does have practical implications that we should be mindful of.

For comparison’s sake, the natural earth 10m scale boundary is shown at the same scale. This dataset is intended for mapping at 1:10 million scale or smaller. So, it looks exactly as it should!

As data is processed, converted between formats, or edited in different software, changes can be introduced to the vertices and overall topology of a dataset. For example, when we used to create lines and polygons using the pen tool in Illustrator and then export them to shapefiles using the MAPublisher plugin, the data came back super densified. Even the seemingly simplest data exported was cumbersome to work with in a GIS environment.

Another example comes to mind when creating or editing lines or polygons in ArcMap using the Bezier or other tools for digitizing curves. When this data is exported, the curves we create are preserved by adding (lots of ) vertices. As far as the software is concerned it’s doing the right thing, but it’s creating a dataset that might be unnecessarily bloated.

So with all of the above in mind, here are some reasons for densification:

  • Preservation of data across map projections
  • Ensuring distance and area calculations are accurate
  • Preserving construction of geodesics/loxodromes

And the big pitfall of overly/unnecessarily densified data: It’s bigger in file size and cumbersome to deal with for rendering, geoprocessing and analysis.

A “Real World” Example

Let’s say I’m doing GIS work for a group defining and managing Marine Protected Areas. There’s been a massive new swath of protected area defined in the Indian Ocean, so I need to create data for it and add it to an existing dataset, which will be shared and utilized by thousands of GIS professionals and cartographers in their own mapping and analysis work.

I create a new feature class, define it WGS84 coordinate system, and start creating the feature on a Mercator projection.

Voila! I’ve defined this massive new protected area and it looks just as it should based on the coordinates I’ve been provided. My supervisor has also given my computer screen a glance and signed off on it. My work is done and I’ll merge this feature into the larger dataset of Marine Protected Areas I manage, then publish it online.

A major news outlet has written a piece on the new Marine Protected Area! Their graphics team and cartographers have even pulled the dataset I’ve just updated and created the map shown below. Being creative cartographers, they’re not using a Mercator projection, but instead a Lambert Azimuthal Equal Area (LAEA) centered on the new protected area.

But as I examine the map more closely, I shudder. Something isn’t right.

Can you tell what is wrong?

The protected area is not portrayed correctly on their map. The reason why is quite simple: My data was not preserved as intended when projected to the LAEA projection. And the reason for this can be seen in the first image of this example: I created the polygon using only four vertices. The way the protected area should look is overlaid below.

To our mapping software, rendering features is dependent on the vertices that make them up. The defined coordinate system and projection of the data is obviously essential as well (remember to apply coordinate system transformations!), but regardless of which map projection is used the features are going to be drawn based on their vertices.

Back to my example. I’ve gone back into the dataset and applied some edits to the data for this protected area. With the vertices shown below added, the data looks just as it should across map projections.

Conclusion

Densify your data when there is a need for it. The need may be to take accurate measurements, preserve detail at large scale, or (as exemplified above) to ensure data holds up across map projections. If there isn’t a reason for densification, don’t do it!

Visualizing Baltimore City’s Trees

October 18, 2018
Kevin Danaher

Some time ago, I was browsing my Baltimore neighborhood’s Nextdoor forum and came across a link someone shared to a web map of Baltimore’s Tree Inventory. This was of interest to me not only because I love trees, maps and data, but also because I have joined my neighbors in planting a number of trees throughout our community.

Planting a new tree with members of the community on 23rd street, after “making room” in the sidewalk’s concrete.

The Baltimore Tree Inventory

Digging into the web map, there is a lot of data represented here! Map layers include trees, stumps, as well as locations suitable and not suitable for future planting. Each of the tree features contains a number of attributes including species, common name, height, DBH (diameter at breast height), and much more. All of this data has been painstakingly collected in the field for the entirety of Baltimore, which is no menial task.

The Baltimore Tree Inventory web map

Built as an ArcGIS Online Web App, the map has useful tools/widgets for interacting with the data, my favorite being the “Infographic” which charts the species breakdown in the current map extent. There are other standard but useful widgets as well, including the ability to change the basemap, take measurements, and toggle layers, just to name a few.

Using the ‘Infographic’ widget to chart the distribution of tree species

Visualizing Tree Data in a Different Way

The Baltimore Tree Inventory map is an amazing resource for visualizing tree data and especially to explore the attributes within. After spending some time with the map and data, I had a couple thoughts. First, that the web map performance and rendering is on the slow side dealing with the sheer size and number of points in these map layers. And additionally, that it’s a little hard to get a sense of the density and distribution of these trees with the map’s current symbology. The map seems to be more effective at larger scales (zoomed in to neighborhood/street level) than at smaller scales (visualizing trees across the entire city or spanning multiple neighborhoods) for interacting with the data. This gave me the idea to visualize these points in a different way – as a heatmap.

The Baltimore Tree Inventory data in heatmap form. Click the image to explore.

Making the Map

To create the map, I first gathered the data. I wanted to keep the map as simple as possible, so I’ve made use of only three layers: buildings, water, and the subject data – trees. I also wanted to address any performance issue by using vector tiles, which can easily handle the number of points and polygons we’re dealing with here. I reprojected everything, converted to geojson, and ran through a wonderful command line tool Mapbox has created and maintains called Tippecanoe. Put simply, the purpose of Tippecanoe is to create vector tilesets out of large or complicated datasets that are optimized at a per-zoom-level basis. For example, my building dataset contains over 260,000 polygons and a number of fields. Obviously I’d like to see every individual building at my highest zoom level, but it’s okay for building features to be simplified or dropped entirely at lower zooms. I also dropped all the fields/attributes from the building and water data (preserving only the geometry) which helped keep the size of my tilesets down. In order to create an effective heatmap, I obviously needed to keep all my tree points, so for creating the tree tiles I preserved as much detail in the point data as possible. The lines I fed to Tippecanoe are below:

‘tippecanoe –minimum-zoom=10 –maximum-zoom=17 -o BaltimoreTrees.mbtiles -B 10 –layer=Trees -f –attribution=”TreeBaltimore” Trees.geojson’

‘tippecanoe –minimum-zoom=10 –maximum-zoom=17 -o BaltimoreBuildings.mbtiles –drop-densest-as-needed –extend-zooms-if-still-dropping –layer=Buildings –exclude-all -f -D10 Buildings.geojson’

‘tippecanoe –minimum-zoom=10 –maximum-zoom=17 -o BaltimoreWater.mbtiles –drop-densest-as-needed –extend-zooms-if-still-dropping –layer=Water –exclude-all -f -D11 Water.geojson’

’tile-join -o BaltimoreTreeBlog.mbtiles -n “Baltimore Tree Blog Data” -A “Data @ TreeBaltimore, City of Baltimore” -f  BaltimoreTrees.mbtiles BaltimoreWater.mbtiles BaltimoreBuildings.mbtiles’

The final line above uses tile-join (bundled with Tippecanoe) to merge the tilesets into a single mbtiles file, which I uploaded to our Mapbox account to handle the hosting and serving of vector tiles. The total size of the tiles (created for zooms 10 through 17) is about 30 mb. With more time spent fiddling with Tippecanoe’s many options, I might have decreased the size of the tileset while keeping it usable for my map, but I was happy with the appearance of these tiles across zoom levels.

An oblique view of the map, looking Northeast over Druid Hill Park.

Finally, I began some coding in the Mapbox GL JS API. After pulling my tileset and setting up the basics, I was finding the trickiest thing to get right was the colors for the heatmap data. I started with a color gradient which showed red for the densest data, as is so commonly seen in heatmaps. But this just felt wrong. We’re dealing with trees here, so the color palette should reflect a greener theme. I consulted with Tim, our graphic designer and 3d specialist, whose design decisions I would trust with my life. We toyed with a lot of ideas before landing on the fairly simple range of greens seen in the current interactive. In a heatmap, the key to getting the color right seems to be thinking in how intensities should be represented. In the case of this heatmap, I started wrongfully thinking I wanted areas with more trees to be greener, but the reality is I simply want those areas to pop. So denser areas take on a lighter, near-yellow green which really grabs a users attention, while less dense areas take on the a base green color.

Above zoom 17, it is no longer effective to keep the “heat” in the map. I switched to symbolizing the trees as circles, and included a popup on hover which provides the tree species.

Above zoom 17, trees are represented by circles, and include some interaction.

Final Thoughts, Future Improvements

Viewed in heatmap form, the data tells a somewhat different story than what we see in the Baltimore Tree Inventory map where trees are individually symbolized. In the heatmap, we see some parts of the city stand out much more than others as far as tree density. Something to bear in mind is that this data doesn’t necessarily represent all trees in Baltimore, but does show those that the city, TreeBaltimore, Recreation & Parks and other organizations have inventoried and maintain in a database. Visualizing the tree inventory data in this manner might show tree planting organizations which neighborhoods to put focus on for future planting efforts, or show where additional field collection might be needed to improve the dataset.

My original intent for this map was to allow user filtering by species of tree. However, the data contains over 120,000 tree points and over 300 unique tree species, so to start I thought it best to just stick with mapping… trees. Another thought was to exclude tree data from the heatmap that fell within city parks, and to symbolize the parks on the map in a (green) symbology that would represent an obvious haven for trees. Doing this would put the map’s emphasis on urban trees, or those along the sidewalks of our neighborhoods. But, the park data in Baltimore’s open data portal hasn’t been accessible lately, so I gave up on that for now.

Even since I pulled a snapshot of the dataset several weeks ago, I see that the tree feature layer has been updated – rendering my heatmap outdated! As a resident of Baltimore City, I am extremely pleased that this data is maintained, let alone that it exists. Now, when planting new trees in Baltimore, we can all look forward to the addition of points to this dataset and map.

Data from TreeBaltimore and the City of Baltimore. Thank you to all organizations who contribute to the greening of Baltimore.

Twitter Facebook Linked In YouTube
Subscribe to our newsletter! SUBSCRIBE