Making Sense of the Census: Polygons
The U.S. Census Bureau provides a very intricate and sometimes unstructured hierarchy of geospatial geometries that link back to the U.S. demographic data. This post dives deeper into the geometries or polygons, as a follow up up to our previous post about the shapes of neighborhoods.
Shapefile Polygons
A shapefile is a digital vector storage format for storing geometric location and associated attribute information. The shapefile format can spatially describe vector features like points, lines, and polygons. Shapefiles together with data attributes can create infinitely many representations about geographic data for powerful and accurate computations.
Tiger Shapefiles
The Census Bureau provides the TIGER [Topologically Integrated Geographic Encoding and Referencing] cartographic boundary files as simplified representations of US geographic areas.
Unpacking each shapefile from the U.S. Census Bureau brings incredible amounts of useful data. From city blocks to entire states, there are several levels to explore inside the U.S. Within each level, the Census tracks several more sub-levels. All in all, the Census defines these sub-levels as 136 unique legal/statistical area description (LSAD) codes.
Ragged Hierarchies
According to the Census, the "LSAD codes describe the particular typology for each geographic entity." They help delineate an order of significance to the polygon database such that each LSAD code mostly corresponds to a single broad level (i.e. states, counties, zip codes).
Nevertheless, sometimes a particular LSAD code can span several levels, making it tough to determine where a given polygon lies within a spatial hierarchy.
Scale Variance
The visualization below aims to show the relative sizes of some of these polygons.
In the plot, each bar runs from the minimum area to the maximum area of polygons labeled with the 2-digit LSAD code shown in color on the left. The x-axis is a log-scale, where each step up the scale represents an area that is 1 order of magnitude larger than the previous step. At a glance, we can see that the polygons generally display a hierarchy of the following levels (from largest to smallest):
- States
- Counties
- Core-Based Statistical Areas (CBSAs)
- Zip Code Tabulation Areas (ZCTAs)
- Places
- Quattroshapes ~ Zetashapes ~ Zillow
Still, there is significant variance in each group. A Census Designated Place (CDP) can be as big as a state (a level including all 50 states and all U.S. Territories) but as small as a village.
Ultimately, the sub-level differences stem not only from a polygon's area, but also from its legal/statistical status. The level of Census Place, for instance, acts as a catch all for the strange and varied, but notable areas in the U.S. On the other hand, Census States are clearly defined legal units and Census CBSAs are clearly defined statistical units.
Quattro - Zeta - Zillow
In the visualization, we also inserted neighborhood polygons from Zillow, Quattroshapes and Zetashapes. It's interesting to note the hyper-localized focus of Zillow. The median Zillow neighborhood runs 10-100 times smaller than its Quattroshapes and Zetashapes median counterparts. This most likely comes from Zillow's sharper focus on high-density population areas.
As you go forward working with neighborhood geometries and U.S. polygons, we hope this guide opens a window into the initially opaque structure of Census definitions. Good luck mapping!