How do we store and query large spatial datasets? Here, we will build our understanding of the mathematical foundations of the Bing Tile System. To store large spatial datasets, we can divide them into usable sections (“tiles”) and assign them a unique identifier. One of the many systems for referencing these tiles is the Bing Tile System, a clever way to calculate identifiers for tiles of spatial data.
You may have used a web mapping application (like Bing Maps) which presents a map containing business locations, labels, and a base map. The base map is a collection of images, typically satellite imagery, and as you zoom in on the base map, it becomes more and more detailed. Obviously, your phone cannot store a high resolution image of the entire globe, so how is it able to provide exactly the high resolution imagery that you need?
To do this, Bing Maps divides the imagery into manageable sections, called tiles. Because imagery can have different "levels" of resolution, the Bing Maps Tile System also uniquely identifies tiles in a hierarchy from the most to the least detailed. To do this, Bing Maps constructs an identifier, called a “quadkey.” Using quadkeys, Bing Maps can render standard base maps for everybody who needs them and send the same map tiles to any client.
1. Gridded spatial data
A simple approach to dividing spatial data into tiles is to divide a square area into a grid of 4 tiles. This is exactly the method implemented in the Bing Tile System. To add another “level” of higher resolution data to our grid, we can divide each of the original grid tiles into another 4 tiles. Now, Level 1 has 4 tiles and Level 2 has 16 tiles. We repeat this division as many times as we want, increasing the number of tiles with each new “level.” The number of tiles in a given level will be:
Where n is the number of levels we have divided by. The Bing Tile System has 23 levels.
Dividing data into hierarchical levels of tiles is good (and reduces the size of individual tiles), but the Bing Tile System implements a clever way of identifying each tile in the hierarchy using a “quadkey.”
Quadkeys can look a bit gnarly:
In fact, they are rather simple to use and they are a very efficient way to traverse our tiled dataset. By referencing each tile with a quadkey, we reduce the amount of memory needed to store indices for specified sections of data and increase the efficiency of database indexing for tiles.
2. Creating quadkeys
How do we generate a quadkey for any section of data in our hierarchy? To do this, we use different numeral systems (binary and quaternary). If you are not familiar with these systems you can read an introduction to binary here. Understanding quaternary is just like understanding binary, but quaternary uses four numerals.
So, let's create a quadkey.
In the simplest example, imagine our grid with four sections and assign an integer to each section (Top Left: 0, Top Right: 1, Bottom Left: 2, Bottom Right: 3). In this case, the quadkey of any tile would simply be the integer that refers to its position. For the Bottom Right tile, the quadkey would be:
For another level of nesting, we would repeat this system, and add a new integer to the quadkey. For example, at the next level down (where we have 16 cells), the quadkey:
would refer to the Bottom Right tile at level 2 within the Bottom Right tile at level 1.
This is a simple way to think about quadkeys, but it is not precisely how quadkeys are constructed. In fact, we use the x, y position of tiles in a given level to create the quadkey. With the same tile as the above example (the Bottom Right tile at level 2), we can construct a quadkey using the x and y coordinates of the tile’s location in it’s level. Level 2 has 42 tiles (and we begin counting at 0) so this tile would have the position:
(x = 3, y = 3)
To construct the quadkey, we convert the x and y positions to binary and “interleave” the bits (starting with the Y position) to produce a key:
X = 11 Y = 11 Key = 1111
We then convert this key (in binary) to quaternary:
Quadkey = 33
This gives us our quadkey.
This is a relatively simple example (it always produces 3’s). But consider, for example, the tile with the position: (x = 228, y = 216). In this case, we would produce our quadkey with:
X = 11100100 Y = 11011000 Key = 1111011010010000 Quadkey = 33122100
Done! Note that the quadkey is 8 integers long. The smallest map level that would support this index is 256 * 256 tiles, or 48 tiles (level 8). This is one of the nice things about quadkeys: the length of a quadkey equals a quadkey's level of detail.
Note: Quadkeys always have the length of the level of detail they refer to. The tile at level 1 with position (0, 0) has the quadkey: 0. The tile at level 8 with position (0, 0) has the quadkey: 00000000.
3. The basics of a map projection
The idea of gridded, hierarchical data is general, but how does it actually relate to spatial data? If we use a square map projection to represent the earth, we can divide that square projection into a hierarchy of tiles. To create a square map, we transfer (“project”) the shape of the earth from a globe to a flat surface. For the Bing Tile System, we use the “Mercator Projection” which projects the globe onto a cylinder (then unrolls the cylinder to make a flat map). The Mercator projection is famous for distorting areas; places near the poles look much larger than they actually are. But the projection has some useful properties for creating our tile system:
(1) It is “cylindrical” meaning that North, South, East and West correspond to Up, Down, Right, and Left on the map. This is helpful for navigation.
(2) It is “conformal” meaning that it doesn’t distort the shape of things on the map, just their area. This is important for making street corners and buildings look square.
To make the map projection square, we reduce the latitude of the map between +- 85.05 degrees.
4. Using gridded spatial data
The primary use of gridded spatial data, as we mentioned above, is to serve imagery for mobile mapping applications. Imagery requires a relatively large amount of memory to store, so we would like to reduce the amount of imagery we need to send to individual map users. We can also reduce the memory needed to store other spatial data by referencing it to quadkeys. For example, if I use a GPS signal to record my location on earth, I could report my quadkey (at some resolution level) instead of my actual latitude and longitude coordinates. This makes it easy to combine and aggregate my location with the location of others.
Quadkeys have two other nice properties that we have not yet mentioned:
(1) Quadkeys record a tile’s lineage in the grid hierarchy.
(2) Quadkeys that are close to each other in X Y space are also numerically close together.
(1) For the first property: quadkeys easily identify the parents of a given tile. For example, a tile with quadkey 1320 has the parent 132 which has the parent 13. To move up a level, we simply need to drop an integer off of the quadkey.
(2) To show the second property, we can use the example of three quadkeys at level 8:
A: XY = (255, 255), Quadkey = 33333333 B: XY = (250, 250), Quadkey = 33333030 C: XY = (100, 100), Quadkey = 03300300
We convert the quadkeys to decimal and calculating their differences:
A - B = 65535 - 65484 = 51 A - C = 65535 - 15408 = 50127
As you can see, A and B are numerically much closer than A and C (the same goes for B and A compared to B and C). This is very helpful for optimising the storage and querying of quadkeys in a database. For example, if we know we are looking for the neighbors of a given quadkey, we can focus on a nearby numerical range when we design our query.
Hopefully you have learned a bit about the Bing Tile System, and gained a general intuition about gridded spatial datasets. The Bing Tile System is a very clever solution to a data management problem, but it is not the only gridded tile system out there. For other implementations, check out Google’s S2 geometry or Uber’s H3 system (this one uses hexagons). Readers are encouraged to try a few of the calculations on their own to build their intuition about the mathematical foundation of quadkeys and their numerical relationships to one another.
All images by the author.