Author of Mapbox's Vector Tile specification here and also contributor to some of the code that is used by PostGIS and I wanted to add some additional clarity on some topics associated with Vector Tiles and dynamic serving of them that seems to be a new trend.
The Vector Tiles specification was designed for map visualization but has expanded into other uses as well, but in general the purpose is to be able to quickly provide a complete subset of data for a specific area that is highly cacheable. Most of this provided speed and cache-ability is specifically gained by preprocessing all the data you will use in your map into tiles.
The general steps for turning raw data into Vector Tiles are:
1. Determine a hierarchy of your data. For example if you are talking about roads at some zoom levels you will want to see only highways or major roads while at other zoom levels you will want all your data.
2. For each tile at each zoom level; Select your data following your hierarchy rules, simplify your data based on your zoom level (for example you might need less points to display your road) and then clip your data to your tile and encode it to your Vector Tile.
The problem is that doing these steps is often very complex and requires thought about the cartography of your final resulting map, but it can also drastically effect performance. If you are dynamically serving tiles from PostGIS it is very hard to reduce large quantities of data quickly in some cases. For example take a very detailed coastline of a large lake that is very precise and you are wanting to serve this dynamically. If you are attempting to serve this data on demand each time you need a tile you have to simplify and clip a potentially massive polygon. While this might work for single requests, if you increase in scale this quickly adds lots of load to a PostGIS server. The only solution is to cache the resulting tiles for a longer period to limit load on your database or to preprocess all your data before serving.
Preprocessing of all the tiles is already something other tiling tools such as tippecanoe are really good at doing and comes with the benefit of helping you determine a hierarchy for your data. Preprocessing might seem excessive when it comes to making potentially millions of tiles, but in general it makes your application faster because it is simply serving an already created tile.
Therefore, if your data does not very change quickly I would almost always suggest using preprocessing over dynamic rendering of tiles. You might spend more effort maintaining something than you expect if you start using PostGIS to create tiles on demand over existing tiling tools.
Very good comment and thanks for your work on MVT. I use PostGIS's MVT tools on a daily basis.
I do an intermediate approach: my queries are sometimes too expensive to run dynamically, and my data change semi-frequently (daily/weekly basis), but when they do change I have a clear idea of what tiles are affected. So any time my data needs updating I can mark tiles as stale and then I have a sidekiq job that processes them and uploads them to S3. The tile server itself pulls from S3.
This is probably not quite as fast as a dedicated tile server, but it's far more reliable/responsive than dynamic rendering and reduces load spikes on the database.
So I saw this post earlier to day and tried it on a dataset we have (fixed boundaries w/ some properties that change 4x/hr). We use the value of the properties for styling of the vector tiles. Currently the tiles are re-rendered every 4hrs (even though the data is updated every 15 min) using tippiecanoe, served by tileserver-gl and cached in cloudfront. So I wanted a way to get new data to users faster. But as you have noted this dynamic process crunchy posted IS SLOW, it takes about 3 minutes to paint the world on my brand new macbook pro (about 3 seconds w/ pre-rendered). Given the country boundaries do not change very often is there a way to change just the properties that actually needed updated in the already rendered vector tiles? Our pipeline takes about 45 min to run completely to regenerate the new tiles with updated properties. Or is there a better way to present this data? We started out w/ GeoJSON directly from the DB but the size of the files were huge, the vector tiles are 30% the size of GeoJSON. We were in the MTS private beta but they didn't have the 'update' process worked out yet so it was a full refresh each time.
I haven't done this but I imagine you could put a service worker that has a fetch event listener that puts you in front of the raw tile data being cached. https://github.com/mapbox/mapbox-gl-js/issues/4326
From their you can serialize/deserialize the whole tile and map a new field (annoying), or if your clever... map your variable value fields lower in the values index array of the vt pbf. That way assuming you have a small number of unique style by values, you could get away with simply replacing a single byte representing that style value field with another value dictating a different style, for each feature in the vector tile.
That might be a little to abstract so tl:dr version put a listener in front of fetch. One byte represents the target dynamic field in each feature in the tile (if you have a small number of unique values). Replace that single byte with your desired target byte.
I wish I could learn to build my own tiles, vector or PNG. I don't really understand where the data comes from, how is data gathered and assembled.
I'm also really curious about the choices involving the zoom level, how do you decide to render things depending on the zoom level, when is data discarded, to have good detail or better performance and lighter tiles. I would really be willing to try build lighter maps so I can have my own mapping software on a desktop machine.
The data sizes and hardware requirements involved are generally pretty big. It could be interesting to see how much details one could achieve to make a "portable" map browser when limiting the data size to 2GB, 5GB or 10GB.
I would really like to ask why, on some mapping software, you can't see names of large enough cities/places/regions that are obviously there. It often makes it difficult to browse maps properly.
The data comes from places like the Census Bureau (roads, place names) and then a lot of it has to be collected by the like of OpenStreetMap/Google/Other Providers. (GIS Data is big business)
For Vector based approaches (See mapbox) these data are stored in special built databases and usually simplified geometries are served to the browser. The benefit is continuous zoom, but the pitfall is more server side computation and hence cost.
Because of the cost/compute, raster tiles (PNG, jpg, any pixel format) have been much more popular. These start the same, you collect all these data and put them in a database. The difference is the added step of rendering tiles. This one-off computation saves you work from then on. See maps.stamen.com for an example of tiles made from OSM data.
And you’re right about place names sometimes not being apparent. This is a trade off when using open data and auto generated tiles. With something like MapBox’s vector tiles, you have individual decimal level control of things like labels. And zoom level is another computational trade off. You start at 0 and define an arbitrary end. The higher the number, the computation/data increases four fold each n. O(4^n)
And as far as why the size requirements are so big, geospatial data is big. You have to record information on every point for vectors which depending on quality can be a ton. And for rasters, we’re talking trillions of pixels really. That’s why all of this is server side.
And lastly to your point about lightweight desktop software, tiles don’t really have a place in the data process. They’re only really useful for the visualization aspect. And frankly, I think we’re reaching the capacity of the technique, we just might have some headroom in server efficiency.
> And lastly to your point about lightweight desktop software, tiles don’t really have a place in the data process. They’re only really useful for the visualization aspect. And frankly, I think we’re reaching the capacity of the technique, we just might have some headroom in server efficiency.
Not totally sure what you mean on your last point...data can be feature centric (e.g. stored by feature id) or area centric (stored by area location) etc. Storing data by location is important far beyond visualisation and is abstracted in databases such as PostGIS/Postgres (a branded data structure). That said, I acknowledge that ArcGIS Pro, QGIS etc. have limited support for tiled data but of course that is changing. Safe funded much (all?) of the OGR MVT development afaik.
> The benefit is continuous zoom, but the pitfall is more server side computation and hence cost.
I haven't tested this with dynamic tiles served from PostGIS, but with static tiles served from S3 it's quite the opposite! There's an initial cost to generating tiles, but once they're generated, you can host them on S3 with zero server cost.
I think many systems are backed by a PostGIS database (an extension for Postgres) with features and their coordinates. Map zoom levels define which features should be visible as layers on the map. A rendering frontend then grabs relevant data for the layer being viewed and builds the tiles.
> I wish I could learn to build my own tiles, vector or PNG. I don't really understand where the data comes from, how is data gathered and assembled.
The tools are rapidly evolving. There's no great single entry point and the best advice I can give is pretty generic: find a small-scale thing you want to do and do research toward accomplishing it.
The post you're commenting on is about how PostGIS databases mostly do this work for vector tiles on its own now, so "to build your own tiles", you'd set up a PostGIS database and re-read this post. A year or two ago the advice would be pretty different. A year or two from now and the advice will be _completely_ different.
That said, from zero http://geojson.io is a dead straightforward way to do basic operations with GeoJSON data. You can paste in JSON and it renders on the map; you can draw on the map and it generates GeoJSON. (https://tilejson.io does the same for raster tile sets.)
Real-world data is massive and overwhelming to work with — just drawing your own fake maps in geojson.io and working with that might make some of its concepts easier to digest.
Maperitve[1] is a free and relatively straightforward app focused on taking geo data as input and outputting maps. Work with its rendering rules and you'll understand some of the challenges with rendering at different zoom levels or in different contexts.
Then this post from 2018[2] on Tippecanoe (tile and data converter), TileServer GL (tile server), and Leaflet (Javascript front end to view served tiles) covers how to round-trip a package of vector tiles to GeoJSON data and back. It's straightforward, works with a relatively small area of data, doesn't require GIS experience, and though outdated it's still relevant for understanding by practice how a data-to-tiles pipeline can work.
Raster tiles are a little difficult to recommend learning as tooling has mostly moved on from it in favor of vector tiles, which pack more information and flexibility into less data, and I honestly don't know what tools still reliably do that work — once upon a time I used TileMill but it was already abandoned by then and has been very lightly maintained since.
Re: optimization, here's another more advanced post[3] using real-world data that illustrates some of the challenges.
The end-game is to get to a point where you can open something like QGIS[4], a heavyweight tool that can do all of the above and way too much more, or Maputnik[5], a vector tile styling tool using a CSS-ish language, and not get immediately lost.
> I would really like to ask why, on some mapping software, you can't see names of large enough cities/places/regions that are obviously there.
You won't get a great answer why to that question, I'm afraid. It's dependent on and configured in whatever the front end is, generally done algorithmically, and in some cases manually edited. An art as much as a science, and as fallible as both combined. (See Justin O'Beirne's incredible reviews of Apple Map updates[6] for an example.)
No single labeling strategy will make anyone (much less everyone) happy and most end-user tools don't expose customizability.
> I wish I could learn to build my own tiles, vector or PNG. I don't really understand where the data comes from, how is data gathered and assembled.
There are many data providers out there. You might be interested in OpenMapTiles, which is a pipeline from OpenStreetMap (OSM) data.
https://github.com/openmaptiles/openmaptiles
> I'm also really curious about the choices involving the zoom level, how do you decide to render things depending on the zoom level, when is data discarded, to have good detail or better performance and lighter tiles. I would really be willing to try build lighter maps so I can have my own mapping software on a desktop machine.
Lots of different considerations - is a human going to look at the map? If so then a cartographer will determine what is going to be shown at a given scale. There are other constraints too, such as limited space to show data and also hidden constraints, such as the maximum amount of data for a region (e.g. ~500kb per tile in the case of mapbox vector tiles)
> The data sizes and hardware requirements involved are generally pretty big. It could be interesting to see how much details one could achieve to make a "portable" map browser when limiting the data size to 2GB, 5GB or 10GB.
Lots of projects out there doing impressive things there. Quadtree tiles get you so far...k-d trees might yield other useful properties. Skobbler have some pretty impressive data compression technology (~12GiB for global coverage, routable and searchable...with some limitations - skobbler.com/apps). Of course the trick is to discard all that you don't need.
> I would really like to ask why, on some mapping software, you can't see names of large enough cities/places/regions that are obviously there. It often makes it difficult to browse maps properly.
If there is limited budget then the effort to create appropriate labels is limited. Data sources can be limited / incomplete...there can be nuances between jurisdictions etc. and of course label prioritisation has been a longstanding problem. What happens when you rotate a map and the text labels collide with one another...which ones do you keep...which do you discard etc. These things are also context dependent...why not include continent names? Or region names? Or province names? What about the difference between physical and political geography? A cartographer can help ensure that the right information is available at the right time...whilst acknowledging that they have to tell little white lies in every map they make.
I have a pal who does GIS work in the oil and gas industry- I think its crazy how much influence ESRI has on that market. Would love to learn more about interaction with map data like this.
For a non-gis person this was a fun read. So thanks for the post!
I have the same sentiment on ESRI. It’s basically all I got taught in my university courses, but it’s not what I ever want to use.
It’s crazy that a privately held company holds like a third of the market share.
And personally, I don’t think their software is that good. I find their documentation to be undesirable and their solutions to be strict.
Case in point, the geodatabase (gdb) standard is purposefully meant to obfuscate the data within. No one has ever been able to explain to me why this is, and the standard has been open sourced by now.
Not to mention, the number of times I’ve had ArcMap crash without any helpful information as to why it crashed...
That said, ArcMap is the Excel of GIS. It captured market share (especially government contracts) two or three decades ago and no one has disrupted the desktop GIS platform. On the web front, however, I see companies like MapBox far outpacing anything ESRI is capable of yet.
And to anyone looking to learn GIS: Post GIS, GDAL and any scripting language will make you more powerful than most of the people I know within the field.
> And to anyone looking to learn GIS: Post GIS, GDAL and any scripting language will make you more powerful than most of the people I know within the field.
Funny this article was posted today, because yesterday I was looking into rendering a custom map for a ~100x100 km area from OpenStreetMap data for a particular application. I've got basically no experience making maps but I've dabbled with GDAL and Rasterio. I was thinking of using Mapnik with a dump of (part of) the OpenStreetMap database into a local PostGIS instance. Ideally the rendered tiles should be vector format. Do you think this approach seems reasonable or am I missing a potentially simpler way?
Static; thanks for the info. I would ideally like to dump a bunch of SVG tiles for various zoom levels so I can store them in a static directory on my server rather than serve them dynamically. I take it that Mapnik is capable of dumps like this? And, I would like to use the Python bindings but they look relatively badly documented. Would you suggest a newbie like me uses the C or XML interfaces instead, if they are better documented?
Ah, I haven’t really got into that territory with mapnik. But to the second point, yes you should just generate a bunch of Raster tiles. And before doing this, ask yourself if you really need to.
If this isn’t a huge project, Mapbox is an easy packed solution. Otherwise, there are dozens of really good tile providers already.
oblig. mention for others outside of the ESRI sphere of influence that QGIS exists, is still FOSS, and still actively developed https://qgis.org/en/site/
The Vector Tiles specification was designed for map visualization but has expanded into other uses as well, but in general the purpose is to be able to quickly provide a complete subset of data for a specific area that is highly cacheable. Most of this provided speed and cache-ability is specifically gained by preprocessing all the data you will use in your map into tiles.
The general steps for turning raw data into Vector Tiles are:
1. Determine a hierarchy of your data. For example if you are talking about roads at some zoom levels you will want to see only highways or major roads while at other zoom levels you will want all your data.
2. For each tile at each zoom level; Select your data following your hierarchy rules, simplify your data based on your zoom level (for example you might need less points to display your road) and then clip your data to your tile and encode it to your Vector Tile.
The problem is that doing these steps is often very complex and requires thought about the cartography of your final resulting map, but it can also drastically effect performance. If you are dynamically serving tiles from PostGIS it is very hard to reduce large quantities of data quickly in some cases. For example take a very detailed coastline of a large lake that is very precise and you are wanting to serve this dynamically. If you are attempting to serve this data on demand each time you need a tile you have to simplify and clip a potentially massive polygon. While this might work for single requests, if you increase in scale this quickly adds lots of load to a PostGIS server. The only solution is to cache the resulting tiles for a longer period to limit load on your database or to preprocess all your data before serving.
Preprocessing of all the tiles is already something other tiling tools such as tippecanoe are really good at doing and comes with the benefit of helping you determine a hierarchy for your data. Preprocessing might seem excessive when it comes to making potentially millions of tiles, but in general it makes your application faster because it is simply serving an already created tile.
Therefore, if your data does not very change quickly I would almost always suggest using preprocessing over dynamic rendering of tiles. You might spend more effort maintaining something than you expect if you start using PostGIS to create tiles on demand over existing tiling tools.