MS RFC 22a: Feature cache for long running processes and query processing¶
- Date:
2007/06/24
- Author:
Tamas Szekeres
- Contact:
szekerest at gmail.com
- Last Edited:
2007/06/24
- Status:
Discussion Draft
- Version:
MapServer 5.0
1. Overview¶
Currently the various query operations involve multiple accesses to the data providers which may cause a significant performance impact depending on the providers. In the first phase all of the features in the given search area are retrieved and the index of the relevant shapes are stored in the result cache. In the second phase the features in the result cache are retrieved form the provider one by one. Retaining the shapes in the memory we could eliminate the need of the subsequent accesses to the providers and increase the overall performance of the query. Implementing the cache requires a transformation of the data between the data provider and the client. From this aspect it is desirable to provide a framework to implement this transformation in a higher level of abstraction.
2. Purpose¶
The main purpose of this RFC is to implement a feature cache and retain the shapes in the memory for the further retrievals during a query operation. However I would also want to create a mechanism so that a layer could use another layer as a data source. This outer layer could apply transformations on the shapes coming from the base layer or even retain the shapes in the memory for the subsequent fetches. Here are some other examples where this framework would provide a solution:
Constructing geometries based on feature attributes
Modifying the geometries or the feature attribute values
Geometry transformations (like GEOS operations)
Feature cache
Providing default layer style based on external style information, or attribute based styling
Providing custom data filters
Setting up a proper layer hierarchy one can solve pretty complex issues without the need of creating additional (eg. mapscript) code. Later on - as a live example - I'll show up the solution of the following scenario:
The user will select one or more shapes in one layer (by attribute selection in this case).
Cascaded transformations will be applied on the selected shapes (I'll use the GEOS convexhull and buffer operations)
In another layer the features will be selected based on the transformed shapes as the selection shapes.
In this proposal to ensure the better readability I'll avoid embedding much of the code inline. However most of the implementation patches are available at the tracking ticket of this RFC (see later).
3. General principles of the solution¶
This proposal involves writing additional data providers. These providers will use another layer as the source of the features. To set up this hierarchy of the layers, the CONNECTION parameter of these layers will contain the name of the source layer. In some cases the source layer doesn't participate in the renderings and should be kept internal to the original layer. Therefore we will establish the support for nesting the layers into each other. In this regard the CONNECTION parameter will contain the full path of the layer it refers to. The path contains the layer names in the hierarchy using the slash '/' as the separator between the names. We can specify the pathnames relative to the actual layer or relative to the map itself.
Specify a reference relative to the map:
LAYER
NAME "roads"
CONNECTIONTYPE OGR
CONNECTION "roads.tab"
...
END
LAYER
NAME "cache"
CONNECTIONTYPE CACHE
CONNECTION "/roads"
...
END
Specify a reference relative to the outer layer
LAYER
NAME "cache"
CONNECTIONTYPE CACHE
CONNECTION "roads"
...
LAYER
NAME "roads"
CONNECTIONTYPE OGR
CONNECTION "roads.tab"
...
END
END
The main difference between these 2 options is that in the first case the referred layer is contained by the layers collection of the map, so the layer will participate in the drawings. In the second case the referred layer is internal to the outer layer. It is supported that 2 or more layers connect to the same base layer so that the features will be served from the same cache repository. The base layer can reside in any place of the layer hierarchy.
In any case, the extension layer can also be implemented as a pluggable external layer. (CONNECTIONTYPE PLUGIN)
To achieve the desired functionality the following 3 providers will be implemented:
3.1 Feature caching provider¶
The purpose of this provider is to retain the features from the source layer in the memory so the subsequent fetches on the feature caching provider will be served from the internal cache. With the current implementation I'm planning to retain the shapes of last extent. When the subsequent shape retrieval refers to the same extent (or within that extent) the features will be served from the cache. At the moment I will not address caching features from multiple non overlapping extents and implement more sophisticated cache management (like size limit/expiration handling) but it might be the object of an enhancement in the future. All of the provider specific data will be placed in the layerinfo structure of the provider. The shapes in the cache will be stored in a hashtable. This provider will be implemented in a separate file (mapcache.c).
3.1.1 Shape retrieval options¶
This provider will be capable to populate the cache with all of the shapes in the given extent or only with the shapes in the resultcache of the source layer. The first one is the default option and the latter can be selected by the following PROCESSING directive:
PROCESSING "fetch_mode=selection"
The cache will be populated upon the WhichShapes call of the caching provider when the given rect falls outside of the previous one. We can also specify that the provider will retieve all the shapes of the current map extent regardless to the search area using the following setting:
PROCESSING "spatial_selection_mode=mapextent"
3.1.2 Items selection¶
The feature caching provider will store all of the items available regardless of which items have been selected externally (by using the WhichItems call). However the iteminfo will contain the indexes only of the requested items. The GetShape and the NextShape operations will copy only the requested subset of the items to the caller. This solution will provide the compatibility with any other existing provider so there's no need to alter the common mapserver code from this aspect.
3.1.3 Support for the STYLEITEM "AUTO" option¶
By using the STYLEITEM "AUTO" option the provider can retrieve the classObj of every shapeObj from the data source. So as to keep on supporting this option the caching provider will be capable to cache these classObj-s as well as the shapeObj-s in a separate hashtable. When the STYLEITEM "AUTO" option is set on the feature caching provider the style cache will also be populated by calling the GetAutoStyle function to the source layer. The subsequent GetAutoStyle calls on the feature caching provider will retrieve the classObj-s from the style cache and provide a copy to the caller.
3.1.4 Support for the attribute filter¶
The feature caching provider will be compatible with the existing query operations (implemented in the msQueryBy[] functions in mapquery.c). For supporting the msQueryByAttributes the NextShape will use msEvalExpression before returning the shapes to the caller.
3.2 Geometry transformation provider¶
The geometry transformation provider (implemented in mapgeomtrans.c) will support transparent access to the source layer. Every vtable operations will be dispatched to corresponding vtable operation of the source layer. In addition some of the other layer properties might require to copy between the connected layers.
3.2.1 Items selection¶
This provider will serve the same subset of the items as the source layer provides. Therefore the InitIteminfo will copy the items and the numitems of the external layer to the source layer, like:
int msGeomTransLayerInitItemInfo( layerObj *layer )
{
/* copying the item array an call the subsequent InitItemInfo*/
return msLayerSetItems(((GeomTransLayerInfo*)layer->layerinfo)->srclayer, layer->items, layer->numitems);
}
And upon the GetItems call these values will be copied back to the external layer, like:
int msGeomTransLayerGetItems(layerObj *layer)
{
/* copying the item array back */
int result;
GeomTransLayerInfo* layerinfo = (GeomTransLayerInfo*)layer->layerinfo;
result = msLayerGetItems(layerinfo->srclayer);
msLayerSetItems(layer, layerinfo->srclayer->items, layerinfo->srclayer->numitems);
return result;
}
3.2.2 Applying the transformations¶
The proposed implementation will support most of the GEOS transformations supported by mapserver (buffer, convexhull, boundary, union, intersection, difference, symdifference). The transformations will be applied right before retrieving a shape to the caller. The desired transformation can be selected using a PROCESSING option, like:
PROCESSING "transformation=buffer"
Some of the transformations will accept further parameters:
PROCESSING "buffer_width=0.2"
Some of these operations will use 2 shapes to create the transformed shape. The reference shape of these transformations can be set externally using the WKT representation of a processing option:
PROCESSING "ref_shape=[WKT of the shape]"
I'm also planning to support retrieving this shape from the features collection of the layer and from an external layer as well.
3.3 Layer filter provider¶
The layer filter provider (implemented in mapfilter.c) will provide the shape retrieved from the source layer and will not apply any transformation on that. However some of the shapes will be skipped in the NextShape operations depending on the spatial and the attribute selection options. This provider ensures transparent access to the source layer (just as the previous one) but uses a cache for storing the selection shapes. The selection shapes will be retrieved from another layer which can be specified in the following processing option:
PROCESSING "selection_layer=[path to the layer]"
When populating the selection cache I'll support to retrieve all of the shapes or only the shapes in the result cache as for the caching provider mentioned before:
PROCESSING "fetch_mode=selection"
4. Putting the things together (example)¶
In this chapter I'll describe the solution of the scenario have been mentioned earlier. I'll start with a simple map file definition with 2 layers the counties and the citypoints of the country:
MAP
NAME "Hungary"
STATUS ON
EXTENT 421543.362603 47885.103526 933973.563202 367180.479761
SIZE 800 600
IMAGETYPE PNG
IMAGECOLOR 255 255 255
PROJECTION
"proj=somerc"
"lat_0=47.14439372222222"
"lon_0=19.04857177777778"
"x_0=650000"
"y_0=200000"
"ellps=GRS67"
"units=m"
"no_defs"
END
SYMBOL
Name 'point'
Type ELLIPSE
POINTS
1 1
END
FILLED TRUE
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties"
CONNECTIONTYPE OGR
CONNECTION "Hun_Megye.TAB"
STATUS default
TYPE POLYGON
LABELITEM "name"
CLASS
TEMPLATE "query.html"
LABEL
SIZE medium
COLOR 64 128 64
END
STYLE
COLOR 208 255 208
OUTLINECOLOR 64 64 64
END
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
STATUS default
TYPE POINT
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
END
END
This map will look like this:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample.png
4.1 Adding the feature cache for the layers¶
Any of the layers might be cached by using the CACHE layer provider. The original provider might be nested inside the cache. The parameters related to the drawing should be specified for the outer layer, like:
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties_cache"
CONNECTIONTYPE CACHE
CONNECTION "Hun_Counties"
STATUS default
TYPE POLYGON
LABELITEM "name"
CLASS
TEMPLATE "query.html"
LABEL
SIZE medium
COLOR 64 128 64
END
STYLE
COLOR 208 255 208
OUTLINECOLOR 64 64 64
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_Counties"
CONNECTIONTYPE OGR
CONNECTION "Hun_Megye.TAB"
TYPE POLYGON
END
END
The Hun_Counties_cache layer is queryable and all of the existing methods are available to populate the resultcache. In my example I'll use the mapscript C# drawquery example (implemented in drawquery.cs) to execute an attribute query and use the drawquery afterwards. The following command will be used along with this sample:
drawquery sample.map "('[Name]'='Tolna')" sample.png
Which passes the ('[Name]'='Tolna') query string to the queryByAttributes of the first layer.
The result of the rendering will look like:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.png
The corresponding mapfile can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample1.map
4.2 Applying transformations on the counties¶
In the next step I'll apply a cascaded GEOS convexhull and buffer operations on the first layer. The results will be rendered in a separate layer using the following definition:
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "/Hun_Counties_cache"
PROCESSING "transformation=convexhull"
END
END
The result of the rendering will look like:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample2.map
Which is not the desired result since all of the polygons were transformed. So as to transform only the selected shape an additional cache should be specified by using the "fetch_mode=selection" processing option:
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "selectioncache"
PROCESSING "transformation=convexhull"
LAYER
NAME "selectioncache"
TYPE POLYGON
CONNECTIONTYPE CACHE
CONNECTION "/Hun_Counties_cache"
PROCESSING "fetch_mode=selection"
END
END
END
And here is the result of the drawing:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample3.map
4.3 Using the transformed shape as the selection shape¶
My final goal is to select the features of the point layer using the transformed shape as the selection shape. Therefore I'll have to use the layer filter provider on the point layer and setting the selection_layer to the transformed layer:
LAYER
TYPE POINT
CONNECTIONTYPE LAYERFILTER
CONNECTION "Hun_CityPoints"
NAME "selectedpoints"
STATUS default
PROCESSING "selection_layer=/selectionshape"
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
TYPE POINT
END
END
Here is the result:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.png
The corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4.map
Note1: Without altering the map configuration I can modify the selection externally and the rendered image will reflect the changes automatically. For example I can select 2 counties using the query string: ('[Name]'='Tolna' or '[Name]'='Baranya')
The resulting image can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample4a.png
Note2: If there's no need to render the selection layer I can specify that configuration inline:
LAYER
TYPE POINT
CONNECTIONTYPE LAYERFILTER
CONNECTION "Hun_CityPoints"
NAME "selectedpoints"
STATUS default
PROCESSING "selection_layer=selectionshape"
CLASS
STYLE
COLOR 255 0 0
SYMBOL "point"
SIZE 2
END
END
LAYER
PROJECTION
"AUTO"
END
NAME "Hun_CityPoints"
CONNECTIONTYPE OGR
CONNECTION "Hun_CityPoints.TAB"
TYPE POINT
END
LAYER
NAME "selectionshape"
CONNECTIONTYPE GEOMTRANS
CONNECTION "simplify"
STATUS default
TYPE POLYGON
TRANSPARENCY 50
CLASS
STYLE
COLOR 64 255 64
OUTLINECOLOR 64 64 64
END
END
PROCESSING "transformation=buffer"
PROCESSING "buffer_width=0.2"
LAYER
NAME "simplify"
TYPE POLYGON
CONNECTIONTYPE GEOMTRANS
CONNECTION "selectioncache"
PROCESSING "transformation=convexhull"
LAYER
NAME "selectioncache"
TYPE POLYGON
CONNECTIONTYPE CACHE
CONNECTION "/Hun_Counties_cache"
PROCESSING "fetch_mode=selection"
END
END
END
END
Here is the result:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.png
And the corresponding mapfile:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/sample5.map
And recall that because I've used a cache on the counties layer all of these renderings require to retrieve the shapes only once from the original provider. That was the primary objective of this RFC.
5. Modifying the mapserver core¶
To achieve the desired fuctionality the following major steps should be done in the mapserver core. The details of the proposed changes can be found here:
http://trac.osgeo.org/mapserver/attachment/ticket/2128/common.patch
5.1 Hashtable implementation¶
The current hashtable implementation (in maphash.c) will be generalized to be capable to store objects as well as strings. Currently the hashtable can store only string values. In addition the we will provide support for specifying the hashsize upon the construction of the hashtable. The objects will be stored in the hashtable by reference and the destroy function of the objects can also be specified externally. The following functions will be added to maphash.c
int initHashTableEx( hashTableObj *table, int hashSize );
void msClearHashItems( hashTableObj *table );
struct hashObj *msInsertHashTablePtr(hashTableObj *table, const char *key, const char *value);
struct hashObj *msFirstItemFromHashTable( hashTableObj *table);
struct hashObj *msNextItemFromHashTable( hashTableObj *table, struct hashObj *lastItem );
int msGetHashTableItemCount(hashTableObj *table);
initHashTableEx can be called to specify the hash size externally. Actually the original initHashTable will be implemented as calling initHashTableEx with MS_HASHSIZE. msClearHashItems will clear all of the elements of the hashtable but does not clear the items array. msInsertHashTablePtr provides the support for adding object by reference to the hashtable. msFirstItemFromHashTable and msNextItemFromHashTable will provide the iteration of the elements efficiently and will be called by the NextShape of the feature caching provider. msGetHashTableItemCount will retrieve the actual number of the hashtable items.
5.2 Extending the layerObj structure to support nesting the layers (map.h)¶
The sublayers in the layerObj structure will be stored as an array of layers.
typedef struct layer_obj {
...
#ifndef SWIG
struct layer_obj **layers;
int numlayers; /* number of sublayers in layer */
#endif /* SWIG */
...
} layerObj;
The parser will take care of loading the nested layers:
int loadLayer(layerObj *layer, mapObj *map)
{
...
case (LAYER):
if(layer->numlayers == 0)
layer->layers = (layerObj**) malloc(sizeof(layerObj*));
else
layer->layers = (layerObj**) realloc(layer->layers, sizeof(layerObj*) * (layer->numlayers));
layer->layers[layer->numlayers]=(layerObj*)malloc(sizeof(layerObj));
if (layer->layers[layer->numlayers] == NULL) {
msSetError(MS_MEMERR, "Malloc of a new layer failed.", "loadLayer()");
return MS_FAILURE;
}
if(loadLayer(layer->layers[layer->numlayers], map) == -1) return MS_FAILURE;
layer->layers[layer->numlayers]->index = layer->numlayers; /* save the index */
++layer->numlayers;
break;
...
}
5.3 Adding a new built in data connection types (map.h)¶
A new data connection type is established for the new providers.
enum MS_CONNECTION_TYPE {MS_INLINE, MS_SHAPEFILE, MS_TILED_SHAPEFILE, MS_SDE, MS_OGR, MS_UNUSED_1, MS_POSTGIS, MS_WMS, MS_ORACLESPATIAL, MS_WFS, MS_GRATICULE, MS_MYGIS, MS_RASTER, MS_PLUGIN, MS_GEOMTRANS, MS_LAYERFILTER, MS_CACHE };
The lexer will be modified to interpret the new connection types.
5.4 Support for destroying the persistent data of the providers (map.h, maplayer.c)¶
The providers would keep persistent data between the various Connection/Close operations on that layer so we should establish a mechanism to destroy this provider specific data by adding a new method to the layervtable:
void (*LayerDestroy)(layerObj *layer);
5.5 Vtable initialization for the new data providers (maplayer.c)¶
int
msInitializeVirtualTable(layerObj *layer)
{
if (layer->baselayer)
msInitializeVirtualTable(layer->baselayer);
...
switch(layer->connectiontype) {
...
case(MS_GEOMTRANS):
return(msGeomTransLayerInitializeVirtualTable(layer));
break;
case(MS_LAYERFILTER):
return(msFilterLayerInitializeVirtualTable(layer));
break;
case(MS_CACHE):
return(msCacheLayerInitializeVirtualTable(layer));
break;
...
}
..
}
6. Files affected¶
The following files will be affected by this RFC:
maphash.c
maphash.h
maplayer.c
maplexer.l
mapfile.c
map.h
Makefile.vc
Makefile.in
mapcache.c (new)
mapgeomtrans.c (new)
mapfilter.c (new)
7. Backwards compatibility issues¶
These changes will retain mapfile and mapscript backward compatibility.
8. Bug ID¶
The ticket for RFC-22a can be found here.
Bug 2128
9. Voting history¶
None