Edit Page

Caching

Introduction

RESTHeart speedups the execution of GET requests to collections, i.e. GET /coll, via caching.

This applies when several documents need to be read from a collection and can moderate the effects of the MongoDB cursor.skip() method that slows downs linearly.

RESTHeart allows to Read Documents via GET requests on collections.

GET /test/coll?count&page=3&pagesize=10 HTTP/1.1

HTTP/1.1 200 OK

[
    { <DOC30> }, { <DOC31> }, ... { <DOC39> }
]

Documents are returned as paginated result sets, i.e. each request returns a limited number of documents.

Pagination is controlled via the following query parameters:

  • page: the range of documents to return

  • pagesize: the number of documents to return (default value is 100, maximum is 1000).

Behind the scene, this is implemented via the MongoDB cursor.skip() method;

The issue is that MongoDB queries with a large "skip" slow down linearly. As the MongoDB manual says:

The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result. As offset (e.g. pageNumber above) increases, cursor.skip() will become slower and more CPU intensive. With larger collections, cursor.skip() may become IO bound.

That is why the MongoDB documentation section about skips suggests:

Range queries can use indexes to avoid scanning unwanted documents, typically yielding better performance as the offset grows compared to using skip() for pagination.

How it works

If the request contains the query parameter ?cache, than the query cursor used to retrieve the pagesize documents is cached. This allows to cache up to batchSize documents.

A subsequent request, on the same collection, and with the very same filter, sort, hint, etc. parameters can reuse the cached cursor, avoiding the need to query MongoDB, and speeding up the request.

Note
as with any caching system, the performance gain comes at a price. The cached documents pages can be obsolete, since cached documents could have been modified or deleted and new documents could have been created.

Configuration

The following configuration applies on caching

mongo:
    # get collection cache speedups GET /coll?cache requests
    get-collection-cache-size: 100
    get-collection-cache-ttl: 10_000 # Time To Live, default 10 seconds
    get-collection-cache-docs: 1000 # number of documents to cache for each request
parameter description default value

get-collection-cache-size

the maximum number of cached cursors

100

get-collection-cache-ttl

the time in milliseconds that an cursor is cached before it’s deleted

10000 (10 seconds)

get-collection-cache-docs

the number of documents to cache per each query (must be >= /mongo/max-pagesize)

1000

Results

The issue 442 on github, has the following performance results on a collection with 6+ millions documents.

Without caching (total time = ~12 seconds):

without caching

With caching (total time = ~2 seconds):

with caching

Cache consistency with transactions

To make sure that requests using caching return consistent data, transactions can be used, since the isolation property of transactions guarantees consistency.

create a session:

POST /_sessions HTTP/1.1

...
Location: http://localhost:8009/_sessions/11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0

start a transactions

POST /_sessions/11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0/_txns HTTP/1.1

HTTP/1.1 201 Created
Location: http://localhost:8009/_sessions/11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0/_txns/1

get data in the transaction with caching

GET /coll?sid=11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0&txn=1&cache&page=3&pagesize=10 HTTP/1.1

HTTP/1.1 200 OK

[
    { <DOC30> }, { <DOC31> }, ... { <DOC39> }
]
GET /coll?sid=11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0&txn=1&cache&page=4&pagesize=10 HTTP/1.1

HTTP/1.1 200 OK

[
    { <DOC40> }, { <DOC41> }, ... { <DOC49> }
]

abort the transaction (since no data has been modified)

DELETE /_sessions/11c3ceb6-7b97-4f34-ba3f-689ea22ce6e0/_txns/1 HTTP/1.1

HTTP/1.1 204 No Content