Aggregations

Introduction

“Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.”

RESTHeart manages aggregation operations: both aggregation pipelines and map reduce functions are supported.

In both cases only inline output type is supported, i.e. no result is directly written to the DB server unless the Materialized Views is used.

The aggrs collection metadata

In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.

Use the collection metadata aggrs to define aggregations. aggrs is an array of pipeline or mapReduce objects:

GET /coll/_meta HTTP/1.1


{
    "aggrs": [
        { <aggregation_1> },
        { <aggregation_2> },
        ...,
        { <aggregation_n> }
    ]
}

Aggregation pipeline

{
    "type":"pipeline",
    "uri": "<uri>",
    "stages": [
        "<stage_1>",
        "<stage_2>",
        "..."
    ],
    "allowDiskUse": true
}
Property Description Mandatory
type for aggregation pipeline operations is "pipeline" yes
uri specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs yes
stages

the MongoDB aggregation pipeline stages.

For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/

yes

To store stages with operators and using the dot notation, RESTHeart automatically escapes the properties keys because of MongoDB’s Restrictions on Field Names:

  • the $ prefix is “underscore escaped”, e.g. $exists is stored as _$exists
  • dots are escaped as :: e.g. SD.prop is stored as SD::prop

Map-Reduce

{
    "type": "mapReduce",
    "uri": "<uri>",
    "map": "<map_function>",
    "reduce": "<reduce_function>",
    "query": "<query>"
}
Property Description Mandatory
type for aggregation pipeline operations is "mapReduce" yes
uri specifies the URI when the operation is bound under /<db>/<collection>/_aggrs path. yes
map

the map function

For more information refer to https://docs.mongodb.org/manual/core/map-reduce/

yes
reduce the reduce function yes
query the filter query no

Examples

The following requests upsert a collection defining two aggregation operations:

  • aggregation operation test_ap bound at /coll/_aggrs/example-pipeline
  • map reduce operation test_mr bound at /coll/_aggrs/example-mapreduce
PUT /coll HTTP/1.1

{ "aggrs" : [ 
      { "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },
            { "$group" : { "_id" : "$name",
                  "avg_age" : { "$avg" : "$age" }
                } }
          ],
        "type" : "pipeline",
        "uri" : "example-pipeline"
      },
      { "map" : "function() { emit(this.name, this.age) }",
        "query" : { "name" : { "$var" : "n" } },
        "reduce" : "function(key, values) { return Array.avg(values) }",
        "type" : "mapReduce",
        "uri" : "example-mapreduce"
      }
    ] }

Materialized Views

The $merge stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out stage).

It’s as simple as adding $merge as the last stage of the pipeline.

The following example defines the aggregation /coll/_aggrs/age-by-gender that computes average ages grouping data by gender. $merge is used as the last stage of the pipeline to write computed data to the avgAgeByGender collection.

Materialized Views are available from MongoDB 4.2.

PUT /coll HTTP/1.1

{ "aggrs" : [
    { "stages" : [
        { "$group" : { "_id" : "$gender",  "avg_age" : { "$avg" : "$age" } } },
        { "$merge": { "into": "avgAgeByGender" } }
        ],
            "type" : "pipeline",
            "uri" : "age-by-gender"
        }
    ]
}

Executing the aggregation request returns no data, but thanks to the $merge stage, the new collection avgAgeByGender gets created.

GET /coll/_aggrs/avg-by-city HTTP/1.1

HTTP/1.1 200 OK
[]
GET /avgAgeByGender HTTP/1.1

HTTP/1.1 200 OK
[
    { "_id": "male", "avg_age": 34.5 }
    { "_id": "female", "avg_age": 35.6 }
]

Passing variables to aggregations

The query parameter avars allows passing variables to the aggregations.

The value of a variable can be any valid JSON. The following query parameter passes two variables, a number and an object: ?avars={ "number": 1, "object": {"a": {"json": "object" }} }

For example, the previous example aggregations both use a variable named “n”. *If the variable is not passed via the avars qparam, the request fails.

GET /coll/_aggrs/example-pipeline HTTP/1.1

HTTP/1.1 400 Bad Request
...
{
    "http status code": 400,
    "http status description": "Bad Request",
    "message": "error executing aggreation pipeline: variable n not bound"
}

Passing the variable n, the request succeeds:

GET /coll/_aggrs/example-pipeline?avars={"n":1} HTTP/1.1

HTTP/1.1 200 OK
...

Variables in stages or query

Variables can be used in aggregation pipeline stages and map reduce query as follows:

{ "$var": "<var_name>" }

In case of map reduce operation previous example, the variable was used to filter the documents to have the name property matching the variable n:

{
  "query": { "name": { "$var": "n" } },
  ...
}

Variables in map reduce functions

Variables are passed also to map and reduce javascript functions where the variable $vars can be used. For instance:

PATCH /coll HTTP/1.1

{ "aggrs" : [
     {  "map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
        "reduce" : "function(key, values) { return Array.avg(values) } }",
        "type" : "mapReduce",
        "uri" : "example-mapreduce"
      }
] }

HTTP/1.1 200 Ok
...

Note the map function; JSON.parse($vars) allows to access the variables passed with the query parameter avars

function() {
 var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
 if (this.age > minage ) { emit(this.name, this.age); }
};

Handling paging in aggregations

Starting from RESTHeart v4.1.8 paging must be handled explicitly by the aggregation (until this version paging was handled automatically). This allows more flexibility and better performances.

Starting from RESTHeart v4.2.0 the following aggregation variables can be used to allow handling paging in the aggregation via default page and pagesize query parameters:

  • @page the value of the page query parameter
  • @pagesize the value of the pagesize query parameter
  • @skip to be used in $skip stage, equals to (page-1)*pagesize
  • @limit to be used in $limit stage, equals to the value of the pagesize query parameter

For example, the following defines the aggregation /aggrs/paging that uses the @skip and @limit variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25 skips 50 documents, returning the following 25 documents.

{
    "aggrs": [
        {
            "uri": "paging",
            "type": "pipeline",
            "stages": [
                {
                    "$skip": {
                        "$var": "@skip"
                    }
                },
                {
                    "$limit": {
                        "$var": "@limit"
                    }
                }
            ]
        }
    ]
}

Security Informations

By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.

This behavior is required to protect data from undesirable malicious query injection.

Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml configuration file.

### Security

# Check if aggregation variables use operators. allowing operators in aggregation variables
# is risky. requester can inject operators modifying the query

aggregation-check-operators: true