Aggregations

Introduction

“Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.”

RESTHeart manages aggregation operations: both aggregation pipelines and map reduce functions are supported.

In both cases only inline output type is supported, i.e. no result is directly written to the DB server unless the Materialized Views is used.

The aggrs collection metadata

In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.

Use the collection metadata aggrs to define aggregations. aggrs is an array of pipeline or mapReduce objects:

GET /coll/_meta HTTP/1.1

{
    "aggrs": [
        { <aggregation_1> },
        { <aggregation_2> },
        ...,
        { <aggregation_n> }
    ]
}

Aggregation pipeline

{
    "type":"pipeline",
    "uri": "<uri>",
    "stages": [
        "<stage_1>",
        "<stage_2>",
        "..."
    ],
    "allowDiskUse": true
}
Property Description Mandatory
type for aggregation pipeline operations is "pipeline" yes
uri specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs yes
stages

the MongoDB aggregation pipeline stages.

For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/

yes

To store stages with operators and using the dot notation, RESTHeart automatically escapes the properties keys because of MongoDB’s Restrictions on Field Names:

  • the $ prefix is “underscore escaped”, e.g. $exists is stored as _$exists
  • dots are escaped as :: e.g. SD.prop is stored as SD::prop

Map-Reduce

{
    "type": "mapReduce",
    "uri": "<uri>",
    "map": "<map_function>",
    "reduce": "<reduce_function>",
    "query": "<query>"
}
Property Description Mandatory
type for aggregation pipeline operations is "mapReduce" yes
uri specifies the URI when the operation is bound under /<db>/<collection>/_aggrs path. yes
map

the map function

For more information refer to https://docs.mongodb.org/manual/core/map-reduce/

yes
reduce the reduce function yes
query the filter query no

Examples

The following requests update the collection metadata defining two aggregation operations:

  • an aggregation operation bound at /coll/_aggrs/example-pipeline
  • a map reduce operation bound at /coll/_aggrs/example-mapreduce
Request
PUT /coll HTTP/1.1

{
  "aggrs": [
    {
      "stages": [
        { "$match": { "name": { "$var": "n" } } },
        { "$group": { "_id": "$name", "avg_age": { "$avg": "$age" } } }
      ],
      "type": "pipeline",
      "uri": "example-pipeline"
    },
    {
      "map": "function() { emit(this.name, this.age) }",
      "query": { "name": { "$var": "n" } },
      "reduce": "function(key, values) { return Array.avg(values) }",
      "type": "mapReduce",
      "uri": "example-mapreduce"
    }
  ]
}

Materialized Views

The $merge stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out stage).

It’s as simple as adding $merge as the last stage of the pipeline.

The following example defines the aggregation /coll/_aggrs/age-by-gender that computes average ages grouping data by gender. $merge is used as the last stage of the pipeline to write computed data to the avgAgeByGender collection.

Materialized Views are available from MongoDB 4.2.

Request
PUT /coll HTTP/1.1

{ "aggrs" : [
    { "stages" : [
        { "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
        { "$merge": { "into": "avgAgeByGender" } }
      ],
      "type" : "pipeline",
      "uri" : "age-by-gender"
    }
  ]
}

Executing the aggregation request returns no data, but thanks to the $merge stage, the new collection avgAgeByGender gets created.

Request
GET /coll/_aggrs/avg-by-city HTTP/1.1

HTTP/1.1 200 OK
[]
Request
GET /avgAgeByGender HTTP/1.1

HTTP/1.1 200 OK
[
    { "_id": "male", "avg_age": 34.5 }
    { "_id": "female", "avg_age": 35.6 }
]

Passing variables to aggregations

The query parameter avars allows passing variables to the aggregations.

The value of a variable can be any valid JSON. The following query parameter passes two variables, a number and an object: ?avars={ "number": 1, "object": {"a": {"json": "object" }} }

For example, the previous example aggregations both use a variable named n. If the variable is not passed via the avars qparam, the request fails.

Request
GET /coll/_aggrs/example-pipeline HTTP/1.1

HTTP/1.1 400 Bad Request

{
    "http status code": 400,
    "http status description": "Bad Request",
    "message": "error executing aggreation pipeline: variable n not bound"
}

Passing the variable n, the request succeeds:

Request
GET /coll/_aggrs/example-pipeline?avars={"n":1} HTTP/1.1

HTTP/1.1 200 OK

Predefined variables

The following predefined variables can be used in the aggregation definition:

variable                  description
@user the user object (excluding the password), e.g. @user.userid (for users defined in acl.yml by FileRealmAuthenticator) or @user._id (for users defined in MongoDB by MongoRealmAuthenticator)
@mongoPermissions the MongoPermissions object, e.g. @mongoPermissions.readFilter
@page the value of the page query parameter
@pagesize the value of the pagesize query parameter
@skip to be used in $skip stage, equals to (page-1)*pagesize
@limit to be used in $limit stage, equals to the value of the pagesize query parameter

Handling paging in aggregations

Paging must be handled explicitly by the aggregation-

For example, the following defines the aggregation /aggrs/paging that uses the @skip and @limit variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25 skips 50 documents, returning the following 25 documents.

{
  "aggrs": [
    {
      "uri": "paging",
      "type": "pipeline",
      "stages": [
        { "$skip": { "$var": "@skip" } },
        { "$limit": { "$var": "@limit" } }
      ]
    }
  ]
}

Variables in stages or query

Variables can be used in aggregation pipeline stages and map reduce query as follows:

{ "$var": "<var_name>" }

In case of map reduce operation previous example, the variable was used to filter the documents to have the name property matching the variable n:

{
  "query": { "name": { "$var": "n" } },
  ...
}

Variables in map reduce functions

Variables are passed also to map and reduce javascript functions where the variable $vars can be used. For instance:

Request
PATCH /coll HTTP/1.1

{ "aggrs" : [
    {
      "map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
      "reduce" : "function(key, values) { return Array.avg(values) } }",
      "type" : "mapReduce",
      "uri" : "example-mapreduce"
    }
  ]
}

HTTP/1.1 200 Ok

Note the map function; JSON.parse($vars) allows to access the variables passed with the query parameter avars

function() {
 var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
 if (this.age > minage ) { emit(this.name, this.age); }
};

Security considerations

By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.

This behavior is required to protect data from undesirable malicious query injection.

Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml configuration file.

### Security

# Check if aggregation variables use operators. allowing operators in aggregation variables
# is risky. requester can inject operators modifying the query

aggregation-check-operators: true

Transaction Support

Starting from RESTHeart v6.0.6, aggregations are executed in the transaction scope if specified via the sid and txn query parameters.

For more information on how to create a transaction scope refer to transactions doc page.