RESTHeart - Aggregations

Introduction
The aggrs collection metadata
- aggregation pipeline
- Map-Reduce
Examples
Materialized Views
Passing variables to aggregations
Security information

Aggregations

Introduction

“Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.”

RESTHeart manages aggregation operations: both aggregation pipelines and map reduce functions are supported.

In both cases only inline output type is supported, i.e. no result is directly written to the DB server unless the Materialized Views is used.

The aggrs collection metadata

In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.

Use the collection metadata aggrs to define aggregations. aggrs is an array of pipeline or mapReduce objects:

GET /coll/_meta HTTP/1.1

{
    "aggrs": [
        { <aggregation_1> },
        { <aggregation_2> },
        ...,
        { <aggregation_n> }
    ]
}

Aggregation pipeline

{
    "type":"pipeline",
    "uri": "<uri>",
    "stages": [
        "<stage_1>",
        "<stage_2>",
        "..."
    ],
    "allowDiskUse": true
}

Property Description Mandatory

type for aggregation pipeline operations is "pipeline" yes

uri specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs yes

Property	Description	Mandatory
type	for aggregation pipeline operations is "pipeline"	yes
uri	specifies the URI when the operation is bound under the path `/<db>/<collection>/_aggrs`	yes
stages	the MongoDB aggregation pipeline stages. For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/	yes

stages

the MongoDB aggregation pipeline stages.

For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/

yes

To store stages with operators and using the dot notation, RESTHeart automatically escapes the properties keys because of MongoDB’s Restrictions on Field Names:

the $ prefix is “underscore escaped”, e.g. $exists is stored as _$exists
dots are escaped as :: e.g. SD.prop is stored as SD::prop

Map-Reduce

{
    "type": "mapReduce",
    "uri": "<uri>",
    "map": "<map_function>",
    "reduce": "<reduce_function>",
    "query": "<query>"
}

Property	Description	Mandatory
type	for aggregation pipeline operations is "mapReduce"	yes
uri	specifies the URI when the operation is bound under /<db>/<collection>/_aggrs path.	yes
map	the map function For more information refer to https://docs.mongodb.org/manual/core/map-reduce/	yes
reduce	the reduce function	yes
query	the filter query	no

Examples

The following requests update the collection metadata defining two aggregation operations:

an aggregation operation bound at /coll/_aggrs/example-pipeline
a map reduce operation bound at /coll/_aggrs/example-mapreduce

Request

PUT /coll HTTP/1.1

{
  "aggrs": [
    {
      "stages": [
        { "$match": { "name": { "$var": "n" } } },
        { "$group": { "_id": "$name", "avg_age": { "$avg": "$age" } } }
      ],
      "type": "pipeline",
      "uri": "example-pipeline"
    },
    {
      "map": "function() { emit(this.name, this.age) }",
      "query": { "name": { "$var": "n" } },
      "reduce": "function(key, values) { return Array.avg(values) }",
      "type": "mapReduce",
      "uri": "example-mapreduce"
    }
  ]
}

Materialized Views

The $merge stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out stage).

It’s as simple as adding $merge as the last stage of the pipeline.

The following example defines the aggregation /coll/_aggrs/age-by-gender that computes average ages grouping data by gender. $merge is used as the last stage of the pipeline to write computed data to the avgAgeByGender collection.

Materialized Views are available from MongoDB 4.2.

Request

PUT /coll HTTP/1.1

{ "aggrs" : [
    { "stages" : [
        { "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
        { "$merge": { "into": "avgAgeByGender" } }
      ],
      "type" : "pipeline",
      "uri" : "age-by-gender"
    }
  ]
}

Executing the aggregation request returns no data, but thanks to the $merge stage, the new collection avgAgeByGender gets created.

Request

GET /coll/_aggrs/avg-by-city HTTP/1.1

HTTP/1.1 200 OK
[]

Request

GET /avgAgeByGender HTTP/1.1

HTTP/1.1 200 OK
[
    { "_id": "male", "avg_age": 34.5 }
    { "_id": "female", "avg_age": 35.6 }
]

Passing variables to aggregations

The query parameter avars allows passing variables to the aggregations.

The value of a variable can be any valid JSON. The following query parameter passes two variables, a number and an object: ?avars={ "number": 1, "object": {"a": {"json": "object" }} }

For example, the previous example aggregations both use a variable named n. If the variable is not passed via the avars qparam, the request fails.

Request

GET /coll/_aggrs/example-pipeline HTTP/1.1

HTTP/1.1 400 Bad Request

{
    "http status code": 400,
    "http status description": "Bad Request",
    "message": "error executing aggreation pipeline: variable n not bound"
}

Passing the variable n, the request succeeds:

Request

GET /coll/_aggrs/example-pipeline?avars={"n":1} HTTP/1.1

HTTP/1.1 200 OK

Predefined variables

The following predefined variables can be used in the aggregation definition:

variable	description
`@user`	the user object (excluding the password), e.g. `@user.userid` (for users defined in acl.yml by `FileRealmAuthenticator`) or `@user._id` (for users defined in MongoDB by `MongoRealmAuthenticator`)
`@mongoPermissions`	the `MongoPermissions` object, e.g. `@mongoPermissions.readFilter`
`@page`	the value of the `page` query parameter
`@pagesize`	the value of the `pagesize` query parameter
`@skip`	to be used in `$skip` stage, equals to `(page-1)*pagesize`
`@limit`	to be used in `$limit` stage, equals to the value of the `pagesize` query parameter

Handling paging in aggregations

Paging must be handled explicitly by the aggregation-

For example, the following defines the aggregation /aggrs/paging that uses the @skip and @limit variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25 skips 50 documents, returning the following 25 documents.

{
  "aggrs": [
    {
      "uri": "paging",
      "type": "pipeline",
      "stages": [
        { "$skip": { "$var": "@skip" } },
        { "$limit": { "$var": "@limit" } }
      ]
    }
  ]
}

Variables in stages or query

Variables can be used in aggregation pipeline stages and map reduce query as follows:

{ "$var": "<var_name>" }

In case of map reduce operation previous example, the variable was used to filter the documents to have the name property matching the variable n:

{
  "query": { "name": { "$var": "n" } },
  ...
}

Variables in map reduce functions

Variables are passed also to map and reduce javascript functions where the variable $vars can be used. For instance:

Request

PATCH /coll HTTP/1.1

{ "aggrs" : [
    {
      "map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
      "reduce" : "function(key, values) { return Array.avg(values) } }",
      "type" : "mapReduce",
      "uri" : "example-mapreduce"
    }
  ]
}

HTTP/1.1 200 Ok

Note the map function; JSON.parse($vars) allows to access the variables passed with the query parameter avars

function() {
 var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
 if (this.age > minage ) { emit(this.name, this.age); }
};

Security considerations

By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.

This behavior is required to protect data from undesirable malicious query injection.

Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml configuration file.

### Security

# Check if aggregation variables use operators. allowing operators in aggregation variables
# is risky. requester can inject operators modifying the query

aggregation-check-operators: true

Transaction Support

Starting from RESTHeart v6.0.6, aggregations are executed in the transaction scope if specified via the sid and txn query parameters.

For more information on how to create a transaction scope refer to transactions doc page.

Get Started

Installation and Configuration

Core Framework

Authentication and Authorization

Data API on MongoDB

REST API

GraphQL API

Websocket API

Aggregations

Introduction

The aggrs collection metadata

Aggregation pipeline

Map-Reduce

Examples

Materialized Views

Passing variables to aggregations

Predefined variables

Handling paging in aggregations

Variables in stages or query

Variables in map reduce functions

Security considerations

Transaction Support