Edit Page

Aggregations

Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.

The RESTHeart API provides powerful aggregation capabilities to developers, including the ability to run both aggregation pipelines and map reduce functions using just a GET request.

To execute an aggregation query, simply send a GET request to /coll/_aggrs/<aggregation-uri> with the appropriate parameters!

Note
Only inline output type is supported - meaning that no results are written directly to the database. However, if you wish to have results written to the database, then Materialized Views is the solution for you.

Aggregation definition

Note
In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.

You can define aggregation by setting the aggrs collection metadata property via a PATCH /coll request. Here is an example of how to structure the request body:

PATCH /coll HTTP/1.1

{
    "aggrs": [
        { <aggregation_1> },
        { <aggregation_2> },
        ...,
        { <aggregation_n> }
    ]
}

The aggregation definition objects have the following format:

{
    "uri": "<uri>",
    "stages": [
        "<stage_1>",
        "<stage_2>",
        "..."
    ],
    "allowDiskUse": true
}
Warning
For RESTHeart versions < 7.3, you also need to specify "type": "pipeline"
Property Description Mandatory

type

for aggregation pipeline operations is pipeline

No, default is pipeline (prior to v7.3, type was required)

uri

specifies the URI when the operation is bound under the path /<collection>/_aggrs

Yes

stages

the MongoDB aggregation pipeline stages. For more information refer to Aggregation Pipeline on MongoDb documentation.

Yes

allowDiskUse

Allow pipeline stages that require more than 100 megabytes of memory to write temporary files to disk

No, default is false

Example

The following request sets an aggregation pipeline bound at /coll/_aggrs/example-pipeline

PATCH /coll HTTP/1.1

{
  "aggrs": [
    {
      "stages": [
        { "$match": { "name": { "$var": "name" } } },
        { "$group": { "_id": "$name", "avg_age": { "$avg": "$age" } } }
      ],
      "uri": "example-pipeline"
    }
  ]
}

Variables

Using variables to define parameters in an aggregation allows you to include dynamic queries and create a more customized output. This can help to make the query much more specific and efficient when retrieving data from the database.

The $var Operator

In aggregation definitions, you have the flexibility to create parametric aggregations using the special operator {"$var": "name" }. This operator enables you to dynamically customize aggregations based on variable values.

Passing Variables to Aggregations

To leverage this feature, you can utilize the ?avars query parameter when making requests. By including ?avars={"name":"Bob"}, you can pass variables to aggregations. This allows you to adapt the aggregation behavior on the fly.

Variable Value Format

It’s important to note that the value of a variable can be any valid JSON. This means you can pass various data types, including strings and complex objects. For instance:

?avars={ "name": "Bob", "obj": {"a": { "json": "object" }} }

In this example, we are passing two variables: a string ("name") and an object ("obj").

Handling Missing Variables

A crucial point to remember is that if an aggregation relies on a variable, such as name, and that variable is not passed via the ?avars query parameter, the request will fail unless you define a variable with a default value. Therefore, ensure that all required variables are provided to avoid issues during aggregation execution.

GET /coll/_aggrs/example-pipeline HTTP/1.1

HTTP/1.1 400 Bad Request

{
    "http status code": 400,
    "http status description": "Bad Request",
    "message": "error executing aggreation pipeline: variable name not bound"
}

Dot Notation Support

Starting from RESTHeart version 7.6, the $var operator also supports dot notation. This means you can navigate through nested JSON structures within variables. For instance:

?avars={ "foo": {"bar": 1 }

In this case, the value of {"$var": "foo.bar" } will be resolved as 1. This enhances the flexibility and versatility of parametric aggregations, allowing for more complex and dynamic data processing.

Variables with default values

In RESTHeart version 7.3 and onwards, you can enhance the flexibility of your aggregation requests by specifying default values for variables. This feature ensures that your aggregations gracefully handle cases where a variable is not explicitly provided in the request using the ?avar query parameter.

Defining Variables with Default Values

To define a variable with a default value, utilize the following syntax: {"$var": [ "name", "default-value" ] }. This structure allows you to set a fallback value that will be used when a specific variable is not included in the request.

Default Values Format

It’s essential to note that default values can take any valid JSON format. For instance, you can set a default value like this:

{"$var": [ "s", {"name":1} ]}

In this example, the default value for variable s is {"name":1}.

Example: Using a Variable with Default Value in $sort Stage

To illustrate this concept, consider an example aggregation that incorporates a variable s with a default value in the $sort stage:

{
  "aggrs": [
    {
        "uri": "sort-with-default-example",
        "stages": [
            { "$sort": { "$var": [ "s", { "name": 1 } ] } }
        ]
    }
  ]
}

In this aggregation, if the s variable is not provided in the request, it will default to {"name": 1}. This powerful feature simplifies aggregation requests and ensures consistent behavior even when specific variables are not explicitly set.

Predefined variables

The following predefined variables can be used in the aggregation definition:

variable description

@user

the user object (excluding the password), e.g. @user._id (for users defined in MongoDB by MongoRealmAuthenticator) or @user.userid (for users defined in acl.yml by FileRealmAuthenticator)

@mongoPermissions

the MongoPermissions object, e.g. @mongoPermissions.readFilter

@page

the value of the page query parameter

@pagesize

the value of the pagesize query parameter

@skip

to be used in $skip stage, equals to (page-1)*pagesize

@limit

to be used in $limit stage, equals to the value of the pagesize query parameter

Handling paging in aggregations

Paging must be handled explicitly by the aggregation-

For example, the following defines the aggregation /aggrs/paging that uses the @skip and @limit variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25 skips 50 documents, returning the following 25 documents.

{
  "aggrs": [
    {
      "uri": "paging",
      "stages": [
        { "$skip": { "$var": "@skip" } },
        { "$limit": { "$var": "@limit" } }
      ]
    }
  ]
}

Optional Stages

Note
Optional stages are available from RESTHeart 7.3.

A pipeline in RESTHeart can include optional stages, which are included only if certain variables are set using the ?avar query parameter.

To define an optional stage, you can use the $ifvar operator.

With One Required Variable

For one required variable, use the following format:

{
    "uri": "by-name",
    "stages": [
        { "$match": { "name": "foo" } },
        { "$ifvar": [ "required-variable", <stage> ] }
    ]
}

Here’s an example:

{
    "uri": "by-name",
    "stages": [
        { "$match": { "name": "foo" } },
        { "$ifvar": [ "s", { "$sort": { "$var": "s" } } ] }
    ]
}

With Multiple Required Variables

If you need to specify more than one required variable, you can use the following format:

{
    "uri": "by-name",
    "stages": [
        { "$match": { "name": "foo" } },
        { "$ifvar": [ [ <required-variables> ], <stage> ] }
    ]
}
Tip
Starting from RESTHeart v7.6, the $ifvar operator supports the dot notation for specifying variables, so it is possible to define an optional stage as { "$ifvar": [ "a.nested.var", <stage> ] }
{
    "uri": "by-name",
    "stages": [
        { "$match": { "name": "foo" } },
        { "$ifvar": [ ["a", "b" ] , { "$match": { "foo": { "$var": "a" }, "bar": { "$var": "b" } } } ] }
    ]
}

Specifying an Else Stage

It is also possible to specify an else stage, i.e., an alternative stage that is included in the aggregation if the required variables are not passed via the ?avar query parameter.

To specify an else stage, use the following format:

{
    "uri": "by-name",
    "stages": [
        { "$match": { "name": "foo" } },
        { "$ifvar": [ ["a", "b" ],
            { "$match": { "foo": { "$var": "a" }, "bar": { "$var": "b" } } },
            { "$match": { "foo": 1, "bar": 2 } } ] }
    ]
}

Materialized Views

The $merge stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out stage).

It’s as simple as adding $merge as the last stage of the pipeline.

The following example defines the aggregation /coll/_aggrs/age-by-gender that computes average ages grouping data by gender. $merge is used as the last stage of the pipeline to write computed data to the avgAgeByGender collection.

PUT /coll HTTP/1.1

{ "aggrs" : [
    { "stages" : [
        { "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
        { "$merge": { "into": "avgAgeByGender" } }
      ],
      "uri" : "age-by-gender"
    }
  ]
}

Executing the aggregation request returns no data, but thanks to the $merge stage, the new collection avgAgeByGender gets created.

GET /coll/_aggrs/avg-by-city HTTP/1.1

HTTP/1.1 200 OK
[]
GET /avgAgeByGender HTTP/1.1

HTTP/1.1 200 OK
[
    { "_id": "male", "avg_age": 34.5 }
    { "_id": "female", "avg_age": 35.6 }
]

Security considerations

By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.

This behavior is required to protect data from undesirable malicious query injection.

Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml configuration file.

# Check if aggregation variables use operators. https://restheart.org/docs/mongodb-rest/aggregations/#security-considerations
mongo:
    aggregation-check-operators: true

Transaction Support

Aggregations are executed in the transaction scope if specified via the sid and txn query parameters.

For more information on how to create a transaction scope refer to Transactions doc page.

Map-Reduce

Warning
map reduce are deprecated. Use aggregation pipeline instead. See Map-Reduce in MongoDb documentation.
{
    "type": "mapReduce",
    "uri": "<uri>",
    "map": "<map_function>",
    "reduce": "<reduce_function>",
    "query": "<query>"
}
Property Description Mandatory

type

for aggregation pipeline operations is "mapReduce"

yes

uri

specifies the map reduce URI under /<db>/<collection>/_aggrs path.

yes

map

the map function

yes

reduce

the reduce function

yes

query

the query

yes

Example

The following request update the collection metadata defining a map reduce operation bound at /coll/_aggrs/example-mapreduce

PUT /coll HTTP/1.1

{
  "aggrs": [
    {
      "map": "function() { emit(this.name, this.age) }",
      "query": { "name": { "$var": "n" } },
      "reduce": "function(key, values) { return Array.avg(values) }",
      "type": "mapReduce",
      "uri": "example-mapreduce"
    }
  ]
}

Variables

in query

You can use the variable in queries using the $var operator.

in map reduce functions

Variables are passed also to map and reduce javascript functions where the variable $vars can be used. For instance:

PATCH /coll HTTP/1.1

{ "aggrs" : [
    {
      "map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
      "reduce" : "function(key, values) { return Array.avg(values) } }",
      "type" : "mapReduce",
      "uri" : "example-mapreduce"
    }
  ]
}

HTTP/1.1 200 Ok

Note the map function; JSON.parse($vars) allows to access the variables passed with the query parameter avars

function() {
 var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
 if (this.age > minage ) { emit(this.name, this.age); }
};