Aggregations
Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.
The RESTHeart API provides powerful aggregation capabilities to developers, including the ability to run both aggregation pipelines and map reduce functions using just a GET request.
To execute an aggregation query, simply send a GET request to /coll/_aggrs/<aggregation-uri>
with the appropriate parameters!
Note
|
Only inline output type is supported - meaning that no results are written directly to the database. However, if you wish to have results written to the database, then Materialized Views is the solution for you. |
Aggregation definition
Note
|
In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior. |
You can define aggregation by setting the aggrs collection metadata property via a PATCH /coll
request. Here is an example of how to structure the request body:
PATCH /coll HTTP/1.1
{
"aggrs": [
{ <aggregation_1> },
{ <aggregation_2> },
...,
{ <aggregation_n> }
]
}
The aggregation definition objects have the following format:
{
"uri": "<uri>",
"stages": [
"<stage_1>",
"<stage_2>",
"..."
],
"allowDiskUse": true
}
Warning
|
For RESTHeart versions < 7.3, you also need to specify "type": "pipeline"
|
Property | Description | Mandatory |
---|---|---|
type |
for aggregation pipeline operations is |
No, default is |
uri |
specifies the URI when the operation is bound under the path |
Yes |
stages |
the MongoDB aggregation pipeline stages. For more information refer to Aggregation Pipeline on MongoDb documentation. |
Yes |
allowDiskUse |
Allow pipeline stages that require more than 100 megabytes of memory to write temporary files to disk |
No, default is |
Example
The following request sets an aggregation pipeline bound at /coll/_aggrs/example-pipeline
PATCH /coll HTTP/1.1
{
"aggrs": [
{
"stages": [
{ "$match": { "name": { "$var": "name" } } },
{ "$group": { "_id": "$name", "avg_age": { "$avg": "$age" } } }
],
"uri": "example-pipeline"
}
]
}
Variables
Using variables to define parameters in an aggregation allows you to include dynamic queries and create a more customized output. This can help to make the query much more specific and efficient when retrieving data from the database.
The $var
Operator
In aggregation definitions, you have the flexibility to create parametric aggregations using the special operator {"$var": "name" }
. This operator enables you to dynamically customize aggregations based on variable values.
Passing Variables to Aggregations
To leverage this feature, you can utilize the ?avars
query parameter when making requests. By including ?avars={"name":"Bob"}
, you can pass variables to aggregations. This allows you to adapt the aggregation behavior on the fly.
Variable Value Format
It’s important to note that the value of a variable can be any valid JSON. This means you can pass various data types, including strings and complex objects. For instance:
?avars={ "name": "Bob", "obj": {"a": { "json": "object" }} }
In this example, we are passing two variables: a string ("name"
) and an object ("obj"
).
Handling Missing Variables
A crucial point to remember is that if an aggregation relies on a variable, such as name
, and that variable is not passed via the ?avars
query parameter, the request will fail unless you define a variable with a default value. Therefore, ensure that all required variables are provided to avoid issues during aggregation execution.
GET /coll/_aggrs/example-pipeline HTTP/1.1
HTTP/1.1 400 Bad Request
{
"http status code": 400,
"http status description": "Bad Request",
"message": "error executing aggreation pipeline: variable name not bound"
}
Dot Notation Support
Starting from RESTHeart version 7.6, the $var
operator also supports dot notation. This means you can navigate through nested JSON structures within variables. For instance:
?avars={ "foo": {"bar": 1 }
In this case, the value of {"$var": "foo.bar" }
will be resolved as 1
. This enhances the flexibility and versatility of parametric aggregations, allowing for more complex and dynamic data processing.
Variables with default values
In RESTHeart version 7.3 and onwards, you can enhance the flexibility of your aggregation requests by specifying default values for variables. This feature ensures that your aggregations gracefully handle cases where a variable is not explicitly provided in the request using the ?avar
query parameter.
Defining Variables with Default Values
To define a variable with a default value, utilize the following syntax: {"$var": [ "name", "default-value" ] }
. This structure allows you to set a fallback value that will be used when a specific variable is not included in the request.
Default Values Format
It’s essential to note that default values can take any valid JSON format. For instance, you can set a default value like this:
{"$var": [ "s", {"name":1} ]}
In this example, the default value for variable s
is {"name":1}
.
Example: Using a Variable with Default Value in $sort
Stage
To illustrate this concept, consider an example aggregation that incorporates a variable s
with a default value in the $sort
stage:
{
"aggrs": [
{
"uri": "sort-with-default-example",
"stages": [
{ "$sort": { "$var": [ "s", { "name": 1 } ] } }
]
}
]
}
In this aggregation, if the s
variable is not provided in the request, it will default to {"name": 1}
. This powerful feature simplifies aggregation requests and ensures consistent behavior even when specific variables are not explicitly set.
Predefined variables
The following predefined variables can be used in the aggregation definition:
variable | description |
---|---|
|
the user object (excluding the password), e.g. |
|
the |
|
the value of the |
|
the value of the |
|
to be used in |
|
to be used in |
Handling paging in aggregations
Paging must be handled explicitly by the aggregation-
For example, the following defines the aggregation /aggrs/paging
that uses the @skip
and @limit
variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25
skips 50 documents, returning the following 25 documents.
{
"aggrs": [
{
"uri": "paging",
"stages": [
{ "$skip": { "$var": "@skip" } },
{ "$limit": { "$var": "@limit" } }
]
}
]
}
Optional Stages
Note
|
Optional stages are available from RESTHeart 7.3. |
A pipeline in RESTHeart can include optional stages, which are included only if certain variables are set using the ?avar
query parameter.
To define an optional stage, you can use the $ifvar
operator.
With One Required Variable
For one required variable, use the following format:
{
"uri": "by-name",
"stages": [
{ "$match": { "name": "foo" } },
{ "$ifvar": [ "required-variable", <stage> ] }
]
}
Here’s an example:
{
"uri": "by-name",
"stages": [
{ "$match": { "name": "foo" } },
{ "$ifvar": [ "s", { "$sort": { "$var": "s" } } ] }
]
}
With Multiple Required Variables
If you need to specify more than one required variable, you can use the following format:
{
"uri": "by-name",
"stages": [
{ "$match": { "name": "foo" } },
{ "$ifvar": [ [ <required-variables> ], <stage> ] }
]
}
Tip
|
The $ifvar operator supports the dot notation for specifying variables, so it is possible to define an optional stage as { "$ifvar": [ "a.nested.var", <stage> ] }
|
{
"uri": "by-name",
"stages": [
{ "$match": { "name": "foo" } },
{ "$ifvar": [ ["a", "b" ] , { "$match": { "foo": { "$var": "a" }, "bar": { "$var": "b" } } } ] }
]
}
Specifying an Else Stage
It is also possible to specify an else stage, i.e., an alternative stage that is included in the aggregation if the required variables are not passed via the ?avar
query parameter.
To specify an else stage, use the following format:
{
"uri": "by-name",
"stages": [
{ "$match": { "name": "foo" } },
{ "$ifvar": [ ["a", "b" ],
{ "$match": { "foo": { "$var": "a" }, "bar": { "$var": "b" } } },
{ "$match": { "foo": 1, "bar": 2 } } ] }
]
}
Materialized Views
The $merge
stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out
stage).
It’s as simple as adding $merge
as the last stage of the pipeline.
The following example defines the aggregation /coll/_aggrs/age-by-gender
that computes average ages grouping data by gender. $merge
is used as the last stage of the pipeline to write computed data to the avgAgeByGender
collection.
PUT /coll HTTP/1.1
{ "aggrs" : [
{ "stages" : [
{ "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
{ "$merge": { "into": "avgAgeByGender" } }
],
"uri" : "age-by-gender"
}
]
}
Executing the aggregation request returns no data, but thanks to the $merge
stage, the new collection avgAgeByGender
gets created.
GET /coll/_aggrs/avg-by-city HTTP/1.1
HTTP/1.1 200 OK
[]
GET /avgAgeByGender HTTP/1.1
HTTP/1.1 200 OK
[
{ "_id": "male", "avg_age": 34.5 }
{ "_id": "female", "avg_age": 35.6 }
]
Security considerations
By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.
This behavior is required to protect data from undesirable malicious query injection.
Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml
configuration file.
# Check if aggregation variables use operators. https://restheart.org/docs/mongodb-rest/aggregations/#security-considerations
mongo:
aggregation-check-operators: true
Transaction Support
Aggregations are executed in the transaction scope if specified via the sid
and txn
query parameters.
For more information on how to create a transaction scope refer to Transactions doc page.
Map-Reduce
Warning
|
map reduce are deprecated. Use aggregation pipeline instead. See Map-Reduce in MongoDb documentation. |
{
"type": "mapReduce",
"uri": "<uri>",
"map": "<map_function>",
"reduce": "<reduce_function>",
"query": "<query>"
}
Property | Description | Mandatory |
---|---|---|
type |
for aggregation pipeline operations is "mapReduce" |
yes |
uri |
specifies the map reduce URI under |
yes |
map |
the map function |
yes |
reduce |
the reduce function |
yes |
query |
the query |
yes |
Example
The following request update the collection metadata defining a map reduce operation bound at /coll/_aggrs/example-mapreduce
PUT /coll HTTP/1.1
{
"aggrs": [
{
"map": "function() { emit(this.name, this.age) }",
"query": { "name": { "$var": "n" } },
"reduce": "function(key, values) { return Array.avg(values) }",
"type": "mapReduce",
"uri": "example-mapreduce"
}
]
}
Variables
in query
You can use the variable in queries using the $var
operator.
in map reduce functions
Variables are passed also to map and reduce javascript functions
where the variable $vars
can be used. For instance:
PATCH /coll HTTP/1.1
{ "aggrs" : [
{
"map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
"reduce" : "function(key, values) { return Array.avg(values) } }",
"type" : "mapReduce",
"uri" : "example-mapreduce"
}
]
}
HTTP/1.1 200 Ok
Note the map function; JSON.parse($vars)
allows to access the
variables passed with the query parameter avars
function() {
var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
if (this.age > minage ) { emit(this.name, this.age); }
};