Aggregations
Introduction
“Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.”
RESTHeart manages aggregation operations: both aggregation pipelines and map reduce functions are supported.
In both cases only inline output type is supported, i.e. no result is directly written to the DB server unless the Materialized Views is used.
The aggrs collection metadata
In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.
Use the collection metadata aggrs
to define aggregations. aggrs
is an array of pipeline or mapReduce objects:
GET /coll/_meta HTTP/1.1
{
"aggrs": [
{ <aggregation_1> },
{ <aggregation_2> },
...,
{ <aggregation_n> }
]
}
aggregation pipeline
{
"type": "pipeline",
"uri": "<uri>",
"stages": ["<stage_1>", "<stage_2>", "..."],
"allowDiskUse": true
}
Property | Description | Mandatory |
---|---|---|
type | for aggregation pipeline operations is "pipeline" | yes |
uri | specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs |
yes |
stages | the MongoDB aggregation pipeline stages. For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/ |
yes |
To store stages with operators and using the dot notation, RESTHeart automatically escapes the properties keys because of MongoDB’s Restrictions on Field Names:
- the $ prefix is “underscore escaped”, e.g.
$exists
is stored as_$exists
- dots are escaped as :: e.g.
SD.prop
is stored asSD::prop
Map-Reduce
{
"type": "mapReduce",
"uri": "<uri>",
"map": "<map_function>",
"reduce": "<reduce_function>",
"query": "<query>"
}
Property | Description | Mandatory |
---|---|---|
type | for aggregation pipeline operations is "mapReduce" | yes |
uri | specifies the URI when the operation is bound under /<db>/<collection>/_aggrs path. | yes |
map | the map function For more information refer to https://docs.mongodb.org/manual/core/map-reduce/ |
yes |
reduce | the reduce function | yes |
query | the filter query | no |
Examples
The following requests upsert a collection defining two aggregation operations:
- aggregation operation test_ap bound at
/coll/_aggrs/example-pipeline
- map reduce operation test_mr bound at
/coll/_aggrs/example-mapreduce
PUT /coll HTTP/1.1
{ "aggrs" : [
{ "stages" : [ { "$match" : { "name" : { "$var" : "n" } } },
{ "$group" : { "_id" : "$name",
"avg_age" : { "$avg" : "$age" }
} }
],
"type" : "pipeline",
"uri" : "example-pipeline"
},
{ "map" : "function() { emit(this.name, this.age) }",
"query" : { "name" : { "$var" : "n" } },
"reduce" : "function(key, values) { return Array.avg(values) }",
"type" : "mapReduce",
"uri" : "example-mapreduce"
}
] }
Materialized Views
The $merge
stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out
stage).
It’s as simple as adding $merge
as the last stage of the pipeline.
The following example defines the aggregation /coll/_aggrs/age-by-gender
that computes average ages grouping data by gender. $merge
is used as the last stage of the pipeline to write computed data to the avgAgeByGender
collection.
Materialized Views are available from MongoDB 4.2.
PUT /coll HTTP/1.1
{ "aggrs" : [
{ "stages" : [
{ "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
{ "$merge": { "into": "avgAgeByGender" } }
],
"type" : "pipeline",
"uri" : "age-by-gender"
}
]
}
Executing the aggregation request returns no data, but thanks to the $merge
stage, the new collection avgAgeByGender
gets created.
GET /coll/_aggrs/avg-by-city HTTP/1.1
HTTP/1.1 200 OK
[]
GET /avgAgeByGender HTTP/1.1
HTTP/1.1 200 OK
[
{ "_id": "male", "avg_age": 34.5 }
{ "_id": "female", "avg_age": 35.6 }
]
Passing variables to aggregations
The query parameter avars
allows passing variables to the aggregations.
The value of a variable can be any valid JSON.
The following query parameter passes two variables, a number and an object: ?avars={ "number": 1, "object": {"a": {"json": "object" }} }
For example, the previous example aggregations both use a variable named
“n”. *If the variable is not passed via the avars
qparam, the request
fails.
GET /coll/_aggrs/example-pipeline HTTP/1.1
HTTP/1.1 400 Bad Request
...
{
"http status code": 400,
"http status description": "Bad Request",
"message": "error executing aggreation pipeline: variable n not bound"
}
Passing the variable n, the request succeeds:
GET /coll/_aggrs/example-pipeline?avars={"n":1} HTTP/1.1
HTTP/1.1 200 OK
...
Variables in stages or query
Variables can be used in aggregation pipeline stages and map reduce query as follows:
{ "$var": "<var_name>" }
In case of map reduce operation previous example, the variable was used to filter the documents to have the name property matching the variable n:
{
"query": { "name": { "$var": "n" } }
}
Variables in map reduce functions
Variables are passed also to map and reduce javascript functions
where the variable $vars
can be used. For instance:
PATCH /coll HTTP/1.1
{ "aggrs" : [
{ "map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
"reduce" : "function(key, values) { return Array.avg(values) } }",
"type" : "mapReduce",
"uri" : "example-mapreduce"
}
] }
HTTP/1.1 200 Ok
...
Note the map function; JSON.parse($vars)
allows to access the
variables passed with the query parameter avars
function() {
var minage = JSON.parse($vars).minage; // <-- here we get minage from avars qparam
if (this.age > minage ) { emit(this.name, this.age); }
};
Handling paging in aggregations
Starting RESTHeart Platform v4.1.12 (and RESTHeart OSS v4.1.8) paging must be handled explicitly by the aggregation (until this version paging was handled automatically). This allows more flexibility and better performances.
Starting RESTHeart Platform v4.2.0 and (and RESTHeart OSS v4.2.0) the following aggregation variables can be used to allow handling paging in the aggregation via default page
and pagesize
query parameters:
@page
the value of thepage
query parameter@pagesize
the value of thepagesize
query parameter@skip
to be used in$skip
stage, equals to(page-1)*pagesize
@limit
to be used in$limit
stage, equals to the value of thepagesize
query parameter
For example, the following defines the aggregation /aggrs/paging
that uses the @skip
and @limit
variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25
skips 50 documents, returning the following 25 documents.
{
"aggrs": [
{
"uri": "paging",
"type": "pipeline",
"stages": [
{
"$skip": {
"$var": "@skip"
}
},
{
"$limit": {
"$var": "@limit"
}
}
]
}
]
}
Security Informations
By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.
This behavior is required to protect data from undesirable malicious query injection.
Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart-platform-core.yml
configuration file.
### Security
# Check if aggregation variables use operators. allowing operators in aggregation variables
# is risky. requester can inject operators modifying the query
aggregation-check-operators: true