Aggregations
Introduction
“Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result.”
RESTHeart manages aggregation operations: both aggregation pipelines and map reduce functions are supported.
In both cases only inline output type is supported, i.e. no result is directly written to the DB server unless the Materialized Views is used.
The aggrs collection metadata
In RESTHeart, not only documents but also dbs and collections have properties. Some properties are metadata, i.e. they have a special meaning for RESTheart that influences its behavior.
Use the collection metadata aggrs
to define aggregations. aggrs
is an array of pipeline or mapReduce objects:
GET /coll/_meta HTTP/1.1
{
"aggrs": [
{ <aggregation_1> },
{ <aggregation_2> },
...,
{ <aggregation_n> }
]
}
Aggregation pipeline
{
"type":"pipeline",
"uri": "<uri>",
"stages": [
"<stage_1>",
"<stage_2>",
"..."
],
"allowDiskUse": true
}
Property | Description | Mandatory |
---|---|---|
type | for aggregation pipeline operations is "pipeline" | yes |
uri | specifies the URI when the operation is bound under the path /<db>/<collection>/_aggrs |
yes |
stages | the MongoDB aggregation pipeline stages. For more information refer to https://docs.mongodb.org/manual/core/aggregation-pipeline/ |
yes |
To store stages with operators and using the dot notation, RESTHeart automatically escapes the properties keys because of MongoDB’s Restrictions on Field Names:
- the $ prefix is “underscore escaped”, e.g.
$exists
is stored as_$exists
- dots are escaped as :: e.g.
SD.prop
is stored asSD::prop
Map-Reduce
{
"type": "mapReduce",
"uri": "<uri>",
"map": "<map_function>",
"reduce": "<reduce_function>",
"query": "<query>"
}
Property | Description | Mandatory |
---|---|---|
type | for aggregation pipeline operations is "mapReduce" | yes |
uri | specifies the URI when the operation is bound under /<db>/<collection>/_aggrs path. | yes |
map | the map function For more information refer to https://docs.mongodb.org/manual/core/map-reduce/ |
yes |
reduce | the reduce function | yes |
query | the filter query | no |
Examples
The following requests update the collection metadata defining two aggregation operations:
- an aggregation operation bound at
/coll/_aggrs/example-pipeline
- a map reduce operation bound at
/coll/_aggrs/example-mapreduce
PUT /coll HTTP/1.1
{
"aggrs": [
{
"stages": [
{ "$match": { "name": { "$var": "n" } } },
{ "$group": { "_id": "$name", "avg_age": { "$avg": "$age" } } }
],
"type": "pipeline",
"uri": "example-pipeline"
},
{
"map": "function() { emit(this.name, this.age) }",
"query": { "name": { "$var": "n" } },
"reduce": "function(key, values) { return Array.avg(values) }",
"type": "mapReduce",
"uri": "example-mapreduce"
}
]
}
Materialized Views
The $merge
stage for the pipelines delivers the ability to create collections based on an aggregation and update those created collections efficiently, i.e. it just updates the generated results collection rather than rebuild it completely (like it would with the $out
stage).
It’s as simple as adding $merge
as the last stage of the pipeline.
The following example defines the aggregation /coll/_aggrs/age-by-gender
that computes average ages grouping data by gender. $merge
is used as the last stage of the pipeline to write computed data to the avgAgeByGender
collection.
Materialized Views are available from MongoDB 4.2.
PUT /coll HTTP/1.1
{ "aggrs" : [
{ "stages" : [
{ "$group" : { "_id" : "$gender", "avg_age" : { "$avg" : "$age" } } },
{ "$merge": { "into": "avgAgeByGender" } }
],
"type" : "pipeline",
"uri" : "age-by-gender"
}
]
}
Executing the aggregation request returns no data, but thanks to the $merge
stage, the new collection avgAgeByGender
gets created.
GET /coll/_aggrs/avg-by-city HTTP/1.1
HTTP/1.1 200 OK
[]
GET /avgAgeByGender HTTP/1.1
HTTP/1.1 200 OK
[
{ "_id": "male", "avg_age": 34.5 }
{ "_id": "female", "avg_age": 35.6 }
]
Passing variables to aggregations
The query parameter avars
allows passing variables to the aggregations.
The value of a variable can be any valid JSON.
The following query parameter passes two variables, a number and an object: ?avars={ "number": 1, "object": {"a": {"json": "object" }} }
For example, the previous example aggregations both use a variable named
n
. If the variable is not passed via the avars
qparam, the request
fails.
GET /coll/_aggrs/example-pipeline HTTP/1.1
HTTP/1.1 400 Bad Request
{
"http status code": 400,
"http status description": "Bad Request",
"message": "error executing aggreation pipeline: variable n not bound"
}
Passing the variable n, the request succeeds:
GET /coll/_aggrs/example-pipeline?avars={"n":1} HTTP/1.1
HTTP/1.1 200 OK
Predefined variables
The following predefined variables can be used in the aggregation definition:
variable | description |
---|---|
@user |
the user object (excluding the password), e.g. @user.userid (for users defined in acl.yml by FileRealmAuthenticator ) or @user._id (for users defined in MongoDB by MongoRealmAuthenticator ) |
@mongoPermissions |
the MongoPermissions object, e.g. @mongoPermissions.readFilter |
@page |
the value of the page query parameter |
@pagesize |
the value of the pagesize query parameter |
@skip |
to be used in $skip stage, equals to (page-1)*pagesize |
@limit |
to be used in $limit stage, equals to the value of the pagesize query parameter |
Handling paging in aggregations
Paging must be handled explicitly by the aggregation-
For example, the following defines the aggregation /aggrs/paging
that uses the @skip
and @limit
variables. As a result, the request GET /coll/_aggrs/paging?page=3&pagesize=25
skips 50 documents, returning the following 25 documents.
{
"aggrs": [
{
"uri": "paging",
"type": "pipeline",
"stages": [
{ "$skip": { "$var": "@skip" } },
{ "$limit": { "$var": "@limit" } }
]
}
]
}
Variables in stages or query
Variables can be used in aggregation pipeline stages and map reduce query as follows:
{ "$var": "<var_name>" }
In case of map reduce operation previous example, the variable was used to filter the documents to have the name property matching the variable n:
{
"query": { "name": { "$var": "n" } },
...
}
Variables in map reduce functions
Variables are passed also to map and reduce javascript functions
where the variable $vars
can be used. For instance:
PATCH /coll HTTP/1.1
{ "aggrs" : [
{
"map" : "function() { var minage = JSON.parse($vars).minage; if (this.age > minage ) { emit(this.name, this.age); }; }",
"reduce" : "function(key, values) { return Array.avg(values) } }",
"type" : "mapReduce",
"uri" : "example-mapreduce"
}
]
}
HTTP/1.1 200 Ok
Note the map function; JSON.parse($vars)
allows to access the
variables passed with the query parameter avars
function() {
var minage = JSON.parse($vars).minage;// <-- here we get minage from avars qparam
if (this.age > minage ) { emit(this.name, this.age); }
};
Security considerations
By default RESTHeart makes sure that the aggregation variables passed as query parameters don’t include MongoDB operators.
This behavior is required to protect data from undesirable malicious query injection.
Even though is highly discouraged, is possible to disable this check by editing the following property in the restheart.yml
configuration file.
### Security
# Check if aggregation variables use operators. allowing operators in aggregation variables
# is risky. requester can inject operators modifying the query
aggregation-check-operators: true
Transaction Support
Starting from RESTHeart v6.0.6, aggregations are executed in the transaction scope if specified via the sid
and txn
query parameters.
For more information on how to create a transaction scope refer to transactions doc page.