GROUP BY ($group) in MongoDB

Mongodb is not relational database, but we sometimes want to use functions similar from such databases. I wrote about JOINs in Mongo few weeks ago, but it isn’t only one option. Second and very interesting is grouping – something like GROUP BY in for example MySQL database. Do we really need grouping in Mongo? Maybe yes, maybe no, but I must implement such option for one of application modules. It isn’t difficult and we can do this very simply. Again, I will use examples from PHP and mongodb extension for this language – code should be very similar in other languages.

In contrast to JOINs and $lookup command, grouping doesn’t require MongoDB 3.2 or never version. It’s still not separate  command, but one of options for $aggregation, so we must remember about order of commands, limits  sorting caveats etc. Syntax of $group is very simple. We just set all commands to grouping as „new” _id for results. See and example:
aggregate = [];
$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
        ],
    ],
];

$results = $this->getCollection()->aggregate($aggregate);

This code will group our results by name field. Let’s try with many fields, it’s still very simple:

$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
            'user' => '$user',
        ],
    ],
];

Rest will will be the same as on standard find command but… as you can see, results will be different, because with grouping we don’t fetch all fields, but only grouping result (new _id). What can we do with this? Solution is simple, but may be a bit onerous – we must specify fields to fetch as other arguments:

aggregate = [];
$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
            'user' => '$user',
        ],
    'name' => ['$first' => '$name'],
    'user' => ['$first' => '$user'],
    'birthdate' => ['$first' => '$birthdate'],
    'age' => ['$first' => '$age'],
    ],
];

This example will group records by name and user and fetch name, user, birthdate and age of them. 

Aggregation caveats

We can use grouping in aggregation, but must still remember about order of commands – it’s matter for our results! See this code:
aggregate = [];
$aggregate[] = ['$limit' => $limit];
$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
        ],
    'name' => ['$first' => '$name'],
    'user' => ['$first' => '$user'],
    ],
];

and similar example:

aggregate = [];
$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
        ],
    'name' => ['$first' => '$name'],
    'user' => ['$first' => '$user'],
    ],
];
$aggregate[] = ['$limit' => $limit];

Results will be completely different, because on first example we first limit records and then grouping – results will be „incorrect”, because we use only limited portion of data! Also, we must remember to sort our results BEFORE and after grouping if order of one field is import for us:

aggregate = [];
$aggregate[] = ['$match' => $filters]; // Filter records
$aggregate[] = ['$sort' => ['_sort_date' => -1]]; // Our sort date
$aggregate[] = [
    '$group' => [
        '_id' => [
            'name' => '$name',
        ],
    'name' => ['$first' => '$name'],
    'user' => ['$first' => '$user'],
    ],
];
$aggregate[] = ['$sort' => ['_sort_date' => -1]]; // Second sort - after grouping
$aggregate[] = ['$limit' => $limit]; // In the end, limit results