MongoDB aggregation – why stages order really matters

MongoDB aggregation framework is very powerful tool. It can do much more than many relation databases features, but if we do not know or do not understand some simple rules, effects can be… a bit strange. Especially if we want to use it in PHP and build aggregation step by step, as array of commands. Great example is to use sorting or limiting, one small mistake and it will return false documents. Why? Because we can forget about commands order. 

When we use standard Eloquent, it’s not a problem to first set some conditions, then add sorting and then additional conditions, everything will be fine, because it will use simple find on Mongo: 

$query = MyModel::query()->where('foo', 'bar');
$query->limit(10);
$query->orderBy('name', 'desc');
$query->where('param1', 'val1');
$results = $query->get();

 
In that example, everything will be ok, because query builder will do all required things. But what if we will try to do the same thing using aggregation framework? We can do that in raw Mongo, but use another PHP and Laravel example, it will the same except small syntax changes: 

$aggregate = [];
$aggregate = [
	'$match' => [
		'foo' => 'bar'
	]
];
$aggregate = [
	'$limit' => 10
];
$aggregate = [
	'$sort' => ['name' => -1]
];
$aggregate = [
	'$match' => [
		'param1' => 'val1'
	]
];
        
$results = MyModel::raw(function (Builder $builder) use ($aggregate)) {
	return $builder->$aggregate;
};

In that case, results will be completely different. Why? Because when we use aggregation, command order matters. It will first filter documents and receive only these with field “foo” with “bar” value, then it will limit them to just 10, then sort and in the end, use additional filter by “param1” field. As you can see, limiter inside will cause, that a lot of documents will be skipped and not included in query. Its why command order is important – it allows us to make very complex queries, modify data, join collections in different way (using $lookup) but we should now, what we want to achieve.  

Ok, so, how should be prepared aggregation to work like simple query from first example? We need to change order. Also, we can simplify that by merge two $match stages into one: 

$aggregate = [];
$aggregate = [
	'$match' => [
		'foo' => 'bar',
		'param1' => 'val1'
	]
];
$aggregate = [
	'$sort' => ['name' => -1]
];
$aggregate = [
	'$limit' => 10
];
        
$results = MyModel::raw(function (Builder $builder) use ($aggregate)) {
	return $builder->$aggregate;
};

In that case, aggregation will first filter documents, then sort them and finally use limiter – no risk, that some documents meet our requirements will be skipped.