4.1. Grouping

Grouping (GROUP BY).

4.1.1. Extract Expressions

Gather up all the aggregate columns in the query and make their expressions unique, for example:

SELECT value, sum(a), sum(a + 2), sum(a) * 2
FROM some_table

The unique aggregates would be (in no particular order):

sum(a)
sum(a + 2)

It is important they are unique because we don’t want to calculate them twice. Even if we didn’t care about the performance hit it is still critical they are unique because these expressions become the key as we aggregate and so multiple expressions would cause all values to be run twice through all the aggregate functions.

Each unique expression becomes a key and is rendered so sum(x + 1) is the same as sum(x+1). However, sum(x + 1) and sum(1 + x) are not the same thing.

4.1.2. Grouping

As we iterate the records we maintain a Redis hash called group which contains keys that represent JSON strings and a value of 1.

Since a query does not need to have a GROUP BY clause this effectively means that all the rows in the set belong to the same group. So we give this master group a true value to group on.

In any case the value is encoded to JSON - this ensures the value is both a string and unique across values and equivalent values in different data types.

The following records (grouped by a):

{"a": 123}
{"a": true}
{"a": "123"}
{"a": 123}

Would be:

"true"
"123"
"\"123\""

A query could contain multiple aggregate expressions and we want to keep them independent. Our two possible solutions is to maintain a different hash for each expression, or use a single hash but have a prefix/suffix to the keys. I chose to use the latter separated by a colon. The hash in Redis will look like this:

Key Value
count(*):"true" 1
count(*):"123" 2
count(*):"\"123\"" 1

The value is an integer that is incremented with each unique key. The number 123 appears twice.

4.1.3. Lua Processing

There are two Lua functions required to produce the final result. Lets take AVG() as an example. The two Lua functions would be:

function_avg(group, value)
function_avg_post(unique_group, group)

function_avg is run with each value as it’s encountered. This is an opportunity to track values that may be needed for post processing:

function_avg('count(*)', 123)
function_avg('count(*)', true)
function_avg('count(*)', "123")
function_avg('count(*)', 123)

Once the grouping is complete we use a post processing Lua function to calculate the final result:

function_avg_post('agg_key', 'count(*)')

The first argument is the Redis key that contains the original grouping hash so you can lookup the original count if you need to.

4.1.4. Ensure Single Row

If there is no GROUP BY clause we must return one row, even if the original set did not have any rows. At the moment the default value will always be 0, except in the case of MIN() and MAX() which is hard-coded to return null.