Aggregations#

Aggregations in Cozo can be thought of as a function that acts on a stream of values and produces a single value (the aggregate).

There are two kinds of aggregations in Cozo, ordinary aggregations and semi-lattice aggregations. They are implemented differently in Cozo, with semi-lattice aggregations generally faster and more powerful (only the latter can be used recursively).

The power of semi-lattice aggregations derive from the additional properties they satisfy: a semilattice:

idempotency
the aggregate of a single value a is a itself,

commutativity
the aggregate of a then b is equal to the aggregate of b then a,

associativity
it is immaterial where we put the parentheses in an aggregate application.

In auto-recursive semi-lattice aggregations, there are soundness constraints on what can be done on the bindings coming from the auto-recursive parts within the body of the rule. Usually you do not need to worry about this at all since the obvious ways of using this functionality are all sound, but as for non-termination due to fresh variables introduced by function applications, Cozo does not (and cannot) check for unsoundness in this case.

Semi-lattice aggregations#

min(x)#: Aggregate the minimum value of all x.

max(x)#: Aggregate the maximum value of all x.

and(var)#: Aggregate the logical conjunction of the variable passed in.

or(var)#: Aggregate the logical disjunction of the variable passed in.

union(var)#: Aggregate the unions of var, which must be a list.

intersection(var)#: Aggregate the intersections of var, which must be a list.

choice(var)#: Returns a non-null value. If all values are null, returns null. Which one is returned is deterministic but implementation-dependent and may change from version to version.

min_cost([data, cost])#: The argument should be a list of two elements and this aggregation chooses the list of the minimum cost.

shortest(var)#: var must be a list. Returns the shortest list among all values. Ties will be broken non-deterministically.

bit_and(var)#: var must be bytes. Returns the bitwise ‘and’ of the values.

bit_or(var)#: var must be bytes. Returns the bitwise ‘or’ of the values.

Ordinary aggregations#

count(var)#: Count how many values are generated for var (using bag instead of set semantics).

count_unique(var)#: Count how many unique values there are for var.

collect(var)#: Collect all values for var into a list.

unique(var)#: Collect var into a list, keeping each unique value only once.

group_count(var)#: Count the occurrence of unique values of var, putting the result into a list of lists, e.g. when applied to 'a', 'b', 'c', 'c', 'a', 'c', the results is [['a', 2], ['b', 1], ['c', 3]].

bit_xor(var)#: var must be bytes. Returns the bitwise ‘xor’ of the values.

latest_by([data, time])#

The argument should be a list of two elements and this aggregation returns the data of the maximum time. This is very similar to min_cost, the differences being that maximum instead of minimum is used, and non-numerical costs are allowed. only data is returned, and the aggregation is deliberately not a semi-lattice aggregation.

Note

This aggregation is intended to be used in timestamped audit trails. As an example:

?[id, latest_by(status_ts)] := *data[id, status, ts], status_ts = [status, ts]

returns the latest status for each id. If you do this regularly, consider using the time travelling facility.

smallest_by([data, cost])#: The argument should be a list of two elements and this aggregation returns the data of the minimum cost. Non-numerical costs are allowed, unlike min_cost. The value null for data are ignored when comparing.

choice_rand(var)#: Non-deterministically chooses one of the values of var as the aggregate. Each value the aggregation encounters has the same probability of being chosen.

Note

This version of choice is not a semi-lattice aggregation since it is impossible to satisfy the uniform sampling requirement while maintaining no state, which is an implementation restriction unlikely to be lifted.

Statistical aggregations#

mean(x)#: The mean value of x.

sum(x)#: The sum of x.

product(x)#: The product of x.

variance(x)#: The sample variance of x.

std_dev(x)#: The sample standard deviation of x.