Function - (Aggregate | Aggregation)


Aggregate functions return a single value

from values that are in a aggregation relationship (ie a set)

This values are also known as summary because they try to summarize / describe a set of data



You compute generally over additive numeric data grouped by class attribute


Data Processing - Selection

  • FIRST() - Returns the first value
  • LAST() - Returns the last value
  • MAX() - Returns the largest value
  • MIN() - Returns the smallest value
  • Quantile - (Median|Middle) - Returns the median


Mutative operation

Mutative accumulation for a sum

int sum = 0;
for (int x : numbers) {
   sum += x;

Reduction operation

They are implemented as reduction operation


Some operations like AVG are not partitionable. The computation can't therefore happens in parallel.

It is not valid to compute them on partitioned data because they are not commutative and associative.

You can still compute partial aggregates by transforming the non-commutative and associative function by commutative and associative function and, then roll them up.

Example: if:

  • you want to compute AVG(x)
  • you expand to SUM(x) / COUNT(x),
  • You compute partition SUM(x) and COUNT(X) on each partition
  • You sum them using SUM.

See Parallel Programming - (Function|Operation)

Powered by ComboStrap