Function - (Aggregate | Aggregation)

1 - About

Aggregate functions return a single value

from values that are in a aggregation relationship (ie a set)

This values are also known as summary because they try to summarize / describe a set of data

3 - List

3.1 - Computed

You compute generally over additive numeric data grouped by class attribute
  • AVG() - mean - Returns the average / the mean (sum/count)
  • COUNT() - Returns the number of rows
  • SUM() - Returns the sum
  • mode - Returns the mode (most often)
  • Cryptography - Hash - Returns a probabilistic unique string representation

3.2 - Selection

Data Processing - Selection

  • FIRST() - Returns the first value
  • LAST() - Returns the last value
  • MAX() - Returns the largest value
  • MIN() - Returns the smallest value
  • Quantile - (Median|Middle) - Returns the median

4 - Implementation

4.1 - Mutative operation

Mutative accumulation for a sum

int sum = 0;
for (int x : numbers) {
   sum += x;

4.2 - Reduction operation

They are implemented as reduction operation

4.3 - Partition

Some operations like AVG are not partitionable. The computation can't therefore happens in parallel.

It is not valid to compute them on partitioned data because they are not commutative and associative.

You can still compute partial aggregates by transforming the non-commutative and associative function by commutative and associative function and, then roll them up.

Example: if:

  • you want to compute AVG(x)
  • you expand to SUM(x) / COUNT(x),
  • You compute partition SUM(x) and COUNT(X) on each partition
  • You sum them using SUM.

See Parallel Programming - (Function|Operation)

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap