Relational Algebra - Expression and Operators
Table of Contents
1 - About
Relational algebra defines the relational database through a set of data operators (select, filter, join, sort, union, etc.) that defines an intermediate format for query planning/optimization.
Information requests may be expressed using set notions and set operations.
Relational algebra is about Data Flow focusing on transformations (relational operator)
Because Relational expressions (operator) process data, their names are typically verbs:
- Sort,
- Join,
- Project,
- Filter,
- Scan,
- Sample.
Merge with Sql Engine - Relational Operator (Data operations|Execution Plan Steps) ?
Relational algebra is an extension to mathematical set theory (set operation), it was devised by E.F. Code (IBM) in 1970.
where:
Programs that manipulate tabular data exhibit an algebraic structure, therefore:
- It allows a reasoning and manipulation independently of physical data representation.
- All operations on a table return a table.
- The operation can then be chained together.
Relational algebra expressions dictate how to achieve an answer by giving what operations to do and in what order to do them. A declarative language only expresses conditions that must be met in order for a result to be an answer, not how to get that answer.
For instance, SQL is the what not the how. It's clear what we want unclear how to get it
Every query returns a table. The language is algebraically closed A view is stored as an algebraic closure so it can be optimized when used with queries.
2 - Articles Related
3 - Order
By design, query results in relational algebra are unordered. Repeating the same query multiple times is not guaranteed to return results in the same order. Similarly, database-backed relational data also do not guarantee row order.
4 - Column
The column names of an operator are guaranteed to be unique.
In case of name conflict, a suffix is generally applied. For example, joining EMP to DEPT in a SCOTT schema, will result in
- a column called DEPTNO
- and another called DEPTNO_1.
5 - Relational Operator Algebra
All relational algebra operator are set-operator.
operator in Algebra.
Type | Name | Symbol | Sql Operator | Description |
---|---|---|---|---|
Normal | select, Selection | <math>\sigma</math> of s | where | select the rows |
Normal | project | <math>\pi</math> | select | select the columns |
Normal | join | <math>\Join</math> | join | join |
Normal | union | <math>\cup</math> | union (without duplicates) of union all (with duplicates) | union of tables - Tuples either in Rel 1 or in Rel 2 |
Normal | intersection | <math>\cap</math> | minus or join | same row in two tables |
Normal | Set-difference | - | except, minus | Tuples in Rel 1, but not Rel 2 |
Normal | Rename | <math>\rho</math> | as | Rename an attribute or a relation |
Extended | Grouping and aggregating | g | ||
Extended | Duplicate elimination | d | ||
Extended | Sorting | t | ||
Extended | Division | / | find values that have all properties |
6 - Bag vs Set
Relational Algebra has two semantics:
- Set semantics = standard Relational Algebra
- Bag semantics = extended Relational Algebra
Rule of thumb:
- Every paper will assume set semantics
- Every implementation will assume bag semantics
7 - Flow
Input | Query Language | Output |
---|---|---|
Sets of Tuples | Set Relational Algebra | Set of Tuples |
Bags of Tuples | Bag Relational Algebra | Bag of Tuples |
Lists of Tuples | Extended Relational Algebra | List of Tuples |
8 - Algebraic Optimization
9 - Equivalent relational expressions with different costs
10 - Library
11 - Documentation / Reference
- Bill Howe, PhD (Director of Research, Scalable Data Analytics) University of Washington - eScience Institute