# Relational Algebra - Expression and Operators

Relational algebra is based upon the fact that you can pass tabular data through a set of data operators (select, filter, join, sort, union, etc.) in a algebraic structure.

It means that:

• the output of a tabular operation is in the form of tabular data
• and can be passed as input to the next tabular operation
• and so on.

It's the same than with mathematics algebra where the output of a mathematical operation a number can be passed to the next one. $$1 + 2 - 4 = -1$$ The above expression can also be expressed in relational algebra if we see the tabular data as number. $$\text{ table_1 } join \text{ table_2 } filter = \text{ table_result }$$

Relational algebra is an extension to mathematical set theory (set operation), it was devised by E.F. Code (IBM) in 1970.

### Pipeline

Every query returns a table. The language is algebraically closed A view is stored as an algebraic closure so it can be optimized when used with queries.

Programs that manipulate tabular data exhibit an algebraic structure, therefore:

• All operations on a table return a table.
• The operation can then be chained (pipeline) together.

Relational algebra is about Data Flow focusing on transformations (relational operator)

### Query Optimization

Translating a SQL into relational algebra defines an intermediate format for query planning/optimization.

where:

Information requests may be expressed using set notions and set operations.

### Easy Verbs

Because Relational expressions (operator) process data, their names are typically verbs:

• Sort,
• Join,
• Project,
• Filter,
• Scan,
• Sample.

### The what not the how

The SQL declarative language expresses only the conditions that must be met in order for a result to be an answer, not how to get that answer.

SQL is the what not the how. It's clear what we want unclear how to get it.

Relational algebra expressions dictate how to achieve an answer by giving what operations to do and in what order to do them.

It allows a reasoning and manipulation independently of physical data representation.

## Order

By design, query results in relational algebra are unordered. Repeating the same query multiple times is not guaranteed to return results in the same order. Similarly, database-backed relational data also do not guarantee row order.

## Column

The column names of an operator are guaranteed to be unique.

In case of name conflict, a suffix is generally applied. For example, joining EMP to DEPT in a SCOTT schema, will result in

• a column called DEPTNO
• and another called DEPTNO_1.

## Relational Operator Algebra

All relational algebra operator are set-operator.

operator in Algebra.

Type Name Symbol Sql Operator Description
Normal select, Selection $\sigma$ of s where select the rows
Normal project $\pi$ select select the columns
Normal join $\Join$ join join
Normal union $\cup$ union (without duplicates)
of union all (with duplicates)
union of tables - Tuples either in Rel 1 or in Rel 2
Normal intersection $\cap$ minus or join same row in two tables
Normal Set-difference - except, minus Tuples in Rel 1, but not Rel 2
Normal Rename $\rho$ as Rename an attribute or a relation
Extended Grouping and aggregating g
Extended Duplicate elimination d
Extended Sorting t
Extended Division / find values that have all properties

## Bag vs Set

Relational Algebra has two semantics:

• Set semantics = standard Relational Algebra
• Bag semantics = extended Relational Algebra

Rule of thumb:

• Every paper will assume set semantics
• Every implementation will assume bag semantics

## Flow

Input Query Language Output
Sets of Tuples Set Relational Algebra Set of Tuples
Bags of Tuples Bag Relational Algebra Bag of Tuples
Lists of Tuples Extended Relational Algebra List of Tuples

## Documentation / Reference

• Bill Howe, PhD (Director of Research, Scalable Data Analytics) University of Washington - eScience Institute