Calcite - Relational Expression (RelNode, Algebra)

Card Puncher Data Processing

About

Relational Algebra in Calcite

A relational expression is represented by a tree of The inter-wiki (calacite) does not exist and there is no default inter-wiki defined.RelNode.

A RelNode can be considered as the same logic than the Spark dataframe.

List

  • TableScan
  • Project
  • Filter
  • Aggregate
  • Join
  • Union
  • Intersect
  • Sort

Type

Logical

Logical algebra has no implementation of the relational operator and therefore can't run.

Logical algebra is also known as the logical plan.

All logical operator starts with the prefix Logical

Example of a relation expression printed (explain):

4:LogicalProject(name=[$2])
  3:LogicalFilter(condition=[=($0, 100)])
    2:LogicalJoin(condition=[=($1, $5)], joinType=[inner])
      0:LogicalTableScan(table=[[HR, emps]])
      1:LogicalTableScan(table=[[HR, depts]])

This is equivalent to the below SQL:

select * 
from emps inner join depts on deptno
where empid = 100

Physical

Every logical operator needs to be transformed in a physical algebra to be executed.

Because the volcano planner (optimizer) is cost base, it's based on this physical operator to get a cost

Example of output for physical_plan (ie no logical Node) - The same plan as above but that say how to do perform the logical steps

1628:EnumerableProject(name=[$2])
  1627:EnumerableHashJoin(condition=[=($1, $5)], joinType=[inner])
    1626:EnumerableFilter(condition=[=($0, 100)])
      94:EnumerableTableScan(table=[[HR, emps]])
    92:EnumerableTableScan(table=[[HR, depts]])

Characteristic

  • A relational expression is immutable.

Management

Create

RelBuilder

To build a relational expression, use the algebra builder (RelBuilder)

The builder uses a stack to store the relational expression produced by one step and pass it as an input to the next step.

build() is a stack method to get the last relational expression, namely the root of the expression tree.

The builder methods perform various optimizations, including:

  • project returns its input if asked to project all columns in order
  • filter flattens the condition (so an AND and OR may have more than 2 children), simplifies (converting say x = 1 AND TRUE to x = 1)
  • If you apply sort then limit, the effect is as if you had called sortLimit

SqlNode

From a SqlNode, you get a logical plan. See Logical plan creation

Optimize

See Calcite - Optimizer (RelOptCluster)

Visit

  • RelShuttle - Visitor that has methods for the common logical relational expressions.
final RelShuttle shuttle = new RelHomogeneousShuttle() {
      @Override public RelNode visit(TableScan scan) {
        final RelOptTable table = scan.getTable();
        if (scan instanceof LogicalTableScan
            && Bindables.BindableTableScan.canHandle(table)) {
          // Always replace the LogicalTableScan with BindableTableScan
          // because it's implementation does not require a "schema" as context.
          return Bindables.BindableTableScan.create(scan.getCluster(), table);
        }
        return super.visit(scan);
      }
    };
relNode = relNode.accept(shuttle);

Validation

See Trait

Run

The RelRunner implementation runs a relational expression.

PreparedStatement run = RelRunners.run(relNode);
ResultSet resultSet = run.executeQuery();

More … see Calcite - Getting Started (from Sql to Resultset)

Print

See Calcite - Getting Started (from Sql to Resultset) for a full example.

  • with the RelOptUtil
RelOptUtil.toString(relNode)
  • with the RelWriter and the explain function
RelWriter rw = new RelWriterImpl(new PrintWriter(System.out, true));
relNode.explain(rw);

Example of output for a logical plan (ie only Logical Node)

4:LogicalProject(name=[$2])
  3:LogicalFilter(condition=[=($0, 100)])
    2:LogicalJoin(condition=[=($1, $5)], joinType=[inner])
      0:LogicalTableScan(table=[[HR, emps]])
      1:LogicalTableScan(table=[[HR, depts]])

Example for physical_plan (ie no logical Node) - The same plan as above but that say how to do perform the logical steps

1628:EnumerableProject(name=[$2])
  1627:EnumerableHashJoin(condition=[=($1, $5)], joinType=[inner])
    1626:EnumerableFilter(condition=[=($0, 100)])
      94:EnumerableTableScan(table=[[HR, emps]])
    92:EnumerableTableScan(table=[[HR, depts]])

Explain

See print

Read / Write

Json (Example)

Documentation / Reference





Discover More
Card Puncher Data Processing
Calcite (Farrago, Optiq)

Calcite is a Java SQL Processing engine where the data storage is developed in plugin. Calcite is an open source cost based query optimizer and query execution framework. SQL Parser SQL Validation...
Calcite Converter Rule
Calcite - (Planner) Rule (RelOptRule)

A rule is used to modified a relational expression (ie query plan) They are used by the planner to: optimize or modify relational algebra expressions (ie query plan). Every rule extends org/apache/calcite/plan/RelOptRuleRelOptRule...
Card Puncher Data Processing
Calcite - (Row|Scalar) Expression (RexNode)

Expression that return a scalar (ie one value). Many of them returns data from the stack (the complete relational expression). They implements RexNode...
Calcite Conventions
Calcite - Convention

A convention is a type of trait They are used to represent a single data source Inputs to a relational expression must be in the same convention A convention is associated with a RelNode interface...
Card Puncher Data Processing
Calcite - Getting Started (from Sql to Resultset)

A getting started page that shows a query planning process (ie from sql to resultset process) gerardnico/calcite/blob/master/src/test/java/com/gerardnico/calcite/CalciteFrameworksTest.javaCalciteFrameworksTest.java...
Card Puncher Data Processing
Calcite - Logical Plan (Logical algebra)

in Calcite A logical plan is a relational expression with only logical operator. Logical algebra has no implementation of the relational operator and therefore can't run. The logical plan is the first...
Card Puncher Data Processing
Calcite - Optimizer (RelOptCluster)

The optimizer is a program that takes a relational expression (query plan) and rewrites it with optimization rules. The output is still a relational expression and is generally called the physical plan....
Card Puncher Data Processing
Calcite - Physical Plan

in calcite. The physical plan is the relation algebra expression that describe how to perform the operation on the data. It's the output of the optimizer Example of output of a physical_plan (There...
Card Puncher Data Processing
Calcite - Planner (RelOptPlanner)

org/apache/calcite/plan/package-infoPlan provides an optimizer interface (ie Defines interfaces for constructing rule-based optimizers of relational expressions) Frameworks...
Card Puncher Data Processing
Calcite - Query Cost

in calcite. Calcite applies a Cost based optimizer by default that is called the Volcano planner. The cost is provided by the relational expression (relNode). See Cost is represented by org/apache/calcite/plan/RelOptCostRelOptCost...



Share this page:
Follow us:
Task Runner