About
Relational Algebra in Calcite
A relational expression is represented by a tree of The inter-wiki (calacite) does not exist and there is no default inter-wiki defined.RelNode.
A RelNode can be considered as the same logic than the Spark dataframe.
Articles Related
List
- TableScan
- Project
- Filter
- Aggregate
- Join
- Union
- Intersect
- Sort
Type
Logical
Logical algebra has no implementation of the relational operator and therefore can't run.
Logical algebra is also known as the logical plan.
All logical operator starts with the prefix Logical
Example of a relation expression printed (explain):
4:LogicalProject(name=[$2])
3:LogicalFilter(condition=[=($0, 100)])
2:LogicalJoin(condition=[=($1, $5)], joinType=[inner])
0:LogicalTableScan(table=[[HR, emps]])
1:LogicalTableScan(table=[[HR, depts]])
This is equivalent to the below SQL:
select *
from emps inner join depts on deptno
where empid = 100
Physical
Every logical operator needs to be transformed in a physical algebra to be executed.
Because the volcano planner (optimizer) is cost base, it's based on this physical operator to get a cost
Example of output for physical_plan (ie no logical Node) - The same plan as above but that say how to do perform the logical steps
1628:EnumerableProject(name=[$2])
1627:EnumerableHashJoin(condition=[=($1, $5)], joinType=[inner])
1626:EnumerableFilter(condition=[=($0, 100)])
94:EnumerableTableScan(table=[[HR, emps]])
92:EnumerableTableScan(table=[[HR, depts]])
Characteristic
- A relational expression is immutable.
Management
Create
RelBuilder
To build a relational expression, use the algebra builder (RelBuilder)
The builder uses a stack to store the relational expression produced by one step and pass it as an input to the next step.
build() is a stack method to get the last relational expression, namely the root of the expression tree.
The builder methods perform various optimizations, including:
- project returns its input if asked to project all columns in order
- filter flattens the condition (so an AND and OR may have more than 2 children), simplifies (converting say x = 1 AND TRUE to x = 1)
- If you apply sort then limit, the effect is as if you had called sortLimit
SqlNode
From a SqlNode, you get a logical plan. See Logical plan creation
Optimize
Visit
- RelShuttle - Visitor that has methods for the common logical relational expressions.
final RelShuttle shuttle = new RelHomogeneousShuttle() {
@Override public RelNode visit(TableScan scan) {
final RelOptTable table = scan.getTable();
if (scan instanceof LogicalTableScan
&& Bindables.BindableTableScan.canHandle(table)) {
// Always replace the LogicalTableScan with BindableTableScan
// because it's implementation does not require a "schema" as context.
return Bindables.BindableTableScan.create(scan.getCluster(), table);
}
return super.visit(scan);
}
};
relNode = relNode.accept(shuttle);
Validation
See Trait
Run
The RelRunner implementation runs a relational expression.
PreparedStatement run = RelRunners.run(relNode);
ResultSet resultSet = run.executeQuery();
More … see Calcite - Getting Started (from Sql to Resultset)
See Calcite - Getting Started (from Sql to Resultset) for a full example.
- with the RelOptUtil
RelOptUtil.toString(relNode)
- with the RelWriter and the explain function
RelWriter rw = new RelWriterImpl(new PrintWriter(System.out, true));
relNode.explain(rw);
Example of output for a logical plan (ie only Logical Node)
4:LogicalProject(name=[$2])
3:LogicalFilter(condition=[=($0, 100)])
2:LogicalJoin(condition=[=($1, $5)], joinType=[inner])
0:LogicalTableScan(table=[[HR, emps]])
1:LogicalTableScan(table=[[HR, depts]])
Example for physical_plan (ie no logical Node) - The same plan as above but that say how to do perform the logical steps
1628:EnumerableProject(name=[$2])
1627:EnumerableHashJoin(condition=[=($1, $5)], joinType=[inner])
1626:EnumerableFilter(condition=[=($0, 100)])
94:EnumerableTableScan(table=[[HR, emps]])
92:EnumerableTableScan(table=[[HR, depts]])
Explain
See print
Read / Write
Json (Example)