Calcite - Query Cost

1 - About

SQL Engine - (Query Plan) Cost in calcite.

Calcite applies a Cost based optimizer by default that is called the Volcano planner.

The cost is provided by the relational expression (relNode). See calculation

  • Cost is represented by RelOptCost
  • Cost typically includes
    • rowcount
    • IO
    • CPU cost
  • Cost estimates are relative
  • Statistics are used to improve accuracy of cost estimations

3 - Management

3.1 - Print

The cost can be seen when you print the relational expression (ie explain) with all attributes.

Example from the getting started guide.


RelNode relNode = relRoot.project();
final RelWriter relWriter = new RelWriterImpl(new PrintWriter(System.out), SqlExplainLevel.ALL_ATTRIBUTES, false);
relNode.explain(relWriter);


LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 150.0, cumulative cost = {3668.75 rows, 11014.762352915506 cpu, 0.0 io}, id = 18
  LogicalProject(NAME=[$1], EXPR$1=[$2]): rowcount = 150.0, cumulative cost = {3518.75 rows, 5002.0 cpu, 0.0 io}, id = 17
    LogicalAggregate(group=[{0, 1}], EXPR$1=[COUNT()]): rowcount = 150.0, cumulative cost = {3368.75 rows, 4702.0 cpu, 0.0 io}, id = 15
      LogicalProject(deptno0=[$5], NAME=[$6], empid=[$0]): rowcount = 1500.0, cumulative cost = {3200.0 rows, 4702.0 cpu, 0.0 io}, id = 14
        LogicalJoin(condition=[=($1, $5)], joinType=[inner]): rowcount = 1500.0, cumulative cost = {1700.0 rows, 202.0 cpu, 0.0 io}, id = 12
          LogicalTableScan(table=[[HR, emps]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 10
          LogicalTableScan(table=[[HR, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 11

3.2 - Calculation

The cost is provided by the relational expression (relNode). Example with EnumerableHashJoin


@Override public RelOptCost computeSelfCost(RelOptPlanner planner,
      RelMetadataQuery mq) {
    double rowCount = mq.getRowCount(this);

    // Joins can be flipped, and for many algorithms, both versions are viable
    // and have the same cost. To make the results stable between versions of
    // the planner, make one of the versions slightly more expensive.
    switch (joinType) {
    case SEMI:
    case ANTI:
      // SEMI and ANTI join cannot be flipped
      break;
    case RIGHT:
      rowCount = RelMdUtil.addEpsilon(rowCount);
      break;
    default:
      if (RelNodes.COMPARATOR.compare(left, right) > 0) {
        rowCount = RelMdUtil.addEpsilon(rowCount);
      }
    }

    // Cheaper if the smaller number of rows is coming from the LHS.
    // Model this by adding L log L to the cost.
    final double rightRowCount = right.estimateRowCount(mq);
    final double leftRowCount = left.estimateRowCount(mq);
    if (Double.isInfinite(leftRowCount)) {
      rowCount = leftRowCount;
    } else {
      rowCount += Util.nLogN(leftRowCount);
    }
    if (Double.isInfinite(rightRowCount)) {
      rowCount = rightRowCount;
    } else {
      rowCount += rightRowCount;
    }
    if (isSemiJoin()) {
      return planner.getCostFactory().makeCost(rowCount, 0, 0).multiplyBy(.01d);
    } else {
      return planner.getCostFactory().makeCost(rowCount, 0, 0);
    }
  }

4 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap