Calcite (Farrago, Optiq)

About

Calcite is a Java SQL Processing engine where the data storage is developed in plugin.

Calcite is an open source cost based query optimizer and query execution framework.

Getting Started

Component

  • Catalog: metadata and namespace
  • Sql Parser: Parse the SQL string to a SQLNode - abstract syntax tree
  • Sql validator: Validate the SQL tree against the catalog
  • Sql to Rel Converter: Transform a SQL to a relational expression
  • Query Optimizer: Optimize/rewrite the logical plan (relational expression) - The output is called a physical plan.
  • SQL Generator: Converts relational expression to SQL

Key Concept

Relational Algebra

Row Expression

  • Row Expression - RexNode (Equivalent to Sparks' column)
    • Projection Fields
    • Filter Condition
    • Join Condition
    • Sort fields

List:

  • Input Column Ref - RexInputRef
  • Literal - RexLiteral
  • Struct Field access - RexFieldAccess
  • Function call - RexCall
  • Windows expression - RexOver

Rules

Rules - RelOptRule (Interface) used to modified query plan

  • Planners - RelOptPlanner
  • Programs - Program

Documentation / Reference

https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite

Query to relational Operator

Every query is represented as a tree of relational operators.

You can:

  • translate from SQL to relational algebra,
  • or build the tree directly.

Schema

Schemas are defined as a list of tables, each containing minimally a table name and a url.

  • Html page and file adapter : If a page has more than one table, you can include in a table definition selector and index fields to specify the desired table. If there is no table specification, the file adapter chooses the largest table on the page.

Jdbc

jdbc:calcite:model=target/test-classes/model.json
// or
jdbc:calcite:schemaFactory=org.apache.calcite.adapter.druid.DruidSchemaFactory;schema.url=http://localhost:8082;schema.coordinatorUrl=http://localhost:8081
  • A JSON model of a simple Calcite schema.
{
  "version": "1.0",
  "defaultSchema": "SALES",
  "schemas": [
    {
      "name": "SALES",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.csv.CsvSchemaFactory",
      "operand": {
        "directory": "sales"
      }
    }
  ]
}

where:

Adapter can be built programmatically using the Schema SPI. see Calcite Schema SPI

DDL

SELECT and DML are standardized, but DDL tends to be database-specific, so the calcite policy is that DDL extensions are made outside of Calcite. See CALCITE-609 for example.

You could copy work that has already been done in Drill and Phoenix in extending Calcite’s core parser for DDL.

Test

VM:

Dataset: Database - HyperSQL DataBase (HSQLDB)

Planner

Build

Stream

Documentation / Reference

Task Runner