Calcite (Farrago, Optiq)

About

Calcite is a Java SQL Processing engine where the data storage is developed in plugin.

Calcite is an open source cost based query optimizer and query execution framework.

Getting Started

Articles Related

Component

Catalog: metadata and namespace
Sql Parser: Parse the SQL string to a SQLNode - abstract syntax tree
Sql validator: Validate the SQL tree against the catalog
Sql to Rel Converter: Transform a SQL to a relational expression
Query Optimizer: Optimize/rewrite the logical plan (relational expression) - The output is called a physical plan.
SQL Generator: Converts relational expression to SQL

Key Concept

Relational Algebra

Row Expression

Row Expression - RexNode (Equivalent to Sparks' column)
- Projection Fields
- Filter Condition
- Join Condition
- Sort fields

List:

Input Column Ref - RexInputRef
Literal - RexLiteral
Struct Field access - RexFieldAccess
Function call - RexCall
Windows expression - RexOver

Rules

Rules - RelOptRule (Interface) used to modified query plan

Planners - RelOptPlanner
Programs - Program

Documentation / Reference

https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite

Query to relational Operator

Every query is represented as a tree of relational operators.

You can:

translate from SQL to relational algebra,
or build the tree directly.

Schema

Schemas are defined as a list of tables, each containing minimally a table name and a url.

Jdbc Schema
Html page and file adapter : If a page has more than one table, you can include in a table definition selector and index fields to specify the desired table. If there is no table specification, the file adapter chooses the largest table on the page.

Jdbc

jdbc:calcite:model=target/test-classes/model.json
// or
jdbc:calcite:schemaFactory=org.apache.calcite.adapter.druid.DruidSchemaFactory;schema.url=http://localhost:8082;schema.coordinatorUrl=http://localhost:8081

A JSON model of a simple Calcite schema.

{
  "version": "1.0",
  "defaultSchema": "SALES",
  "schemas": [
    {
      "name": "SALES",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.csv.CsvSchemaFactory",
      "operand": {
        "directory": "sales"
      }
    }
  ]
}

where:

schema

Adapter can be built programmatically using the Schema SPI. see Calcite Schema SPI

DDL

SELECT and DML are standardized, but DDL tends to be database-specific, so the calcite policy is that DDL extensions are made outside of Calcite. See CALCITE-609 for example.

You could copy work that has already been done in Drill and Phoenix in extending Calcite’s core parser for DDL.

Test

VM:

https://github.com/vlsi/calcite-test-dataset

Dataset: Database - HyperSQL DataBase (HSQLDB)

Planner

Eigenbase: the project where Calcite’s initial IP came from
optimization planners available in Farrago
HowToWriteNewOptimizerRules

Build

https://builds.apache.org/blue/organizations/jenkins/Calcite-Snapshots/activity

Stream

Information about whether a table allows streaming