Calcite (Farrago, Optiq)

1 - About

Calcite is a Java SQL Processing engine where the data storage is developed in plugin.

Calcite is an open source cost based query optimizer and query execution framework.

Getting Started

3 - Component

  • Catalog: metadata and namespace
  • Sql Parser: Parse the SQL string to a SQLNode - abstract syntax tree
  • Sql validator: Validate the SQL tree against the catalog
  • Sql to Rel Converter: Transform a SQL to a relational expression
  • Query Optimizer: Optimize/rewrite the logical plan (relational expression) - The output is called a physical plan.
  • SQL Generator: Converts relational expression to SQL

4 - Key Concept

4.1 - Row Expression

  • Row Expression - RexNode (Equivalent to Sparks' column)
    • Projection Fields
    • Filter Condition
    • Join Condition
    • Sort fields


  • Input Column Ref - RexInputRef
  • Literal - RexLiteral
  • Struct Field access - RexFieldAccess
  • Function call - RexCall
  • Windows expression - RexOver

4.1.1 - Rules

Rules - RelOptRule (Interface) used to modified query plan

  • Planners - RelOptPlanner
  • Programs - Program

5 - Documentation / Reference

6 - Query to relational Operator

Every query is represented as a tree of relational operators.

You can:

  • translate from SQL to relational algebra,
  • or build the tree directly.

6.1 - Schema

Schemas are defined as a list of tables, each containing minimally a table name and a url.

  • Html page and file adapter : If a page has more than one table, you can include in a table definition selector and index fields to specify the desired table. If there is no table specification, the file adapter chooses the largest table on the page.

7 - Jdbc

// or

  • A JSON model of a simple Calcite schema.

  "version": "1.0",
  "defaultSchema": "SALES",
  "schemas": [
      "name": "SALES",
      "type": "custom",
      "factory": "org.apache.calcite.adapter.csv.CsvSchemaFactory",
      "operand": {
        "directory": "sales"


Adapter can be built programmatically using the Schema SPI. see Calcite Schema SPI

8 - DDL

SELECT and DML are standardized, but DDL tends to be database-specific, so the calcite policy is that DDL extensions are made outside of Calcite. See CALCITE-609 for example.

You could copy work that has already been done in Drill and Phoenix in extending Calcite’s core parser for DDL.

9 - Test

10 - Planner

11 - Build

12 - Stream

13 - Documentation / Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap