SQL Engine - SQL Parser

About

The parser in an SQL engine parse a Sql statement.

It's the first stage of SQL processing. This stage involves separating the pieces of a SQL statement into a SQL Tree where each node is a SQL token (such as Select,from, …)

The database parses a statement when instructed by the application, which means that only the application, and not the database itself, can reduce the number of parses.

When an application issues a SQL statement, the application makes a parse call to the database to prepare the statement for execution. The parse call opens or creates a cursor, which is a handle for the session-specific private SQL area that holds a parsed SQL statement and other processing information.

During the parse call, the database performs the following checks:

Syntax Check
Semantic Check
Shared Pool Check

The preceding checks identify the errors that can be found before statement execution. Some errors cannot be caught by parsing. For example, the database can encounter deadlocks or errors in data conversion only during statement execution.

Articles Related

Check

Syntax

Lexical Analysis - Parser (Syntax analysis|Linter)

The engine must check each SQL statement for syntactic validity. A statement that breaks a rule for well-formed SQL syntax fails the check. For example, the following statement fails because the keyword FROM is misspelled as FORM:

SQL> SELECT * FORM employees;
SELECT * FORM employees
         *
ERROR at line 1:
ORA-00923: FROM keyword not found where expected

Semantic

Compiler - Semantics Analysis.

The semantics of a statement are its meaning. Thus, a semantic check determines whether a statement is meaningful, for example, whether the objects and columns in the statement exist. A syntactically correct statement can fail a semantic check, as shown in the following example of a query of a nonexistent table:

SQL> SELECT * FROM nonexistent_table;
SELECT * FROM nonexistent_table
              *
ERROR at line 1:
ORA-00942: table or view does not exist

Shared Pool

During the parse, the database performs a shared pool check to determine whether it can skip resource-intensive steps of statement processing. To this end, the database uses a hashing algorithm to generate a hash value for every SQL statement. The statement hash value is the SQL ID shown in VSQL.SQL_ID.

When a user submits a SQL statement, the database searches the shared SQL area to see if an existing parsed statement has the same hash value. The hash value of a SQL statement is distinct from the following values:

Memory address for the statement

Oracle Database uses the SQL ID to perform a keyed read in a lookup table. In this way, the database obtains possible memory addresses of the statement.

Hash value of an execution plan for the statement

A SQL statement can have multiple plans in the shared pool. Each plan has a different hash value. If the same SQL ID has multiple plan hash values, then the database knows that multiple plans exist for this SQL ID.

The figure below is a simplified representation of a shared pool check of an UPDATE statement in a dedicated server architecture. Comparison of hash value between the PGA and the SGA.

Categories

Parse operations fall into the following categories, depending on the type of statement submitted and the result of the hash check:

Hard

If Oracle Database cannot reuse existing code, then it must build a new executable version of the application code. This operation is known as a hard parse, or a library cache miss. The database always perform a hard parse of DDL. During the hard parse, the database accesses the library cache and data dictionary cache numerous times to check the data dictionary. When the database accesses these areas, it uses a serialization device called a latch on required objects so that their definition does not change (see “Latches”). Latch contention increases statement execution time and decreases concurrency.

Soft

A soft parse is any parse that is not a hard parse. If the submitted statement is the same as a reusable SQL statement in memory, then the database reuses the existing code. This reuse of code is also called a library cache hit.

Soft parses can vary in the amount of work they perform. For example, configuring the session shared SQL area can sometimes reduce the amount of latching in the soft parses, making them “softer.”

In general, a soft parse is preferable to a hard parse because the database skips the optimization and row source generation steps, proceeding straight to execution.

Example

Identical syntax is not sufficient

If a check determines that a statement in the shared pool has the same hash value, then the database performs semantic and environment checks to determine whether the statements mean the same. Identical syntax is not sufficient. For example, suppose two different users log in to the database and issue the following SQL statements:

CREATE TABLE my_table ( some_col INTEGER );
SELECT * FROM my_table;

The SELECT statements for the two users are syntactically identical, but two separate schema objects are named my_table. This semantic difference means that the second statement cannot reuse the code for the first statement.

Environmental difference can force a hard parse

Even if two statements are semantically identical, an environmental difference can force a hard parse. In this case, the environment is the totality of session settings that can affect execution plan generation, such as the work area size or optimizer settings. Consider the following series of SQL statements executed by a single user:

ALTER SYSTEM FLUSH SHARED_POOL;
SELECT * FROM my_table;

ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS;
SELECT * FROM my_table;

ALTER SESSION SET SQL_TRACE=TRUE;
SELECT * FROM my_table;

In the preceding example, the same SELECT statement is executed in three different optimizer environments. Consequently, the database creates three separate shared SQL areas for these statements and forces a hard parse of each statement.

Library

Specialized parser library for SQL

OptiQ Parser Used by Drill
JSqlParser

http://www.sqlparser.com/ - Professional

Calcite

Calcite - Sql Parser - Only for the calcite syntax

JOOQ

JOOQ (Implementation Parse Query function) SQL parser is a hand-written, recursive descent parser. There are great advantages of this approach over formal grammar-based, generated parsers (e.g. by using ANTLR). These advantages are, among others:

They can be tuned easily for performance
They are very simple and easy to maintain
It's easy to implement corner cases of the grammar, which might require context (that's a big plus with SQL)
It's the easiest way to bind a grammar to an existing backing expression tree implementation (which is what jOOQ really is)