SQL - Select

Data System Architecture


SELECT is a keyword of the SQL Language (specifically dml) that permits to read data from the database.

A SELECT Statement is called a query.

The select statement has three capabilities:

  • projection (choose a subset of the columns in the table)
  • selection
  • and joining

Multi-threaded Dump

A select statement through a client (such as JDBC, ODBC) works only in one thread (one connection). Therefore if you want to dump a table with several threads, you needs a criterion by which you can split the select * for table into multiple ones.


The Sqoop utility for instance, uses a splitting column to split the workload.

By default, it identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range.

For example, with the following parameters

  • primary key column minimum value was 0
  • primary key column maximum value was 1000
  • 4 threads

You would get the following threads:

Threads Number Select
1 SELECT * FROM sometable WHERE id >= 0 AND id < 250
2 SELECT * FROM sometable WHERE id >= 250 AND id < 500
3 SELECT * FROM sometable WHERE id >= 500 AND id < 750
4 SELECT * FROM sometable WHERE id >= 750 AND id < 1001

The primary key values must be uniformly distributed across its range, otherwise this can result in unbalanced threads.

Documentation / Reference

Discover More
Card Puncher Data Processing
IO - CRUD (Create/Read/Update/Delete) - basic functions of persistent storage

In computer programming, create, read, update and delete (as an acronym CRUD) are the four basic functions of persistent storage. Operation SQL HTTP File SystemmutatoCreatINSERPUPOSwritSELECGEread...
Data System Architecture
SQL - Data Manipulation Language (DML)

Data Manipulation Language (DML) is a category of SQL statement that modify the data of a database. The principal commands are: SELECT UPDATE, INSERT, and DELETE but you can also find: ...
Data System Architecture
SQL - Query (SELECT)

A query is a SQL statement that begins with the SELECT word. A stored query is called a view. The result of a query is stored in a result table, called the result-set. A query is itself a Relations...
Data System Architecture
Shared Lock

A shared lock is a lock acquired in the shared mode over a data structure. The data structure can be any: from a file to any in-memory structure (ie variable with a complex type, generally a collection)...
Data System Architecture
Sql - Insert

insert is a data manipulation language statement. In an INSERT statement, you define the values with: the VALUES expression or a select statement Derby
Data System Architecture
Sql - LIMIT clause

The Limit clause set the number of rows returned by a select. This is a sqlite example
Data System Architecture
Sql Chaining explained

All operations on a table return a table. A select query returns a table A view returns a table A common table expression returns a table The operations on tables can therefore be chained together...
Sqlite Banner
Sqlite - Upsert

This page is the upsert statement in Sqlite The Sqlite upsert syntax can be found at Upsert. This page shows you some This example shows statement based...
Data System Architecture
Structured Query Language (SQL)

SQL is the standard language used to operate on table and its auxiliary data Structure (such as view, aggregate,...). SQL permits application designers to manipulate sets of rows with a non-procedural...

Share this page:
Follow us:
Task Runner