SQL - Select

About

SELECT is a keyword of the SQL Language (specifically dml) that permits to read data from the database.

A SELECT Statement is called a query.

The select statement has three capabilities:

  • projection (choose a subset of the columns in the table)
  • selection
  • and joining

Multi-threaded Dump

A select statement through a client (such as JDBC, ODBC) works only in one thread (one connection). Therefore if you want to dump a table with several threads, you needs a criterion by which you can split the select * for table into multiple ones.

Example

The Sqoop utility for instance, uses a splitting column to split the workload.

By default, it identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range.

For example, with the following parameters

  • primary key column minimum value was 0
  • primary key column maximum value was 1000
  • 4 threads

You would get the following threads:

Threads Number Select
1 SELECT * FROM sometable WHERE id >= 0 AND id < 250
2 SELECT * FROM sometable WHERE id >= 250 AND id < 500
3 SELECT * FROM sometable WHERE id >= 500 AND id < 750
4 SELECT * FROM sometable WHERE id >= 750 AND id < 1001

The primary key values must be uniformly distributed across its range, otherwise this can result in unbalanced threads.

Documentation / Reference


Powered by ComboStrap