About
SELECT is a keyword of the SQL Language (specifically dml) that permits to read data from the database.
A SELECT Statement is called a query.
The select statement has three capabilities:
- projection (choose a subset of the columns in the table)
- selection
- and joining
Articles Related
Multi-threaded Dump
A select statement through a client (such as JDBC, ODBC) works only in one thread (one connection). Therefore if you want to dump a table with several threads, you needs a criterion by which you can split the select * for table into multiple ones.
Example
The Sqoop utility for instance, uses a splitting column to split the workload.
By default, it identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range.
For example, with the following parameters
- primary key column minimum value was 0
- primary key column maximum value was 1000
- 4 threads
You would get the following threads:
Threads Number | Select |
---|---|
1 | SELECT * FROM sometable WHERE id >= 0 AND id < 250 |
2 | SELECT * FROM sometable WHERE id >= 250 AND id < 500 |
3 | SELECT * FROM sometable WHERE id >= 500 AND id < 750 |
4 | SELECT * FROM sometable WHERE id >= 750 AND id < 1001 |
The primary key values must be uniformly distributed across its range, otherwise this can result in unbalanced threads.