The CSV format is a physical representation of a relation (table).
Tabular formats are more space-efficient than JSON, which can improve loading times for large datasets.
The text qualifier or better known the double quote character is used to enclose text values in the exported data.
This is required when some cell has values that include:
- a delimiter (such as a comma) from some European address formats for instance
- or the double quote character itself
Within the header and each record, there may be one or more fields, separated by a delimiter character (generally commas).
The escape character for the text qualifier character is by default the double quote: “
You are "top"
in csv becomes:
"You are ""top"""
The newline in CSV follows this rules:
- Each record is located on a separate line, delimited by a line break
- The last record in the file may or may not have an ending line break.
- A cell that has a line break in its content should be quoted
- Newline as record separator
"You are ""top""" CRLF
- A New line in the value of a cell needs to be quoted
"You are CRLF
Extended CSV format add metadata to the data (such as data type,…)
By order of preference:
- http://csvy.org/ - Csv + Yaml
- csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Row to column storage
- A simple way to turn a CSV file into a column-oriented (columnar) format is to save each column to a separate file. To load the data back in, read a single line from each file (column), and 'stitch' the data back together into a row.