About
The CSV format is a physical representation of a relation (table).
Tabular formats are more space-efficient than JSON, which can improve loading times for large datasets.
Articles Related
Syntax
While there are various specifications and implementations for the CSV format, there is no formal specification in existence, which allows for a wide variety of interpretations of CSV files.
Summary:
- Each record is located on a separate line, delimited by a line break
- The last record in the file may or may not have an ending line break.
- There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.
- Within the header and each record, there may be one or more fields, separated by a delimiter character (commas).
- Each field may or may not be enclosed in double quotes (also known as Text Qualifier)
- Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
- If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote.
Text Qualifier
The text qualifier or better known the double quote character is used to enclose text values in the exported data.
This is required when some cell has values that include:
- a delimiter (such as a comma) from some European address formats for instance
- or the double quote character itself
Delimiter
Within the header and each record, there may be one or more fields, separated by a delimiter character (generally commas).
Escape character
The escape character for the text qualifier character is by default the double quote: “
Example:
You are "top"
in csv becomes:
"You are ""top"""
New Line
The newline in CSV follows this rules:
- Each record is located on a separate line, delimited by a line break
- The last record in the file may or may not have an ending line break.
- A cell that has a line break in its content should be quoted
Example:
- Newline as record separator
"You are ""top""" CRLF
- A New line in the value of a cell needs to be quoted
"You are CRLF
top" CRLF
Extended
Extended CSV format add metadata to the data (such as data type,…)
By order of preference:
- http://csvy.org/ - Csv + Yaml
Parsing
Parser algorithm
Library
Tool;
- csvkit. A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Row to column storage
- A simple way to turn a CSV file into a column-oriented (columnar) format is to save each column to a separate file. To load the data back in, read a single line from each file (column), and 'stitch' the data back together into a row.