Avro is a data serialization system.
- Rich data structures
- Schema stored in the file
- A compact, fast, binary data format (compressible file formats)
- A container file, to store persistent data.
- Simple integration with dynamic languages.
Avro requires a schema during data serialization, but also during data deserialization. Because the schema is provided at decoding time, metadata such as the field names don’t have to be explicitly encoded in the data. This makes the binary encoding of Avro data very compact.
The Avro binary format is a good format for loading both compressed and uncompressed data. Avro data is fast to load because the data can be read in parallel, even when the data blocks are compressed.
Avro supports code generation using the schema to automatically generate classes that can read and write Avro data. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
Schema can be extracted.
- spotify/dbeam SQL to Avro
- https://github.com/ept/avrodoc - from Avro's JSON schema format (*.avsc files) to web page.