Data Serialization

About

Avro provides:

Rich data structures
Schema stored in the file
A compact, fast, binary data format (compressible file formats)
A container file, to store persistent data.
Remote procedure call (RPC).
Simple integration with dynamic languages.

Avro requires a schema during data serialization, but also during data deserialization. Because the schema is provided at decoding time, metadata such as the field names don’t have to be explicitly encoded in the data. This makes the binary encoding of Avro data very compact.

The Avro binary format is a good format for loading both compressed and uncompressed data. Avro data is fast to load because the data can be read in parallel, even when the data blocks are compressed.

Articles Related

Client

Avro supports code generation using the schema to automatically generate classes that can read and write Avro data. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schema can be extracted.

Library

spotify/dbeam SQL to Avro

Doc

https://github.com/ept/avrodoc - from Avro's JSON schema format (*.avsc files) to web page.

Data Serialization - AVRO

About

Articles Related

Client

Library

Doc

Documentation / Reference