Data Serialization - AVRO

Data System Architecture


Avro is a data serialization system.

Avro provides:

  • Rich data structures
  • Schema stored in the file
  • A compact, fast, binary data format (compressible file formats)
  • A container file, to store persistent data.
  • Simple integration with dynamic languages.

Avro requires a schema during data serialization, but also during data deserialization. Because the schema is provided at decoding time, metadata such as the field names don’t have to be explicitly encoded in the data. This makes the binary encoding of Avro data very compact.

The Avro binary format is a good format for loading both compressed and uncompressed data. Avro data is fast to load because the data can be read in parallel, even when the data blocks are compressed.


Avro supports code generation using the schema to automatically generate classes that can read and write Avro data. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.

Schema can be extracted.



Documentation / Reference

Discover More
Azure Cluster
Azure - Cluster (HdInsight Cluster)

Cluster of computer. !!! duplicate of !!! Template reference Each cluster has: an Azure Storage account ...
Card Puncher Data Processing
Hive - Avro

in Hive Avro-backed tables: starting in Hive 0.14, could be defined a storage format (ie STORED AS AVRO) before Hive 0.14, should be created as a serde ...
Kafka Commit Log Messaging Process
Kafka - Avro Converter

in Kafka. The Avro converter is normally used with the Schema Registry in order to properly lookup the schema for the Avro data. ...
Converter Basics
Kafka Connect - Converter (Read/Write - I/O)

A converter is a connect concept. It's the code used to persist data from a Connector. Converters are decoupled from connectors themselves to allow for reuse. For example, using the same Avro...
Data System Architecture
Language - Serialization/Deserialization (Data Storage Structure) - Encoding/Decoding - Codec

Serialization is the process of converting a variable (object,...) state into a format/structure that can be: stored or send through a transfer protocol (encoding) Example: Storage: Xml Text...
Undraw File Manager Re Ms29
What are the Read-optimized File Formats (write once, read many)?

This page lists the file formats that follows the principle write once, read many. These formats are therefore read-optimized encoding formats. Parquet, ORCFile, AVRO

Share this page:
Follow us:
Task Runner