Parquet Cli (Old Parquet Tool)

Data System Architecture

About

The parquet cli is a Command line tool to manage parquet file

The Parquet Cli was known as the parquet tool.

Management

Build

apache/parquet-mr/blob/master/parquet-cli

git clone https://github.com/Parquet/parquet-mr.git 
cd parquet-mr/parquet-cli/ 
mvn clean install -DskipTests
# alias
alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"

Management

Get schema

parquet schema sample.parquet 

Read

  • The whole file
parquet cat sample.parquet 
  • A few line
parquet head -n5 sample.parquet 

Get metadata

parquet meta sample.parquet





Discover More
Parquet Nested Representation
Table - Parquet Format (On Disk)

Parquet is a read-optimized encoding format (write once, read many) for columnar tabular data Parquet is built from the ground up with complex nested data structures and implements...



Share this page:
Follow us:
Task Runner