Table of Contents

About

The parquet cli is a Command line tool to manage parquet file

The Parquet Cli was known as the parquet tool.

Management

Build

apache/parquet-mr/blob/master/parquet-cli

git clone https://github.com/Parquet/parquet-mr.git 
cd parquet-mr/parquet-cli/ 
mvn clean install -DskipTests
# alias
alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"

Management

Get schema

parquet schema sample.parquet 

Read

  • The whole file
parquet cat sample.parquet 
  • A few line
parquet head -n5 sample.parquet 

Get metadata

parquet meta sample.parquet