About
The parquet cli is a Command line tool to manage parquet file
The Parquet Cli was known as the parquet tool.
Management
Build
apache/parquet-mr/blob/master/parquet-cli
git clone https://github.com/Parquet/parquet-mr.git
cd parquet-mr/parquet-cli/
mvn clean install -DskipTests
# alias
alias parquet="hadoop jar /path/to/parquet-cli-1.12.3-runtime.jar org.apache.parquet.cli.Main --dollar-zero parquet"
Management
Get schema
parquet schema sample.parquet
Read
- The whole file
parquet cat sample.parquet
- A few line
parquet head -n5 sample.parquet
Get metadata
parquet meta sample.parquet