Parquet Tool

About

Command line tool to manage parquet file

Management

Build

apache/parquet-mr/tree/master/parquet-tools

  • maven 3,git, jdk-7/8
git clone https://github.com/Parquet/parquet-mr.git 
cd parquet-mr/parquet-tools/ 
mvn clean package -Plocal 

Management

With parquet-tool

To run it on hadoop, you should use hadoop jar instead of java -jar

Get schema

java -jar parquet-tools-X.X.X.jar schema sample.parquet 

Read

  • The whole file
java -jar parquet-tools-X.X.X.jar cat sample.parquet 
  • A few line
java -jar parquet-tools-X.X.X.jar head -n5 sample.parquet 

Get metadata

java -jar parquet-tools-X.X.X.jar meta sample.parquet

Documentation / Reference


Powered by ComboStrap