Parquet Tool
Table of Contents
About
Command line tool to manage parquet file
Articles Related
Management
Build
apache/parquet-mr/tree/master/parquet-tools
- maven 3,git, jdk-7/8
git clone https://github.com/Parquet/parquet-mr.git
cd parquet-mr/parquet-tools/
mvn clean package -Plocal
Management
With parquet-tool
To run it on hadoop, you should use hadoop jar instead of java -jar
Get schema
java -jar parquet-tools-X.X.X.jar schema sample.parquet
Read
- The whole file
java -jar parquet-tools-X.X.X.jar cat sample.parquet
- A few line
java -jar parquet-tools-X.X.X.jar head -n5 sample.parquet
Get metadata
java -jar parquet-tools-X.X.X.jar meta sample.parquet