Parquet Tool

1 - About

Command line tool to manage parquet file

3 - Management

3.1 - Build

apache/parquet-mr/tree/master/parquet-tools

  • maven 3,git, jdk-7/8

git clone https://github.com/Parquet/parquet-mr.git 
cd parquet-mr/parquet-tools/ 
mvn clean package -Plocal 

4 - Management

With parquet-tool

To run it on hadoop, you should use hadoop jar instead of java -jar

4.1 - Get schema


java -jar parquet-tools-X.X.X.jar schema sample.parquet 

4.2 - Read

  • The whole file

java -jar parquet-tools-X.X.X.jar cat sample.parquet 

  • A few line

java -jar parquet-tools-X.X.X.jar head -n5 sample.parquet 

4.3 - Get metadata


java -jar parquet-tools-X.X.X.jar meta sample.parquet

5 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap