Tool
Gridmix
MapReduce workload.
GridMix is a benchmark for Hadoop clusters. It works from a MapReduce job trace describing the workload. Such traces are typically generated by Rumen. Rumen mines JobHistory logs to extract meaningful data and stores it in an easily-parsed, condensed format or digest.
Yarn Scheduler Load Simulator (SLS)
Distributed System Testing
- open source software library for systems testing: https://jepsen.io/
Hive
See Hive - Benchmark