MapReduce workload.
GridMix is a benchmark for Hadoop clusters. It works from a MapReduce job trace describing the workload. Such traces are typically generated by Rumen. Rumen mines JobHistory logs to extract meaningful data and stores it in an easily-parsed, condensed format or digest.
See Yarn - Scheduler Load Simulator (SLS)
See Hive - Benchmark