db:spark:cluster:cache

Table of Contents

About

Data - Cache in Spark.

Each executor has a cache.

Articles Related

from the Spark - Web UI (Driver UI)

Example

lines is recomputed

lines = sc.textFile("...", 4)
comments = lines.filter(isComment) 
print lines.count()
print comments.count() # lines is recomputed

lines is NOT recomputed but get from the cache

lines = sc.textFile("...", 4)
lines.cache() # save, lines is NOT recomputed when comments.count() is called
comments = lines.filter(isComment) 
print lines.count()
print comments.count()

Documentation / Reference

http://spark.rstudio.com/guides/caching/