HTTP - Range Request


A range request is simply a way for http to request a portion of the file instead of the entire file. It gives the possibility to read a file in multiple portion and therefore in parallel.


Example with a GET method

Range: byte=0-10000


EMR uses this technique to read data from S3. For example, if a single data file on Amazon S3 is about 1 GB, Hadoop reads your file from Amazon S3 by issuing 15 different HTTP requests in parallel if Amazon S3 split size is 64 MB (1 GB/64 MB = ~15).

Documentation / Reference

Powered by ComboStrap