Consumer Analytics - Event Collector
Table of Contents
1 - About
A collector collects event send by a tracker
The event and data send are describe in a measurement protocol
2 - Articles Related
3 - Aggregation
Data aggregation refers to techniques for gathering individual data records (for example log records) and combining them into a large bundle of data files.
Why aggregation ? Before processing your data, Hadoop splits your data (files) into multiple chunks. After splitting the file(s), a single map task processes each part. If you are using HDFS as the underlying data storage, the HDFS framework has already separated the data files into multiple blocks. In addition, since your data is fragmented, Hadoop uses HDFS data blocks to assign a single map task to each HDFS block.
4 - Protocol
A collector would implement this two HTTP method
4.1 - Post
User-Agent: user_agent_string
POST https://www.example.com/collect
payload_data
4.2 - Get
GET /collect?payload_data HTTP/1.1
Host: https://www.example.com
User-Agent: user_agent_string
A Get request can be triggered:
Example from Google Analytics with the z parameter: https://www.example.com/collect?payload_data&z=123456
4.3 - Error handling
If you do not get a 2xx status code, you should NOT retry the request. Instead, you should stop and correct any errors in your HTTP request.
5 - List
6 - Documentation / Reference
- AWS_Amazon_EMR_Best_Practices.pdf