Consumer Analytics - Event Collector

Card Puncher Data Processing

About

A collector collects event send by a tracker

The event and data send are describe in a measurement protocol

Aggregation

Data aggregation refers to techniques for gathering individual data records (for example log records) and combining them into a large bundle of data files.

Why aggregation ? Before processing your data, Hadoop splits your data (files) into multiple chunks. After splitting the file(s), a single map task processes each part. If you are using HDFS as the underlying data storage, the HDFS framework has already separated the data files into multiple blocks. In addition, since your data is fragmented, Hadoop uses HDFS data blocks to assign a single map task to each HDFS block.

Protocol

A collector would implement this two HTTP method

Post

Post

User-Agent: user_agent_string
POST https://www.example.com/collect
payload_data

Get

GET /collect?payload_data HTTP/1.1
Host: https://www.example.com
User-Agent: user_agent_string

A Get request can be triggered:

To avoid to hit a cached HTTP GET requests, a collector should provide a cache burster (ie a special parameter that can be set with a random number in order to ensure that all requests are unique, and that subsequent requests are not retrieved from the cache).

Example from Google Analytics with the z parameter: https://www.example.com/collect?payload_data&z=123456

Error handling

If you do not get a 2xx status code, you should NOT retry the request. Instead, you should stop and correct any errors in your HTTP request.

List

Documentation / Reference

  • AWS_Amazon_EMR_Best_Practices.pdf





Discover More
Card Puncher Data Processing
Application Analytics - Measurement Protocol

The measurement protocol defines the communication protocol (ie how and what) that is used between the tracker and the collector. It has generally two parts: The transport – to where and how you...
Card Puncher Data Processing
What is a Analytics Application?

Analytics is a event-driven data application that analyses collected analytics events . The events can be collected: real-time created by a tracker application (for events such as click, page load,...



Share this page:
Follow us:
Task Runner