About
A periodic process will process the raw data event to load them into a SQL database.
When data is loaded:
- the data is inserted and updated
- the schema is adjusted
Data are broken down into tables and views where:
- Tables contain duplicate data,
- Views do not.
Example:
- A table = type of event
- each tracked event has its own table (pageview, click button,…)
- each property maps to a column
- Partition = one day of data
Platform
Segment
Google Analytics
Google Analytics works with a one table format exported each day where:
- each row within a table corresponds to a session
- each hit data is saved in a record format (array)
See the full definition at BigQuery Export schema (Column Definition)
Extract of the Example that can be found in the guide
- Average number of transactions per user that made a purchase in July 2017
SELECT
(SUM (total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM (
SELECT
fullVisitorId,
SUM (totals.transactions) AS total_transactions_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701'
AND '20170731'
AND totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )
- Total transactions per device browser in July 2017
SELECT
device.browser,
SUM ( totals.transactions ) AS total_transactions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
GROUP BY
device.browser
ORDER BY
total_transactions DESC