Online Analytical Processing (OLAP) refers to:
- a class of application
- an approach
to quickly answer multi-dimensional analytical queries.
The term OLAP was created as a slight modification of the traditional database term OLTP (Online Transaction Processing).
In the core of any OLAP system is a concept of an cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions.
It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time for the same query on OLTP relational data. The most important mechanism in OLAP which allows it to achieve such performance is the use of aggregations (aggregate table). Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions. The number of possible aggregations is determined by every possible combination of dimension granularities.
Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is Algorithm - NP Complete (Nondeterministic Polynomial Time Complete). Many approaches to the problem have been explored, including greedy algorithms, randomized search, genetic algorithms and A* search algorithm.
A very effective way to support aggregation and other common OLAP operations is the use of bitmap indexes.
OLAP systems have been traditionally categorized using the following taxonomy.
MOLAP of Multidimensionnal (OLAP|Database)
MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP uses database structures that are generally optimal for attributes such as time period, location, product or account code. The way that each dimension will be aggregated is defined in advance by one or more hierarchies.
ROLAP of Relational OLAP
ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design. A typical implementation : the cube is mapped to a set of base tables in a relational DBMS modeled as star or snowflake schema.
HOLAP of Hybrid OLAP
There is no clear agreement across the industry as to what constitutes “Hybrid OLAP”, except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.
|MOLAP||generally delivers better performance||Some MOLAP implementations are prone to database explosion|
|ROLAP||generally more scalable||query performance can suffer|
|HOLAP||mix the best of ROLAP and MOLAP (pre-process quickly, scale well and offer good function support)|
MOLAP Database Explosion
Database explosion is a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. The typical mitigation technique for database explosion is not to materialize all the possible aggregation, but only the optimal subset of aggregations based on the desired performance vs. storage trade off.
Molap Beter Performance
MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes compression techniques.
Rolap more scalable
ROLAP is generally more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer. Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use.
Requirements for the MOLAP component
Requirements for the MOLAP component of a data warehouse or data mart strategy include:
- The ability to scale to large volumes of data and large numbers of concurrent users
- Consistent, fast query response times that allow for iterative speed-of-thought analysis
- Integrated metadata that seamlessly links the OLAP server and the data warehouse relational database
- The ability to automatically drill from summary and calculated data, which is managed by the OLAP server, to detail data stored in the data warehouse relational database
- A calculation engine that includes robust mathematical functions for computing derived data (aggregations, matrix calculations, cross-dimensional calculations, OLAP-aware formulas and procedural calculations)
- Seamless integration of historical, projected and derived data
- A multi-user read/write environment to support users what-if analysis, modeling and planning requirements
- The ability to be deployed quickly, adopted easily and maintained cost-effectively
- Robust data-access security and user management
- Availability of a wide variety of viewing and analysis tools to support different user communities
Neither personal-productivity tools nor query and reporting tools are designed to meet these requirements by themselves. ROLAP (relational OLAP) tools meet only one or two of these requirements.