Parallel Programming - Architecture (Shared nothing, Shared disk, Shared Memory)

Data System Architecture

About

Traditionally, two approaches have been used for the implementation of parallel execution in database systems.

The main differentiation is whether or not the physical data layout is used as a base – and static pre-requisite – for dividing, thus parallelizing, the work.

These fundamental approaches are known as:

  • shared everything architecture

Shared Everything

Shared Nothing

Taxonomy Of Parallel Architecture

Shared Nothing

Shared Nothing the machine are only connected by the network. The “shared nothing” architecture shares network bandwidth because it must transfer data from machines doing the Map tasks to machine doing the Reduce tasks over the network.

In a shared nothing system CPU cores are solely responsible for individual data sets and the only way to access a specific piece of data you have to use the CPU core that owns this subset of data, such systems are are also commonly known as MPP (Massively Parallel Processing) systems. In order to achieve a good workload distribution MPP systems have to use a hash algorithm to distribute (partition) data evenly across available CPU cores.

As a result MPP systems introduce mandatory, fixed parallelism in their systems in order to perform operations that involve table scans, the fixed parallelism completely relies on a fixed static data partitioning at database or object creation time. Most non-Oracle data warehouse systems are MPP systems.

A “shared nothing” architecture means that each computer system has its own private memory and private disk.

This architecture is followed by essentially all high performance, scalable, DBMSs, including Teradata, Netezza, Greenplum, Paraccel, DB2 and Vertica. It is also used by most of the high-end e-commerce platforms, inclusing Amazon, Akamai, Yahoo, Google, and Facebook.

Shared everything

Shared everything is also known as shared memory or shared everything.

Oracle RAC does not run on a shared nothing system. It was built many years ago to run on a “shared disk” architecture.

In this world, a computer system has private memory but shares a disk system with other computer systems. Such a “disk cluster” was popularized in the 1990’s by Sun and HP, among others In the 2000’s this architecture has been replaced by “grid computing”, which uses shared nothing.

Shared disk has well-known scalability problems, when applied to DBMSs. These result from the requirement that each system have its own lock table and buffer pool, which must be synchronized with their peers. This synchronization is painful and has serious performance problems, which limit the scalability of shared disk implementations. In fact, the largest Oracle RAC configuration I am aware if is 6 systems. In contrast shared nothing DBMSs, such as Vertica, are routinely run on 50-100 nodes.

But thanks to the shared everything architecture the Database does not require any pre-defined data partitioning to enable parallelism. The database can parallelize almost every operation, independent of the underlying data distribution. If, however, the data has been pre-partitioned (using Partitioning), the database can use the same optimizations and algorithms shared nothing vendors claim.

Oracle Database uses a shared everything architecture, which from a storage perspective means that any CPU core in a configuration can access any piece of data; this is the most fundamental architectural difference between Oracle and all other major database products on the market.

Summary

??? In summary, shared nothing scales much better than shared disk. Hence, it is the superior architecture ???

Reference





Discover More
Data System Architecture
Concurrency - Parallel Execution

Where a sequential execution can be imagined as a single worker on an assembly line moving between tasks, a parallel process is more like a series of workers, each doing a specific task. Parallelism...
Data System Architecture
Data Warehouse

A data warehouse is a large central data repository of current, history and summarised data coming from operational and external sources used primarily for analysis. s is large historical databases for...
Oracle Database Hash Redistribution
Oracle Database - Data Redistribution (Parallel)

Data redistribution is not unique to the Oracle Database. In fact, this is one of the most fundamental principles of parallel processing, being used by every product that provides parallel capabilities.Parallel...
Oracle Database Block Based Granule
Oracle Database - Granule (Parallel Data Access)

A granule is the smallest unit of work when accessing data. Because Oracle database uses a shared everything architecture (unlike the shared nothing architecture), Oracle can – and will - choose this...
Card Puncher Data Processing
Oracle Database - Parallel execution with Oracle Partitioning (parallel partition-wise join)

There are specific optimizations between SQL parallel execution and Oracle Partitioning that you should bear in mind when you plan to use these functionalities together. For example, two large partitioned...
Card Puncher Data Processing
Oracle Partition - Partition-Wise Join (PWJ)

The most fundamental parallel execution optimization is a partition-wise join. If two rather large tables are often joined together in SQL statements, consider the potential benefits of partition-wise...
Card Puncher Data Processing
Software Design - Scalability (Scale Up|Out)

Increasing or decreasing the capacity of a system by making effective use of resources is known as scalability. A scalable system can handle increasing numbers of requests without adversely affecting response...



Share this page:
Follow us:
Task Runner