What is a Nosql Database?

Data System Architecture

About

NoSql databases are distributed database that query and store data in only one table.

A NoSQL database may have the following meanings:

Main Characteristics

No Join for sharding

NoSql means no join. There is no query that will join data between two tables.

If a join happens, it's in the code application, not in the database.

This approach easily permits sharding a table by a couple of key and therefore redirecting a simple query to a single node (single computer).

Availability above consistency

In the database ACID properties, NoSql databases prioritize availability (A) above consistency (C).

In an always-on world where downtime is unacceptable, a NoSQL application chooses to sacrifice consistency instead of availability.

They choose AP from CAP.

Developers need then to enforce consistency in their code.

NoSql vs Consistent system

Everyone MUST see the same thing, either old or new, no matter how long it takes.

  • NoSQL:

For large applications, we can’t afford to wait that long, and maybe it doesn’t matter anyway

Database Type

The NoSQL field has actually four different types of databases:

Features

Pro

  • scalable: “It's hard to scale out my RDBMS in a distributed environment“
  • Ability to horizontally scale “simple operation” (key lookups, read/write of 1 or few records) throughput over many servers
  • high-throughput reads and writes
  • Flexibility: “My data doesn’t conform to a rigid schema”
  • The ability to replicate and partition data over many servers (sharding)
  • Efficient use of distributed indexes and RAM for data storage

Cons

  • No acid means no mission critical data where consistency is key.
  • No High level Query language - A simple API – Limited SQL (without join)
  • A weaker concurrency model BASE (Basically Available, Soft state, Eventually consistent) than ACID

I guess what I'm saying is that my decision to use NoSQL, and I'm guessing others' decisions to do so, has less to do with the fact that we can't squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.

Major Impact Systems (Rick Cattel)

  • memcached demonstrated that in memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.
  • Dynamo pioneered the idea of [using] eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually.
  • BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to.

List

Terminology

  • Document = nested values, extensible records (think XML or JSON)
  • Extensible record = families of attributes have a schema, but new attributes may be added
  • Key - Value object = a set of key - value pairs. No schema, no exposed nesting

1)

Task Runner