About
NoSql databases are distributed database that query and store data in only one table.
A NoSQL database may have the following meanings:
- NoJoin
- NoSchema (ie Schema On Read),
- NoTransactions (ie Transactions),
- NoLanguage (No Sql Language by default)
Main Characteristics
No Join for sharding
NoSql means no join. There is no query that will join data between two tables.
If a join happens, it's in the code application, not in the database.
This approach easily permits sharding a table by a couple of key and therefore redirecting a simple query to a single node (single computer).
Availability above consistency
In the database ACID properties, NoSql databases prioritize availability (A) above consistency (C).
In an always-on world where downtime is unacceptable, a NoSQL application chooses to sacrifice consistency instead of availability.
Developers need then to enforce consistency in their code.
NoSql vs Consistent system
Everyone MUST see the same thing, either old or new, no matter how long it takes.
- NoSQL:
For large applications, we can’t afford to wait that long, and maybe it doesn’t matter anyway
Database Type
The NoSQL field has actually four different types of databases:
- key value stores (Redis, Oracle NoSQL, Amazon DynamoDB, …)
- column-family stores (Hbase, BigTable, Cassandra, …)
- document databases (MongoDB, CloudFirebase…)
Features
Pro
- scalable: “It's hard to scale out my RDBMS in a distributed environment“
- Ability to horizontally scale “simple operation” (key lookups, read/write of 1 or few records) throughput over many servers
- high-throughput reads and writes
- Flexibility: “My data doesn’t conform to a rigid schema”
- The ability to replicate and partition data over many servers (sharding)
- Efficient use of distributed indexes and RAM for data storage
Cons
- No acid means no mission critical data where consistency is key.
- No High level Query language - A simple API – Limited SQL (without join)
I guess what I'm saying is that my decision to use NoSQL, and I'm guessing others' decisions to do so, has less to do with the fact that we can't squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.
Major Impact Systems (Rick Cattel)
- memcached demonstrated that in memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.
- Dynamo pioneered the idea of [using] eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually.
- BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to.
List
- Aws - DynamoDB (default settings),
- Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)
Terminology
- Document = nested values, extensible records (think XML or JSON)
- Extensible record = families of attributes have a schema, but new attributes may be added
- Key - Value object = a set of key - value pairs. No schema, no exposed nesting