NoSql databases are distributed database that query and store data in only one table.
NoSql means no join. There is no query that will join data between two tables. The join happens in the code, not in the database.
This approach permits easily to shard a table by a couple of key and therefore to redirect the simple query to a single node (single computer).
NoSql databases prioritize availability above consistency. In an always-on world were downtime is unacceptable, a NoSQL application chooses to sacrifice consistency instead of availability - ie AP from CAP.
Developers need then to enforce consistency in there code.
NoSql can have the following meanings:
- NoSchema (ie Schema On Read),
- NoLanguage (No Sql Language)
NoSql vs Consistent system
Everyone MUST see the same thing, either old or new, no matter how long it takes.
For large applications, we can’t afford to wait that long, and maybe it doesn’t matter anyway
The NoSQL field has actually four different types of databases:
- key value stores (Redis, Oracle NoSQL, Amazon DynamoDB, …)
- column-family stores (Hbase, BigTable, Cassandra, …)
- document databases (MongoDB, CloudFirebase…)
- scalable: “It's hard to scale out my RDBMS in a distributed environment“
- Ability to horizontally scale “simple operation” (key lookups, read/write of 1 or few records) throughput over many servers
- high-throughput reads and writes
- Flexibility: “My data doesn’t conform to a rigid schema”
- The ability to replicate and partition data over many servers (sharding)
- Efficient use of distributed indexes and RAM for data storage
- No acid means no mission critical data where consistency is key.
- No High level Query language - A simple API – Limited SQL (without join)
I guess what I'm saying is that my decision to use NoSQL, and I'm guessing others' decisions to do so, has less to do with the fact that we can't squeeze a few thousand writes a second out of MySQL and more to do with management and cost overhead. NoSQL solutions allow us to serve absurd amounts of data for a really, really low price.
Major Impact Systems (Rick Cattel)
- memcached demonstrated that in memory indexes can be highly scalable, distributing and replicating objects over multiple nodes.
- Dynamo pioneered the idea of [using] eventual consistency as a way to achieve higher availability and scalability: data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually.
- BigTable demonstrated that persistent record storage could be scaled to thousands of nodes, a feat that most of the other systems aspire to.
- Aws - DynamoDB (default settings),
- Oracle NoSQL Database (column/key-store DB based on BerkeleyDB)
- Document = nested values, extensible records (think XML or JSON)
- Extensible record = families of attributes have a schema, but new attributes may be added
- Key - Value object = a set of key - value pairs. No schema, no exposed nesting