What I know about Apache Cassandra

24 Mar 2016 Tags: cassandra nosql · · Updated March 24, 2016

Intro

Introduction to Apache Cassandra

Cassandra has a peer-to-peer distributed architecture that is much more elegant, and easy to set up and maintain. In Cassandra, all nodes are the same; there is no concept of a master node, with all nodes communicating with each other via a gossip protocol.

There is nothing programmatic that a developer or administrator needs to do or code to distribute data across a cluster. Data is transparently partitioned across all nodes in either a randomized or ordered fashion, with random being the default.

When creating a new Cassandra database (also called a keyspace), a user simply indicates via a single command which data centers and/or cloud providers will hold copies of the new database; everything from that point forward is automatically handled and maintained by Cassandra.

If one or more nodes responsible for a particular set of data are down, data is simply written to another node, which temporarily holds the data. Once the node(s) come back online, they automatically bring themselves back up to date from nodes that are holding the data they maintain.

A user requests data from any node (which becomes that user’s coordinator node ), with the user’s query being assembled from one or more nodes holding the necessary data. If a particular node having the required data is down, Cassandra simply requests data from another node holding a replicated copy of that data.

While Cassandra is not a transactional database in the same way that legacy RDBMSs offer ACID transactions, it does offer the “AID” portion of ACID, in that data written is atomic, isolated, and durable. The “C” of ACID does not apply to Cassandra, as there is no concept of referential integrity or foreign keys

Because NoSQL databases like Cassandra do not support operations like SQL joins, data tends to be highly denormalized. While such a thing (wide rows) is normally a problem for an RDBMS, Cassandra provides exceptional performance for objects with many thousands of columns.

Cassandra vs MongoDB
System Properties Comparison Cassandra vs. MongoDB
Face Off: MongoDB Vs HBase Vs Cassandra
A Real Comparison Of NoSQL Databases HBase, Cassandra & MongoDB
MongoDB. This is not the database you are looking for.
DataStax Cassandra Tutorials - Apache Cassandra Overview
A Brief Introduction to Apache Cassandra
Real data models of silicon valley
Migrate a Relational Database Structure into a NoSQL Cassandra Structure (Part I)
Migrate a Relational Database into Cassandra (Part II – Northwind Planning)

In a relational database setting I can often simply normalize away and worry about which table I need to focus my indexing efforts on later when I’m working in the application. However, in NoSQL, non-relational database design, we often need to decide up front which entity most queries will be interested in and build everything else around that entity.

So…will it be “order” or “product”? Today I’ll decide that the key entity in this database is “order” – customers will be hitting this on a daily, per transaction basis whereas I can probably run my product reports offline.
SoundCloud’s Activity Feed and Real-Time Stats Powered by Apache Cassandra
Spotify scales to the top of the charts with Apache Cassandra at 40k requests/second
C* Summit EU 2013: Playlists at Spotify - Using Cassandra to Store Version Controlled Objects - Slide
Big Data Platforms: How To Migrate From Relational Databases to NoSQL
Apache Cassandra - O’Reilly - Video Training
Migration Best Practices: From RDBMS to Cassandra without a Hitch

Data Model

Cassandra FAQ: Can I Start With a Single Node?
Data Density! Destroyer of Scalability
Cassandra From a Relational World

When you are designing the schema for your relational database, the primary thought on your mind is “What’s the best way to store this data?”. But with Cassandra, your dominant concern should be “How am I going to query this data?”
DataModel - Introduction
Cassandra - Data Model
Modeling data with Cassandra: what CQL hides away from you
Basic Rules of Cassandra Data Modeling

Developers coming from a relational background usually carry over rules about relational modeling and try to apply them to Cassandra. To avoid wasting time on rules that don’t really matter with Cassandra

Writes in Cassandra aren’t free, but they’re awfully cheap. Cassandra is optimized for high write throughput, and almost all writes are equally efficient [1]. If you can perform extra writes to improve the efficiency of your read queries, it’s almost always a good tradeoff.

Denormalization and duplication of data is a fact of life with Cassandra. Don’t be afraid of it. Disk space is generally the cheapest resource (compared to CPU, memory, disk IOPs, or network), and Cassandra is architected around that fact.

Cassandra doesn’t have JOINs, and you don’t really want to use those in a distributed fashion.

Rows are spread around the cluster based on a hash of the partition key, which is the first element of the PRIMARY KEY. So, the key to spreading data evenly is this: pick a good primary key.

Partitions are groups of rows that share the same partition key. When you issue a read query, you want to read rows from as few partitions as possible.

The way to minimize partition reads is to model your data to fit your queries. Don’t model around relations. Don’t model around objects.

If you need different types of answers, you usually need different tables. This is how you optimize for reads. Remember, data duplication is okay. Many of your tables may repeat the same data.
DS220: Data Modeling
Data Modeling Concepts
Data modeling example
Using collections - Set, List, Map Types
Common Cassandra Data Modelling Traps
Data Modelling Recommended Practices
The DevOps of Cassandra Data Modeling

In Cassandra, it is best to start the data modeling process by defining your query patterns first.
An Advanced Cassandra Data Modeling Guide

Denormalize ALL THE THINGS: Increase the number of writes to reduce and simplify reads Most importantly we did this without querying multiple tables and merging and reconciling the results by using a data model that duplicates data and writes our host information in a de-normalized manner contrary to the original highly relational MySQL data model.
Cassandra Data Modeling Best Practices by eBay , Slide
Cassandra Data Modeling Best Practices by eBay, Part 1
Cassandra Data Modeling Best Practices by eBay, Part 2
Advanced data modeling with apache cassandra
Data Modeling Step:
1. Conceptual Data Model (ER Diagram?)
2. Application Query Workflow
3. Logical Data Model (Combine 1 & 2)
4. Physical Data Model (3 with Data Type)
Cassandra By Example: Data Modelling with CQL3 - Twissandra - Twitter + Cassandra
Cassandra Data Modeling - Practical Considerations @ Netflix
The most important thing to know in Cassandra data modeling: The primary key
The importance of single-partition operations in Cassandra
Transitioning from MySQL to Cassandra at Chaordic
Static columns in Cassandra and their benefits
Understanding Deletes
Learn Cassandra - GitBook
Data Modeling for Scale with Cassandra
Cassandra: Batch loading without the Batch keyword
Playlist tutorial
Eventually Consistent - Revisited

the CAP theorem, which states that of three properties of shared-data systems—data consistency, system availability, and tolerance to network partition—only two can be achieved at any given time.

An important observation is that in larger distributed-scale systems, network partitions are a given; therefore, consistency and availability cannot be achieved at the same time. This means that there are two choices on what to drop: relaxing consistency will allow the system to remain highly available under the partitionable conditions, whereas making consistency a priority means that under certain conditions the system will not be available.
Cassandra Query Patterns: Not using the “in” query for multiple partitions.

[Hey Relational Developer, Let’s Go Crazy (Patrick McFadin, DataStax)

Cassandra Summit 2016](https://www.youtube.com/watch?v=KFCmxrmnkt8) - Silde

botbotbot 's blog

Search results

What I know about Apache Cassandra

Intro

Data Model

Driver

botbotbot 's blog

Search results

What I know about Apache Cassandra

Intro

Data Model

Driver

Latest Posts

JWT 30 Nov 2016

Vim Grep then Substitute 29 Nov 2016

Code Review 23 Nov 2016