Big Data/Cassandra
Search for Apache Cassandra on Wikipedia. |
Apache Cassandra is a NoSQL wide column-oriented database management system, distributed and scalable. In 2015, it has become one of the world's most popular SGBD[1].
Installation
editThe Java sources are available on https://github.com/apache/cassandra, but a tarball is on http://cassandra.apache.org/download/.
- MacOS:
brew install cassandra && brew services start cassandra
See also http://cassandra.apache.org/doc/latest/getting_started/installing.html for more information.
To launch the server:
- On Linux:
/cassandra/bin/cassandra
- On Windows: \cassandra\bin\cassandra.bat
Graphical user interface
editThere are several GUI to manage Cassandra. For example Helenos: its Java sources are available on https://github.com/tomekkup/helenos, and a compiled version on http://sourceforge.net/projects/helenos-gui/.
It includes an Apache + Tomcat server, launchable by \helenos\bin\startup.bat. Then, the web interface must be visible on http://localhost:8080 (login: admin / password: admin).
NB: it can create some column families, but not see the ones which were created in CQL.
Data manipulation
editIn 2011 Cassandra introduced the Cassandra Query Language (CQL)[2][3], you can interact with CQL using the cqlsh
client. Using cqlsh
you can create w:keyspaces and tables, insert and query tables among other operations.
The CQL 3.0 syntax looks like this[4]:
CREATE KEYSPACE MyBase1 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
USE MyBase1;
CREATE TABLE MyTable1 (
id text,
FirstName text,
LastName text,
PRIMARY KEY(id));
INSERT INTO MyTable1 (id, LastName) VALUES ('1', 'Test');
SELECT * FROM MyTable1;
DROP TABLE MyTable1;
Additional Notes:
- There isn't any autoincrement option.
- No case-sensitive field names.
- Inserting a new record with an existing primary key will replace the old one, without any warning.
- When inserting more than 1,000 records, cqlsh may ignore the rest. It's recommended to use the ETL sstableloader.
Cassandra port usage
editHow to use several nodes
editTo communicate from one server to another Cassandra needs to open the ports[9]: 7000, 7001, 7199 (SSL), 9042 and 9160.
There isn't any master node, so the fail-over is automatic. Each node must own a "seed node" in its configuration, to get the distributed architecture. Their description is stored into \cassandra\conf\cassandra-rackdc.properties.
To let the nodes communicate, into cassandra.yaml, the parameter endpoint_snitch must be RackInferringSnitch (instead of SimpleSnitch by default).
Then, the nodes list is visible with:
- On Linux: \cassandra\bin\nodetool status
- On Windows: \cassandra\bin\nodetool.bat status
NB: when a keyspace is cerated with a replication_factor superior to one, the nodes become redundant (mirroring).
Related Technologies
editReferences
edit- ↑ http://db-engines.com/en/ranking
- ↑ https://grokbase.com/t/cassandra/user/1162fkpwx2/release-0-8-0
- ↑ https://docs.datastax.com/en/cql/3.3/cql/cqlIntro.html
- ↑ https://cassandra.apache.org/doc/cql3/CQL.html
- ↑ http://cassandra.apache.org/doc/latest/faq/index.html#what-ports
- ↑ http://cassandra.apache.org/doc/latest/faq/index.html#what-ports
- ↑ https://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used
- ↑ https://stackoverflow.com/questions/2359159/cassandra-port-usage-how-are-the-ports-used
- ↑ http://docs.datastax.com/en/cassandra/2.0/cassandra/initialize/initializeSingleDS.html
- ↑ https://en.wikipedia.org/wiki/Amazon_DynamoDB
- ↑ https://en.wikipedia.org/wiki/Redis
- Apache Cassandra - home page
- A. Lakshman and P. Malik "Cassandra: a decentralized structured storage system" ACM SIGOPS Operating Systems Review, Volume 44 Issue 2, April 2010, Pages 35-40, ACM New York, NY, USA