Monday, August 31, 2015

Introduction to Apache Spark Part 1

The reason people are so interested in Apache Spark is it puts the power of Hadoop in the hands of developers. It is easier to setup an Apache Spark cluster than an Hadoop Cluster. It runs faster. And it is a lot easier to program. It puts the promise and power of Big Data and real time analysis in the hands of the masses. With that in mind, let's introduce Apache Spark in this quick tutorial.

Google search interests for Apache Spark has sky rocketed recently, indicating a wide range of interest. (108,000 searches in July according to Google Ad Word Tools about ten times more than Microservices).

Apache Spark, an open source cluster computing system, is growing fast. Apache Spark has a growing ecosystem of libraries and framework to enables advanced data analytics. Apache Spark's rapid success is due to its power and and ease-of-use. It is more productive and has faster runtime than the typical MapReduce based analysisApache Spark provides in-memory, distributed computing. It has APIs in Java, Scala, Python, and R. The Spark Ecosystem is show below.

See the full Apache Spark Tutorial.

Kafka and Cassandra support, training for AWS EC2 Cassandra 3.0 Training