Spark Streaming and Kafka

If you are using Spark Streaming with Kafka, Micheal G. Noll wrote a very comprehensive guide on Integrating Kafka and Spark Streaming where he provides a lot of detail on parallelism, perfomance tuning and scalability. You can also find a complete code example on github.…

Read more

Tuning Spark Streaming

Jeroen van Wilgenburg has an excellent blog post on Understanding Spark parameters – A step by step guide to tune your Spark job which provides best practices around dealing with an optimal Spark Streaming setup (receiver, batch size).…

Read more

Spark Packages

The folks from Databricks have launched Spark Packages, a community site hosting modules that are not (directly) part of the Apache Spark project. At time of writing the site contains 16 packages including stuff like launch scripts (GCE, Azure, etc.), integrations (for Kafka, Avro, etc.), utils (testing, RDDs, etc.) and…

Read more

Spark 1.2 released

Today, Apache Spark 1.2 has been released with the following highlights: Improved Spark Core Spark Streaming now more or less fully available via Python as well MLLib has been improved GraphX is now stable Big congrats and thanks to the team!…

Read more

The Essential Apache Spark Cheat Sheet

DZone provides now an Apache Spark Cheat Sheet: This Refcard introduces Spark, explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.…

Read more

Spark Tutorial University of Maryland

This is a two-and-a-half day tutorial on the distributed programming framework Apache Spark. The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises. http://lintool.github.io/SparkTutorial/…

Read more

Databricks Spark Reference Applications

Reference Applications demonstrating Apache Spark - brought to you by Databricks: http://databricks.gitbooks.io/databricks-spark-reference-applications/…

Read more

Spark Panel Discussion with Cloudera, MapR & Pivotal

The Los Angeles Spark Users Group recently hosted a panel discussion on Spark, featuring respresentatives of three Big Data vendors: http://inside-bigdata.com/2014/10/08/spark-panel-discussion-cloudera-mapr-pivotal/…

Read more

Spark Summit East 03/2015

The Spark Summit is comming to the East coast: from March 18-19 2015 Spark pracitioners will meet in New York. Keep an eye on that!…

Read more
Proudly published with Ghost