New community site: SparkHub

Databricks, the commercial shepherd of Spark, has launched a new community site called SparkHub. It covers news and events and provides you with an up-to-date list of videos and other resources to learn more about Spark. This new site, SparkHub, is a valuable addition to sparkbigdata.com (maintained by Slim…

Read more

Monitoring Spark with Graphite and Grafana

Folks at Hammer Lab have shared the results of their efforts on Monitoring Spark with Graphite and Grafana. The post explains how to use Spark's MetricSystem to direct relevant stats from Executors, Drivers and the JVM to various Sinks - in this case, allowing them to create Graphite/Grafana dashboards.…

Read more

Spark Streaming and Kafka

If you are using Spark Streaming with Kafka, Micheal G. Noll wrote a very comprehensive guide on Integrating Kafka and Spark Streaming where he provides a lot of detail on parallelism, perfomance tuning and scalability. You can also find a complete code example on github.…

Read more

Tuning Spark Streaming

Jeroen van Wilgenburg has an excellent blog post on Understanding Spark parameters – A step by step guide to tune your Spark job which provides best practices around dealing with an optimal Spark Streaming setup (receiver, batch size).…

Read more

Spark Packages

The folks from Databricks have launched Spark Packages, a community site hosting modules that are not (directly) part of the Apache Spark project. At time of writing the site contains 16 packages including stuff like launch scripts (GCE, Azure, etc.), integrations (for Kafka, Avro, etc.), utils (testing, RDDs, etc.) and…

Read more

Spark 1.2 released

Today, Apache Spark 1.2 has been released with the following highlights: Improved Spark Core Spark Streaming now more or less fully available via Python as well MLLib has been improved GraphX is now stable Big congrats and thanks to the team!…

Read more

The Essential Apache Spark Cheat Sheet

DZone provides now an Apache Spark Cheat Sheet: This Refcard introduces Spark, explains its place in the big data ecosystem, walks through setup and creation of a basic Spark application, and explains commonly used actions and operations.…

Read more

Spark Tutorial University of Maryland

This is a two-and-a-half day tutorial on the distributed programming framework Apache Spark. The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises. http://lintool.github.io/SparkTutorial/…

Read more

Databricks Spark Reference Applications

Reference Applications demonstrating Apache Spark - brought to you by Databricks: http://databricks.gitbooks.io/databricks-spark-reference-applications/…

Read more
Proudly published with Ghost