Optimizing Spark Job Performance With Apache Ignite (Part 1)

Big Data

Big Data / Big Data 23 Views 0

Portions of this article were taken from my book,& High-Performance In-Memory Computing With Apache Ignite. If this post got you interested, check out the rest of the book for more helpful information.

Apache Ignite offers several ways to improve a Spark job's performance: Ignite RDD, which represents an Ignite cache as a Spark RDD abstraction, and Ignite IGFS, an in-memory file system that can be transparently plugged into Spark deployments. Ignite RDD allows easily sharing states in-memory between different Spark jobs or applications.& With Ignite in-memory shares RDDs, any Spark job can put some data into an Ignite cache that other Spark jobs can access later. Ignite RDD is implemented as a view over the Ignite distributed cache, which can be deployed either within the Spark job execution process or on a Spark worker.