Monday 15 June 2015

java - Apache Spark: what's the designed behavior if master fails -



java - Apache Spark: what's the designed behavior if master fails -

we running our calculations in standalone spark cluster, ver 1.0.2 - previous major release. not have ha or recovery logic configured. piece of functionality on driver side consumes incoming jms messages , submits respective jobs spark.

when bring single & spark master downwards (for tests), seems driver programme unable figure out cluster no longer usable. results in 2 major problems:

the driver tries reconnect endlessly master, or @ to the lowest degree couldn't wait until gives up. because of previous point, submission of new jobs blocks (in org.apache.spark.scheduler.jobwaiter#awaitresult). presume because cluster not reported unreacheable/down , submission logic waits until cluster comes back. means run out of jms listener threads fast since blocked.

there couple of akka failure detection-related properties can configure on spark, but:

the official documentation doesn't recommend enabling akka's built-in failure detection. i want understand how supposed work default.

so, can please explain what's designed behavior if single spark master in standalone deployment mode fails/stops/shuts down. wasn't able find proper doc on net this.

in default, spark can handle workers failures not master (driver) failure. if master crashes, no new applications can created. therefore, provide 2 high availability schemes here: https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability

hope helps,

le quoc do

java cluster-computing akka apache-spark

No comments:

Post a Comment