Thursday, 15 July 2010

java - Designate a specific machine with Apache Spark -



java - Designate a specific machine with Apache Spark -

i'm totally new spark , hadoop-type stuff in general, forgive me if painfully basic question. i'm trying design scheme create utilize of cluster of number of machines first tasks in series of tasks. follow-up tasks, run on rdds first tasks generate, must done on same machine. machine cluster long it's machine duration of programme run.

how create sure happens? can reserve single machine in cluster , run follow-up tasks on machine? if so, how in java? if not, there other way accomplish this?

in general, no. spark, hadoop, designed distribute tasks more or less arbitrarily, on available nodes, , assumes available nodes equivalent purposes. none of them treated specially.

if don't want sec half of process run in (more or less) massively parallel fashion, don't want utilize parallel processing framework half of job. maybe should write info parallel calculations disk somewhere, , run sec half of job on data, not spark rdd transformations, normal scala code reads files , processes them. it's hard say.

why "follow-up tasks" need run in 1 particular place? if can explain more need, maybe can create suggestions you.

java apache-spark rdd

No comments:

Post a Comment