Hi All,
This is more of a general question. How are tasks synchronized in batch execution? If, for example, we ran an iterative pipeline (map1 -> reduce1 -> reduce2 -> map2), and the first two operators (map1->reduce1) were chained, how would reduce2 be notified that
map1 -> reduce1 have completed their execution so as to start reading its input data? I noticed that in the driver classes (MapDriver, ChainedReduceDriver etc.) there are input and output counters (numRecordsOut, numRecordsIn). Are these used to check if an
operator has consumed all of its data?
Thank you in advance.
Best Wishes,
Mary