Hello Flinkers,
I am experimenting a bit with DataSet API and I have written a simple
program that joins two (key, value) datasets by key. The server I am running
my experiments has 12 cores with 4 threads each, thus I have set the number
of slots for a TaskManager to 12x4=48 to leverage the full parallelism.
Although, I am trying to run the same join with different levels of
I do the join and count() the result. The running time of the experiment
executed with parallelism 48 is EQUAL (?!?!?) with the running time of the
experiment with parallelism 1 or 10 or 20. How is this possible?
It does not make sense. I expected to see at least some difference. If you
have any ideas, please share!
P.S. Also, is there any DummySink for DataSet API like in DataStream API as
I only care for enumerating the result for now. The count() does not let me
do env.execute() and I would like to get the getNetRuntime() from env after
Sent from: