state and Id of job in flatMap

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

state and Id of job in flatMap

anissa moussaoui

Hello,

I created a process for an anomaly detection with a flatMap. I need to know the end of each job at the level of the flatMap to be able to flush a buffer in the output collector.

I saw that it is possible to get status of job by using ExecutionEnvironment, but i don't know how i can implement it.

Is it possible to recover the end of each job in the flatMap of any processing source with last iteration of job or to have in advance the size of the partition that each job must deal ?

Thank's in advance !

Anissa




      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire 
Reply | Threaded
Open this post in threaded view
|

Re: state and Id of job in flatMap

anissa moussaoui

Hello,
Thank you for your replay. I will use MapPartition for anomaly detection for batch job. But i saw that flink has planned to unify stream and batch according to the folowing link https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html  , so in this case how can i use MapPartition with stream ?

Thank's in advance !

Anissa



Le mer. 20 mars 2019 à 18:53, Ken Krugler <[hidden email]> a écrit :
Hi Anissa,

I assume you’re running a batch job, since there is no “end of job” with streaming.

If so, then calling .mapPartition(new YourMapPartitionFunction) is one option that I’ve used for cases like this.

See MapPartitionFunction in the JavaDocs.

— Ken


On Mar 20, 2019, at 8:59 AM, anissa moussaoui <[hidden email]> wrote:

Hello,

I created a process for an anomaly detection with a flatMap. I need to know the end of each job at the level of the flatMap to be able to flush a buffer in the output collector.

I saw that it is possible to get status of job by using ExecutionEnvironment, but i don't know how i can implement it.

Is it possible to recover the end of each job in the flatMap of any processing source with last iteration of job or to have in advance the size of the partition that each job must deal ?

Thank's in advance !

Anissa


--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra



      

Arbre vert.jpg Pensez à la planète, imprimer ce papier que si nécessaire