Source & job parallelism

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Source & job parallelism

LINZ, Arnaud

Hi,

 

I have a streaming source that extends RichParallelSourceFunction, but for some reason I don’t want parallelism at the source level, so I use :

Env.setSource(mySource).setParrellelism(1).map(mymapper)

 

I do want parallelism at the mapper level, because it’s a long task, and I would like the source to dispatch data to several mappers.

 

It seems that I don’t get parallelism on the mapper, it seems that the setParallelism() does not apply only to the source.

Is that right? If yes, how can I mix my parallelism levels ?

 

Best regards,

Arnaud




L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message, merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent this message cannot therefore be held liable for its content nor attachments. Any unauthorized use or dissemination is prohibited. If you are not the intended recipient of this message, then please delete it and notify the sender.
Reply | Threaded
Open this post in threaded view
|

Re: Source & job parallelism

Matthias J. Sax
Hi Arnaud,

did you try:

> Env.setSource(mySource).setParrellelism(1).map(mymapper).setParallelism(10)

If this does not work, it might be that Flink chains the mapper to the
source which implies to use the same parallelism (and the producer
dictates this dop value).

Using a rebalance() in between should break the chaining:

> Env.setSource(mySource).setParrellelism(1).rebalance().map(mymapper).setParallelism(10)



-Matthias

On 08/25/2015 07:08 PM, LINZ, Arnaud wrote:

> Hi,
>
>  
>
> I have a streaming source that extends RichParallelSourceFunction, but
> for some reason I don’t want parallelism at the source level, so I use :
>
> Env.setSource(mySource).setParrellelism(1).map(mymapper)
>
>  
>
> I do want parallelism at the mapper level, because it’s a long task, and
> I would like the source to dispatch data to several mappers.
>
>  
>
> It seems that I don’t get parallelism on the mapper, it seems that the
> setParallelism() does not apply only to the source.
>
> Is that right? If yes, how can I mix my parallelism levels ?
>
>  
>
> Best regards,
>
> Arnaud
>
>
> ------------------------------------------------------------------------
>
> L'intégrité de ce message n'étant pas assurée sur internet, la société
> expéditrice ne peut être tenue responsable de son contenu ni de ses
> pièces jointes. Toute utilisation ou diffusion non autorisée est
> interdite. Si vous n'êtes pas destinataire de ce message, merci de le
> détruire et d'avertir l'expéditeur.
>
> The integrity of this message cannot be guaranteed on the Internet. The
> company that sent this message cannot therefore be held liable for its
> content nor attachments. Any unauthorized use or dissemination is
> prohibited. If you are not the intended recipient of this message, then
> please delete it and notify the sender.


signature.asc (836 bytes) Download Attachment