problem with kafka on multiple partitions and large size message

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with kafka on multiple partitions and large size message

Youzha
Hi, i’m using kafka consumer and producer on flink. 

i’ve succeed to consume, transform, and produce to another topic on my development env ( 1 partition and a few dummy msg) 

but when i try to submit job on production env ( 20 partitions and the size is about 100Gb ),
there is no one message produce into the kafka topic. i’ve try to set flink parallelism into 20 but is still not working. then i try to use rebalance function it’s success to consume the messages but it still no one msg produce into kafka topic. 

pls advice
Reply | Threaded
Open this post in threaded view
|

Re: problem with kafka on multiple partitions and large size message

Arvid Heise-3
Hi Youzha,

do you use exactly once mode in kafka producer? Make sure that you have enabled checkpointing and set the interval appropriately.

You can also see the message flow in the web UI. Check if something is reaching the sink. Aside from that, if you use window operators make sure that they fire.

In general, the information that you provided is very sparse. I can give more detailed pointers with more information.

On Mon, Nov 30, 2020 at 9:25 AM Youzha <[hidden email]> wrote:
Hi, i’m using kafka consumer and producer on flink. 

i’ve succeed to consume, transform, and produce to another topic on my development env ( 1 partition and a few dummy msg) 

but when i try to submit job on production env ( 20 partitions and the size is about 100Gb ),
there is no one message produce into the kafka topic. i’ve try to set flink parallelism into 20 but is still not working. then i try to use rebalance function it’s success to consume the messages but it still no one msg produce into kafka topic. 

pls advice


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: problem with kafka on multiple partitions and large size message

Youzha
Hi Arvid,

thanks for your reply.

yes i’m using exactly once mode and i’ve enable checkpointing. 

env.enableCheckpointing(20000, CheckpointingMode.EXACTLY_ONCE);


this is my flow in the web UI. i’m trying to do the simple thing. read from kafka and then sink it to another topic. 


this is the web UI flow 



there are no msg produce on my new one topic.


any suggestion about this? 🙏🏻


On Mon, 30 Nov 2020 at 17.10 Arvid Heise <[hidden email]> wrote:
Hi Youzha,

do you use exactly once mode in kafka producer? Make sure that you have enabled checkpointing and set the interval appropriately.

You can also see the message flow in the web UI. Check if something is reaching the sink. Aside from that, if you use window operators make sure that they fire.

In general, the information that you provided is very sparse. I can give more detailed pointers with more information.

On Mon, Nov 30, 2020 at 9:25 AM Youzha <[hidden email]> wrote:
Hi, i’m using kafka consumer and producer on flink. 

i’ve succeed to consume, transform, and produce to another topic on my development env ( 1 partition and a few dummy msg) 

but when i try to submit job on production env ( 20 partitions and the size is about 100Gb ),
there is no one message produce into the kafka topic. i’ve try to set flink parallelism into 20 but is still not working. then i try to use rebalance function it’s success to consume the messages but it still no one msg produce into kafka topic. 

pls advice


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: problem with kafka on multiple partitions and large size message

Youzha
Hi,

Sorry for my last pict attachment. there is still filter transform process before produce to kafka. when i check it again, if i use rebalance function, there will be stuck on the first process after consume message.
on my last post. the process stuck on filter process. After consume from topic, the messages sent to filter transform. but it never sent to the next process.
i've try to make the simple process. read from kafka topic and then produce it into another topic. when using rebalance function, the message success to produce.
So my question is how can the messages stuck on the first process after consume from the topic ? and why they never sent to the next process ???


image.png

On Mon, Nov 30, 2020 at 6:01 PM Youzha <[hidden email]> wrote:
Hi Arvid,

thanks for your reply.

yes i’m using exactly once mode and i’ve enable checkpointing. 

env.enableCheckpointing(20000, CheckpointingMode.EXACTLY_ONCE);


this is my flow in the web UI. i’m trying to do the simple thing. read from kafka and then sink it to another topic. 


this is the web UI flow 



there are no msg produce on my new one topic.


any suggestion about this? 🙏🏻


On Mon, 30 Nov 2020 at 17.10 Arvid Heise <[hidden email]> wrote:
Hi Youzha,

do you use exactly once mode in kafka producer? Make sure that you have enabled checkpointing and set the interval appropriately.

You can also see the message flow in the web UI. Check if something is reaching the sink. Aside from that, if you use window operators make sure that they fire.

In general, the information that you provided is very sparse. I can give more detailed pointers with more information.

On Mon, Nov 30, 2020 at 9:25 AM Youzha <[hidden email]> wrote:
Hi, i’m using kafka consumer and producer on flink. 

i’ve succeed to consume, transform, and produce to another topic on my development env ( 1 partition and a few dummy msg) 

but when i try to submit job on production env ( 20 partitions and the size is about 100Gb ),
there is no one message produce into the kafka topic. i’ve try to set flink parallelism into 20 but is still not working. then i try to use rebalance function it’s success to consume the messages but it still no one msg produce into kafka topic. 

pls advice


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: problem with kafka on multiple partitions and large size message

Arvid Heise-3
Hi Youzha,

could you double-check if your filter function is actually just filtering out all messages? For example, could you replace the filter by (record -> true)? Filtering all records would most certainly describe the behavior best.

If that doesn't work, we need to go back to the basics: which Flink version are you using? Which data are you using locally to test?

Best,

Arvid

On Mon, Nov 30, 2020 at 5:33 PM Youzha <[hidden email]> wrote:
Hi,

Sorry for my last pict attachment. there is still filter transform process before produce to kafka. when i check it again, if i use rebalance function, there will be stuck on the first process after consume message.
on my last post. the process stuck on filter process. After consume from topic, the messages sent to filter transform. but it never sent to the next process.
i've try to make the simple process. read from kafka topic and then produce it into another topic. when using rebalance function, the message success to produce.
So my question is how can the messages stuck on the first process after consume from the topic ? and why they never sent to the next process ???


image.png

On Mon, Nov 30, 2020 at 6:01 PM Youzha <[hidden email]> wrote:
Hi Arvid,

thanks for your reply.

yes i’m using exactly once mode and i’ve enable checkpointing. 

env.enableCheckpointing(20000, CheckpointingMode.EXACTLY_ONCE);


this is my flow in the web UI. i’m trying to do the simple thing. read from kafka and then sink it to another topic. 


this is the web UI flow 



there are no msg produce on my new one topic.


any suggestion about this? 🙏🏻


On Mon, 30 Nov 2020 at 17.10 Arvid Heise <[hidden email]> wrote:
Hi Youzha,

do you use exactly once mode in kafka producer? Make sure that you have enabled checkpointing and set the interval appropriately.

You can also see the message flow in the web UI. Check if something is reaching the sink. Aside from that, if you use window operators make sure that they fire.

In general, the information that you provided is very sparse. I can give more detailed pointers with more information.

On Mon, Nov 30, 2020 at 9:25 AM Youzha <[hidden email]> wrote:
Hi, i’m using kafka consumer and producer on flink. 

i’ve succeed to consume, transform, and produce to another topic on my development env ( 1 partition and a few dummy msg) 

but when i try to submit job on production env ( 20 partitions and the size is about 100Gb ),
there is no one message produce into the kafka topic. i’ve try to set flink parallelism into 20 but is still not working. then i try to use rebalance function it’s success to consume the messages but it still no one msg produce into kafka topic. 

pls advice


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng