(DEPRECATED) Apache Flink User Mailing List archive.

Flink 1.3 -> 1.4 Kafka state migration issue

Classic

List

Threaded

4 messages Options

Gyula Fóra

Flink 1.3 -> 1.4 Kafka state migration issue

Hi,

Is it possible that the Kafka partition assignment logic has changed between Flink 1.3 and 1.4? I am trying to migrate some jobs using Kafka 0.8 sources and about half my jobs lost offset state for some partitions (but not all partitions). Jobs with parallelism 1 dont seem to be affected...

Any ideas?

Gyula

Gyula Fóra

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Migrating the jobs by setting the sources to parallelism = 1 and then scale back up after migration seems to be a good workaround, but I am wondering if something I do made this happen or this is a bug.

Gyula Fóra <[hidden email]> ezt írta (időpont: 2018. jan. 8., H, 14:46):

Hi,

Is it possible that the Kafka partition assignment logic has changed between Flink 1.3 and 1.4? I am trying to migrate some jobs using Kafka 0.8 sources and about half my jobs lost offset state for some partitions (but not all partitions). Jobs with parallelism 1 dont seem to be affected...

Any ideas?

Gyula

Tzu-Li (Gordon) Tai

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Hi Gyula,

Is your 1.3 savepoint from Flink 1.3.1 or 1.3.0?

In those versions, we had a critical bug that caused duplicate partition assignments in corner cases, so the assignment logic was altered from 1.3.1 to 1.3.2 (and therefore also 1.4.0).

If you indeed was using 1.3.1 or 1.3.0, and you are certain that the savepoint does not contain duplicate partition assignments caused by the bug, then yes restoring with DOP 1 and then rescaling again is a good workaround.

Please see the 1.3.2 release announcement [1] for details.

Best,

Gordon

[1] http://flink.apache.org/news/2017/08/05/release-1.3.2.html

On Jan 8, 2018 6:57 AM, "Gyula Fóra" <[hidden email]> wrote:

Migrating the jobs by setting the sources to parallelism = 1 and then scale back up after migration seems to be a good workaround, but I am wondering if something I do made this happen or this is a bug.

Gyula Fóra <[hidden email]> ezt írta (időpont: 2018. jan. 8., H, 14:46):
Hi,

Is it possible that the Kafka partition assignment logic has changed between Flink 1.3 and 1.4? I am trying to migrate some jobs using Kafka 0.8 sources and about half my jobs lost offset state for some partitions (but not all partitions). Jobs with parallelism 1 dont seem to be affected...

Any ideas?

Gyula

Gyula Fóra

Re: Flink 1.3 -> 1.4 Kafka state migration issue

Hi,

Thanks Gordon, should have read the announcement :)

This might indeed be the case here, I will just use the workaround. At least this is a known issue, almost got a heart attack today :D

Cheers,

Gyula

Tzu-Li (Gordon) Tai <[hidden email]> ezt írta (időpont: 2018. jan. 8., H, 17:56):

Hi Gyula,

Is your 1.3 savepoint from Flink 1.3.1 or 1.3.0?
In those versions, we had a critical bug that caused duplicate partition assignments in corner cases, so the assignment logic was altered from 1.3.1 to 1.3.2 (and therefore also 1.4.0).

If you indeed was using 1.3.1 or 1.3.0, and you are certain that the savepoint does not contain duplicate partition assignments caused by the bug, then yes restoring with DOP 1 and then rescaling again is a good workaround.

Please see the 1.3.2 release announcement [1] for details.

Best,
Gordon

[1] http://flink.apache.org/news/2017/08/05/release-1.3.2.html

On Jan 8, 2018 6:57 AM, "Gyula Fóra" <[hidden email]> wrote:
Migrating the jobs by setting the sources to parallelism = 1 and then scale back up after migration seems to be a good workaround, but I am wondering if something I do made this happen or this is a bug.

Gyula Fóra <[hidden email]> ezt írta (időpont: 2018. jan. 8., H, 14:46):
Hi,

Is it possible that the Kafka partition assignment logic has changed between Flink 1.3 and 1.4? I am trying to migrate some jobs using Kafka 0.8 sources and about half my jobs lost offset state for some partitions (but not all partitions). Jobs with parallelism 1 dont seem to be affected...

Any ideas?

Gyula