is Flink supporting pre-loading of a compacted (reference) topic for a join ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Dominique De Vito
Hi,

I am looking for a "stream" join between:

* new data from a Kafka topic
* reference data from a compacted Kafka topic

Datastream.connect() works, but for a key, a join may occur before the corresponding reference data has been read by Flink. And this is, of course, a problem.

Is there a mean to load the compacted topic content before the join occurs, before the join really starts ?

According to some previous archived emails, and also the "FLIP-17", this feature is not implemented. Still, I am not sure about latest version status, so my question.

If not implemented, is there a way to overcome the limitation, and achieve the expected result (join) ?

Thanks.

Regards,
Dominique
Reply | Threaded
Open this post in threaded view
|

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Tzu-Li Tai
Hi Dominique,

FLIP-17 (Side Inputs) is not yet implemented, AFAIK.

One possible way to overcome this right now if your reference data is static
and not continuously changing, is to use the State Processor API to
bootstrap a savepoint with the reference data.
Have you looked into that and see if it would work for you?

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Dominique De Vito
Hi Gordon,

Thanks for your reply / help.

Yes, following the savepoint road would certainly make the job, even it's complicating the picture.

We might go that way in the future, but so far, we have followed an easier one through eventual consistency:

* if some referential data is not (yet) loaded (as part of the join), then do not join ;-)

(a) instead of doing a regular join (as expected), just set "not found" for the missing property values and push (through the collector) the produced data to the next tasks,
(b) store the incomplete data in some state, and when referential data has come, then make the regular join, and push updated data  to the next taks (through the collector).

So, it's all about eventual consistency.

Thanks.

Regards,
Dominique





Le mar. 28 janv. 2020 à 03:43, Tzu-Li Tai <[hidden email]> a écrit :
Hi Dominique,

FLIP-17 (Side Inputs) is not yet implemented, AFAIK.

One possible way to overcome this right now if your reference data is static
and not continuously changing, is to use the State Processor API to
bootstrap a savepoint with the reference data.
Have you looked into that and see if it would work for you?

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/