(DEPRECATED) Apache Flink User Mailing List archive.

is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Classic

List

Threaded

3 messages Options

Dominique De Vito

is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Hi,

I am looking for a "stream" join between:

* new data from a Kafka topic
* reference data from a compacted Kafka topic

Datastream.connect() works, but for a key, a join may occur before the corresponding reference data has been read by Flink. And this is, of course, a problem.

Is there a mean to load the compacted topic content before the join occurs, before the join really starts ?

According to some previous archived emails, and also the "FLIP-17", this feature is not implemented. Still, I am not sure about latest version status, so my question.

If not implemented, is there a way to overcome the limitation, and achieve the expected result (join) ?

Thanks.

Regards,
Dominique

Tzu-Li Tai

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Hi Dominique,

FLIP-17 (Side Inputs) is not yet implemented, AFAIK.

One possible way to overcome this right now if your reference data is static
and not continuously changing, is to use the State Processor API to
bootstrap a savepoint with the reference data.
Have you looked into that and see if it would work for you?

Cheers,
Gordon

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Dominique De Vito

Re: is Flink supporting pre-loading of a compacted (reference) topic for a join ?

Hi Gordon,

Thanks for your reply / help.

Yes, following the savepoint road would certainly make the job, even it's complicating the picture.

We might go that way in the future, but so far, we have followed an easier one through eventual consistency:

* if some referential data is not (yet) loaded (as part of the join), then do not join ;-)

(a) instead of doing a regular join (as expected), just set "not found" for the missing property values and push (through the collector) the produced data to the next tasks,

(b) store the incomplete data in some state, and when referential data has come, then make the regular join, and push updated data to the next taks (through the collector).

So, it's all about eventual consistency.

Thanks.

Regards,

Dominique

Le mar. 28 janv. 2020 à 03:43, Tzu-Li Tai <[hidden email]> a écrit :

Hi Dominique,

FLIP-17 (Side Inputs) is not yet implemented, AFAIK.

One possible way to overcome this right now if your reference data is static
and not continuously changing, is to use the State Processor API to
bootstrap a savepoint with the reference data.
Have you looked into that and see if it would work for you?

Cheers,
Gordon

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/