Hi, Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version . Many thanks |
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1]. The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version. Best, On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
|
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
|
In reply to this post by hai
Hi, Flavio: That’s good, Thank you. I will try it later ~ Regards Original Message Sender: Flavio Pompermaier<[hidden email]> Recipient: Hai<[hidden email]> Cc: user<[hidden email]> Date: Monday, Apr 29, 2019 19:56 Subject: Re: Read mongo datasource in Flink I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1]. The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version. Best, On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
|
In reply to this post by hai
Thanks for your sharing ~ That’s great ! Original Message Sender: Wouter Zorgdrager<[hidden email]> Recipient: Hai<[hidden email]> Cc: user<[hidden email]> Date: Monday, Apr 29, 2019 20:05 Subject: Re: Read mongo datasource in Flink For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. Cheers, Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
|
In reply to this post by Wouter Zorgdrager-2
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong? On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
|
Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working. Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <[hidden email]>:
|
Just a thought, A robust and high performance way to potentially achieve your goals is:
Debezium->Kafka->Flink Good robust handling of various topologies, reasonably good scaling properties, good restart-ability and such.. Thanks Kenny Gorman Co-Founder and CEO
|
Free forum by Nabble | Edit this page |