Read mongo datasource in Flink

classic Classic list List threaded Threaded
8 messages Options
hai
Reply | Threaded
Open this post in threaded view
|

Read mongo datasource in Flink

hai

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks

Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

Flavio Pompermaier
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks


Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

Wouter Zorgdrager-2
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. 

Cheers,

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks


hai
Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

hai
In reply to this post by hai

Hi, Flavio:


That’s good, Thank you. I will try it later ~


Regards


 Original Message 
Sender: Flavio Pompermaier<[hidden email]>
Recipient: Hai<[hidden email]>
Cc: user<[hidden email]>
Date: Monday, Apr 29, 2019 19:56
Subject: Re: Read mongo datasource in Flink

I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks


hai
Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

hai
In reply to this post by hai

Thanks for your sharing ~ That’s great !



 Original Message 
Sender: Wouter Zorgdrager<[hidden email]>
Recipient: Hai<[hidden email]>
Cc: user<[hidden email]>
Date: Monday, Apr 29, 2019 20:05
Subject: Re: Read mongo datasource in Flink

For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. 

Cheers,

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks


Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

Flavio Pompermaier
In reply to this post by Wouter Zorgdrager-2
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. 

Cheers,

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks



Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

Wouter Zorgdrager-2
Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working.


Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <[hidden email]>:
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. 

Cheers,

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,


Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .


Many thanks



Reply | Threaded
Open this post in threaded view
|

Re: Read mongo datasource in Flink

Kenny Gorman
Just a thought, A robust and high performance way to potentially achieve your goals is:

Debezium->Kafka->Flink


Good robust handling of various topologies, reasonably good scaling properties, good restart-ability and such..

Thanks
Kenny Gorman
Co-Founder and CEO



On Apr 29, 2019, at 7:47 AM, Wouter Zorgdrager <[hidden email]> wrote:

Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working.


Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <[hidden email]>:
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context. 

Cheers,

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks