(DEPRECATED) Apache Flink User Mailing List archive.

Read mongo datasource in Flink

Classic

List

Threaded

8 messages Options

hai

Read mongo datasource in Flink

Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

Flavio Pompermaier

Re: Read mongo datasource in Flink

I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].

The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

Wouter Zorgdrager-2

Re: Read mongo datasource in Flink

For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context.

Cheers,

Wouter

[1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala

[2]: http://json4s.org/

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:

I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,
Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

hai

Re: Read mongo datasource in Flink

In reply to this post by hai

Hi, Flavio:

That’s good, Thank you. I will try it later ~

Regards

Original Message

Sender: Flavio Pompermaier<[hidden email]>

Recipient: Hai<[hidden email]>

Cc: user<[hidden email]>

Date: Monday, Apr 29, 2019 19:56

Subject: Re: Read mongo datasource in Flink

I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].

The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,

Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:

Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

hai

Re: Read mongo datasource in Flink

In reply to this post by hai

Thanks for your sharing ~ That’s great !

Original Message

Sender: Wouter Zorgdrager<[hidden email]>

Recipient: Hai<[hidden email]>

Cc: user<[hidden email]>

Date: Monday, Apr 29, 2019 20:05

Subject: Re: Read mongo datasource in Flink

Cheers,

Wouter

[1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala

[2]: http://json4s.org/

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:

I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,
Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

Flavio Pompermaier

Re: Read mongo datasource in Flink

In reply to this post by Wouter Zorgdrager-2

But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:

For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context.

Cheers,
Wouter

[1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
[2]: http://json4s.org/

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,
Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

Wouter Zorgdrager-2

Re: Read mongo datasource in Flink

Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working.

[1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan

Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <[hidden email]>:

But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context.

Cheers,
Wouter

[1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
[2]: http://json4s.org/

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,
Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks

Kenny Gorman

Re: Read mongo datasource in Flink

Just a thought, A robust and high performance way to potentially achieve your goals is:

Debezium->Kafka->Flink

https://debezium.io/docs/connectors/mongodb/

Good robust handling of various topologies, reasonably good scaling properties, good restart-ability and such..

Thanks

Kenny Gorman

Co-Founder and CEO

www.eventador.io

On Apr 29, 2019, at 7:47 AM, Wouter Zorgdrager <[hidden email]> wrote:

Yes, that is correct. This is a really basic implementation that doesn't take parallelism into account. I think you need something like this [1] to get that working.

[1]: https://docs.mongodb.com/manual/reference/command/parallelCollectionScan/#dbcmd.parallelCollectionScan

Op ma 29 apr. 2019 om 14:37 schreef Flavio Pompermaier <[hidden email]>:
But what about parallelism with this implementation? From what I see there's only a single thread querying Mongo and fetching all the data..am I wrong?

On Mon, Apr 29, 2019 at 2:05 PM Wouter Zorgdrager <[hidden email]> wrote:
For a framework I'm working on, we actually implemented a (basic) Mongo source [1]. It's written in Scala and uses Json4s [2] to parse the data into a case class. It uses a Mongo observer to iterate over a collection and emit it into a Flink context.

Cheers,
Wouter

[1]: https://github.com/codefeedr/codefeedr/blob/develop/codefeedr-plugins/codefeedr-mongodb/src/main/scala/org/codefeedr/plugins/mongodb/BaseMongoSource.scala
[2]: http://json4s.org/

Op ma 29 apr. 2019 om 13:57 schreef Flavio Pompermaier <[hidden email]>:
I'm not aware of an official source/sink..if you want you could try to exploit the Mongo HadoopInputFormat as in [1].
The provided link use a pretty old version of Flink but it should not be a big problem to update the maven dependencies and the code to a newer version.

Best,
Flavio

[1] https://github.com/okkam-it/flink-mongodb-test

On Mon, Apr 29, 2019 at 6:15 AM Hai <[hidden email]> wrote:
Hi,

Can anyone give me a clue about how to read mongodb’s data as a batch/streaming datasource in Flink? I don’t find the mongodb connector in recent release version .

Many thanks