Batch reading from Cassandra

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Batch reading from Cassandra

Lasse Nedergaard-2
Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard
Reply | Threaded
Open this post in threaded view
|

Re: Batch reading from Cassandra

Piotr Nowojski-3
Hi,

I’m afraid that we don’t have any native support for reading from Cassandra at the moment. The only things that I could find, are streaming sinks [1][2].

Piotrek


On 23 Feb 2020, at 10:03, Lasse Nedergaard <[hidden email]> wrote:

Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard