(DEPRECATED) Apache Flink User Mailing List archive.

Batch reading from Cassandra

Classic

List

Threaded

2 messages Options

Lasse Nedergaard-2

Batch reading from Cassandra

Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard

Piotr Nowojski-3

Re: Batch reading from Cassandra

Hi,

I’m afraid that we don’t have any native support for reading from Cassandra at the moment. The only things that I could find, are streaming sinks [1][2].

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/cassandra.html

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/connect.html#further-tablesources-and-tablesinks

On 23 Feb 2020, at 10:03, Lasse Nedergaard <[hidden email]> wrote:

Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard