(DEPRECATED) Apache Flink User Mailing List archive.

Batch reading from Cassandra. How to?

Classic

List

Threaded

3 messages Options

Lasse Nedergaard

Batch reading from Cassandra. How to?

Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%

Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.

Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard

Re: Batch reading from Cassandra. How to?

Any good suggestions?

Lasse

Den tir. 11. feb. 2020 kl. 08.48 skrev Lasse Nedergaard <[hidden email]>:

Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard

Till Rohrmann

Re: Batch reading from Cassandra. How to?

Hi Lasse,

as far as I know, the best way to read from Cassandra is to use the CassandraInputFormat [1]. Unfortunately, there is no such optimized way to read a large amount of data as Spark offers it at the moment. But if you want to contribute this feature to Flink, then the community would highly appreciate it.

[1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/batch/connectors/cassandra/CassandraInputFormat.java

Cheers,

Till

On Fri, Feb 14, 2020 at 11:04 AM Lasse Nedergaard <[hidden email]> wrote:

Any good suggestions?

Lasse

Den tir. 11. feb. 2020 kl. 08.48 skrev Lasse Nedergaard <[hidden email]>:
Hi.

We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by reading from the underlying SS tables to load in parallel.
Do we have something similarly in Flink, or how is the most efficient way to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard