Insert data into Cassandra without Flink Cassandra connection

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Insert data into Cassandra without Flink Cassandra connection

Soheil Pourbafrani
I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data.  It's ok running in local mode.

The problem is when I submit the code on YARN cluster, as each taskmanager has it's own JVM, the connection to the Cassandra will not share and I should open and close it for each taskmanager. Is there any way to have a connection for all taskmanagers?
Reply | Threaded
Open this post in threaded view
|

Re: Insert data into Cassandra without Flink Cassandra connection

Soheil Pourbafrani
Here is my code 

stream.flatMap(new FlatMapFunction<byte[], Void>() {

@Override
public void flatMap(byte[] value, Collector<Void> out) throws Exception {
Parser.setInsert(true);
CassandraConnection.connect();
Parser.setInsert(true);
System.out.println("\n*********** New Message ***********\n");
System.out.println("Row Number : " + i ++ );
System.out.println("Message : " + HexUtiles.bytesToHex(value));
Parser.parse(ByteBuffer.wrap(value), ConfigHashMap);
}
});


On Thu, Apr 26, 2018 at 5:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data.  It's ok running in local mode.

The problem is when I submit the code on YARN cluster, as each taskmanager has it's own JVM, the connection to the Cassandra will not share and I should open and close it for each taskmanager. Is there any way to have a connection for all taskmanagers?

Reply | Threaded
Open this post in threaded view
|

Re: Insert data into Cassandra without Flink Cassandra connection

Piotr Nowojski
Hi,

The only way that I can think of is if you keep your flatMap operator with parallelism 1, but that might defeat the purpose. Otherwise there is no way to open one single connection and share it across multiple TaskManagers (which can be running on different physical machines). Please rethink your solution/approach with respect to distributed nature of Flink.

However there are some valid use cases where one would like to have some part of his job graph distributed and some part(s) non distributed - like issuing one single commit after a distributed write, or processing a data in parallel but writing them to a relational database like MySQL via one single Sink operator.. 

Piotrek

On 26 Apr 2018, at 15:23, Soheil Pourbafrani <[hidden email]> wrote:

Here is my code 

stream.flatMap(new FlatMapFunction<byte[], Void>() {

@Override
public void flatMap(byte[] value, Collector<Void> out) throws Exception {
Parser.setInsert(true);
CassandraConnection.connect();
Parser.setInsert(true);
System.out.println("\n*********** New Message ***********\n");
System.out.println("Row Number : " + i ++ );
System.out.println("Message : " + HexUtiles.bytesToHex(value));
Parser.parse(ByteBuffer.wrap(value), ConfigHashMap);
}
});


On Thu, Apr 26, 2018 at 5:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data.  It's ok running in local mode.

The problem is when I submit the code on YARN cluster, as each taskmanager has it's own JVM, the connection to the Cassandra will not share and I should open and close it for each taskmanager. Is there any way to have a connection for all taskmanagers?


Reply | Threaded
Open this post in threaded view
|

Re: Insert data into Cassandra without Flink Cassandra connection

Shuyi Chen
Maybe you can share a bit more about why you need only one connection to Cassandra across all TaskManagers, so we can better help?

On Wed, May 2, 2018 at 4:08 AM, Piotr Nowojski <[hidden email]> wrote:
Hi,

The only way that I can think of is if you keep your flatMap operator with parallelism 1, but that might defeat the purpose. Otherwise there is no way to open one single connection and share it across multiple TaskManagers (which can be running on different physical machines). Please rethink your solution/approach with respect to distributed nature of Flink.

However there are some valid use cases where one would like to have some part of his job graph distributed and some part(s) non distributed - like issuing one single commit after a distributed write, or processing a data in parallel but writing them to a relational database like MySQL via one single Sink operator.. 

Piotrek


On 26 Apr 2018, at 15:23, Soheil Pourbafrani <[hidden email]> wrote:

Here is my code 

stream.flatMap(new FlatMapFunction<byte[], Void>() {

@Override
public void flatMap(byte[] value, Collector<Void> out) throws Exception {
Parser.setInsert(true);
CassandraConnection.connect();
Parser.setInsert(true);
System.out.println("\n*********** New Message ***********\n");
System.out.println("Row Number : " + i ++ );
System.out.println("Message : " + HexUtiles.bytesToHex(value));
Parser.parse(ByteBuffer.wrap(value), ConfigHashMap);
}
});


On Thu, Apr 26, 2018 at 5:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data.  It's ok running in local mode.

The problem is when I submit the code on YARN cluster, as each taskmanager has it's own JVM, the connection to the Cassandra will not share and I should open and close it for each taskmanager. Is there any way to have a connection for all taskmanagers?





--
"So you have to trust that the dots will somehow connect in your future."