I want to use Cassandra native connection (Not Flink Cassandra connection) to insert some data into Cassandra. According to the design of the code, the connection to Cassandra will open once at the start and all taskmanager use it to write data. It's ok running in local mode.
The problem is when I submit the code on YARN cluster, as each taskmanager has it's own JVM, the connection to the Cassandra will not share and I should open and close it for each taskmanager. Is there any way to have a connection for all taskmanagers? |
Here is my code stream.flatMap(new FlatMapFunction<byte[], Void>() { On Thu, Apr 26, 2018 at 5:22 PM, Soheil Pourbafrani <[hidden email]> wrote:
|
Hi,
The only way that I can think of is if you keep your flatMap operator with parallelism 1, but that might defeat the purpose. Otherwise there is no way to open one single connection and share it across multiple TaskManagers (which can be running on different physical machines). Please rethink your solution/approach with respect to distributed nature of Flink. However there are some valid use cases where one would like to have some part of his job graph distributed and some part(s) non distributed - like issuing one single commit after a distributed write, or processing a data in parallel but writing them to a relational database like MySQL via one single Sink operator.. Piotrek
|
Maybe you can share a bit more about why you need only one connection to Cassandra across all TaskManagers, so we can better help? On Wed, May 2, 2018 at 4:08 AM, Piotr Nowojski <[hidden email]> wrote:
"So you have to trust that the dots will somehow connect in your future."
|
Free forum by Nabble | Edit this page |