(DEPRECATED) Apache Flink User Mailing List archive.

Global Hashmap & global static variable.

Classic

List

Threaded

2 messages Options

Annemarie Burger

Global Hashmap & global static variable.

Hi,

I have two questions:

1. In the first part of my pipeline using Flink DataStreams processing graph
edges, I'm filling up Hashmap. In it goes a vertex id and the partition this
vertex is assigned to. Later in my pipeline I want to query this Hashmap
again, to see in which partition exactly I can find a specific edge based on
which partitions its two vertices are assigned to. What is the best way to
do this? I keep getting Nullpointer exceptions.

2. Is there a good way to retrieve a single value at some point in my
pipeline, and then make it globally available? I was using static, but found
that this led to a Nullpointer exception when executing Flink standalone.
Weirdly enough it worked fine in my Intellij.

All help very appreciated!

Best,
Annemarie

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Piotr Nowojski-4

Re: Global Hashmap & global static variable.

Hi Annemarie,

You are missing some basic concepts in Flink, please take a look at [1].

> Weirdly enough it worked fine in my Intellij.

It's completely normal. If you are accessing some static variable in your code and you are executing your Flink application in a testing local environment (Intellij), where there is just a single JVM running all of the code, so everyone (any code/operator/function) accessing the same static variable will be accessing the same one. But if you execute the same code on a cluster, with multiple machines (Flink is a distributed system!), there will be no way for different JVM processes to communicate via static variables.

Probably the same problem applies to your first issue with the HashMap.

Please, carefully follow [1], how Flink executes/distributes your code to understand this problem.

Can you describe what are you trying to do/achieve/solve?

There is currently no way for different operators/functions to share access to the same state. Note, different parallel instances of the same operator/function can share the same state. It's called broadcast state [2], but it doesn't allow for the pattern you are looking for (aggregate results in one stage, and then use this aggregated state in another later stage/operator). To do this, you would have to store the state in some external system (some external key/valua storage, DB, Kafka, File on a distribute file system, ...), to make it visible and accessible across your whole cluster.

Piotrek

[1] https://ci.apache.org/projects/flink/flink-docs-stable/concepts/flink-architecture.html

[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html

pt., 17 lip 2020 o 21:32 Annemarie Burger <[hidden email]> napisał(a):

Hi,

I have two questions:

1. In the first part of my pipeline using Flink DataStreams processing graph
edges, I'm filling up Hashmap. In it goes a vertex id and the partition this
vertex is assigned to. Later in my pipeline I want to query this Hashmap
again, to see in which partition exactly I can find a specific edge based on
which partitions its two vertices are assigned to. What is the best way to
do this? I keep getting Nullpointer exceptions.

2. Is there a good way to retrieve a single value at some point in my
pipeline, and then make it globally available? I was using static, but found
that this led to a Nullpointer exception when executing Flink standalone.
Weirdly enough it worked fine in my Intellij.

All help very appreciated!

Best,
Annemarie

--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/