Global Hashmap & global static variable.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Global Hashmap & global static variable.

Annemarie Burger
Hi,

I have two questions:

1. In the first part of my pipeline using Flink DataStreams processing graph
edges, I'm filling up Hashmap. In it goes a vertex id and the partition this
vertex is assigned to. Later in my pipeline I want to query this Hashmap
again, to see in which partition exactly I can find a specific edge based on
which partitions its two vertices are assigned to. What is the best way to
do this? I keep getting Nullpointer exceptions.

2. Is there a good way to retrieve a single value at some point in my
pipeline, and then make it globally available? I was using static, but found
that this led to a Nullpointer exception when executing Flink standalone.
Weirdly enough it worked fine in my Intellij.

All help very appreciated!

Best,
Annemarie



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Global Hashmap & global static variable.

Piotr Nowojski-4
Hi Annemarie,

You are missing some basic concepts in Flink, please take a look at [1].

> Weirdly enough it worked fine in my Intellij.

It's completely normal. If you are accessing some static variable in your code and you are executing your Flink application in a testing local environment (Intellij), where there is just a single JVM running all of the code, so everyone (any code/operator/function) accessing the same static variable will be accessing the same one. But if you execute the same code on a cluster, with multiple machines (Flink is a distributed system!), there will be no way for different JVM processes to communicate via static variables.

Probably the same problem applies to your first issue with the HashMap.

Please, carefully follow [1], how Flink executes/distributes your code to understand this problem.

Can you describe what are you trying to do/achieve/solve? 

There is currently no way for different operators/functions to share access to the same state. Note, different parallel instances of the same operator/function can share the same state. It's called broadcast state [2], but it doesn't allow for the pattern you are looking for (aggregate results in one stage, and then use this aggregated state in another later stage/operator). To do this, you would have to store the state in some external system (some external key/valua storage, DB, Kafka, File on a distribute file system, ...), to make it visible and accessible across your whole cluster.

Piotrek


pt., 17 lip 2020 o 21:32 Annemarie Burger <[hidden email]> napisał(a):
Hi,

I have two questions:

1. In the first part of my pipeline using Flink DataStreams processing graph
edges, I'm filling up Hashmap. In it goes a vertex id and the partition this
vertex is assigned to. Later in my pipeline I want to query this Hashmap
again, to see in which partition exactly I can find a specific edge based on
which partitions its two vertices are assigned to. What is the best way to
do this? I keep getting Nullpointer exceptions.

2. Is there a good way to retrieve a single value at some point in my
pipeline, and then make it globally available? I was using static, but found
that this led to a Nullpointer exception when executing Flink standalone.
Weirdly enough it worked fine in my Intellij.

All help very appreciated!

Best,
Annemarie



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/