(DEPRECATED) Apache Flink User Mailing List archive.

Is it possible to use OperatorState, when NOT implementing a source or sink function?

Classic

List

Threaded

6 messages Options

Marco Villalobos-2

Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?

JING ZHANG

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

Hi,

please use `CheckpointedFunction`, you could initialize your operator state in `initializeState` method by using context.getOperatorStateStore().***

Best regards,

JING ZHANG

Marco Villalobos <[hidden email]> 于2021年6月5日周六下午1:55写道：

Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?

Marco Villalobos-2

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

Does that work in the DataStream API in Batch Execution Mode?

On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG <[hidden email]> wrote:

Hi,
please use `CheckpointedFunction`, you could initialize your operator state in `initializeState` method by using context.getOperatorStateStore().***

Best regards,
JING ZHANG

Marco Villalobos <[hidden email]> 于2021年6月5日周六下午1:55写道：
Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?

Yun Gao

Re: Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

Hi Marco,

I think yes, the operator state could be used in batch mode. Since there

is no checkpoint in batch mode, the operator state would serve as a kind

of ordinary in-memory storage.

Best,

Yun

------------------------------------------------------------------
Sender:Marco Villalobos<[hidden email]>
Date:2021/06/05 19:56:08
Recipient:JING ZHANG<[hidden email]>
Cc:user<[hidden email]>
Theme:Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

Does that work in the DataStream API in Batch Execution Mode?

On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG <[hidden email]> wrote:
Hi,
please use `CheckpointedFunction`, you could initialize your operator state in `initializeState` method by using context.getOperatorStateStore().***

Best regards,
JING ZHANG

Marco Villalobos <[hidden email]> 于2021年6月5日周六下午1:55写道：
Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?

Marco Villalobos-2

Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

In reply to this post by Marco Villalobos-2

Oh....that won't work for me either. I needed to use MapState.

Perhaps I should describe my problem. I am using a KeyedState process function, but the workload that it is processing is not distributing well across the cluster. I have four task managers, but the way my data is keyed in this operator, it only goes to one task manager node.

I need state, but I don't really need it keyed.

On Sat, Jun 5, 2021 at 4:56 AM Marco Villalobos <[hidden email]> wrote:

Does that work in the DataStream API in Batch Execution Mode?

On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG <[hidden email]> wrote:
Hi,
please use `CheckpointedFunction`, you could initialize your operator state in `initializeState` method by using context.getOperatorStateStore().***

Best regards,
JING ZHANG

Marco Villalobos <[hidden email]> 于2021年6月5日周六下午1:55写道：
Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?

Yun Gao

Re: Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?

Hi Marco,

It seems to me that the imbalance problem and the state is independent for this issue: the data distribution

is only decided by the KeySelector used. The only limitation for state is that the keyed state is bind to the

KeySelector used across the tasks. If the imbalance is the root problem, have you checked

how many keys in total does the job have ?

Best,

Yun

------------------Original Mail ------------------
Sender:Marco Villalobos <[hidden email]>
Send Date:Sat Jun 5 23:26:09 2021
Recipients:JING ZHANG <[hidden email]>
CC:user <[hidden email]>
Subject:Re: Is it possible to use OperatorState, when NOT implementing a source or sink function?
Oh....that won't work for me either. I needed to use MapState.

Perhaps I should describe my problem. I am using a KeyedState process function, but the workload that it is processing is not distributing well across the cluster. I have four task managers, but the way my data is keyed in this operator, it only goes to one task manager node.

I need state, but I don't really need it keyed.

On Sat, Jun 5, 2021 at 4:56 AM Marco Villalobos <[hidden email]> wrote:
Does that work in the DataStream API in Batch Execution Mode?

On Sat, Jun 5, 2021 at 12:04 AM JING ZHANG <[hidden email]> wrote:
Hi,
please use `CheckpointedFunction`, you could initialize your operator state in `initializeState` method by using context.getOperatorStateStore().***

Best regards,
JING ZHANG

Marco Villalobos <[hidden email]> 于2021年6月5日周六下午1:55写道：
Is it possible to use OperatorState, when NOT implementing a source or sink function?

If yes, then how?