Hi together
I did implement a little pipeline, which has some statefull computation: Conntaing a function which extends RichFlatMapFunction and Checkpointed. The state is stored in the field: private var state_item: ValueState[Option[Pathsection]] = null override def open(conf: Configuration):Unit = { log.info("Open a new Checkpointed FlatMap function with configuration: {}", conf) state_item = getRuntimeContext.getState(new ValueStateDescriptor("State of Pathsection", (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], None)) } override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): Option[Pathsection] = { log.debug("Snapshote State with checkpointId: {} at Timestamp {}", checkpointId, checkpointTimestamp) removeOldEntries(checkpointTimestamp) state_item.value() } override def restoreState(state: Option[Pathsection]):Unit = { if (state == null){ log.debug("Restore Snapshot: Null") } else if(state == None){ log.debug("Restore Snapshot: None") } else if (state_item == null){ log.debug("State Item not initialized") } else{ state_item.update(state) } } But when I do run this computation and get the program to fail, I get the following error: java.lang.Exception: Could not restore checkpointed state to operators and functions at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.Exception: Failed to restore state to function: No key available. at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) ... 3 more Caused by: java.lang.RuntimeException: No key available. at org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) at ....Function which has the Checkpointed thingy (CheckpointedIncrAddrPositions.scala:68) What am I missing? Thanks Simon |
Hi Simon,
There are two types of state: 1) Keyed State The state you access via `getRuntimeContext().getState(..)` is scoped by key. If no key is in the scope, the key is null and update operations won't work. Use a `keyBy(..)` before your map function to partition the state by key. The state is automatically checkpointed by Flink. 2) Operator State This state is kept per parallel instance of the operator. You implement the Checkpointed interface and use the `snapshotState(..)` and `restoreState(..)` methods to checkpoint the state. I think you want to use one of the two. Although it is possible to use both, it looks like you're confusing the two in your example. Cheers, Max On Wed, Jun 1, 2016 at 3:06 PM, simon peyer <[hidden email]> wrote: > Hi together > > I did implement a little pipeline, which has some statefull computation: > > Conntaing a function which extends RichFlatMapFunction and Checkpointed. > > The state is stored in the field: > > private var state_item: ValueState[Option[Pathsection]] = null > > > override def open(conf: Configuration):Unit = { > log.info("Open a new Checkpointed FlatMap function with configuration: > {}", conf) > state_item = getRuntimeContext.getState(new ValueStateDescriptor("State > of Pathsection", > (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], > None)) > } > > > override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): > Option[Pathsection] = { > log.debug("Snapshote State with checkpointId: {} at Timestamp {}", > checkpointId, checkpointTimestamp) > removeOldEntries(checkpointTimestamp) > state_item.value() > } > > override def restoreState(state: Option[Pathsection]):Unit = { > > > > if (state == null){ > log.debug("Restore Snapshot: Null") > } > else if(state == None){ > log.debug("Restore Snapshot: None") > } > else if (state_item == null){ > log.debug("State Item not initialized") > } > else{ > state_item.update(state) > } > } > > But when I do run this computation and get the program to fail, I get the > following error: > > > java.lang.Exception: Could not restore checkpointed state to operators and > functions > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Failed to restore state to function: No key > available. > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) > ... 3 more > Caused by: java.lang.RuntimeException: No key available. > at > org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) > at ....Function which has the Checkpointed thingy > (CheckpointedIncrAddrPositions.scala:68) > > What am I missing? > > Thanks > Simon > |
Hi Max
I'm using a keyby but would like to store the state. Thus what's the way to go? How do I have to handle the state in option 2). Could you give an example? Thanks --Simon > On 01 Jun 2016, at 15:55, Maximilian Michels <[hidden email]> wrote: > > Hi Simon, > > There are two types of state: > > > 1) Keyed State > > The state you access via `getRuntimeContext().getState(..)` is scoped > by key. If no key is in the scope, the key is null and update > operations won't work. Use a `keyBy(..)` before your map function to > partition the state by key. The state is automatically checkpointed by > Flink. > > 2) Operator State > > This state is kept per parallel instance of the operator. You > implement the Checkpointed interface and use the `snapshotState(..)` > and `restoreState(..)` methods to checkpoint the state. > > > I think you want to use one of the two. Although it is possible to use > both, it looks like you're confusing the two in your example. > > Cheers, > Max > > On Wed, Jun 1, 2016 at 3:06 PM, simon peyer <[hidden email]> wrote: >> Hi together >> >> I did implement a little pipeline, which has some statefull computation: >> >> Conntaing a function which extends RichFlatMapFunction and Checkpointed. >> >> The state is stored in the field: >> >> private var state_item: ValueState[Option[Pathsection]] = null >> >> >> override def open(conf: Configuration):Unit = { >> log.info("Open a new Checkpointed FlatMap function with configuration: >> {}", conf) >> state_item = getRuntimeContext.getState(new ValueStateDescriptor("State >> of Pathsection", >> (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], >> None)) >> } >> >> >> override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): >> Option[Pathsection] = { >> log.debug("Snapshote State with checkpointId: {} at Timestamp {}", >> checkpointId, checkpointTimestamp) >> removeOldEntries(checkpointTimestamp) >> state_item.value() >> } >> >> override def restoreState(state: Option[Pathsection]):Unit = { >> >> >> >> if (state == null){ >> log.debug("Restore Snapshot: Null") >> } >> else if(state == None){ >> log.debug("Restore Snapshot: None") >> } >> else if (state_item == null){ >> log.debug("State Item not initialized") >> } >> else{ >> state_item.update(state) >> } >> } >> >> But when I do run this computation and get the program to fail, I get the >> following error: >> >> >> java.lang.Exception: Could not restore checkpointed state to operators and >> functions >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.Exception: Failed to restore state to function: No key >> available. >> at >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) >> ... 3 more >> Caused by: java.lang.RuntimeException: No key available. >> at >> org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) >> at ....Function which has the Checkpointed thingy >> (CheckpointedIncrAddrPositions.scala:68) >> >> What am I missing? >> >> Thanks >> Simon >> |
Hi Simon,
You don't need to write any code to checkpoint the Keyed State. It is done automatically by Flink. Just remove the `snapshoteState(..)` and `restoreState(..)` methods. Cheers, Max On Wed, Jun 1, 2016 at 4:00 PM, simon peyer <[hidden email]> wrote: > Hi Max > > I'm using a keyby but would like to store the state. > > Thus what's the way to go? > > How do I have to handle the state in option 2). > > Could you give an example? > > Thanks > --Simon > >> On 01 Jun 2016, at 15:55, Maximilian Michels <[hidden email]> wrote: >> >> Hi Simon, >> >> There are two types of state: >> >> >> 1) Keyed State >> >> The state you access via `getRuntimeContext().getState(..)` is scoped >> by key. If no key is in the scope, the key is null and update >> operations won't work. Use a `keyBy(..)` before your map function to >> partition the state by key. The state is automatically checkpointed by >> Flink. >> >> 2) Operator State >> >> This state is kept per parallel instance of the operator. You >> implement the Checkpointed interface and use the `snapshotState(..)` >> and `restoreState(..)` methods to checkpoint the state. >> >> >> I think you want to use one of the two. Although it is possible to use >> both, it looks like you're confusing the two in your example. >> >> Cheers, >> Max >> >> On Wed, Jun 1, 2016 at 3:06 PM, simon peyer <[hidden email]> wrote: >>> Hi together >>> >>> I did implement a little pipeline, which has some statefull computation: >>> >>> Conntaing a function which extends RichFlatMapFunction and Checkpointed. >>> >>> The state is stored in the field: >>> >>> private var state_item: ValueState[Option[Pathsection]] = null >>> >>> >>> override def open(conf: Configuration):Unit = { >>> log.info("Open a new Checkpointed FlatMap function with configuration: >>> {}", conf) >>> state_item = getRuntimeContext.getState(new ValueStateDescriptor("State >>> of Pathsection", >>> (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], >>> None)) >>> } >>> >>> >>> override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): >>> Option[Pathsection] = { >>> log.debug("Snapshote State with checkpointId: {} at Timestamp {}", >>> checkpointId, checkpointTimestamp) >>> removeOldEntries(checkpointTimestamp) >>> state_item.value() >>> } >>> >>> override def restoreState(state: Option[Pathsection]):Unit = { >>> >>> >>> >>> if (state == null){ >>> log.debug("Restore Snapshot: Null") >>> } >>> else if(state == None){ >>> log.debug("Restore Snapshot: None") >>> } >>> else if (state_item == null){ >>> log.debug("State Item not initialized") >>> } >>> else{ >>> state_item.update(state) >>> } >>> } >>> >>> But when I do run this computation and get the program to fail, I get the >>> following error: >>> >>> >>> java.lang.Exception: Could not restore checkpointed state to operators and >>> functions >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >>> at java.lang.Thread.run(Thread.java:745) >>> Caused by: java.lang.Exception: Failed to restore state to function: No key >>> available. >>> at >>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) >>> ... 3 more >>> Caused by: java.lang.RuntimeException: No key available. >>> at >>> org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) >>> at ....Function which has the Checkpointed thingy >>> (CheckpointedIncrAddrPositions.scala:68) >>> >>> What am I missing? >>> >>> Thanks >>> Simon >>> > |
Hi Max
Thanks for your answer. We have some states, on some keys, which we would like to delete after a certain time. And since there is no option at the moment to put an "expiriece" date on it, I just use the snapshot function to test and verify if the current key is still in some threshold. So I would like to have an option to perodically check the timestamp, and remove old entries from the state. Therefore I used the KeyedStream to key by a id. Can I use the internal function to do the snapshoting, but use the snapshot function to do some cleanup on the states? Thanks Simon > On 01 Jun 2016, at 16:29, Maximilian Michels <[hidden email]> wrote: > > Hi Simon, > > You don't need to write any code to checkpoint the Keyed State. It is > done automatically by Flink. Just remove the `snapshoteState(..)` and > `restoreState(..)` methods. > > Cheers, > Max > > On Wed, Jun 1, 2016 at 4:00 PM, simon peyer <[hidden email]> wrote: >> Hi Max >> >> I'm using a keyby but would like to store the state. >> >> Thus what's the way to go? >> >> How do I have to handle the state in option 2). >> >> Could you give an example? >> >> Thanks >> --Simon >> >>> On 01 Jun 2016, at 15:55, Maximilian Michels <[hidden email]> wrote: >>> >>> Hi Simon, >>> >>> There are two types of state: >>> >>> >>> 1) Keyed State >>> >>> The state you access via `getRuntimeContext().getState(..)` is scoped >>> by key. If no key is in the scope, the key is null and update >>> operations won't work. Use a `keyBy(..)` before your map function to >>> partition the state by key. The state is automatically checkpointed by >>> Flink. >>> >>> 2) Operator State >>> >>> This state is kept per parallel instance of the operator. You >>> implement the Checkpointed interface and use the `snapshotState(..)` >>> and `restoreState(..)` methods to checkpoint the state. >>> >>> >>> I think you want to use one of the two. Although it is possible to use >>> both, it looks like you're confusing the two in your example. >>> >>> Cheers, >>> Max >>> >>> On Wed, Jun 1, 2016 at 3:06 PM, simon peyer <[hidden email]> wrote: >>>> Hi together >>>> >>>> I did implement a little pipeline, which has some statefull computation: >>>> >>>> Conntaing a function which extends RichFlatMapFunction and Checkpointed. >>>> >>>> The state is stored in the field: >>>> >>>> private var state_item: ValueState[Option[Pathsection]] = null >>>> >>>> >>>> override def open(conf: Configuration):Unit = { >>>> log.info("Open a new Checkpointed FlatMap function with configuration: >>>> {}", conf) >>>> state_item = getRuntimeContext.getState(new ValueStateDescriptor("State >>>> of Pathsection", >>>> (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], >>>> None)) >>>> } >>>> >>>> >>>> override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): >>>> Option[Pathsection] = { >>>> log.debug("Snapshote State with checkpointId: {} at Timestamp {}", >>>> checkpointId, checkpointTimestamp) >>>> removeOldEntries(checkpointTimestamp) >>>> state_item.value() >>>> } >>>> >>>> override def restoreState(state: Option[Pathsection]):Unit = { >>>> >>>> >>>> >>>> if (state == null){ >>>> log.debug("Restore Snapshot: Null") >>>> } >>>> else if(state == None){ >>>> log.debug("Restore Snapshot: None") >>>> } >>>> else if (state_item == null){ >>>> log.debug("State Item not initialized") >>>> } >>>> else{ >>>> state_item.update(state) >>>> } >>>> } >>>> >>>> But when I do run this computation and get the program to fail, I get the >>>> following error: >>>> >>>> >>>> java.lang.Exception: Could not restore checkpointed state to operators and >>>> functions >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >>>> at java.lang.Thread.run(Thread.java:745) >>>> Caused by: java.lang.Exception: Failed to restore state to function: No key >>>> available. >>>> at >>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) >>>> ... 3 more >>>> Caused by: java.lang.RuntimeException: No key available. >>>> at >>>> org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) >>>> at ....Function which has the Checkpointed thingy >>>> (CheckpointedIncrAddrPositions.scala:68) >>>> >>>> What am I missing? >>>> >>>> Thanks >>>> Simon >>>> >> |
Hi
In other words, what's the easiest way to clean up states in flink, if this key may never arrive again? --Thanks Simon > On 02 Jun 2016, at 10:16, simon peyer <[hidden email]> wrote: > > Hi Max > > Thanks for your answer. > We have some states, on some keys, which we would like to delete after a certain time. > And since there is no option at the moment to put an "expiriece" date on it, I just use the snapshot function to test and verify if the current key is still in some threshold. > > So I would like to have an option to perodically check the timestamp, and remove old entries from the state. > > Therefore I used the KeyedStream to key by a id. > Can I use the internal function to do the snapshoting, but use the snapshot function to do some cleanup on the states? > > > Thanks > Simon > > >> On 01 Jun 2016, at 16:29, Maximilian Michels <[hidden email]> wrote: >> >> Hi Simon, >> >> You don't need to write any code to checkpoint the Keyed State. It is >> done automatically by Flink. Just remove the `snapshoteState(..)` and >> `restoreState(..)` methods. >> >> Cheers, >> Max >> >> On Wed, Jun 1, 2016 at 4:00 PM, simon peyer <[hidden email]> wrote: >>> Hi Max >>> >>> I'm using a keyby but would like to store the state. >>> >>> Thus what's the way to go? >>> >>> How do I have to handle the state in option 2). >>> >>> Could you give an example? >>> >>> Thanks >>> --Simon >>> >>>> On 01 Jun 2016, at 15:55, Maximilian Michels <[hidden email]> wrote: >>>> >>>> Hi Simon, >>>> >>>> There are two types of state: >>>> >>>> >>>> 1) Keyed State >>>> >>>> The state you access via `getRuntimeContext().getState(..)` is scoped >>>> by key. If no key is in the scope, the key is null and update >>>> operations won't work. Use a `keyBy(..)` before your map function to >>>> partition the state by key. The state is automatically checkpointed by >>>> Flink. >>>> >>>> 2) Operator State >>>> >>>> This state is kept per parallel instance of the operator. You >>>> implement the Checkpointed interface and use the `snapshotState(..)` >>>> and `restoreState(..)` methods to checkpoint the state. >>>> >>>> >>>> I think you want to use one of the two. Although it is possible to use >>>> both, it looks like you're confusing the two in your example. >>>> >>>> Cheers, >>>> Max >>>> >>>> On Wed, Jun 1, 2016 at 3:06 PM, simon peyer <[hidden email]> wrote: >>>>> Hi together >>>>> >>>>> I did implement a little pipeline, which has some statefull computation: >>>>> >>>>> Conntaing a function which extends RichFlatMapFunction and Checkpointed. >>>>> >>>>> The state is stored in the field: >>>>> >>>>> private var state_item: ValueState[Option[Pathsection]] = null >>>>> >>>>> >>>>> override def open(conf: Configuration):Unit = { >>>>> log.info("Open a new Checkpointed FlatMap function with configuration: >>>>> {}", conf) >>>>> state_item = getRuntimeContext.getState(new ValueStateDescriptor("State >>>>> of Pathsection", >>>>> (Option(Pathsection)).getClass.asInstanceOf[Class[Option[Pathsection]]], >>>>> None)) >>>>> } >>>>> >>>>> >>>>> override def snapshotState(checkpointId: Long, checkpointTimestamp: Long): >>>>> Option[Pathsection] = { >>>>> log.debug("Snapshote State with checkpointId: {} at Timestamp {}", >>>>> checkpointId, checkpointTimestamp) >>>>> removeOldEntries(checkpointTimestamp) >>>>> state_item.value() >>>>> } >>>>> >>>>> override def restoreState(state: Option[Pathsection]):Unit = { >>>>> >>>>> >>>>> >>>>> if (state == null){ >>>>> log.debug("Restore Snapshot: Null") >>>>> } >>>>> else if(state == None){ >>>>> log.debug("Restore Snapshot: None") >>>>> } >>>>> else if (state_item == null){ >>>>> log.debug("State Item not initialized") >>>>> } >>>>> else{ >>>>> state_item.update(state) >>>>> } >>>>> } >>>>> >>>>> But when I do run this computation and get the program to fail, I get the >>>>> following error: >>>>> >>>>> >>>>> java.lang.Exception: Could not restore checkpointed state to operators and >>>>> functions >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:457) >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:209) >>>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) >>>>> at java.lang.Thread.run(Thread.java:745) >>>>> Caused by: java.lang.Exception: Failed to restore state to function: No key >>>>> available. >>>>> at >>>>> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreState(AbstractUdfStreamOperator.java:168) >>>>> at >>>>> org.apache.flink.streaming.runtime.tasks.StreamTask.restoreState(StreamTask.java:449) >>>>> ... 3 more >>>>> Caused by: java.lang.RuntimeException: No key available. >>>>> at >>>>> org.apache.flink.runtime.state.memory.MemValueState.update(MemValueState.java:69) >>>>> at ....Function which has the Checkpointed thingy >>>>> (CheckpointedIncrAddrPositions.scala:68) >>>>> >>>>> What am I missing? >>>>> >>>>> Thanks >>>>> Simon >>>>> >>> > |
Hi, right now, the way to do it is by using a custom operator, i.e. a OneInputStreamOperator. There you have the low-level control and can set timers based on watermarks or processing time. You can, for example look at StreamMap for a very simple operator or WindowOperator for an operator that does timers both on event time and processing time. In the future, we will also provide an API on states that allows to specify a time-to-live. That should solve your case. Cheers, Aljoscha On Thu, 2 Jun 2016 at 11:40 simon peyer <[hidden email]> wrote: Hi |
Free forum by Nabble | Edit this page |