Problem with Windowing

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Windowing

Rico Bergmann
Hi!

I have a problem that I cannot really track down. I'll try to describe
the issue.

My streaming flink program computes something. At the end I'm doing the
follwing on my DataStream ds
ds.window(2, TimeUnit.SECONDS)
.groupBy(/*custom KeySelector converting input to a String representation*/)
.mapWindow(/*TypeConversion*/)
.flatten()

Then the result is written to a Kafka topic.

The purpose of this is output deduplication within a 2 seconds window...

Without the above the program works fine. But with the above I don't get
any output and no error appears in the log. The program keeps running.
Am I doing something wrong?

I would be happy for help!

Cheers, Rico.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Windowing

Matthias J. Sax
Can you post your whole program (both versions if possible)?

Otherwise I have only a wild guess: A common mistake is not to assign
the stream variable properly:

DataStream ds = ...

ds = ds.APPLY_FUNCTIONS

ds.APPLY_MORE_FUNCTIONS

In your code example, the assignment is missing -- but maybe it just
missing in your email.

-Matthias


On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:

> Hi!
>
> I have a problem that I cannot really track down. I'll try to describe
> the issue.
>
> My streaming flink program computes something. At the end I'm doing the
> follwing on my DataStream ds
> ds.window(2, TimeUnit.SECONDS)
> .groupBy(/*custom KeySelector converting input to a String
> representation*/)
> .mapWindow(/*TypeConversion*/)
> .flatten()
>
> Then the result is written to a Kafka topic.
>
> The purpose of this is output deduplication within a 2 seconds window...
>
> Without the above the program works fine. But with the above I don't get
> any output and no error appears in the log. The program keeps running.
> Am I doing something wrong?
>
> I would be happy for help!
>
> Cheers, Rico.


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Windowing

Rico Bergmann
The part is exactly as I wrote. ds is assigned a data flow that computes some stuff. Then the de duplication code as written in my first mail us assigned to a new variable called output. Then output.addSink(.) is called.


> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax <[hidden email]>:
>
> Can you post your whole program (both versions if possible)?
>
> Otherwise I have only a wild guess: A common mistake is not to assign
> the stream variable properly:
>
> DataStream ds = ...
>
> ds = ds.APPLY_FUNCTIONS
>
> ds.APPLY_MORE_FUNCTIONS
>
> In your code example, the assignment is missing -- but maybe it just
> missing in your email.
>
> -Matthias
>
>
>> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
>> Hi!
>>
>> I have a problem that I cannot really track down. I'll try to describe
>> the issue.
>>
>> My streaming flink program computes something. At the end I'm doing the
>> follwing on my DataStream ds
>> ds.window(2, TimeUnit.SECONDS)
>> .groupBy(/*custom KeySelector converting input to a String
>> representation*/)
>> .mapWindow(/*TypeConversion*/)
>> .flatten()
>>
>> Then the result is written to a Kafka topic.
>>
>> The purpose of this is output deduplication within a 2 seconds window...
>>
>> Without the above the program works fine. But with the above I don't get
>> any output and no error appears in the log. The program keeps running.
>> Am I doing something wrong?
>>
>> I would be happy for help!
>>
>> Cheers, Rico.
>
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Windowing

Matthias J. Sax
Maybe you could include some log statements in you user code to see
which parts of the program receive data and which not. To narrow down
the problematic part...

On 08/31/2015 06:03 PM, Rico Bergmann wrote:

> The part is exactly as I wrote. ds is assigned a data flow that computes some stuff. Then the de duplication code as written in my first mail us assigned to a new variable called output. Then output.addSink(.) is called.
>
>
>> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax <[hidden email]>:
>>
>> Can you post your whole program (both versions if possible)?
>>
>> Otherwise I have only a wild guess: A common mistake is not to assign
>> the stream variable properly:
>>
>> DataStream ds = ...
>>
>> ds = ds.APPLY_FUNCTIONS
>>
>> ds.APPLY_MORE_FUNCTIONS
>>
>> In your code example, the assignment is missing -- but maybe it just
>> missing in your email.
>>
>> -Matthias
>>
>>
>>> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
>>> Hi!
>>>
>>> I have a problem that I cannot really track down. I'll try to describe
>>> the issue.
>>>
>>> My streaming flink program computes something. At the end I'm doing the
>>> follwing on my DataStream ds
>>> ds.window(2, TimeUnit.SECONDS)
>>> .groupBy(/*custom KeySelector converting input to a String
>>> representation*/)
>>> .mapWindow(/*TypeConversion*/)
>>> .flatten()
>>>
>>> Then the result is written to a Kafka topic.
>>>
>>> The purpose of this is output deduplication within a 2 seconds window...
>>>
>>> Without the above the program works fine. But with the above I don't get
>>> any output and no error appears in the log. The program keeps running.
>>> Am I doing something wrong?
>>>
>>> I would be happy for help!
>>>
>>> Cheers, Rico.
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Windowing

Stephan Ewen
Hey Rico!

Parts of the "global windows" are still not super stable, and we are heavily reworking them for the 0.10 release.

What you can try is reversing the order of the "window" and "groupby" statement. If you group before windowing, you get local windows, if you window before grouping, you get global windows. Local windows work better.

Greetings,
Stephan


On Mon, Aug 31, 2015 at 6:40 PM, Matthias J. Sax <[hidden email]> wrote:
Maybe you could include some log statements in you user code to see
which parts of the program receive data and which not. To narrow down
the problematic part...

On 08/31/2015 06:03 PM, Rico Bergmann wrote:
> The part is exactly as I wrote. ds is assigned a data flow that computes some stuff. Then the de duplication code as written in my first mail us assigned to a new variable called output. Then output.addSink(.) is called.
>
>
>> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax <[hidden email]>:
>>
>> Can you post your whole program (both versions if possible)?
>>
>> Otherwise I have only a wild guess: A common mistake is not to assign
>> the stream variable properly:
>>
>> DataStream ds = ...
>>
>> ds = ds.APPLY_FUNCTIONS
>>
>> ds.APPLY_MORE_FUNCTIONS
>>
>> In your code example, the assignment is missing -- but maybe it just
>> missing in your email.
>>
>> -Matthias
>>
>>
>>> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
>>> Hi!
>>>
>>> I have a problem that I cannot really track down. I'll try to describe
>>> the issue.
>>>
>>> My streaming flink program computes something. At the end I'm doing the
>>> follwing on my DataStream ds
>>> ds.window(2, TimeUnit.SECONDS)
>>> .groupBy(/*custom KeySelector converting input to a String
>>> representation*/)
>>> .mapWindow(/*TypeConversion*/)
>>> .flatten()
>>>
>>> Then the result is written to a Kafka topic.
>>>
>>> The purpose of this is output deduplication within a 2 seconds window...
>>>
>>> Without the above the program works fine. But with the above I don't get
>>> any output and no error appears in the log. The program keeps running.
>>> Am I doing something wrong?
>>>
>>> I would be happy for help!
>>>
>>> Cheers, Rico.
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Problem with Windowing

Rico Bergmann
Hi Stefan,

Thanks for the advice. It works ...

Cheers. Rico. 



Am 31.08.2015 um 20:14 schrieb Stephan Ewen <[hidden email]>:

Hey Rico!

Parts of the "global windows" are still not super stable, and we are heavily reworking them for the 0.10 release.

What you can try is reversing the order of the "window" and "groupby" statement. If you group before windowing, you get local windows, if you window before grouping, you get global windows. Local windows work better.

Greetings,
Stephan


On Mon, Aug 31, 2015 at 6:40 PM, Matthias J. Sax <[hidden email]> wrote:
Maybe you could include some log statements in you user code to see
which parts of the program receive data and which not. To narrow down
the problematic part...

On 08/31/2015 06:03 PM, Rico Bergmann wrote:
> The part is exactly as I wrote. ds is assigned a data flow that computes some stuff. Then the de duplication code as written in my first mail us assigned to a new variable called output. Then output.addSink(.) is called.
>
>
>> Am 31.08.2015 um 17:45 schrieb Matthias J. Sax <[hidden email]>:
>>
>> Can you post your whole program (both versions if possible)?
>>
>> Otherwise I have only a wild guess: A common mistake is not to assign
>> the stream variable properly:
>>
>> DataStream ds = ...
>>
>> ds = ds.APPLY_FUNCTIONS
>>
>> ds.APPLY_MORE_FUNCTIONS
>>
>> In your code example, the assignment is missing -- but maybe it just
>> missing in your email.
>>
>> -Matthias
>>
>>
>>> On 08/31/2015 04:38 PM, Dipl.-Inf. Rico Bergmann wrote:
>>> Hi!
>>>
>>> I have a problem that I cannot really track down. I'll try to describe
>>> the issue.
>>>
>>> My streaming flink program computes something. At the end I'm doing the
>>> follwing on my DataStream ds
>>> ds.window(2, TimeUnit.SECONDS)
>>> .groupBy(/*custom KeySelector converting input to a String
>>> representation*/)
>>> .mapWindow(/*TypeConversion*/)
>>> .flatten()
>>>
>>> Then the result is written to a Kafka topic.
>>>
>>> The purpose of this is output deduplication within a 2 seconds window...
>>>
>>> Without the above the program works fine. But with the above I don't get
>>> any output and no error appears in the log. The program keeps running.
>>> Am I doing something wrong?
>>>
>>> I would be happy for help!
>>>
>>> Cheers, Rico.
>>
>