Lot of data generated in out file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lot of data generated in out file

Ashish Attarde
Hi Flink Team,

I am seeing one of the out file for on my task manager is dumping lot of data. 
Not sure, why this is happening. All the data that is getting dumped in out file is ideally what parsedInput stream should be getting.



Here is the flink program that is executing:

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);

DataStream<String> rawInput = env.addSource(new FlinkKafkaConsumer010<>(
"event-ft",
new SimpleStringSchema(),
kafkaProps).setStartFromLatest());

DataStream<String> input2 = rawInput
.map(new KafkaMsgReads());

DataStream<EventRec> parsedInput = input2
.flatMap(new Splitter())
.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor<FTRecord>(Time.seconds(2)) {
@Override
public long extractTimestamp(EventRec record) {
return record.getmTimeStamp()/TT_SCALE_FACTOR;
}
}).rebalance().map(new RawInputCounter());

parsedInput
.keyBy("mflowHashLSB","mflowHashMSB")
.window(SlidingEventTimeWindows.of(Time.milliseconds(1000),Time.milliseconds(950)))
.allowedLateness(Time.seconds(1))
.apply(new CRWindow());

parsedInput.writeUsingOutputFormat(new DiscardingOutputFormat<>());

env.execute();

Here is the definition of CRWindow class:

public static class CRWindow  implements WindowFunction<FTRecord, FTFlow, Tuple, TimeWindow> {

@Override
public void apply(Tuple key, TimeWindow window, Iterable<FTRecord> ftRecords, Collector<FTFlow> collector) {
return;
}

}

Also, is there any elaborate documentation of windowing mechanism available? I am intereseted in using windowing with ability to push the events from one window to future window. Similar funcationality exist in storm for pushing an event to subsequent window. 

Thanks
-Ashish
Reply | Threaded
Open this post in threaded view
|

Re: Lot of data generated in out file

Tzu-Li (Gordon) Tai
Hi Ashish,

I don't really see why there are outputs in the out file for the program you
provided. Perhaps others could chime in here ..

As for your second question regarding window outputs:
Yes, subsequent window operators should definitely be doable in Flink.
This is just a matter of multiple transformations in your pipeline.
The only restriction right now, is that after a window operation, the stream
is no longer a KeyedStream, so you would need to "re-key" the stream before
applying the second windowed transformation.

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Lot of data generated in out file

Ashish Attarde
Thanks Gordon for your reply.

The out file mistry got resolved. Someone accidently, modified the POJO code on server that I work on to, and had put in println.

Thank you for the information. I am experimenting with windowing to understand better and fit in my use case. 

Thanks
-Ashish

On Sun, Apr 15, 2018 at 10:09 PM, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
Hi Ashish,

I don't really see why there are outputs in the out file for the program you
provided. Perhaps others could chime in here ..

As for your second question regarding window outputs:
Yes, subsequent window operators should definitely be doable in Flink.
This is just a matter of multiple transformations in your pipeline.
The only restriction right now, is that after a window operation, the stream
is no longer a KeyedStream, so you would need to "re-key" the stream before
applying the second windowed transformation.

Cheers,
Gordon



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/



--

Thanks
-Ashish Attarde