Flink CEP with files and no streams?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink CEP with files and no streams?

Esa Heikkinen

Hello

 

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).

Is that possible ? If yes, do you know examples Scala codes about that ?

 

Or should I convert the log files (with time stamps) into streams ?

But how to handle time stamps in Flink ?

 

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

 

I would want CEP type analysis for log files.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink CEP with files and no streams?

Fabian Hueske-2
Hi Esa,

you can also read files as a stream.
However, you have to be careful in which order you read the files and how you generate watermarks.
The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks.
Things become more tricky when you try to read the files in parallel.

Best, Fabian

2018-02-07 9:40 GMT+01:00 Esa Heikkinen <[hidden email]>:

Hello

 

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).

Is that possible ? If yes, do you know examples Scala codes about that ?

 

Or should I convert the log files (with time stamps) into streams ?

But how to handle time stamps in Flink ?

 

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

 

I would want CEP type analysis for log files.

 

 


Reply | Threaded
Open this post in threaded view
|

RE: Flink CEP with files and no streams?

Esa Heikkinen

Hi

 

Thanks for the reply, but because I am a newbie with Flink, do you have any good Scala code examples about this ?

 

Esa

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Wednesday, February 7, 2018 11:21 AM
To: Esa Heikkinen <[hidden email]>
Cc: [hidden email]
Subject: Re: Flink CEP with files and no streams?

 

Hi Esa,

you can also read files as a stream.
However, you have to be careful in which order you read the files and how you generate watermarks.

The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks.

Things become more tricky when you try to read the files in parallel.

Best, Fabian

 

2018-02-07 9:40 GMT+01:00 Esa Heikkinen <[hidden email]>:

Hello

 

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).

Is that possible ? If yes, do you know examples Scala codes about that ?

 

Or should I convert the log files (with time stamps) into streams ?

But how to handle time stamps in Flink ?

 

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

 

I would want CEP type analysis for log files.

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Flink CEP with files and no streams?

Fabian Hueske-2
Hi,

I'm not aware of a good example but I can give you some pointers.

- Implement the SourceFunction interface. This function will not be executed in parallel, so you don't have to worry about parallelism.
- Since you said, you want to run it as a batch job, you might not need to implement checkpointing functionality
- In the run method, you open the file that you need to read. Start parsing the file, when you have a record, extract the timestamp and emit both by passing them to the SourceContext.
- Every n-th record, you can emit a watermark. The watermark timestamp must be smaller than all record that will be emitted in the future.

I'd start processing a single file and extending the source from there.

Hope this helps,
Fabian

2018-02-07 13:59 GMT+01:00 Esa Heikkinen <[hidden email]>:

Hi

 

Thanks for the reply, but because I am a newbie with Flink, do you have any good Scala code examples about this ?

 

Esa

 

From: Fabian Hueske [mailto:[hidden email]]
Sent: Wednesday, February 7, 2018 11:21 AM
To: Esa Heikkinen <[hidden email]>
Cc: [hidden email]
Subject: Re: Flink CEP with files and no streams?

 

Hi Esa,

you can also read files as a stream.
However, you have to be careful in which order you read the files and how you generate watermarks.

The easiest approach is to implement a non-parallel source function that reads the files in the right order and generates watermarks.

Things become more tricky when you try to read the files in parallel.

Best, Fabian

 

2018-02-07 9:40 GMT+01:00 Esa Heikkinen <[hidden email]>:

Hello

 

I am trying to use CEP of Flink for log files (as batch job), but not for streams (as realtime).

Is that possible ? If yes, do you know examples Scala codes about that ?

 

Or should I convert the log files (with time stamps) into streams ?

But how to handle time stamps in Flink ?

 

If I can not use Flink at all for this purpose, do you have any recommendations of other tools ?

 

I would want CEP type analysis for log files.