Call batch job in streaming context?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Call batch job in streaming context?

eric hoffmann
Hi
Is it possible to call batch job on a streaming context?
what i want to do is:
for a given input event, fetch cassandra elements based on event data, apply transformation on them and apply a ranking when all elements fetched by cassandra are processed.
If i do this in batch mode i would have to submit a job on each events and i can have an event every 45 seconds.
Is there any alternative? can i start a batch job that will receive some external request, process it and wait for another request?
thx
Eric
Reply | Threaded
Open this post in threaded view
|

Re: Call batch job in streaming context?

Piotr Nowojski
Hi,

I’m not sure if I understand your problem and your context, but spawning a batch job every 45 seconds doesn’t sound as a that bad idea (as long as the job is short).

Another idea would be to incorporate this batch job inside your streaming job, for example by reading from Cassandra using an AsyncIO operator:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/asyncio.html

Quick google search revealed for example this:

https://stackoverflow.com/questions/43067681/read-data-from-cassandra-for-processing-in-flink

Piotrek

> On 23 Nov 2018, at 10:33, eric hoffmann <[hidden email]> wrote:
>
> Hi
> Is it possible to call batch job on a streaming context?
> what i want to do is:
> for a given input event, fetch cassandra elements based on event data, apply transformation on them and apply a ranking when all elements fetched by cassandra are processed.
> If i do this in batch mode i would have to submit a job on each events and i can have an event every 45 seconds.
> Is there any alternative? can i start a batch job that will receive some external request, process it and wait for another request?
> thx
> Eric

Reply | Threaded
Open this post in threaded view
|

Re: Call batch job in streaming context?

bastien dine
Hi Eric,

You can run a job from another one, using the REST API
This is the only way we have found to launch a batch job from a streaming job

------------------

Bastien DINE
Data Architect / Software Engineer / Sysadmin
bastiendine.io


Le ven. 23 nov. 2018 à 11:52, Piotr Nowojski <[hidden email]> a écrit :
Hi,

I’m not sure if I understand your problem and your context, but spawning a batch job every 45 seconds doesn’t sound as a that bad idea (as long as the job is short).

Another idea would be to incorporate this batch job inside your streaming job, for example by reading from Cassandra using an AsyncIO operator:
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/asyncio.html

Quick google search revealed for example this:

https://stackoverflow.com/questions/43067681/read-data-from-cassandra-for-processing-in-flink

Piotrek

> On 23 Nov 2018, at 10:33, eric hoffmann <[hidden email]> wrote:
>
> Hi
> Is it possible to call batch job on a streaming context?
> what i want to do is:
> for a given input event, fetch cassandra elements based on event data, apply transformation on them and apply a ranking when all elements fetched by cassandra are processed.
> If i do this in batch mode i would have to submit a job on each events and i can have an event every 45 seconds.
> Is there any alternative? can i start a batch job that will receive some external request, process it and wait for another request?
> thx
> Eric