what is the difference between map vs process on a datastream?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

what is the difference between map vs process on a datastream?

kant kodali
what is the difference between map vs process on a datastream? they look very similar.

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: what is the difference between map vs process on a datastream?

David Anderson-2
Map applies a MapFunction (or a RichMapFunction) to a DataStream and does a one-to-one transformation of the stream elements.

Process applies a ProcessFunction, which can produce zero, one, or many events in response to each event. And when used on a keyed stream, a KeyedProcessFunction can use Timers to defer actions until later, based either on watermarks or the time-of-day clock. A ProcessFunction can also have side outputs.

Both RichMapFunctions and KeyedProcessFunctions can use keyed state.

Process is strictly more powerful -- there's nothing you can do with map that you couldn't do with process instead. The same is true for flatmap, which is similar to map, but with a Collector that can be used to emit zero, one, or many events in response to each event, just like a process function.

David


On Tue, Mar 17, 2020 at 11:50 AM kant kodali <[hidden email]> wrote:
what is the difference between map vs process on a datastream? they look very similar.

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: what is the difference between map vs process on a datastream?

kant kodali
Got it! and thanks a lot for that. So there is no difference between flatmap and process then?

On Tue, Mar 17, 2020 at 5:29 AM David Anderson <[hidden email]> wrote:
Map applies a MapFunction (or a RichMapFunction) to a DataStream and does a one-to-one transformation of the stream elements.

Process applies a ProcessFunction, which can produce zero, one, or many events in response to each event. And when used on a keyed stream, a KeyedProcessFunction can use Timers to defer actions until later, based either on watermarks or the time-of-day clock. A ProcessFunction can also have side outputs.

Both RichMapFunctions and KeyedProcessFunctions can use keyed state.

Process is strictly more powerful -- there's nothing you can do with map that you couldn't do with process instead. The same is true for flatmap, which is similar to map, but with a Collector that can be used to emit zero, one, or many events in response to each event, just like a process function.

David


On Tue, Mar 17, 2020 at 11:50 AM kant kodali <[hidden email]> wrote:
what is the difference between map vs process on a datastream? they look very similar.

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: what is the difference between map vs process on a datastream?

Tzu-Li (Gordon) Tai
Hi,

As David already explained, they are similar in that you may output zero to
multiple records for both process and flatMap functions.

However, ProcessFunctions also expose to the user much more powerful
functionality, such as registering timers, outputting to side outputs, etc.

Cheers,
Gordon




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/