Updating external service and then processing response

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Updating external service and then processing response

wazza
Hi all,

I need to send a request to an external web service and then store the response in a DB table, and I am wondering how people have approached this or similar problems in the past.

The flow is: Kafka source (msgs only every few seconds)  => filter/map operators => result sent to web service (which updates state in that system) => response stored in DB.

Initially I was thinking of just creating a custom sink which basically: Sends request to webservice  => Get response containing external key => Save key into DB
This feels to me like basically smashing together 2 separate sinks into 1, and I am not sure if that is a good design or not.

Another option would be to create a RichMapFunction (possibly async function) which does the web service call. My map function can then just return the response which I can then feed into a standard DB sink. 
However, with this approach it feels strange to update an external system in a map() function, but maybe that's ok? Also, I presume to make my map function idempotent I would need to store some state (I can key the messages and use a ValueState) so I don't do duplicate web service calls if there is a failure?

Thoughts?

Thanks in advance.

Reply | Threaded
Open this post in threaded view
|

Re: Updating external service and then processing response

Michael Latta
If the external web service call does not modify the state of that external system all the approaches you list are probably ok.  If there is external state modification then you want to ensure on restart the Flink job does not resend requests to that service or that it can handle duplicate requests.  In that sense a sink that sends the request is the cleanest as it represents an export of data to an external system.  The response back is just to allow the sink to not repeat messages.  If it is sending data to the system that affects it’s state and then the response has values that need to be recorded as results not just control values, then that could be a separate flow or use the map process as from the Flink’s point of view it was a transform.  This later case however to me smacks of an undesireable side effect as these make error recovery cases harder.

Michael

> On Apr 28, 2018, at 8:21 PM, wazza <[hidden email]> wrote:
>
> Hi all,
>
> I need to send a request to an external web service and then store the response in a DB table, and I am wondering how people have approached this or similar problems in the past.
>
> The flow is: Kafka source (msgs only every few seconds)  => filter/map operators => result sent to web service (which updates state in that system) => response stored in DB.
>
> Initially I was thinking of just creating a custom sink which basically: Sends request to webservice  => Get response containing external key => Save key into DB
> This feels to me like basically smashing together 2 separate sinks into 1, and I am not sure if that is a good design or not.
>
> Another option would be to create a RichMapFunction (possibly async function) which does the web service call. My map function can then just return the response which I can then feed into a standard DB sink.
> However, with this approach it feels strange to update an external system in a map() function, but maybe that's ok? Also, I presume to make my map function idempotent I would need to store some state (I can key the messages and use a ValueState) so I don't do duplicate web service calls if there is a failure?
>
> Thoughts?
>
> Thanks in advance.
>

Reply | Threaded
Open this post in threaded view
|

Re: Updating external service and then processing response

wazza
Hi Michael,

Yeah, I feel the updating of state in an external system is a job for a sink. However, I don't really like the idea of combining both a web service call and DB write in the one sink class because it's breaking single responsibility - I feel there should be a nicer way to compose these types of flows. Also means I need to write smarts/manage state in the sink because of several points of failure (web service call, db update). NB: the response from the web service is not just for control because there are some values I need to record, including a correlation key for when the external system sends notifications to us sometime in the following hours/days, and I then need to perform an update in our system.

Which is what led me to just thinking of the web service call as a map transform, and connecting a DB sink to that operator. However, as you mention, this will be a map/transform function with side effects, therefore I would need to manage state to ensure duplicate web service calls aren't sent on failure restarts etc.

Maybe I am missing a better approach, but for now I am leaning towards doing the web service call in a (stateful) map function, with the result feeding into a DB sink.

Cheers


On Mon, Apr 30, 2018 at 2:34 AM, TechnoMage <[hidden email]> wrote:
If the external web service call does not modify the state of that external system all the approaches you list are probably ok.  If there is external state modification then you want to ensure on restart the Flink job does not resend requests to that service or that it can handle duplicate requests.  In that sense a sink that sends the request is the cleanest as it represents an export of data to an external system.  The response back is just to allow the sink to not repeat messages.  If it is sending data to the system that affects it’s state and then the response has values that need to be recorded as results not just control values, then that could be a separate flow or use the map process as from the Flink’s point of view it was a transform.  This later case however to me smacks of an undesireable side effect as these make error recovery cases harder.

Michael

> On Apr 28, 2018, at 8:21 PM, wazza <[hidden email]> wrote:
>
> Hi all,
>
> I need to send a request to an external web service and then store the response in a DB table, and I am wondering how people have approached this or similar problems in the past.
>
> The flow is: Kafka source (msgs only every few seconds)  => filter/map operators => result sent to web service (which updates state in that system) => response stored in DB.
>
> Initially I was thinking of just creating a custom sink which basically: Sends request to webservice  => Get response containing external key => Save key into DB
> This feels to me like basically smashing together 2 separate sinks into 1, and I am not sure if that is a good design or not.
>
> Another option would be to create a RichMapFunction (possibly async function) which does the web service call. My map function can then just return the response which I can then feed into a standard DB sink.
> However, with this approach it feels strange to update an external system in a map() function, but maybe that's ok? Also, I presume to make my map function idempotent I would need to store some state (I can key the messages and use a ValueState) so I don't do duplicate web service calls if there is a failure?
>
> Thoughts?
>
> Thanks in advance.
>