Hi,
I am using flink-1.2 streaming API to process clickstream and compute some results per cookie. The computed results are stored in Cassandra using flink-cassandra connector. After a result is stored in cassandra, I want to notify an external system (using their API or via Kafka) that result is available (for one cookie). Can this be done (with/without modifying sink source code)? What if I create a JointSink that internally uses cassandra sink and kafka sink and writes to both places? I am not worried about same record written multiple times as the computed result and the external system consumption is idempotent. Thank you, Tarandeep |
Hi,
this is the second time that something like this is being requested or proposed. This was the first time: [1].
+Seth, who might have an opinion on this.
I'm starting to think that we might need to generalise this pattern. Right now, the SinkFunction interface is this:
public interface SinkFunction<IN> extends Function, Serializable {
/**
* Function for standard sink behaviour. This function is called
* for every record.
*/
void invoke(IN value) throws Exception;
}
The interface for FlatMapFunction is this:
public interface FlatMapFunction<T, O> extends Function, Serializable {
/**
* The core method of the FlatMapFunction. Takes an element from the
* input data set and transforms
* it into zero, one, or more elements.
void flatMap(T value, Collector<O> out) throws Exception;
}
The only difference is naming and the fact that FlatMapFunction can emit elements. All SinkFunction implementations could be implemented as a FlatMapFunction<IN, Void>, so we might not even need a special SinkFunction in the end and all sinks can become "sinks that can also forward data".
For an example of a system that does it like this you can look at Apache Beam, where there are no special Sink Functions. Everything is just DoFns (basically a very powerful FlatMapFunction) stringed together. For example there is a file "sink" that consists of three DoFns: one does some initialisation and sends forward write handles, the second DoFn does the writing and forwards handles to the written data, the third one finalises by renaming the written files to the final location.
Best,
Aljoscha
On Thu, Mar 9, 2017, at 16:56, Tarandeep Singh wrote:
|
Free forum by Nabble | Edit this page |