Hi to all,
I have a (batch) job that writes to 1 or more sinks. Is there a way to retrieve, once the job has terminated, the number of records written to each sink? Is there any better way than than using an accumulator for each sink? If that is the only way to do that, the Sink API could be enriched in order to automatically create an accumulator when required. E.g. dataset.output(JDBCOutputFormat.buildJDBCOutputFormat() .setDrivername(...) .setDBUrl(...) .setQuery(...) .addRecordsCountAccumulator("some-name") .finish()) Best, Flavio |
Do you want to know how many records
the sink received, or how many the sink wrote to the DB?
If it's the first you're in luck because we measure that already, check out the metrics documentation. If it's the latter, then this issue is essentially covered by FLINK-7286 which aims at allowing functions to modify the numRecordsIn/numRecordsOut counts. On 14.02.2018 12:22, Flavio Pompermaier wrote:
|
Actually I'd like to get this number from my Java class in order to update some external dataset "catalog", so I'm asking if there's some programmatic way to access this info (from JobExecutionResult for example). On Wed, Feb 14, 2018 at 12:25 PM, Chesnay Schepler <[hidden email]> wrote:
Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809 |
The only way to access this info from
the client is the REST
API or the Metrics
REST API.
On 14.02.2018 12:38, Flavio Pompermaier wrote:
|
The problem here is that I don't know the vertex id of the sink..would it be possible to access the sink info by id? And couldn't be all those info attached to the JobExecutionResult (avoiding to set up all the rest connection etc)? On Wed, Feb 14, 2018 at 12:44 PM, Chesnay Schepler <[hidden email]> wrote:
Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809 |
Technically yes, a subset of metrics is
stored in the ExecutionGraph when the job finishes. (This is for
example where the webUI derives the values from for finished
jobs). However these are on the task level, and will not contain
the number of incoming records if your sink is chained to another
operator. Changing this would be a larger endeavor, and tbh i
don't see this happening soon.
I'm afraid for now you're stuck with the REST API for finished jobs. (Correction for my previous mail: The metrics REST API cannot be used for finished jobs) Alternatively, if you rather want to work on files/json you can enable job archiving by configuring the jobmanager.archive.fs.dir
directory. When the job finishes this will contain a big JSON file
for each job containing all responses that the UI would return for
finished jobs.On 14.02.2018 12:50, Flavio Pompermaier wrote:
|
So, if I'm not wrong, the right way to do this is using accumulators..what do you think about my proposal to add an easy way to add to a sink an accumulator for the written/outputed records?
On Wed, Feb 14, 2018 at 1:08 PM, Chesnay Schepler <[hidden email]> wrote:
Flavio Pompermaier Development Department OKKAM S.r.l. Tel. <a href="tel:+39%200461%20041809" value="+390461041809" target="_blank">+(39) 0461 041809 |
Hi Flavio, Not sure if I would add this functionality to the sinks.2018-02-14 14:11 GMT+01:00 Flavio Pompermaier <[hidden email]>:
|
Hi Fabian,
thanks for the feedback. Right now I'm doing exactly as you said. Since I was seeing this as a useful API extension I just proposed this addition and so I asked for feedbacks. Of course, it doesn't make much sense if I'm the only one asking for it :) Best, Flavio On Mon, Feb 19, 2018 at 10:31 AM, Fabian Hueske <[hidden email]> wrote:
Flavio Pompermaier Development Department OKKAM S.r.l. Tel. +(39) 0461 041809 |
Free forum by Nabble | Edit this page |