Job Statistics

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Job Statistics

Jean Bez
Hello,

Is it possible to view job statistics after it finished to execute directly in the command line? If so, could you please explain how? I could not find any mentions about this in the docs. I also tried to set the logs to debug mode, but no other information was presented. 

Thank you!

Regards,
Jean
Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Matthias J. Sax
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:

> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Fabian Hueske-2
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean


Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Jean Bez

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean


Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Maximilian Michels
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean



Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Jean Bez
Hi Maximilian,

The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results?

Thank you!!

2015-06-18 10:45 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean




Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Maximilian Michels
Hi Jean,

As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment:

JobExecutionResult result = env.execute();
System.out.println("runtime: " result.getNetRuntime());
for (Map.Entry<String, Object> entry : result.getAllAccumulatorResults().entrySet()) {
    System.out.println(entry.getKey() + ": " entry.getValue());
}

You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd.

Best,
Max

On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez <[hidden email]> wrote:
Hi Maximilian,

The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results?

Thank you!!

2015-06-18 10:45 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean





Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Jean Bez
Hello Max,

I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right?

2015-06-18 11:37 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment:

JobExecutionResult result = env.execute();
System.out.println("runtime: " result.getNetRuntime());
for (Map.Entry<String, Object> entry : result.getAllAccumulatorResults().entrySet()) {
    System.out.println(entry.getKey() + ": " entry.getValue());
}

You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd.

Best,
Max

On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez <[hidden email]> wrote:
Hi Maximilian,

The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results?

Thank you!!

2015-06-18 10:45 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean






Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Jean Bez
Hi,

I tried to view directly from the web interface but I could not find any other information about the completed jobs. I have the list, but when I open it, no further information is provided. Is this correct?

2015-06-18 15:10 GMT-03:00 Jean Bez <[hidden email]>:
Hello Max,

I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right?

2015-06-18 11:37 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment:

JobExecutionResult result = env.execute();
System.out.println("runtime: " result.getNetRuntime());
for (Map.Entry<String, Object> entry : result.getAllAccumulatorResults().entrySet()) {
    System.out.println(entry.getKey() + ": " entry.getValue());
}

You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd.

Best,
Max

On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez <[hidden email]> wrote:
Hi Maximilian,

The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results?

Thank you!!

2015-06-18 10:45 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean







Reply | Threaded
Open this post in threaded view
|

Re: Job Statistics

Stephan Ewen
Hi!

There are no I/O or record statistics collected at the moment. It is work in progress. Also a new Web Frontend that visualizes those is in the works, so this is going to improve soon, but for now, there is no easy way to grab those numbers.

If you are interested in contributing, I could pull you into some of the discussions about collecting and reporting metrics.

Greetings,
Stephan


On Thu, Jun 18, 2015 at 1:42 PM, Jean Bez <[hidden email]> wrote:
Hi,

I tried to view directly from the web interface but I could not find any other information about the completed jobs. I have the list, but when I open it, no further information is provided. Is this correct?

2015-06-18 15:10 GMT-03:00 Jean Bez <[hidden email]>:
Hello Max,

I will try to do that! Do you know if I could obtain data about the I/O and communication as well? From what I could understand I can get the runtime and the accumulator results only. Is that right?

2015-06-18 11:37 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

As I said, there is currently only the run time available. You can print the run time and accumulators results to std out by retrieving the JobExecutionResult from the ExecutionEnvironment:

JobExecutionResult result = env.execute();
System.out.println("runtime: " result.getNetRuntime());
for (Map.Entry<String, Object> entry : result.getAllAccumulatorResults().entrySet()) {
    System.out.println(entry.getKey() + ": " entry.getValue());
}

You would do that in your Flink program. You could also store metrics in the accumulators. However, since you're trying to compare different systems I'd advise you to use some external tools for monitoring resource usage like Ganglia or collectd.

Best,
Max

On Thu, Jun 18, 2015 at 4:11 PM, Jean Bez <[hidden email]> wrote:
Hi Maximilian,

The metrics am interested in are I/O, run time and communication. Could you please provide an example of how to obtain such results?

Thank you!!

2015-06-18 10:45 GMT-03:00 Maximilian Michels <[hidden email]>:
Hi Jean,

I think it would be a nice to have feature to display some metrics on the command line after a job has completed. We already have the run time and the accumulator results available at the CLI and printing those would be easy. What metrics in particular are you looking for?

Best,
Max

On Thu, Jun 18, 2015 at 3:41 PM, Jean Bez <[hidden email]> wrote:

Hi Fabian,

I am trying to compare some examples on Hadoop, Spark and Flink. If possible I would like to see the job statistics like the report given by Hadoop. Since I am running these examples on a large cluster it would be much better if I could obtain such data directly from the console.

Thanks!
Jean

Em 18/06/2015 04:55, "Fabian Hueske" <[hidden email]> escreveu:
Hi Jean,

what kind of job execution stats are you interested in?

Cheers, Fabian

2015-06-18 9:01 GMT+02:00 Matthias J. Sax <[hidden email]>:
Hi,

the CLI cannot show any job statistics. However, you can use the
JobManager web interface that is accessible at port 8081 from a browser.

-Matthias


On 06/17/2015 10:13 PM, Jean Bez wrote:
> Hello,
>
> Is it possible to view job statistics after it finished to execute
> directly in the command line? If so, could you please explain how? I
> could not find any mentions about this in the docs. I also tried to set
> the logs to debug mode, but no other information was presented.
>
> Thank you!
>
> Regards,
> Jean