Python vs Scala - Performance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Python vs Scala - Performance

Maximilian Alber
Hi Flinksters,

we had recently a discussion in our working group which Language we should use with Flink. To bring it to the point: most people would like to use Python because the are familiar with it and there is a nice scientific stack to f.e. print and analyse the results. But our concern is that Python is far less efficient than Scala.

Is that true? If yes, there is an estimate on the penalty?
For a better understanding it would be nice if someone could describe how Python fits into the Java/Scala environment of Flink!

Thank you!
Cheers,
Max
Reply | Threaded
Open this post in threaded view
|

Re: Python vs Scala - Performance

hawin
Hi  Max

I think you have to learn Java or Scala if you want to use Flink in your project. 

The below picture from Marton's slides.  Hopefully, that is a good reference for you.


Inline image 1


Best regards
Hawin

On Mon, Jun 29, 2015 at 3:19 AM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

we had recently a discussion in our working group which Language we should use with Flink. To bring it to the point: most people would like to use Python because the are familiar with it and there is a nice scientific stack to f.e. print and analyse the results. But our concern is that Python is far less efficient than Scala.

Is that true? If yes, there is an estimate on the penalty?
For a better understanding it would be nice if someone could describe how Python fits into the Java/Scala environment of Flink!

Thank you!
Cheers,
Max

Reply | Threaded
Open this post in threaded view
|

Re: Python vs Scala - Performance

Maximilian Alber
Thank you very much! Good to know that it is build on the Java API.

I would be still interested to know, it it is a "serious" impact to use Python instead of Java or if on the long run the runtime is amortized the same?
That there is no definite answer I'm aware of, but still I would appreciate your opinions!

Cheers,
Max

On Tue, Jun 30, 2015 at 3:12 AM, Hawin Jiang <[hidden email]> wrote:
Hi  Max

I think you have to learn Java or Scala if you want to use Flink in your project. 

The below picture from Marton's slides.  Hopefully, that is a good reference for you.


Inline image 1


Best regards
Hawin

On Mon, Jun 29, 2015 at 3:19 AM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

we had recently a discussion in our working group which Language we should use with Flink. To bring it to the point: most people would like to use Python because the are familiar with it and there is a nice scientific stack to f.e. print and analyse the results. But our concern is that Python is far less efficient than Scala.

Is that true? If yes, there is an estimate on the penalty?
For a better understanding it would be nice if someone could describe how Python fits into the Java/Scala environment of Flink!

Thank you!
Cheers,
Max


Reply | Threaded
Open this post in threaded view
|

Re: Python vs Scala - Performance

Fabian Hueske-2
Hi Max,

the Python API is still in an early beta state and builds on the Java API as said before. All framework processing (sorting, joining, data shipping, etc.) is done in Java. Whenever, a Python user function needs to be called, the data is given to an external Python process and later received back. The Python API tries to reduce the number of switches between Java and Python, but the cost is quite high.

You can expect a slowdown of at least 2x, probably more. That depends of course on your job.

Cheers, Fabian

2015-06-30 10:05 GMT+02:00 Maximilian Alber <[hidden email]>:
Thank you very much! Good to know that it is build on the Java API.

I would be still interested to know, it it is a "serious" impact to use Python instead of Java or if on the long run the runtime is amortized the same?
That there is no definite answer I'm aware of, but still I would appreciate your opinions!

Cheers,
Max

On Tue, Jun 30, 2015 at 3:12 AM, Hawin Jiang <[hidden email]> wrote:
Hi  Max

I think you have to learn Java or Scala if you want to use Flink in your project. 

The below picture from Marton's slides.  Hopefully, that is a good reference for you.


Inline image 1


Best regards
Hawin

On Mon, Jun 29, 2015 at 3:19 AM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

we had recently a discussion in our working group which Language we should use with Flink. To bring it to the point: most people would like to use Python because the are familiar with it and there is a nice scientific stack to f.e. print and analyse the results. But our concern is that Python is far less efficient than Scala.

Is that true? If yes, there is an estimate on the penalty?
For a better understanding it would be nice if someone could describe how Python fits into the Java/Scala environment of Flink!

Thank you!
Cheers,
Max



Reply | Threaded
Open this post in threaded view
|

Re: Python vs Scala - Performance

Maximilian Alber
Ok, thanks. That I wanted to (not) hear :-)

Cheers,
Max

On Tue, Jun 30, 2015 at 10:17 AM, Fabian Hueske <[hidden email]> wrote:
Hi Max,

the Python API is still in an early beta state and builds on the Java API as said before. All framework processing (sorting, joining, data shipping, etc.) is done in Java. Whenever, a Python user function needs to be called, the data is given to an external Python process and later received back. The Python API tries to reduce the number of switches between Java and Python, but the cost is quite high.

You can expect a slowdown of at least 2x, probably more. That depends of course on your job.

Cheers, Fabian

2015-06-30 10:05 GMT+02:00 Maximilian Alber <[hidden email]>:
Thank you very much! Good to know that it is build on the Java API.

I would be still interested to know, it it is a "serious" impact to use Python instead of Java or if on the long run the runtime is amortized the same?
That there is no definite answer I'm aware of, but still I would appreciate your opinions!

Cheers,
Max

On Tue, Jun 30, 2015 at 3:12 AM, Hawin Jiang <[hidden email]> wrote:
Hi  Max

I think you have to learn Java or Scala if you want to use Flink in your project. 

The below picture from Marton's slides.  Hopefully, that is a good reference for you.


Inline image 1


Best regards
Hawin

On Mon, Jun 29, 2015 at 3:19 AM, Maximilian Alber <[hidden email]> wrote:
Hi Flinksters,

we had recently a discussion in our working group which Language we should use with Flink. To bring it to the point: most people would like to use Python because the are familiar with it and there is a nice scientific stack to f.e. print and analyse the results. But our concern is that Python is far less efficient than Scala.

Is that true? If yes, there is an estimate on the penalty?
For a better understanding it would be nice if someone could describe how Python fits into the Java/Scala environment of Flink!

Thank you!
Cheers,
Max