|
Thanks a lot for the feedback for this survey. I will close it now since 6 days have passed without new activity.
To me it seems that we currently don't have many users who use flink-python or flink-streaming-python because of their limitations (mentioned in the survey by Xianda). This information might be useful when discussing Flink's future Python strategy and whether to continue supporting flink-python and flink-streaming-python in the future.
Cheers, Till You are right. Let's refocus this on the python user survey and spin out
another thread.
On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke <[hidden email]> wrote:
> Hi Folks,
> To avoid polluting the survey thread with discussions, we started separate
> thread and maybe we can continue the discussion over there.
>
> Regards,
> Xianda
>
> On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <[hidden email]> wrote:
>
> > I like that we are having a general discussion about how to use Python
> and
> > Flink together in the future.
> > The current python support has some shortcomings that were mentioned
> > before, so we clearly need something better.
> >
> > Parts of the community have worked together with the Apache Beam project,
> > which is pretty far in adding a portability layer to support Python.
> > Before we dive deep into a design proposal for a new Python API in
> Flink, I
> > think we should figure out in which general direction Python support
> should
> > go.
> >
> > *Option (1): Language portability via Apache Beam*
> >
> > Pro:
> > - already exists to a large extend and already has users
> > - portability layer offers other languages in addition to python. Go is
> > in the making, NodeJS has been speculated, etc.
> > - collaboration with another project / community which means more
> > manpower and exposure. Beam currently has a strong focus on Flink as a
> > runner for Python.
> > - Python API is used for existing ML libraries from the TensorFlow
> > ecosystem
> >
> > Con:
> > - Not Flink's API. Python users need to learn the syntax of another API
> > (Python API is inherently different, but even more different here).
> >
> > *Option (2): Implement own Python API*
> >
> > Pro:
> > - Python API will be closer to Flink Java / Scala APIs
> >
> > Con:
> > - We will only have Python.
> > - Need to to rebuild the Python language bridge (significant work to
> get
> > stable)
> > - might lose tight collaboration with Beam and the other parties in
> Beam
> > - not benefiting from Beam's ecosystem
> >
> > *Option (3): **Implement own portability layer*
> >
> > Pro
> > - Flexibility to align APIs across languages within Flink ecosystem
> >
> > Con
> > - A lot of work (for context, to get this feature complete, Beam has
> > worked on that for a year now)
> > - Replicating work that already exists
> > - good chance to lose tight collaboration with Beam and parties in that
> > project
> > - not benefiting from Beam's ecosystem
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <[hidden email]> wrote:
> >
> > > Did you take a look at Apache Beam? It already provides a comprehensive
> > > Python SDK and can be used with Flink:
> > > https://beam.apache.org/roadmap/portability/#python-on-flink
> > >
> > > We are using it at Lyft for Python streaming pipelines.
> > >
> > > Thomas
> > >
> > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]> wrote:
> > >
> > > > Hi Till,
> > > >
> > > > 1. So far as I know, most of the users at Alibaba are using SQL.
> Some
> > of
> > > > users at Alibaba want integrated python libraries with Flink for
> > > streaming
> > > > processing, and Jython is unusable.
> > > >
> > > > 2. Python UDFs for SQL:
> > > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > > * start a Python process in open()
> > > > * communicate with JVM process via Socket.
> > > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > > Python runtime
> > > >
> > > > 3. We've draft a design doc for Python API
> > > > [DISCUSS] Flink Python API
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > > >
> > > >
> > > > Python UDF for SQL is not discussed in this documentation, we'll
> > create a
> > > > new proposal when the SQL DDL is ready.
> > > >
> > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]>
> > > > wrote:
> > > >
> > > > > Hi Xianda,
> > > > >
> > > > > thanks for sharing this detailed feedback. Do I understand you
> > > correctly
> > > > > that flink-python and flink-streaming-python are not usable for the
> > use
> > > > > cases at Alibaba atm?
> > > > >
> > > > > Could you share a bit more details about the Python UDFs for SQL?
> How
> > > do
> > > > > you execute the Python code? Will it work with any Python library?
> If
> > > you
> > > > > are about to publish the design document then feel free to refer me
> > to
> > > > this
> > > > > document.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]>
> > wrote:
> > > > >
> > > > > > Xianda Ke <[hidden email]>
> > > > > > 9:47 AM (11 minutes ago)
> > > > > > to dev, user
> > > > > > After communicating with some of the internal users at Alibaba,
> my
> > > > > > impression is that:
> > > > > >
> > > > > > - Most of them need C extensions support, they want to
> > integrated
> > > > > their
> > > > > > algorithms with stream processing,but Jython is unacceptable
> for
> > > > them.
> > > > > > - For some users, who are only familiar with SQL/Python,
> > > developing
> > > > > Java
> > > > > > API application/UDF is too complex. Writing Python UDF and
> > > declaring
> > > > > it
> > > > > > in
> > > > > > SQL is preferred.
> > > > > > - Machine Learning users needs richer Python APIs, such as
> Table
> > > API
> > > > > > Python support.
> > > > > >
> > > > > >
> > > > > > From my point of view, currently Python support has a few caveats
> > in
> > > > > Flink.
> > > > > >
> > > > > > - For batch, there is only DataSet Python API.
> > > > > > - For streaming, where Flink really shines, only Jython is
> > > > supported,
> > > > > > but Jython has lots of limitations.
> > > > > > - For most of the big data users, SQL/Table API is more
> > friendly,
> > > > but
> > > > > > Python users have no such APIs right now.
> > > > > > - The interactive Python shell is very user-friendly. It can
> be
> > > used
> > > > > to
> > > > > > test interactively and is a simple way to learn the API.
> > However,
> > > > > there
> > > > > > is
> > > > > > no such interactive Python shell in Flink now.
> > > > > >
> > > > > >
> > > > > > At Alibaba, Python UDF for SQL has been developed and has been
> > > > delivered
> > > > > to
> > > > > > internal users. Currently, we start to develop the Python API,
> and
> > > > we've
> > > > > > drafted a design documentation and will publish it to the
> community
> > > > soon
> > > > > > for discussion.
> > > > > >
> > > > > > Regards,
> > > > > > Xianda
> > > > > >
> > > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <
> > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Flink community,
> > > > > > >
> > > > > > > in order to better understand the needs of our users and to
> plan
> > > for
> > > > > the
> > > > > > > future, I wanted to reach out to you and ask how much you use
> > > Flink's
> > > > > > > Python API, namely flink-python and flink-streaming-python.
> > > > > > >
> > > > > > > In order to gather feedback, I would like to ask all Python
> users
> > > to
> > > > > > > respond to this thread and quickly outline how you use Python
> in
> > > > > > > combination with Flink. Thanks a lot for your help!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ke, Xianda
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Ke, Xianda
> > > >
> > >
> >
>
>
> --
> Ke, Xianda
>
|