(DEPRECATED) Apache Flink User Mailing List archive.

Flink and Presto integration

Classic

List

Threaded

7 messages Options

Flavio Pompermaier

Jan 27, 2020; 11:44am

Flink and Presto integration

Hi all,

is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,

Flavio

Itamar Syn-Hershko

Jan 27, 2020; 1:29pm

Re: Flink and Presto integration

Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:

Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

Itamar Syn-Hershko
CTO, Founder
<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860
[hidden email]
https://bigdataboutique.com

Flavio Pompermaier

Jan 27, 2020; 1:53pm

Re: Flink and Presto integration

Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.

I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).

I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).

My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).

On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <[hidden email]> wrote:

Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

Itamar Syn-Hershko

Jan 27, 2020; 1:58pm

Re: Flink and Presto integration

Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes).

Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and nowhere else. Flink sources/sinks are programmatically configurable and are compiled into your Flink program. So that is not possible at the moment, and all that's possible to do is get that info form the API of both products and visualize that. Definitely not managing them from a single place.

On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <[hidden email]> wrote:

Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.
I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).
I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).
My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).

On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <[hidden email]> wrote:
Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

... [show rest of quote]

Itamar Syn-Hershko
CTO, Founder
<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860
[hidden email]
https://bigdataboutique.com

Jingsong Li

Jan 28, 2020; 3:34am

Re: Flink and Presto integration

Hi Flavio,

Your requirement should be to use blink batch to read the tables in Presto?
I'm not familiar with Presto's catalog. Is it like hive Metastore?

If so, what needs to be done is similar to the hive connector.
You need to implement a catalog of presto, which translates the Presto table into a Flink table. You may need to deal with partitions, statistics, and so on.

Best,

Jingsong Lee

On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <[hidden email]> wrote:

Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes).

Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and nowhere else. Flink sources/sinks are programmatically configurable and are compiled into your Flink program. So that is not possible at the moment, and all that's possible to do is get that info form the API of both products and visualize that. Definitely not managing them from a single place.

On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <[hidden email]> wrote:
Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.
I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).
I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).
My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).

On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <[hidden email]> wrote:
Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

... [show rest of quote]

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;color:rgb(0,0,0);text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

Best, Jingsong Lee

Piotr Nowojski-3

Jan 28, 2020; 10:05am

Re: Flink and Presto integration

Hi,

Yes, Presto (in presto-hive connector) is just using hive Metastore to get the table definitions/meta data. If you connect to the same hive Metastore with Flink, both systems should be able to see the same tables.

Piotrek

On 28 Jan 2020, at 04:34, Jingsong Li <[hidden email]> wrote:

Hi Flavio,

Your requirement should be to use blink batch to read the tables in Presto?
I'm not familiar with Presto's catalog. Is it like hive Metastore?

If so, what needs to be done is similar to the hive connector.
You need to implement a catalog of presto, which translates the Presto table into a Flink table. You may need to deal with partitions, statistics, and so on.

Best,
Jingsong Lee

On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <[hidden email]> wrote:
Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes).

Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and nowhere else. Flink sources/sinks are programmatically configurable and are compiled into your Flink program. So that is not possible at the moment, and all that's possible to do is get that info form the API of both products and visualize that. Definitely not managing them from a single place.

On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <[hidden email]> wrote:
Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.
I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).
I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).
My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).

On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <[hidden email]> wrote:
Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color: transparent; text-decoration: none; font-size: 13px; font-family: Arial, Helvetica, sans-serif; font-weight: normal; line-height: 16px; padding-bottom: 5px;" target="_blank" class="">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

... [show rest of quote]

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color: transparent; text-decoration: none; font-size: 13px; font-family: Arial, Helvetica, sans-serif; font-weight: normal; line-height: 16px; padding-bottom: 5px;" target="_blank" class="">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

--
Best, Jingsong Lee

... [show rest of quote]

Flavio Pompermaier

Jan 28, 2020; 11:04am

Re: Flink and Presto integration

Hive metastore is the de facto standard for Hadoop but in my use case I have to query other databases (like MySQL, Oracle and SQL Server).

So Presto would be a good choice (apart from the fact that you need to restart it when you add a new catalog..), and I'd like to have an easy translation of the catalogs..

Another fear I have is that I could have different versions of the same database type (e.g. Oracle or SQL server) and I'll probably hit an incompatibility when using the latest jar of a connector.

From what I see this corner case doesn't have a clear solution but I have some workaround in mind that I need to verify (e.g. shade jars or allocate source reader tasks to different Task Managers based on the deployed jar versions..)

On Tue, Jan 28, 2020 at 11:05 AM Piotr Nowojski <[hidden email]> wrote:

Hi,

Yes, Presto (in presto-hive connector) is just using hive Metastore to get the table definitions/meta data. If you connect to the same hive Metastore with Flink, both systems should be able to see the same tables.

Piotrek

On 28 Jan 2020, at 04:34, Jingsong Li <[hidden email]> wrote:

Hi Flavio,

Your requirement should be to use blink batch to read the tables in Presto?
I'm not familiar with Presto's catalog. Is it like hive Metastore?

If so, what needs to be done is similar to the hive connector.
You need to implement a catalog of presto, which translates the Presto table into a Flink table. You may need to deal with partitions, statistics, and so on.

Best,
Jingsong Lee

On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <[hidden email]> wrote:
Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes).

Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and nowhere else. Flink sources/sinks are programmatically configurable and are compiled into your Flink program. So that is not possible at the moment, and all that's possible to do is get that info form the API of both products and visualize that. Definitely not managing them from a single place.

On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <[hidden email]> wrote:
Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.
I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).
I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).
My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).

On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <[hidden email]> wrote:
Hi Flavio,

Presto contributor and Starburst Partners here.

Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.

A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).

What kind of integration will you be looking for?

On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?

Best,
Flavio

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

... [show rest of quote]

--

Itamar Syn-Hershko

CTO, Founder

<a href="tel:+972-54-2467860" rel="noopener noreferrer" style="background-color:transparent;text-decoration:none;font-size:13px;font-family:Arial,Helvetica,sans-serif;font-weight:normal;line-height:16px;padding-bottom:5px" target="_blank">+972-54-2467860

[hidden email]

https://bigdataboutique.com

... [show rest of quote]

--
Best, Jingsong Lee

... [show rest of quote]

... [show rest of quote]