Re: Flink and Presto integration
Posted by
Piotr Nowojski-3 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Flink-and-Presto-integration-tp32407p32441.html
Hi,
Yes, Presto (in presto-hive connector) is just using hive Metastore to get the table definitions/meta data. If you connect to the same hive Metastore with Flink, both systems should be able to see the same tables.
Piotrek
Hi Flavio,
Your requirement should be to use blink batch to read the tables in Presto?
I'm not familiar with Presto's catalog. Is it like hive Metastore?
If so, what needs to be done is similar to the hive connector.
You need to implement a catalog of presto, which translates the Presto table into a Flink table. You may need to deal with partitions, statistics, and so on.
Best,
Jingsong Lee
On Mon, Jan 27, 2020 at 9:58 PM Itamar Syn-Hershko <
[hidden email]> wrote:
Yes, Flink does batch processing by "reevaluating a stream" so to speak. Presto doesn't have sources and sinks, only catalogs (which are always allowing reads, and sometimes also writes).
Presto catalogs are a configuration - they are managed on the node filesystem as a configuration file and nowhere else. Flink sources/sinks are programmatically configurable and are compiled into your Flink program. So that is not possible at the moment, and all that's possible to do is get that info form the API of both products and visualize that. Definitely not managing them from a single place.
On Mon, Jan 27, 2020 at 3:54 PM Flavio Pompermaier <
[hidden email]> wrote:
Both Presto and Flink make use of a Catalog in order to be able to read/write data from a source/sink.
I don't agree about " Flink is about processing data streams" because Flink is competitive also for the batch workloads (and this will be further improved in the next releases).
I'd like to register my data sources/sinks in one single catalog (E.g. Presto) and then being able to reuse it also in Flink (with a simple translation).
My idea of integration here is thus more at catalog level since I would use Presto for exploring data from UI and Flink to process it because once the configuration part has finished (since I have many Flink jobs that I don't want to throw away or rewrite).
On Mon, Jan 27, 2020 at 2:30 PM Itamar Syn-Hershko <
[hidden email]> wrote:
Hi Flavio,
Presto contributor and Starburst Partners here.
Presto and Flink are solving completely different challenges. Flink is about processing data streams as they come in; Presto is about ad-hoc / periodic querying of data sources.
A typical architecture would use Flink to process data streams and write data and aggregations to some data stores (Redis, MemSQL, SQLs, Elasticsearch, etc) and then using Presto to query those data stores (and possible also others using Query Federation).
What kind of integration will you be looking for?
On Mon, Jan 27, 2020 at 1:44 PM Flavio Pompermaier <
[hidden email]> wrote:
Hi all,
is there any integration between Presto and Flink? I'd like to use Presto for the UI part (preview and so on) while using Flink for the batch processing. Do you suggest something else otherwise?
Best,
Flavio
--
--
--