is Flink a database ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

is Flink a database ?

Hanan Yehudai

This seems like a controversial subject..

 on purpose
😊

 

I have my data lake in parquet files – should I use Flink batch mode to query historical  batch   ad Hoc queries ?
or should I use a dedicated “database”   eg Drill / Dremio  / Hive    and their likes  ?
what advantage will Flink give me for queries this type of batch data..




 

 

Reply | Threaded
Open this post in threaded view
|

Re: is Flink a database ?

Piotr Nowojski-3
Hi :)

What do you mean by “a database”? A SQL like query engine? Flink is already that [1]. A place where you store the data? Flink kind of is that as well [2] and many users are using Flink as the source of truth, not just as a data processing framework.

With Flink Table API/SQL [1], you can easily query the data from other systems (for example read tables stored in Hive Metastore). By extension, you could do the same with DataStream API. Or DataSet API.

With each of those APIs (Table API/SQL, DataStream API, DataSet API) there come different advantages/trade offs. Table API/SQL as pretty high level, give you automatic optimisations and easy of use. DataStream API/DataSet API as being lower level, give you more fine grained control over what’s happening at the expense of requiring more knowledge from you.

As how Flink Table API/SQL compare to other systems, I guess it will be better if someone from the Table API/SQL team respond.

Piotrek

[2] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html

On 4 Nov 2019, at 14:05, Hanan Yehudai <[hidden email]> wrote:

This seems like a controversial subject.. 

 on purpose 
😊
 
I have my data lake in parquet files – should I use Flink batch mode to query historical  batch   ad Hoc queries ? 
or should I use a dedicated “database”   eg Drill / Dremio  / Hive    and their likes  ?
what advantage will Flink give me for queries this type of batch data..