UUIDs generated by Flink SQL

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

UUIDs generated by Flink SQL

Gregory Fee
Hello, from what I understand in the documentation it appears there is no way to assign UUIDs to operators added to the DAG by Flink SQL. Is my understanding correct?

I'd very much like to be able to assign UUIDs to those operators. I want to run a program using some Flink SQL, create a save point, and then run another program with slightly different structure that picks up from that save point. The suggested way of making something like that work in the document is to assign UUIDs but that doesn't seem possible if I'm using Flink SQL. Any advice?

On a related note, I'm wondering what happens if I have a stateful program using Flink SQL and I want to update my Flink binaries. If the query plan ends up changing based on that upgrade does it mean that the load of the save point is going to fail?

Thanks! 

--
<form method="post" target="_blank" onsubmit="try {return window.confirm(&quot;You are submitting information to an external page.\nAre you sure?&quot;);} catch (e) {return false;}">
Gregory Fee
Engineer
<a href="tel:+14258304734" style="font-size:13px;color:#494f50;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-serif;text-decoration:none" target="_blank">425.830.4734
Lyft
Reply | Threaded
Open this post in threaded view
|

Re: UUIDs generated by Flink SQL

Fabian Hueske-2
Hi Gregory,

Your understanding is correct. It is not possible to assign UUID to the operators generated by the SQL/Table planner.
To be honest, I am not sure whether the use case that you are describing should be the scope of the "officially" supported use cases of the API.
It would require in depth knowledge of the SQL operators' internals which is something that we don't want to expose as public API because we want to have the freedom to improve the execution code.

Having said that, we have thought about adding the possibility of adjusting the parallelism of operators.
Similar to assigning UUIDs, this would require an intermediate step between planning and submission because usually, you don't know the plan that is generated.
This could be done by generating a representation of a plan that can be modified before translating it into a DataStream program.

Right now, we don't aim to guarantee backwards compatibility for queries. Starting a query from a savepoint works if you don't change the query and flink-table version but might fail as soon as you change either of both.
If you start the same query with a different flink-table version, different optimization rules or changes in the operators might result in different states.
If you start a different query, the data types of the state of operators will most likely have changed.
Coming up with an upgrade strategy for SQL queries is still a major TODO and there are several ideas how this can be achieved.

Best, Fabian


2018-03-09 0:47 GMT+01:00 Gregory Fee <[hidden email]>:
Hello, from what I understand in the documentation it appears there is no way to assign UUIDs to operators added to the DAG by Flink SQL. Is my understanding correct?

I'd very much like to be able to assign UUIDs to those operators. I want to run a program using some Flink SQL, create a save point, and then run another program with slightly different structure that picks up from that save point. The suggested way of making something like that work in the document is to assign UUIDs but that doesn't seem possible if I'm using Flink SQL. Any advice?

On a related note, I'm wondering what happens if I have a stateful program using Flink SQL and I want to update my Flink binaries. If the query plan ends up changing based on that upgrade does it mean that the load of the save point is going to fail?

Thanks! 

--
Gregory Fee
Engineer
<a href="tel:+14258304734" style="font-size:13px;color:#494f50;font-family:&#39;Helvetica Neue&#39;,Helvetica,Arial,sans-serif;text-decoration:none" target="_blank">425.830.4734
Lyft