UUID in part files

classic Classic list List threaded Threaded
2 messages Options
Dan
Reply | Threaded
Open this post in threaded view
|

UUID in part files

Dan
Hi.

Context
I'm migrating my Flink SQL job to DataStream.  When switching to StreamingFileSink, I noticed that the part files now do not have a uuid in them.  "part-0-0" vs "part-{uuid string}-0-0".  This is easy to add with OutputFileConfig.

Question
Is there a reason why the base OutputFileConfig doesn't add the uuid automatically?  Is this just a legacy issue?  Or do most people not have the uuid in the file outputs?
Reply | Threaded
Open this post in threaded view
|

Re: UUID in part files

Yun Gao
Hi Dan

The SQL add the uuid by default is for the case that users want execute
multiple bounded sql and append to the same directory (hive table), thus
a uuid is attached to avoid overriding the previous output.

The datastream could be viewed as providing the low-level api and
thus it does not add the uuid automatically. And as you have pointed out,
by using OutputFileConfig users could also implement the functionality.

Best,
 Yun

------------------Original Mail ------------------
Sender:Dan Hill <[hidden email]>
Send Date:Mon Feb 8 07:40:36 2021
Recipients:user <[hidden email]>
Subject:UUID in part files
Hi.

Context
I'm migrating my Flink SQL job to DataStream.  When switching to StreamingFileSink, I noticed that the part files now do not have a uuid in them.  "part-0-0" vs "part-{uuid string}-0-0".  This is easy to add with OutputFileConfig.

Question
Is there a reason why the base OutputFileConfig doesn't add the uuid automatically?  Is this just a legacy issue?  Or do most people not have the uuid in the file outputs?