Hi, You can point a file-based input format to a directory and the input format should read all files in that directory. That works as well for TableSources that are internally use file-based input formats. Is that what you are looking for? Best, Fabian Am Mo., 28. Jan. 2019 um 17:22 Uhr schrieb françois lacombe <[hidden email]>:
|
Hi Fabian, Thank you for this input. This is interesting. With such an input format, will all the file will be loaded in memory before to be processed or will all be streamed? All the best François Le mar. 29 janv. 2019 à 22:20, Fabian Hueske <[hidden email]> a écrit :
Pensez à la planète, imprimer ce papier que si nécessaire
|
Hi, The files will be read in a streaming fashion. Typically files are broken down into processing splits that are distributed to tasks for reading. How a task reads a file split depends on the implementation, but usually the format reads the split as a stream and does not read the split as a whole before emitting records. Best, Fabian Am Mo., 4. Feb. 2019 um 12:06 Uhr schrieb françois lacombe <[hidden email]>:
|
Thank you Fabian, That's good, I'll go for a custom File input stream. All the best François Le lun. 4 févr. 2019 à 12:10, Fabian Hueske <[hidden email]> a écrit :
Pensez à la planète, imprimer ce papier que si nécessaire
|
In reply to this post by Fabian Hueske-2
Hi Fabian, I've got issues for a custom InputFormat implementation with my existing code. Is this can be used in combination with a BatchTableSource custom source? As I understand your solution, I should move my source to implementations like :
right?I currently have a BatchTableSource class which produce a DataSet<Row> from a single geojson file. This doesn't sound compatible with a custom InputFormat, don't you? Thanks in advance for any addition hint, all the best François Le lun. 4 févr. 2019 à 12:10, Fabian Hueske <[hidden email]> a écrit :
Pensez à la planète, imprimer ce papier que si nécessaire
|
H François, The TableEnvironment.connect() method can only be used if you provide (quite a bit) more code. It requires a TableSourceFactory and handling of all the properties that are defined in the other builder methods. See [1]. I would recommend to either register the BatchTableSource directly (tEnv.registerTableSource()) or get a DataSet (via env.createSource()) and register the DataSet as a Table (tEnv.registerDataSet()). Best, Fabian Am Mo., 11. Feb. 2019 um 21:09 Uhr schrieb françois lacombe <[hidden email]>:
|
Hi Fabian, After a bit more documentation reading I have a better understanding of how InputFormat interface works. Indeed I've better to wrap a custom InputFormat implementation in my source. This article helps a lot connect() will be for a next sprint All the best François Le ven. 15 févr. 2019 à 09:37, Fabian Hueske <[hidden email]> a écrit :
Pensez à la planète, imprimer ce papier que si nécessaire
|
Free forum by Nabble | Edit this page |