http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Best-pattern-for-achieving-stream-enrichment-side-input-from-a-large-static-source-tp25771p25780.html
Hey Ken,
Thank you for your quick response! That definitely sounds like something worth exploring.
Just a few more small questions, if that's ok.
1. You referred to the parquet source as a "stream", but what we have is a static data-source which we will always want to "query" against .
What we thought about doing is to stream the entire parquet dataset and load it into our state.
Does that sound right, or is that "hacky"?
2. Can the continuousFileMonitoringFunction be used to track an entire directory of parquet files? Also, we'd like it to refresh its' state (= its' internal data structures) every time the parquet folder is updated, but only after all new files have been written (meaning, we'll need it to run once an update has been detected, but not right away)
Is that a reasonable use-case?
And thank you once again.
Nimrod.