Flink 1.2 Proper Packaging of flink-table with SBT

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink 1.2 Proper Packaging of flink-table with SBT

Justin Yan
Hello!

We are attempting to use the Flink Table API, but are running into a few issues.

We initially started with our dependencies looking something like this:
libraryDependencies ++= Seq(
"org.apache.flink" %% "flink-scala" % "1.2.0" % "provided",
"org.apache.flink" %% "flink-clients" % "1.2.0" % "provided",
"org.apache.flink" %% "flink-table" % "1.2.0",
Libraries.specs2,
...)
However, this is mildly annoying since flink-table declares dependencies on the flink core modules, and thus brings everything in anyway.  On top of that, we saw this JIRA: https://issues.apache.org/jira/browse/FLINK-5227, which we found concerning, so we decided to follow the advice - we downloaded and built Flink-1.2 from source (java 7, mvn 3.3) using the following (We're also using the Kinesis connector):
tools/change-scala-version.sh 2.11
mvn clean install -Pinclude-kinesis -DskipTests
cd flink-dist
mvn clean install -Pinclude-kinesis -DskipTests
Once this was done, we took the JAR in "/flink-libraries/flink-table/target/" and copied it over to the taskManager "/lib" directory.  Finally, we marked our `flink-table` dependency as "provided".  Everything compiles, but when I try to actually run a simple job, I get the following error at runtime:

java.lang.NoClassDefFoundError: org/codehaus/commons/compiler/CompileException

Indeed, when I peek inside of the `flink-table` JAR, I can't find that particular package (and similarly, it isn't in the flink-dist JAR either)

$ jar tf flink-table_2.11-1.2.0.jar | grep codehaus
$

I then attempted to include this library in my user code by adding:
"org.codehaus.janino" % "janino" % "3.0.6",
to my list of dependencies.  When I run a `jar tf myjar.jar | grep CompileException` - I see the class. However, when I run my flink application in this fat JAR, I continue to get the same error, even though I am positive this class is included in the fat JAR.  I eventually got around this by placing this jar in the `flink/lib` directory, but I am very confused as to how this class cannot be found when I have included it in the fat JAR that I am submitting with the Flink CLI to a YARN cluster.  I mostly wanted to mention this in case it is a bug, but mostly to see if anyone else has had trouble with the Table API, and if not, if I have structured my project incorrectly to cause these troubles.

Thanks!

Justin



Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 Proper Packaging of flink-table with SBT

Justin Yan
Of course, 15 minutes after I give up and decide to email the mailing list, I figure it out - my flink App was using the CollectionsEnvironment and not the proper RemoteEnvironment.

It is still the case, however, that the `flink-table` JAR built by the standard commands doesn't include the dependencies it requires, and so I'd be curious to hear what the proper procedure is for linking against `flink-table` if you want to avoid the bug I highlighted in the aforementioned JIRA.

Thank you and sorry for the extra noise!

Justin

On Tue, Mar 7, 2017 at 7:21 PM, Justin Yan <[hidden email]> wrote:
Hello!

We are attempting to use the Flink Table API, but are running into a few issues.

We initially started with our dependencies looking something like this:
libraryDependencies ++= Seq(
"org.apache.flink" %% "flink-scala" % "1.2.0" % "provided",
"org.apache.flink" %% "flink-clients" % "1.2.0" % "provided",
"org.apache.flink" %% "flink-table" % "1.2.0",
Libraries.specs2,
...)
However, this is mildly annoying since flink-table declares dependencies on the flink core modules, and thus brings everything in anyway.  On top of that, we saw this JIRA: https://issues.apache.org/jira/browse/FLINK-5227, which we found concerning, so we decided to follow the advice - we downloaded and built Flink-1.2 from source (java 7, mvn 3.3) using the following (We're also using the Kinesis connector):
tools/change-scala-version.sh 2.11
mvn clean install -Pinclude-kinesis -DskipTests
cd flink-dist
mvn clean install -Pinclude-kinesis -DskipTests
Once this was done, we took the JAR in "/flink-libraries/flink-table/target/" and copied it over to the taskManager "/lib" directory.  Finally, we marked our `flink-table` dependency as "provided".  Everything compiles, but when I try to actually run a simple job, I get the following error at runtime:

java.lang.NoClassDefFoundError: org/codehaus/commons/compiler/CompileException

Indeed, when I peek inside of the `flink-table` JAR, I can't find that particular package (and similarly, it isn't in the flink-dist JAR either)

$ jar tf flink-table_2.11-1.2.0.jar | grep codehaus
$

I then attempted to include this library in my user code by adding:
"org.codehaus.janino" % "janino" % "3.0.6",
to my list of dependencies.  When I run a `jar tf myjar.jar | grep CompileException` - I see the class. However, when I run my flink application in this fat JAR, I continue to get the same error, even though I am positive this class is included in the fat JAR.  I eventually got around this by placing this jar in the `flink/lib` directory, but I am very confused as to how this class cannot be found when I have included it in the fat JAR that I am submitting with the Flink CLI to a YARN cluster.  I mostly wanted to mention this in case it is a bug, but mostly to see if anyone else has had trouble with the Table API, and if not, if I have structured my project incorrectly to cause these troubles.

Thanks!

Justin




Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 Proper Packaging of flink-table with SBT

Timo Walther
Hi Justin,

thank you for reporting your issues. I never tried the Table API with SBT but `flink-table` should not declare dependencies to core modules, this is only done in `test` scope, maybe you have to specify the right scope manually? You are right, the mentioned Jira should be fixed asap, I added it to my personal TODO list. Regarding the missing Janino artifacts in the Jar file, I created a Jira issue (https://issues.apache.org/jira/browse/FLINK-5994). This is very strange as it actually a dependency of flink-table.

Thanks again for the feedback. If you experience any further issues with the Table API feel free to post them here.

Regards,
Timo


Am 08/03/17 um 04:50 schrieb Justin Yan:
Of course, 15 minutes after I give up and decide to email the mailing list, I figure it out - my flink App was using the CollectionsEnvironment and not the proper RemoteEnvironment.

It is still the case, however, that the `flink-table` JAR built by the standard commands doesn't include the dependencies it requires, and so I'd be curious to hear what the proper procedure is for linking against `flink-table` if you want to avoid the bug I highlighted in the aforementioned JIRA.

Thank you and sorry for the extra noise!

Justin

On Tue, Mar 7, 2017 at 7:21 PM, Justin Yan <[hidden email]> wrote:
Hello!

We are attempting to use the Flink Table API, but are running into a few issues.

We initially started with our dependencies looking something like this:
libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-scala" % "1.2.0" % "provided",
  "org.apache.flink" %% "flink-clients" % "1.2.0" % "provided",
  "org.apache.flink" %% "flink-table" % "1.2.0",
  Libraries.specs2,
  ...)
However, this is mildly annoying since flink-table declares dependencies on the flink core modules, and thus brings everything in anyway.  On top of that, we saw this JIRA: https://issues.apache.org/jira/browse/FLINK-5227, which we found concerning, so we decided to follow the advice - we downloaded and built Flink-1.2 from source (java 7, mvn 3.3) using the following (We're also using the Kinesis connector):
tools/change-scala-version.sh 2.11
mvn clean install -Pinclude-kinesis -DskipTests
cd flink-dist
mvn clean install -Pinclude-kinesis -DskipTests
Once this was done, we took the JAR in "/flink-libraries/flink-table/target/" and copied it over to the taskManager "/lib" directory.  Finally, we marked our `flink-table` dependency as "provided".  Everything compiles, but when I try to actually run a simple job, I get the following error at runtime:

java.lang.NoClassDefFoundError: org/codehaus/commons/compiler/CompileException

Indeed, when I peek inside of the `flink-table` JAR, I can't find that particular package (and similarly, it isn't in the flink-dist JAR either)

$ jar tf flink-table_2.11-1.2.0.jar | grep codehaus
$

I then attempted to include this library in my user code by adding:
"org.codehaus.janino" % "janino" % "3.0.6",
to my list of dependencies.  When I run a `jar tf myjar.jar | grep CompileException` - I see the class. However, when I run my flink application in this fat JAR, I continue to get the same error, even though I am positive this class is included in the fat JAR.  I eventually got around this by placing this jar in the `flink/lib` directory, but I am very confused as to how this class cannot be found when I have included it in the fat JAR that I am submitting with the Flink CLI to a YARN cluster.  I mostly wanted to mention this in case it is a bug, but mostly to see if anyone else has had trouble with the Table API, and if not, if I have structured my project incorrectly to cause these troubles.

Thanks!

Justin





Reply | Threaded
Open this post in threaded view
|

Re: Flink 1.2 Proper Packaging of flink-table with SBT

Justin Yan
Hi Timo,

Regarding the dependency issue, looking at flink-table's pom.xml, I believe the issue is the dependency on flink-streaming-scala, which then transitively depends on almost all of the core flink modules.  If it had not been for the aforementioned JIRA issue about OOM errors, I probably would've just declared my dependency in SBT like this and been done with it:

"org.apache.flink" %% "flink-table" % "1.2.0" exclude("org.apache.flink", "flink-streaming-scala"),

As for the missing Janino artifacts, I agree it is strange as the dependency is declared - and the JAR contained the calcite dependencies.  I basically followed the standard "build from source" steps (as documented here https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/building.html), and copied the flink-table JAR that was in the target/ directory.  If there is trouble reproducing, I'm happy to provide a more detailed set-up.

Thanks!

Justin





On Wed, Mar 8, 2017 at 1:53 AM, Timo Walther <[hidden email]> wrote:
Hi Justin,

thank you for reporting your issues. I never tried the Table API with SBT but `flink-table` should not declare dependencies to core modules, this is only done in `test` scope, maybe you have to specify the right scope manually? You are right, the mentioned Jira should be fixed asap, I added it to my personal TODO list. Regarding the missing Janino artifacts in the Jar file, I created a Jira issue (https://issues.apache.org/jira/browse/FLINK-5994). This is very strange as it actually a dependency of flink-table.

Thanks again for the feedback. If you experience any further issues with the Table API feel free to post them here.

Regards,
Timo


Am 08/03/17 um 04:50 schrieb Justin Yan:
Of course, 15 minutes after I give up and decide to email the mailing list, I figure it out - my flink App was using the CollectionsEnvironment and not the proper RemoteEnvironment.

It is still the case, however, that the `flink-table` JAR built by the standard commands doesn't include the dependencies it requires, and so I'd be curious to hear what the proper procedure is for linking against `flink-table` if you want to avoid the bug I highlighted in the aforementioned JIRA.

Thank you and sorry for the extra noise!

Justin

On Tue, Mar 7, 2017 at 7:21 PM, Justin Yan <[hidden email]> wrote:
Hello!

We are attempting to use the Flink Table API, but are running into a few issues.

We initially started with our dependencies looking something like this:
libraryDependencies ++= Seq(
  "org.apache.flink" %% "flink-scala" % "1.2.0" % "provided",
  "org.apache.flink" %% "flink-clients" % "1.2.0" % "provided",
  "org.apache.flink" %% "flink-table" % "1.2.0",
  Libraries.specs2,
  ...)
However, this is mildly annoying since flink-table declares dependencies on the flink core modules, and thus brings everything in anyway.  On top of that, we saw this JIRA: https://issues.apache.org/jira/browse/FLINK-5227, which we found concerning, so we decided to follow the advice - we downloaded and built Flink-1.2 from source (java 7, mvn 3.3) using the following (We're also using the Kinesis connector):
tools/change-scala-version.sh 2.11
mvn clean install -Pinclude-kinesis -DskipTests
cd flink-dist
mvn clean install -Pinclude-kinesis -DskipTests
Once this was done, we took the JAR in "/flink-libraries/flink-table/target/" and copied it over to the taskManager "/lib" directory.  Finally, we marked our `flink-table` dependency as "provided".  Everything compiles, but when I try to actually run a simple job, I get the following error at runtime:

java.lang.NoClassDefFoundError: org/codehaus/commons/compiler/CompileException

Indeed, when I peek inside of the `flink-table` JAR, I can't find that particular package (and similarly, it isn't in the flink-dist JAR either)

$ jar tf flink-table_2.11-1.2.0.jar | grep codehaus
$

I then attempted to include this library in my user code by adding:
"org.codehaus.janino" % "janino" % "3.0.6",
to my list of dependencies.  When I run a `jar tf myjar.jar | grep CompileException` - I see the class. However, when I run my flink application in this fat JAR, I continue to get the same error, even though I am positive this class is included in the fat JAR.  I eventually got around this by placing this jar in the `flink/lib` directory, but I am very confused as to how this class cannot be found when I have included it in the fat JAR that I am submitting with the Flink CLI to a YARN cluster.  I mostly wanted to mention this in case it is a bug, but mostly to see if anyone else has had trouble with the Table API, and if not, if I have structured my project incorrectly to cause these troubles.

Thanks!

Justin