Hi,
I have a problem that the frontend somehow seems to have the user jar on the classpath and it leads to a netty conflict: So in the jobmanager logs I can see that my job started (running on YARN), but can't access the frontend, it gives internal server error with the previous exception. So I dont have the same jar problem on the actual running job. I haven't really seen this before, is this something that happened to somebody else as well? Thank you! Gyula |
Hi, Since Flink 1.2 "per job yarn applications" (when you do "-m yarn-cluster") include the job jar into the classpath as well. Does this change explain the behavior? On Thu, Feb 23, 2017 at 4:59 PM, Gyula Fóra <[hidden email]> wrote:
|
Hi Robert, It definitely explains the behaviour. If so what is the rationale behind it, and how should I handle the dependency conflict? Thanks, Gyula Robert Metzger <[hidden email]> ezt írta (időpont: 2017. febr. 23., Cs, 21:44):
|
Mh. The user jar is put into every classpath. So the jobmanager / taskmanagers are potentially affected by this as well. Probably the data transfer between the TMs doesn't call the same methods as the UI on the JobManager :) The simplest solution is to shade your netty in the user jar into a different location. On Thu, Feb 23, 2017 at 10:01 PM, Gyula Fóra <[hidden email]> wrote:
|
Hi Robert,
I was not aware of this big change (I know it's my fault) but I am not sure if I agree with the rationale. I read through the JIRA and it seems that this is mostly a convenience change that we dont need to copy jars and mess with the classloading that much. On the other hand if user jars can conflict with frontend/backend classes that can lead to very serious (and hard to fix) problems, especially in larger scale deployments. What do you think about this? Gyula Robert Metzger <[hidden email]> ezt írta (időpont: 2017. febr. 23., Cs, 22:10):
|
On Fri, Feb 24, 2017 at 11:05 AM, Gyula Fóra <[hidden email]> wrote:
> I was not aware of this big change (I know it's my fault) but I am not sure > if I agree with the rationale. No comment on the actual issue from my side, but I strongly disagree that this is your fault. We should have covered this better in the release announcement in my opinion. Of course, this doesn't help now. ;-) – Ufuk |
I agree with you Gyula, this change is dangerous. I have seen another case from a user with Hadoop dependencies that crashed in Flink 1.2.0 that didn't in 1.1.x
I wonder if we should introduce a config flag for Flink 1.2.1 to disable the behavior if needed. On Fri, Feb 24, 2017 at 2:27 PM, Ufuk Celebi <[hidden email]> wrote: On Fri, Feb 24, 2017 at 11:05 AM, Gyula Fóra <[hidden email]> wrote: |
Did any user have problems with the Flink 1.1 behaviour? If not, we could disable it again, by default, and add a flag for adding the user jar to all the classpaths. On Fri, 24 Feb 2017 at 14:50 Robert Metzger <[hidden email]> wrote: I agree with you Gyula, this change is dangerous. I have seen another case |
The JIRA (https://issues.apache.org/jira/browse/FLINK-4913) doesn't mention any particular user or use case. I honestly care so much if we enable or disable it by default. But since its the new default behavior of Flink 1.2. I'm against changing that in Flink 1.2.1, that's why I proposed to add a flag to disable it in 1.2.1, so that users upgrading from 1.2.0 to 1.2.1 don't notice it. On Fri, Feb 24, 2017 at 5:41 PM, Aljoscha Krettek <[hidden email]> wrote:
|
Hi, I can only see how it will break things in subtle ways. If you think there is any real benefit to the current approach I dont mind having it as a default, otherwise I am in favor of reverting to the 1.1 default. (My logic is that the user will only observe a difference in behavior when the new setup actually causes problems) Gyula On Fri, Feb 24, 2017, 17:53 Robert Metzger <[hidden email]> wrote:
|
I think the change reduces the chances to run into classloading issues in case there's a bug in Flink (= it is using the wrong CL) I've filed a JIRA for the problem: https://issues.apache.org/jira/browse/FLINK-6031 On Fri, Feb 24, 2017 at 9:29 PM, Gyula Fóra <[hidden email]> wrote:
|
I think we need to get away from the dynamic class loading as much as possible. It breaks way to soon and causes easily class leaks.
I would be in favor if understanding how to fix this on the Flink side, i.e., either: - Having flags for disabling it optionally - Having an option of "user code first" or "user code last" in the classpath - Shading Netty in Flink. I think Netty is a good candidate to be shaded, actually. On Mon, Mar 13, 2017 at 2:33 PM, Robert Metzger <[hidden email]> wrote: I think the change reduces the chances to run into classloading issues in |
Free forum by Nabble | Edit this page |