Hi to all,
I have a use case where I need to tell a Flink cluster to give me a sample of X records using parametrizable sampling functions. Is there any best practice or advice to do that? Should I create a Remote ExecutionEnvironment or should I use the Flink client (I don't know if it uses REST services or RPC or whatever)? Is there any java snippet for that? Best, Flavio |
Hi Flavio,
Do you want to sample from a running batch job? That would be like Queryable State in streaming jobs but it is not supported in batch mode. Cheers, Max On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier <[hidden email]> wrote: > Hi to all, > > I have a use case where I need to tell a Flink cluster to give me a sample > of X records using parametrizable sampling functions. Is there any best > practice or advice to do that? > > Should I create a Remote ExecutionEnvironment or should I use the Flink > client (I don't know if it uses REST services or RPC or whatever)? > Is there any java snippet for that? > > Best, > Flavio > |
Hi Max,
actually I have a jar containing sampling jobs and I need to collect results from a client. I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear that it's not the right way to do that because I just need to tell the cluster the main class and the parameters to run the job (and where the jar file is on HDFS). Best, Flavio On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <[hidden email]> wrote: Hi Flavio, |
Hi Flavio, Beware, this is extremely hacky. We should have a better way to invoke jar files remotely. Honestly, the best thing is if you keep a local copy of your sampling jars and work directly with them.This is not really possible at the moment. Though there is a workaround. You can create a dummy jar file (may be empty). Then you can use ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass /path/to/dummy.jar That way Flink will include your cluster jar and you can load all classes necessary. Alternatively, using the Remote Environment, this looks like this: public static void main(String[] args) throws Exception { Cheers, Max On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier <[hidden email]> wrote:
|
Hi max,
that's exactly what I was looking for. What do you mean for 'the best thing is if you keep a local copy of your sampling jars and work directly with them'? Best, Flavio On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <[hidden email]> wrote:
|
I meant that you simply keep the sampling jar on the machine where you
want to sample. However, you mentioned that it is a requirement for it to be on the cluster. Cheers, Max On Tue, Sep 27, 2016 at 3:18 PM, Flavio Pompermaier <[hidden email]> wrote: > Hi max, > that's exactly what I was looking for. What do you mean for 'the best thing > is if you keep a local copy of your sampling jars and work directly with > them'? > > Best, > Flavio > > On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <[hidden email]> wrote: >> >> Hi Flavio, >> >> This is not really possible at the moment. Though there is a workaround. >> You can create a dummy jar file (may be empty). Then you can use >> >> ./flink run -C hdfs:///path/to/cluster.jar -c org.package.SampleClass >> /path/to/dummy.jar >> >> That way Flink will include your cluster jar and you can load all classes >> necessary. >> >> Alternatively, using the Remote Environment, this looks like this: >> >> public static void main(String[] args) throws Exception { >> >> final RemoteEnvironment env = new RemoteEnvironment( >> "remoteHost", >> 6123, >> new Configuration(), >> new String[0], >> new URL[]{ >> new URL("file:///path/to/sample.jar"), >> new >> URL("file:///Users/max/Dev/flink/build-target/lib/flink-dist_2.10-1.2-SNAPSHOT.jar")}); >> URLClassLoader classLoader = new >> URLClassLoader(env.globalClasspaths.toArray(new URL[0])); >> >> Class<?> clazz = >> classLoader.loadClass("org.package.sample.SampleClass"); >> >> Method main = clazz.getDeclaredMethod("sampleMethod", >> ExecutionEnvironment.class); >> >> // pass environment as an argument to your sample method >> // the method should return the results of the execution >> Object sampleResult = main.invoke(null, env); >> } >> >> >> Beware, this is extremely hacky. We should have a better way to invoke jar >> files remotely. Honestly, the best thing is if you keep a local copy of your >> sampling jars and work directly with them. >> >> Cheers, >> Max >> >> On Tue, Sep 27, 2016 at 12:25 PM, Flavio Pompermaier >> <[hidden email]> wrote: >>> >>> Hi Max, >>> actually I have a jar containing sampling jobs and I need to collect >>> results from a client. >>> I've tried to use ExecutionEnvironment.createRemoteEnvironment but I fear >>> that it's not the right way to do that because >>> I just need to tell the cluster the main class and the parameters to run >>> the job (and where the jar file is on HDFS). >>> >>> Best, >>> Flavio >>> >>> On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <[hidden email]> >>> wrote: >>>> >>>> Hi Flavio, >>>> >>>> Do you want to sample from a running batch job? That would be like >>>> Queryable State in streaming jobs but it is not supported in batch >>>> mode. >>>> >>>> Cheers, >>>> Max >>>> >>>> >>>> On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier >>>> <[hidden email]> wrote: >>>> > Hi to all, >>>> > >>>> > I have a use case where I need to tell a Flink cluster to give me a >>>> > sample >>>> > of X records using parametrizable sampling functions. Is there any >>>> > best >>>> > practice or advice to do that? >>>> > >>>> > Should I create a Remote ExecutionEnvironment or should I use the >>>> > Flink >>>> > client (I don't know if it uses REST services or RPC or whatever)? >>>> > Is there any java snippet for that? >>>> > >>>> > Best, >>>> > Flavio >>>> > >>> >>> >>> >>> >> > > |
I think I'll probably end with submitting the job through YARN in order to have a more standard approach :)
Thanks, Flavio On Wed, Sep 28, 2016 at 5:19 PM, Maximilian Michels <[hidden email]> wrote: I meant that you simply keep the sampling jar on the machine where you |
Free forum by Nabble | Edit this page |