Hello all, I'm trying to run a job using FlinkML and I'm confused about the source of an error.
Any idea what might be causing this? I'm running the job in local mode, 1 TM with 8 slots and ~32GB heap size. All the vectors created by the libSVM loader have the correct size. |
Hi! Does this error occur in 0.10 or im 1.0-SNAPSHOT? It is probably an incorrectly configured Kryo instance (not a problem of the sorter). What is strange is that it occurs in the "MapReferenceResolver" - there should be no reference resolution during serialization / deserialization. Can you try what happens when you explicitly register the type SparseVector at the ExecutionEnvironment? Stephan On Wed, Jan 20, 2016 at 11:24 AM, Theodore Vasiloudis <[hidden email]> wrote:
|
It's on 0.10. I've tried explicitly registering SparseVector (which is done anyway by registerFlinkMLTypes which is called when the SVM predict or evaluate functions are called) in my job but I still get the same. I will try a couple different datasets and try to see if it's the number of features that is causing this or something else.On Wed, Jan 20, 2016 at 11:39 AM, Stephan Ewen <[hidden email]> wrote:
|
I haven't been able to reproduce this with other datasets. Taking a smaller sample from the large dataset I'm using (link to data) causes the same problem however. I'm wondering if the implementation of readLibSVM is what's wrong here. I've tried the new version commited recently by Chiwan, but I still get the same error.On Wed, Jan 20, 2016 at 1:43 PM, Theodore Vasiloudis <[hidden email]> wrote:
|
The bug looks to be in the serialization via Kryo while spilling windows. Note that Kryo is here used as a fallback serializer, since the SparseVector is not transparent type to Flink. I think there are two possible reasons: 1) Kryo, or our Kryo setup has an issue here 2) Kryo is inconsistently configured. There are multiple Kryo instances used across the serializers in the sorter. There may be a bug that they are not initialized in sync. To check this, can you build Flink with this pull request (https://github.com/apache/flink/pull/1528) or from this branch (https://github.com/StephanEwen/incubator-flink kryo) and see if that fixes it? Thanks, Stephan On Wed, Jan 20, 2016 at 3:33 PM, Theodore Vasiloudis <[hidden email]> wrote:
|
OK here's what I tried: * Build Flink (mvn clean install) from the branch you linked (kryo)Does it matter in this case, or is it enough that I'm sure the launched Flink instance comes from the branch you linked? On Wed, Jan 20, 2016 at 4:30 PM, Stephan Ewen <[hidden email]> wrote:
|
You could change the version of Stephan’s branch via Alternatively, you could compile an example program with example input data which can reproduce the problem. Then I could also take a look at it. Cheers, On Wed, Jan 20, 2016 at 5:58 PM, Theodore Vasiloudis <[hidden email]> wrote:
|
Alright I will try to do that. I've tried running the job with a CSV file as input, and using DenseVectors to represent the features, still the same IndexOutOfBounds error.On Wed, Jan 20, 2016 at 6:05 PM, Till Rohrmann <[hidden email]> wrote:
|
Can you again post the stack trace? With the patched branch, the reference mapper should not be used any more (which is where the original exception occurred). On Wed, Jan 20, 2016 at 7:38 PM, Theodore Vasiloudis <[hidden email]> wrote:
|
This is the stack trace from running with the patched branch: The program finished with the following exception: On Wed, Jan 20, 2016 at 9:45 PM, Stephan Ewen <[hidden email]> wrote:
|
And this is the one from running with a CSV input, this time I've verified that I'm using the correct version of Flink, according to Till's instructions:
The program finished with the following exception: On Thu, Jan 21, 2016 at 10:51 AM, Theodore Vasiloudis <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |