Lucene SPI class loading fails with shaded flink-connector-elasticsearch

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene SPI class loading fails with shaded flink-connector-elasticsearch

Haddadi Manuel

Hello,
 
When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime of a Flink job :
 
java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not exist, please fix the file 'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.
 
I added lucene-suggest dependency and then I encountered this :
java.lang.ClassCastException: class org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
 
The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.
 
It seems to me that this exception is a side effect of shading flink-connector-elasticsearch-5 dependencies. Actually, the only solution I have found is to re-build flink-connector-elasticsearch-5 jar excluding META-INF/services/org.apache.lucene.codecs.*
 
I would highly appreciate any opinion on this workaround. Could it have side effect ?
 
Thanks. And by the way, congrats to all Flink contributors, this is a pretty good piece of technology !
 
Regards,
 
Manuel Haddadi

Reply | Threaded
Open this post in threaded view
|

Re: Lucene SPI class loading fails with shaded flink-connector-elasticsearch

Till Rohrmann-2
Hi Manuel,

thanks for reporting this issue. It sounds to me like a bug we should fix. I've pulled Gordon into the conversation since he will most likely know more about the ElasticSearch connector shading.

Cheers,
Till

On Thu, Mar 22, 2018 at 5:09 PM, Haddadi Manuel <[hidden email]> wrote:

Hello,
 
When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime of a Flink job :
 
java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not exist, please fix the file 'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.
 
I added lucene-suggest dependency and then I encountered this :
java.lang.ClassCastException: class org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
 
The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.
 
It seems to me that this exception is a side effect of shading flink-connector-elasticsearch-5 dependencies. Actually, the only solution I have found is to re-build flink-connector-elasticsearch-5 jar excluding META-INF/services/org.apache.lucene.codecs.*
 
I would highly appreciate any opinion on this workaround. Could it have side effect ?
 
Thanks. And by the way, congrats to all Flink contributors, this is a pretty good piece of technology !
 
Regards,
 
Manuel Haddadi


Reply | Threaded
Open this post in threaded view
|

Re: Lucene SPI class loading fails with shaded flink-connector-elasticsearch

Tzu-Li (Gordon) Tai
Hi Manuel,

Thanks a lot for reporting this!

Yes, this issue is most likely related to the recent changes to shading the Elasticsearch connector dependencies, though it is a bit curious why I didn’t bump into it before while testing it.

The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.

Could you explain a bit more where the Lucene queries are executed? Were there other dependencies required for this?

I would highly appreciate any opinion on this workaround. Could it have side effect ?

I think your workaround wouldn’t be harmful. Could you explain how you came to the solution? That would help me in getting to the bottom of the problem (and maybe other potential similar issues).

Cheers,
Gordon

On 23 March 2018 at 12:43:31 AM, Till Rohrmann ([hidden email]) wrote:

Hi Manuel,

thanks for reporting this issue. It sounds to me like a bug we should fix. I've pulled Gordon into the conversation since he will most likely know more about the ElasticSearch connector shading.

Cheers,
Till

On Thu, Mar 22, 2018 at 5:09 PM, Haddadi Manuel <[hidden email]> wrote:

Hello,
 
When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime of a Flink job :
 
java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not exist, please fix the file 'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.
 
I added lucene-suggest dependency and then I encountered this :
java.lang.ClassCastException: class org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
 
The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.
 
It seems to me that this exception is a side effect of shading flink-connector-elasticsearch-5 dependencies. Actually, the only solution I have found is to re-build flink-connector-elasticsearch-5 jar excluding META-INF/services/org.apache.lucene.codecs.*
 
I would highly appreciate any opinion on this workaround. Could it have side effect ?
 
Thanks. And by the way, congrats to all Flink contributors, this is a pretty good piece of technology !
 
Regards,
 
Manuel Haddadi


Reply | Threaded
Open this post in threaded view
|

RE: Lucene SPI class loading fails with shaded flink-connector-elasticsearch

Haddadi Manuel

Hi Gordon, hi Till,


Thanks for your feedback. I am happy to contibute by precising how the bug occured, if it might help.


First, to describe a bit more what does my Flink job, there is in a part of its execution plan a ProcessFunction which basically stores the events as Lucene documents in an in-memory Lucene index. When the number of documents reaches a threshold, the process function fires Lucene queries to filter the documents (then the events) according to user models.


Therefore this process function is dependent on Lucene modules lucene-core, lucene-queryparser, lucene-analyzers-common in version 6.3.0 (as a precaution we chose the same version than elasticsearch:5.1.2).


Later the event stream is sent in an Elasticseach index via the module flink-connector-elasticsearch5.


I have updgraded Flink dependencies from version 1.3.2 to 1.4.2. When the job was deployed on a Yarn cluster, it raised the error :

java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not exist, please fix the file 'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.


So I checked the META-INF/services/org.apache.lucene.codecs.PostingsFormat in my job's fat jar. It contained several implementation of PostingsFormat to be loaded :

org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
org.apache.lucene.codecs.idversion.IDVersionPostingsFormat

I don't know how the maven-shade-plugin operates but it seems to me that it aggregates the same configuration files from different modules in one file.

For example, in elasticsearch-5.1.2.jar, the file org.apache.lucene.codecs.PostingsFormat is :
org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
In flink-connector-elasticsearch5_2.11-1.4.2.jar, the file org.apache.lucene.codecs.PostingsFormat is :

org.apache.lucene.search.suggest.document.Completion50PostingsFormat
org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.idversion.IDVersionPostingsFormat
#
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.search.suggest.document.Completion50PostingsFormat

Since my job's fat jar inherits configuration files in META-INF/services from its dependencies, I guess this is the reason why on runtime the Lucene API tries to load some classes that are not in the classpath. I had confirmation of this intuition when I tried to exclude META-INF/services/org.apache.lucene.codecs.* files from flink-connector-elasticsearch5. The file org.apache.lucene.codecs.PostingsFormat of my jar did not lead to runtime exception anymore :

#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat
#  Licensed to the Apache Software Foundation (ASF) under one or more
#  contributor license agreements.  See the NOTICE file distributed with
#  this work for additional information regarding copyright ownership.
#  The ASF licenses this file to You under the Apache License, Version 2.0
#  (the "License"); you may not use this file except in compliance with
#  the License.  You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
#  Unless required by applicable law or agreed to in writing, software
#  distributed under the License is distributed on an "AS IS" BASIS,
#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#  See the License for the specific language governing permissions and
#  limitations under the License.

org.apache.lucene.codecs.idversion.IDVersionPostingsFormat

I hope my explanation is clear enough. Don't hesitate to ask for more information if needed. I would be also be glad if you would point some misunderstanding from my part, or even misusages of Flink framework (maybe the fact we use a Lucene index as a micro-batch inside a Flink transformation).

Cheers,
Manuel



De : Tzu-Li (Gordon) Tai <[hidden email]>
Envoyé : vendredi 23 mars 2018 10:40:52
À : Till Rohrmann; Haddadi Manuel
Cc : [hidden email]
Objet : Re: Lucene SPI class loading fails with shaded flink-connector-elasticsearch
 
Hi Manuel,

Thanks a lot for reporting this!

Yes, this issue is most likely related to the recent changes to shading the Elasticsearch connector dependencies, though it is a bit curious why I didn’t bump into it before while testing it.

The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.

Could you explain a bit more where the Lucene queries are executed? Were there other dependencies required for this?

I would highly appreciate any opinion on this workaround. Could it have side effect ?

I think your workaround wouldn’t be harmful. Could you explain how you came to the solution? That would help me in getting to the bottom of the problem (and maybe other potential similar issues).

Cheers,
Gordon

On 23 March 2018 at 12:43:31 AM, Till Rohrmann ([hidden email]) wrote:

Hi Manuel,

thanks for reporting this issue. It sounds to me like a bug we should fix. I've pulled Gordon into the conversation since he will most likely know more about the ElasticSearch connector shading.

Cheers,
Till

On Thu, Mar 22, 2018 at 5:09 PM, Haddadi Manuel <[hidden email]> wrote:

Hello,
 
When upgrading from flink-1.3.2 to flink-1.4.2, I faced this error on runtime of a Flink job :
 
java.util.ServiceConfigurationError: An SPI class of type org.apache.lucene.codecs.PostingsFormat with classname org.apache.lucene.search.suggest.document.Completion50PostingsFormat does not exist, please fix the file 'META-INF/services/org.apache.lucene.codecs.PostingsFormat' in your classpath.
 
I added lucene-suggest dependency and then I encountered this :
java.lang.ClassCastException: class org.elasticsearch.search.suggest.completion2x.Completion090PostingsFormat
 
The Flink job runs Lucene queries on a data stream which ends up in an Elasticsearch index.
 
It seems to me that this exception is a side effect of shading flink-connector-elasticsearch-5 dependencies. Actually, the only solution I have found is to re-build flink-connector-elasticsearch-5 jar excluding META-INF/services/org.apache.lucene.codecs.*
 
I would highly appreciate any opinion on this workaround. Could it have side effect ?
 
Thanks. And by the way, congrats to all Flink contributors, this is a pretty good piece of technology !
 
Regards,
 
Manuel Haddadi