Hi All, i have Written a consumer that read from kafka topic and write the data in parquet format using StreamSink . But i am getting following error. Its runs for some hours than start failing with this excpetions. I tried to restart it but failing with same exceptions.After i restart with latest offset it started working fine for soem hours and than again fail. I am not able to find root cause for this issue.
code : DataStream<GenericRecord> sourceStream = env.addSource(kafkaConsumer010);
|
Hi Anuj, it looks to me that your input GenericRecords don't conform with your output schema schemaSubject. At least, the stack trace says that your output schema expects some String field but the field was actually some ArrayList. Consequently, I would suggest to verify that your input data has the right format and if not to filter those records out which are non-conformant. Cheers, Till On Sat, Feb 29, 2020 at 2:13 PM aj <[hidden email]> wrote:
|
Hi Till, Thanks for the reply . I have doubt that input has problem because : 1. if input has some problem than it should not come in the topic itself as schema validation fail at producer side only. 2. i am using the same schema that was used to writed the record in topic and i am able to parse the record with same schema as when i try to print the stream its not giving any error , only problem occurring when writing as parquet. This is the code that i am using to get the schema that i m passing to parquetwriter. public static Schema getSchema(String subjectName) { How input can pass through and inserted in topic if it has some issue. Even if its occusring how to find those record and skip that so that because of one record my whole processing should not fail. Thanks, Anuj On Sat, Feb 29, 2020 at 9:12 PM Till Rohrmann <[hidden email]> wrote:
|
Hi Anuj, if you use the exact same schema with which the data has been written for reading and if there is no bug in the parquet Avro support, then it should indeed not fail. Hence, I suspect that the producer of your data might produce slightly different Avro records compared to what Parquet is expecting. But this is just guessing here. The reason why you don't see the program fail when only printing the records is that you don't transform the GenericRecords into the Parquet format which expects a certain format given the schema. Maybe it could help to figure out which Avro version your writer uses and then to compare it to Parquet's AvroWriteSupport. Additionally, the used schema could be helpful as well. It could also be that you are running into this Parquet issue [1]. In this case, you could try to solve the problem by bumping the Parquet version. Cheers, Till On Sat, Feb 29, 2020 at 5:56 PM aj <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |