using
ParquetProtoWriters, does anyone have this working with aws athena ingestion via aws glue crawls?
the parquet files being generated by our flink job looks fine at a binary level, but aws glue crawler crawls over these files via s3 don't seem to be able to deserialize the row data properly. the schema is correctly picked up, but the actual unmarshalling of the rows seems to fail (with no helpful logs).
likewise, using parquet-tools or
pqrs locally has the same behavior of readinging the metadata perfectly fine, but the actual data does not.
i'd like to verify that this is just a relatively atypical combination of formats (parquet and protos) that doesn't have widespread tooling support vs something i'm overlooking on my end. for example, must i define the table manually in athena using a create table statement (most examples of parquet/protobuf uses this approach) and not rely on the schema defined by the aws glue crawler? i didn't go this route because this seemed counter to the spirit of the parquet format being embedded w/ the schema.
thanks!