Hi.
I'm using the StreamingFileSink for writing partitioned data to s3.
The code is below:
StreamingFileSink<GenericRecord> sink = StreamingFileSink.forBulkFormat(new Path("s3a://test-bucket/test"),
ParquetAvroFactory.getParquetWriter(schema, "GZIP"))
.withBucketAssigner(new PartitionBucketAssigner(partitionColumns))
.build();
How can i remove the partition columns from the data (or not populating them in the GenericRecord)?
My problem is with AWS Glue crawler which creates duplicate columns in the table.
Thanks,
Yitzchak.