Cassandra connector POJO - tombstone question

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Cassandra connector POJO - tombstone question

Tarandeep Singh
Hi,

I am using flink-1.2 and Cassandra connector to write to cassandra tables. I am using POJOs with DataStax annotations as described here-
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html

My question is- how are nulls handles by cassandra sink?

Datastax documentation on Mapper states that if we are using POJOs to store data in Cassandra table and the POJO has null fields, then it can create tombstones, so one should use saveNullFields(false) so that null fields are not persisted -
https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options

Default behavior is to persist null fields.

In cassandra pojo sink code, I don't see this option set on Mapper-
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

So does this mean, I can expect to see tombstones when writing data (assuming my POJOs have null fields). If yes, can we expose an option to disable saving null fields.

Thanks,
Tarandeep

Reply | Threaded
Open this post in threaded view
|

Re: Cassandra connector POJO - tombstone question

Chesnay Schepler
Hello,

what i can do is add hook like we do for the ClusterBuilder with which
you can provide a set of options that will
be used for every call to the mapper. This would provide you access with
all options that are listed on the page
you linked.

You can find an implementation of this here:
https://github.com/zentol/flink/tree/unknown_cass_options

Note that this branch is on 1.3-SNAPSHOT, but it should be possible for
you to cherry-pick it onto a 1.2 branch.

I will add a ticket for this soon (currently getting timeouts in JIRA).

Regards,
Chesnay

On 12.04.2017 02:27, Tarandeep Singh wrote:

> Hi,
>
> I am using flink-1.2 and Cassandra connector to write to cassandra
> tables. I am using POJOs with DataStax annotations as described here-
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html
>
> My question is- how are nulls handles by cassandra sink?
>
> Datastax documentation on Mapper states that if we are using POJOs to
> store data in Cassandra table and the POJO has null fields, then it
> can create tombstones, so one should use saveNullFields(false) so that
> null fields are not persisted -
> https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options
>
> Default behavior is to persist null fields.
>
> In cassandra pojo sink code, I don't see this option set on Mapper-
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java
>
> So does this mean, I can expect to see tombstones when writing data
> (assuming my POJOs have null fields). If yes, can we expose an option
> to disable saving null fields.
>
> Thanks,
> Tarandeep
>

Reply | Threaded
Open this post in threaded view
|

Re: Cassandra connector POJO - tombstone question

Tarandeep Singh
Thanks Chesnay, this will work.

Best,
Tarandeep

On Wed, Apr 12, 2017 at 2:42 AM, Chesnay Schepler <[hidden email]> wrote:
Hello,

what i can do is add hook like we do for the ClusterBuilder with which you can provide a set of options that will
be used for every call to the mapper. This would provide you access with all options that are listed on the page
you linked.

You can find an implementation of this here: https://github.com/zentol/flink/tree/unknown_cass_options

Note that this branch is on 1.3-SNAPSHOT, but it should be possible for you to cherry-pick it onto a 1.2 branch.

I will add a ticket for this soon (currently getting timeouts in JIRA).

Regards,
Chesnay


On 12.04.2017 02:27, Tarandeep Singh wrote:
Hi,

I am using flink-1.2 and Cassandra connector to write to cassandra tables. I am using POJOs with DataStax annotations as described here-
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html

My question is- how are nulls handles by cassandra sink?

Datastax documentation on Mapper states that if we are using POJOs to store data in Cassandra table and the POJO has null fields, then it can create tombstones, so one should use saveNullFields(false) so that null fields are not persisted -
https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options

Default behavior is to persist null fields.

In cassandra pojo sink code, I don't see this option set on Mapper-
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

So does this mean, I can expect to see tombstones when writing data (assuming my POJOs have null fields). If yes, can we expose an option to disable saving null fields.

Thanks,
Tarandeep



Reply | Threaded
Open this post in threaded view
|

Re: Cassandra connector POJO - tombstone question

Tarandeep Singh
Hi Chesnay,

Did your code changes (exposing mapper options) made it in 1.3 release?

Thank you,
Tarandeep

On Wed, Apr 12, 2017 at 2:34 PM, Tarandeep Singh <[hidden email]> wrote:
Thanks Chesnay, this will work.

Best,
Tarandeep

On Wed, Apr 12, 2017 at 2:42 AM, Chesnay Schepler <[hidden email]> wrote:
Hello,

what i can do is add hook like we do for the ClusterBuilder with which you can provide a set of options that will
be used for every call to the mapper. This would provide you access with all options that are listed on the page
you linked.

You can find an implementation of this here: https://github.com/zentol/flink/tree/unknown_cass_options

Note that this branch is on 1.3-SNAPSHOT, but it should be possible for you to cherry-pick it onto a 1.2 branch.

I will add a ticket for this soon (currently getting timeouts in JIRA).

Regards,
Chesnay


On 12.04.2017 02:27, Tarandeep Singh wrote:
Hi,

I am using flink-1.2 and Cassandra connector to write to cassandra tables. I am using POJOs with DataStax annotations as described here-
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html

My question is- how are nulls handles by cassandra sink?

Datastax documentation on Mapper states that if we are using POJOs to store data in Cassandra table and the POJO has null fields, then it can create tombstones, so one should use saveNullFields(false) so that null fields are not persisted -
https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options

Default behavior is to persist null fields.

In cassandra pojo sink code, I don't see this option set on Mapper-
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

So does this mean, I can expect to see tombstones when writing data (assuming my POJOs have null fields). If yes, can we expose an option to disable saving null fields.

Thanks,
Tarandeep




Reply | Threaded
Open this post in threaded view
|

Re: Cassandra connector POJO - tombstone question

Chesnay Schepler
No, unfortunately I forgot about them :/

On 01.06.2017 19:39, Tarandeep Singh wrote:
Hi Chesnay,

Did your code changes (exposing mapper options) made it in 1.3 release?

Thank you,
Tarandeep

On Wed, Apr 12, 2017 at 2:34 PM, Tarandeep Singh <[hidden email]> wrote:
Thanks Chesnay, this will work.

Best,
Tarandeep

On Wed, Apr 12, 2017 at 2:42 AM, Chesnay Schepler <[hidden email]> wrote:
Hello,

what i can do is add hook like we do for the ClusterBuilder with which you can provide a set of options that will
be used for every call to the mapper. This would provide you access with all options that are listed on the page
you linked.

You can find an implementation of this here: https://github.com/zentol/flink/tree/unknown_cass_options

Note that this branch is on 1.3-SNAPSHOT, but it should be possible for you to cherry-pick it onto a 1.2 branch.

I will add a ticket for this soon (currently getting timeouts in JIRA).

Regards,
Chesnay


On 12.04.2017 02:27, Tarandeep Singh wrote:
Hi,

I am using flink-1.2 and Cassandra connector to write to cassandra tables. I am using POJOs with DataStax annotations as described here-
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html

My question is- how are nulls handles by cassandra sink?

Datastax documentation on Mapper states that if we are using POJOs to store data in Cassandra table and the POJO has null fields, then it can create tombstones, so one should use saveNullFields(false) so that null fields are not persisted -
https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options

Default behavior is to persist null fields.

In cassandra pojo sink code, I don't see this option set on Mapper-
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

So does this mean, I can expect to see tombstones when writing data (assuming my POJOs have null fields). If yes, can we expose an option to disable saving null fields.

Thanks,
Tarandeep





Reply | Threaded
Open this post in threaded view
|

Re: Cassandra connector POJO - tombstone question

Tarandeep Singh
No problem :)
Thanks for letting me know.

Best,
Tarandeep

On Thu, Jun 1, 2017 at 11:18 AM, Chesnay Schepler <[hidden email]> wrote:
No, unfortunately I forgot about them :/


On 01.06.2017 19:39, Tarandeep Singh wrote:
Hi Chesnay,

Did your code changes (exposing mapper options) made it in 1.3 release?

Thank you,
Tarandeep

On Wed, Apr 12, 2017 at 2:34 PM, Tarandeep Singh <[hidden email]> wrote:
Thanks Chesnay, this will work.

Best,
Tarandeep

On Wed, Apr 12, 2017 at 2:42 AM, Chesnay Schepler <[hidden email]> wrote:
Hello,

what i can do is add hook like we do for the ClusterBuilder with which you can provide a set of options that will
be used for every call to the mapper. This would provide you access with all options that are listed on the page
you linked.

You can find an implementation of this here: https://github.com/zentol/flink/tree/unknown_cass_options

Note that this branch is on 1.3-SNAPSHOT, but it should be possible for you to cherry-pick it onto a 1.2 branch.

I will add a ticket for this soon (currently getting timeouts in JIRA).

Regards,
Chesnay


On 12.04.2017 02:27, Tarandeep Singh wrote:
Hi,

I am using flink-1.2 and Cassandra connector to write to cassandra tables. I am using POJOs with DataStax annotations as described here-
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/cassandra.html

My question is- how are nulls handles by cassandra sink?

Datastax documentation on Mapper states that if we are using POJOs to store data in Cassandra table and the POJO has null fields, then it can create tombstones, so one should use saveNullFields(false) so that null fields are not persisted -
https://docs.datastax.com/en/developer/java-driver/3.1/manual/object_mapper/using/#mapper-options

Default behavior is to persist null fields.

In cassandra pojo sink code, I don't see this option set on Mapper-
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/streaming/connectors/cassandra/CassandraPojoSink.java

So does this mean, I can expect to see tombstones when writing data (assuming my POJOs have null fields). If yes, can we expose an option to disable saving null fields.

Thanks,
Tarandeep