autoCommit for postgres jdbc streaming in Table/SQL API

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.

 

I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down.

 

I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.

 

Am I missing something on how to pull this off?

 

Regards,

Dylan Forciea

Oseberg

Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

I hadn’t heard a response on this, so I’m going to expand this to the dev email list.

 

If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.

 

Regards,

Dylan Forciea

Oseberg

 

From: Dylan Forciea <[hidden email]>
Date: Thursday, October 1, 2020 at 5:14 PM
To: "[hidden email]" <[hidden email]>
Subject: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.

 

I see that the API has the scan.fetch-size option, but not scan.auto-commit per https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit option, and I was then able to immediately start streaming and cut my memory usage way down.

 

I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.

 

Am I missing something on how to pull this off?

 

Regards,

Dylan Forciea

Oseberg

Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

 

Thanks,

Dylan

 

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Sorry for late response. +1 to support it. I will open a jira about it later. 

 

Dylan Forciea <[hidden email]>2020107 周三下午9:53写道:












 

 

I hadn’t heard a response on this, so I’m going to expand this to the dev email list.

 

 

 

If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.

 

 

 

Regards,

 

Dylan Forciea

 

Oseberg

 

 

 

 

From: Dylan Forciea <[hidden email]>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]" <[hidden email]>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API

 

 

 

 

 

 

Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.

 

 

 

I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html
. I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.

 

 

 

I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html
, but I don’t see a way to utilize that in the Table/SQL API.

 

 

 

Am I missing something on how to pull this off?

 

 

 

Regards,

 

Dylan Forciea

 

Oseberg

 

 




Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Till Rohrmann
Hi Dylan,

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.


Cheers,
Till

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:
I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>于2020年10月7日 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg




Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Till Rohrmann
This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

Cheers,
Till

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote:

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg





Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

 

[ERROR] Failures:

[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46

Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)

     but: a value less than or equal to <9010> <9040> was greater than <9010>

[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60

[INFO]

[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

 

Thanks,

Dylan Forciea

 

From: Till Rohrmann <[hidden email]>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <[hidden email]>
Cc: dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote:

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg




Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Jark Wu-3
Hi Dylan,

Sorry for the late reply. We've just come back from a long holiday. 

Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). 
We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term? 
Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master. 
I'm glad to review the code. 

What do you think? 

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1]

Best,
Jark


On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]> wrote:

I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

 

[ERROR] Failures:

[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46

Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)

     but: a value less than or equal to <9010> <9040> was greater than <9010>

[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60

[INFO]

[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

 

Thanks,

Dylan Forciea

 

From: Till Rohrmann <[hidden email]>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <[hidden email]>
Cc: dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote:

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg




Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Dylan Forciea

Jark,

 

Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set to false for streaming to work. The default on the driver is true when the option isn’t specified. I’ve updated the issue accordingly.

 

Setting this to false automatically on the read path would fix my issue. However, I’m only certain that this is proper for Postgres. I’m not sure if this should be done for other drivers, although my gut would say it should be fine if it’s only done for reading. My patch as it is will set the builder to not specify whether to set autoCommit if the option is not specified, which means it would then be left at the default of true. That would conflict with the 1.11 patch you suggested. Let me know if you think I should make the default true in the SQL API.

 

https://github.com/apache/flink/pull/13570

 

Regards,

Dylan

 

From: Jark Wu <[hidden email]>
Date: Thursday, October 8, 2020 at 10:15 PM
To: Dylan Forciea <[hidden email]>
Cc: Till Rohrmann <[hidden email]>, dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

Sorry for the late reply. We've just come back from a long holiday. 

 

Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). 

We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term? 

Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master. 

I'm glad to review the code. 

 

What do you think? 

 

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1]

 

Best,

Jark

 

 

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]> wrote:

I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

 

[ERROR] Failures:

[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46

Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)

     but: a value less than or equal to <9010> <9040> was greater than <9010>

[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60

[INFO]

[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

 

Thanks,

Dylan Forciea

 

From: Till Rohrmann <[hidden email]>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <[hidden email]>
Cc: dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote:

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg



Reply | Threaded
Open this post in threaded view
|

Re: autoCommit for postgres jdbc streaming in Table/SQL API

Jark Wu-3
Thanks for the explanation. I will review the pull request. Let's move the discussion to the PR.

Best,
Jark

On Fri, 9 Oct 2020 at 21:06, Dylan Forciea <[hidden email]> wrote:

Jark,

 

Thank you! I had actually mistyped the JIRA issue; autoCommit needs to be set to false for streaming to work. The default on the driver is true when the option isn’t specified. I’ve updated the issue accordingly.

 

Setting this to false automatically on the read path would fix my issue. However, I’m only certain that this is proper for Postgres. I’m not sure if this should be done for other drivers, although my gut would say it should be fine if it’s only done for reading. My patch as it is will set the builder to not specify whether to set autoCommit if the option is not specified, which means it would then be left at the default of true. That would conflict with the 1.11 patch you suggested. Let me know if you think I should make the default true in the SQL API.

 

https://github.com/apache/flink/pull/13570

 

Regards,

Dylan

 

From: Jark Wu <[hidden email]>
Date: Thursday, October 8, 2020 at 10:15 PM
To: Dylan Forciea <[hidden email]>
Cc: Till Rohrmann <[hidden email]>, dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

Sorry for the late reply. We've just come back from a long holiday. 

 

Thanks for reporting this problem. First, I think this is a bug that `autoCommit` is false by default (JdbcRowDataInputFormat.Builder). 

We can fix the default to true in 1.11 series, and I think this can solve your problem in a short term? 

Besides, we should expose the connector options to set auto commit and this can be another issue to be implemented in master. 

I'm glad to review the code. 

 

What do you think? 

 

Regarding to the failed JMXReporterFactoryTest, I think this is a known issue, see FLINK-19539 [1]

 

Best,

Jark

 

 

On Fri, 9 Oct 2020 at 01:29, Dylan Forciea <[hidden email]> wrote:

I’ve updated the unit tests and documentation, and I was running the azure test pipeline as described in the instructions. However, it appears that what seems to be an unrelated test for the JMX code failed. Is this a matter of me not setting things up correctly? I wanted to ensure everything looked good before I submitted the PR.

 

[ERROR] Failures:

[ERROR]   JMXReporterFactoryTest.testPortRangeArgument:46

Expected: (a value equal to or greater than <9000> and a value less than or equal to <9010>)

     but: a value less than or equal to <9010> <9040> was greater than <9010>

[ERROR]   JMXReporterFactoryTest.testWithoutArgument:60

[INFO]

[ERROR] Tests run: 10, Failures: 2, Errors: 0, Skipped: 0

 

Thanks,

Dylan Forciea

 

From: Till Rohrmann <[hidden email]>
Date: Thursday, October 8, 2020 at 2:29 AM
To: Dylan Forciea <[hidden email]>
Cc: dev <[hidden email]>, Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

This sounds good. Maybe there are others in the community who can help with the review before the Jark and Leonard are back.

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 7:33 PM Dylan Forciea <[hidden email]> wrote:

Actually…. It looks like what I did covers both cases. I’ll see about getting some unit tests and documentation updated.

 

Dylan

 

From: Dylan Forciea <[hidden email]>
Date: Wednesday, October 7, 2020 at 11:47 AM
To: Till Rohrmann <[hidden email]>, dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Ok, I have created FLINK-19522 describing the issue. I have the code I made so far checked in at https://github.com/apache/flink/compare/master...dforciea:FLINK-19522 but this only fixes the SQL API. It sounds like there may be another change needed for the Table API… I’ll look into that and see if I can figure it out on my own while they’re out. I will also need to add some unit tests and update some documentation to get this ready for a PR.

 

Thanks,

Dylan

 

From: Till Rohrmann <[hidden email]>
Date: Wednesday, October 7, 2020 at 10:55 AM
To: dev <[hidden email]>
Cc: Shengkai Fang <[hidden email]>, "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>, Leonard Xu <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

 

Hi Dylan,

 

thanks for reaching out to the Flink community and excuse our late response. I am not an expert for the Table API and its JDBC connector but what you describe sounds like a missing feature. Also given that FLINK-12198 enabled this feature for the JDBCInputFormat indicates that we might simply need to make it configurable from the JdbcTableSource. I am pulling in Jark and Leonard who worked on the JdbcTableSource and might help you to get this feature into Flink. Their response could take a week because they are currently on vacation if I am not mistaken.

 

What you could already do is to open an issue linking FLINK-12198 and describing the problem and your solution proposal.

 

 

Cheers,

Till

 

On Wed, Oct 7, 2020 at 5:00 PM Dylan Forciea <[hidden email]> wrote:

I appreciate it! Let me know if you want me to submit a PR against the issue after it is created. It wasn’t a huge amount of code, so it’s probably not a big deal if you wanted to redo it.

Thanks,
Dylan

From: Shengkai Fang <[hidden email]>
Date: Wednesday, October 7, 2020 at 9:06 AM
To: Dylan Forciea <[hidden email]>
Subject: Re: autoCommit for postgres jdbc streaming in Table/SQL API

Sorry for late response. +1 to support it. I will open a jira about it later.

Dylan Forciea <[hidden email]<mailto:[hidden email]>>2020107 周三下午9:53写道:













I hadn’t heard a response on this, so I’m going to expand this to the dev email list.



If this is indeed an issue and not my misunderstanding, I have most of a patch already coded up. Please let me know, and I can create a JIRA issue and send out a PR.



Regards,

Dylan Forciea

Oseberg




From: Dylan Forciea <[hidden email]<mailto:[hidden email]>>


Date: Thursday, October 1, 2020 at 5:14 PM


To: "[hidden email]<mailto:[hidden email]>" <[hidden email]<mailto:[hidden email]>>


Subject: autoCommit for postgres jdbc streaming in Table/SQL API






Hi! I’ve just recently started evaluating Flink for our ETL needs, and I ran across an issue with streaming postgres data via the Table/SQL API.



I see that the API has the scan.fetch-size option, but not scan.auto-commit per



https://ci.apache.org/projects/flink/flink-docs-master/dev/table/connectors/jdbc.html . I had attempted to load a large table in, but it completely slurped it into memory before starting the streaming. I modified the flink source code to add a scan.auto-commit

option, and I was then able to immediately start streaming and cut my memory usage way down.



I see in this thread that there was a similar issue resolved for JDBCInputFormat in this thread:



http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-JDBC-Disable-auto-commit-mode-td27256.html , but I don’t see a way to utilize that in the Table/SQL API.



Am I missing something on how to pull this off?



Regards,

Dylan Forciea

Oseberg