JDBC table source

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

JDBC table source

Mohit Anchlia
We are looking to stream data from the database. Is there already a jdbc table source available for streaming?
Reply | Threaded
Open this post in threaded view
|

Re: JDBC table source

Fabian Hueske-2
Hi Mohit,

no, a JdbcTableSource does not exist yet. However, since there is a JdbcInputFormat it should not be hard to wrap that in a TableSource.
However, this would rather be a batch TableSource in the sense that it would just return the data that the query returns. Once all data is read it would terminate. You can of course wrap the JdbcInputFormat in a StreamingTableSource, but as I said it would terminate when all data was read.

If you are thinking of streaming a changelog stream from a database to the Table API / SQL, this would not be possible at the moment due to limitation in the Table API / SQL (these will be removed in the future).
Moreover, not many DBMS expose their changelog (such as PostgreSQL) and there is no common interface for that such as JDBC. Instead they use custom formats. There is a tool called Bottled Water that ingests PostgreSQL streams into Kafka.

So, to make a long story short: implementing a JDBC TableSource for batch query should be fairly easy. A true streaming solution that hooks into the changelog stream of a table is not possible at the moment.

Cheers, Fabian

2017-09-26 15:04 GMT-04:00 Mohit Anchlia <[hidden email]>:
We are looking to stream data from the database. Is there already a jdbc table source available for streaming?

Reply | Threaded
Open this post in threaded view
|

Re: JDBC table source

Mohit Anchlia
Thanks. Idea was to query for 'x' records in last 'n' seconds using an indexed column. Looks like that is not possible?

On Tue, Sep 26, 2017 at 3:24 PM, Fabian Hueske <[hidden email]> wrote:
Hi Mohit,

no, a JdbcTableSource does not exist yet. However, since there is a JdbcInputFormat it should not be hard to wrap that in a TableSource.
However, this would rather be a batch TableSource in the sense that it would just return the data that the query returns. Once all data is read it would terminate. You can of course wrap the JdbcInputFormat in a StreamingTableSource, but as I said it would terminate when all data was read.

If you are thinking of streaming a changelog stream from a database to the Table API / SQL, this would not be possible at the moment due to limitation in the Table API / SQL (these will be removed in the future).
Moreover, not many DBMS expose their changelog (such as PostgreSQL) and there is no common interface for that such as JDBC. Instead they use custom formats. There is a tool called Bottled Water that ingests PostgreSQL streams into Kafka.

So, to make a long story short: implementing a JDBC TableSource for batch query should be fairly easy. A true streaming solution that hooks into the changelog stream of a table is not possible at the moment.

Cheers, Fabian

2017-09-26 15:04 GMT-04:00 Mohit Anchlia <[hidden email]>:
We are looking to stream data from the database. Is there already a jdbc table source available for streaming?


Reply | Threaded
Open this post in threaded view
|

Re: JDBC table source

Fabian Hueske-2
Yes, there's no built-in TableSource for that.
However, it is certainly possible to implement a custom TableSource for your use case. The code of the JdbcInputFormat should be a good starting point. So you could run a query every n seconds (assuming you can consume the data of the last n seconds in n seconds). If you want to run the TableSource in parallel, you would need to partition the query (as for the JdbcInputFormat). 

2017-09-26 19:19 GMT-04:00 Mohit Anchlia <[hidden email]>:
Thanks. Idea was to query for 'x' records in last 'n' seconds using an indexed column. Looks like that is not possible?

On Tue, Sep 26, 2017 at 3:24 PM, Fabian Hueske <[hidden email]> wrote:
Hi Mohit,

no, a JdbcTableSource does not exist yet. However, since there is a JdbcInputFormat it should not be hard to wrap that in a TableSource.
However, this would rather be a batch TableSource in the sense that it would just return the data that the query returns. Once all data is read it would terminate. You can of course wrap the JdbcInputFormat in a StreamingTableSource, but as I said it would terminate when all data was read.

If you are thinking of streaming a changelog stream from a database to the Table API / SQL, this would not be possible at the moment due to limitation in the Table API / SQL (these will be removed in the future).
Moreover, not many DBMS expose their changelog (such as PostgreSQL) and there is no common interface for that such as JDBC. Instead they use custom formats. There is a tool called Bottled Water that ingests PostgreSQL streams into Kafka.

So, to make a long story short: implementing a JDBC TableSource for batch query should be fairly easy. A true streaming solution that hooks into the changelog stream of a table is not possible at the moment.

Cheers, Fabian

2017-09-26 15:04 GMT-04:00 Mohit Anchlia <[hidden email]>:
We are looking to stream data from the database. Is there already a jdbc table source available for streaming?