Hi ,
I was looking into flink streaming api and trying to implement the solution for reading the data from jdbc database and writing them to jdbc databse again. At the moment i can see the datastream is returning Row from the database. dataStream.getType().getGenericParameters() retuning an empty list of collection. I am right now manually creating a database connection and getting the schema from ResultMetadata and constructing the schema for the table which is a bit heavy operation. So is there any other way to get the schema for the table in order to create a new table and write those records in the database ? Please let me know Thanks |
I'm not sure how well this works for the streaming API. Looping in
Chesnay, who worked on this. On Mon, Feb 6, 2017 at 11:09 AM, Punit Tandel <[hidden email]> wrote: > Hi , > > I was looking into flink streaming api and trying to implement the solution > for reading the data from jdbc database and writing them to jdbc databse > again. > > At the moment i can see the datastream is returning Row from the database. > dataStream.getType().getGenericParameters() retuning an empty list of > collection. > > I am right now manually creating a database connection and getting the > schema from ResultMetadata and constructing the schema for the table which > is a bit heavy operation. > > So is there any other way to get the schema for the table in order to create > a new table and write those records in the database ? > > Please let me know > > Thanks > Punit |
Currently, there is no streaming JDBC connector. Check out this thread from last year: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/JDBC-Streaming-Connector-td10508.html On Mon, Feb 6, 2017 at 5:00 PM, Ufuk Celebi <[hidden email]> wrote: I'm not sure how well this works for the streaming API. Looping in |
Hi Robert Thanks for the response, So in near future release of the flink version , is this functionality going to be implemented ? Thanks On 02/07/2017 04:12 PM, Robert Metzger
wrote:
|
Hello,
I don't understand why you explicitly need the schema since the batch JDBCInput-/Outputformats don't require it. That's kind of the nice thing about Rows. Would be cool if you could tell us what you're planning to do with the schema :) In any case, to get the schema within the plan then you will have to query the DB and build it yourself. Note that this is executed on the client. Regards, Chesnay On 08.02.2017 00:39, Punit Tandel wrote:
|
Hi Chesnay Currently that is what i have done, reading the schema from database in order to create a new table in jdbc database and writing the rows coming from jdbcinputformat. Overall i am trying to implement the solution which reads the streaming data from one source which either could be coming from kafka, Jdbc, Hive, Hdfs and writing those streaming data to output source which is again could be any of those. For a simple use case i have just taken one scenario using jdbc in and jdbc out, Since the jdbc input source returns the datastream of Row and to write them into jdbc database we have to create a table which requires schema. Thanks
On 02/08/2017 08:22 AM, Chesnay
Schepler wrote:
|
Hello,
in the JDBC case i would suggest that you extract the schema from the first Row that your sink receives, create the table, and then start writing data. However, keep in mind that Rows can contain null fields; so you may not be able to extract the entire schema if the first row has a null somewhere. Regards, Chesnay On 08.02.2017 10:48, Punit Tandel wrote:
|
HI With this approach i will be able to get data types but not the column names because TypeInformation<?> typeInformation = dataStream.getType() will return types but not the columns names. Is there any other way to get the column names from Row? Thanks On 02/08/2017 10:17 AM, Chesnay
Schepler wrote:
|
I also thought about it and my conclusion was to use a generic sql parser (e.g. Calcite?) to extract the column names from the input query (because in the query you can rename/add fields...).. I'd like to hear opinions about this..unfortunately I don't have the time to implement this right now :(
On Wed, Feb 8, 2017 at 1:59 PM, Punit Tandel <[hidden email]> wrote:
|
Ok . I am right now simply taking a POJO to get the data types and schema but needed generic approach to get these information. Thanks On 02/08/2017 01:37 PM, Flavio
Pompermaier wrote:
|
In reply to this post by Punit Tandel
Hi All,
Is there any preferred way to manage multiple jdbc connections from flink..? I am new to flink and looking for some guidance around the right pattern and apis to do this. The usecase needs to route a stream to a particular jdbc
connection depending on a field value.So the records are written to multiple destination dbs.
Thanks
Sathi
Sent from my iPhone =============Notice to Recipient: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged, and intended for the use of the individual or entity named above. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you. ============= |
Hi Sathi, you can split select or filter your data stream based on the field's value. Then you are able to obtain multiple data streams which you can output using a JDBCOutputFormat for each data stream. Be aware, however, that the JDBCOutputFormat does not give you any processing guarantees since it does not take part in Flink's checkpointing mechanism. Unfortunately, Flink does not have a streaming JDBC connector, yet. Cheers, Till On Thu, Mar 2, 2017 at 7:21 AM, Sathi Chowdhury <[hidden email]> wrote:
|
Hi Till,
Thanks for your reply.I guess I will have to write a custom sink function that will use JdbcOutputFormat. I have a question about checkpointing support though ..if I am reading a stream from kinesis , streamA and it is transformed
to streamB, and that is written to db, as streamB is checkpointed when program recovers will it start from the streamB's Checkpointed offset ? In that case checkpointing the jdbc side is not so important maybe ..
Thanks
Sathi
=============Notice to Recipient: This e-mail transmission, and any documents, files or previous e-mail messages attached to it may contain information that is confidential or legally privileged, and intended for the use of the individual or entity named above. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of any of the information contained in or attached to this transmission is STRICTLY PROHIBITED. If you have received this transmission in error, please immediately notify the sender by telephone or return e-mail and delete the original transmission and its attachments without reading or saving in any manner. Thank you. ============= |
Hi Sathi, if you read data from Kinesis than Flink can offer you exactly once processing guarantees. However, what you see written out to your database depends a little bit on the implementation of your custom sink. If you have synchronous JDBC client which does not lose data and you fail your job whenever you see an error, then you should achieve at least once. Cheers, Till On Thu, Mar 2, 2017 at 4:49 PM, Sathi Chowdhury <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |