Joining streamed data to reference data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Joining streamed data to reference data

Porritt, James

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

 

1)      Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2)      Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

 

Thanks,

James.

######################################################################
The information contained in this communication is confidential and
intended only for the individual(s) named above. If you are not a named
addressee, please notify the sender immediately and delete this email
from your system and do not disclose the email or any part of it to any
person. The views expressed in this email are the views of the author
and do not necessarily represent the views of Millennium Capital Partners
LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic
communications of MCP LLP and its affiliates, including telephone
communications, may be electronically archived and subject to review
and/or disclosure to someone other than the recipient. MCP LLP is
authorized and regulated by the Financial Conduct Authority. Millennium
Capital Partners LLP is a limited liability partnership registered in
England & Wales with number OC312897 and with its registered office at
50 Berkeley Street, London, W1J 8HD.
######################################################################

Reply | Threaded
Open this post in threaded view
|

Re: Joining streamed data to reference data

vino yang
Hi Porritt,

Flink does not support streaming and batch join, currently, streaming and batch job are both independent.

I guess your use case is streaming and dimension table join?  Unfortunately, it's not possible for the Flink SQL API to join a stream with a common dataset now.

1)
As a workaround, if the table is just a tiny one, you can achieve a inner/left outer join with the user defined table functions :

https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sql.html#joins

2)
I did not see any plan about this.

Thanks, vino.


2018-07-20 17:29 GMT+08:00 Porritt, James <[hidden email]>:

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

 

1)      Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2)      Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

 

Thanks,

James.

######################################################################
The information contained in this communication is confidential and
intended only for the individual(s) named above. If you are not a named
addressee, please notify the sender immediately and delete this email
from your system and do not disclose the email or any part of it to any
person. The views expressed in this email are the views of the author
and do not necessarily represent the views of Millennium Capital Partners
LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic
communications of MCP LLP and its affiliates, including telephone
communications, may be electronically archived and subject to review
and/or disclosure to someone other than the recipient. MCP LLP is
authorized and regulated by the Financial Conduct Authority. Millennium
Capital Partners LLP is a limited liability partnership registered in
England & Wales with number OC312897 and with its registered office at
50 Berkeley Street, London, W1J 8HD.
######################################################################


Reply | Threaded
Open this post in threaded view
|

Re: Joining streamed data to reference data

Dawid Wysakowicz-2
Hi James,

1) Unfortunately, Flink does not support DataSet with DataStream joins as of now. If the "batch" table is small enough you might try the solution suggested by Vino to load it in the UDTF. You can also try implementing the Stream version of this table yourself. You can use the org.apache.flink.table.sources.CsvTableSource and org.apache.flink.orc.OrcRowInputFormat as examples.

2) Providing better out-of-the box support for multiple source and formats in high on the roadmap for upcoming releases. So I would guess you can expect support for orc in stream in the nearest future.

Best,
Dawid

On Fri, 20 Jul 2018 at 11:59, vino yang <[hidden email]> wrote:
Hi Porritt,

Flink does not support streaming and batch join, currently, streaming and batch job are both independent.

I guess your use case is streaming and dimension table join?  Unfortunately, it's not possible for the Flink SQL API to join a stream with a common dataset now.

1)
As a workaround, if the table is just a tiny one, you can achieve a inner/left outer join with the user defined table functions :

https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sql.html#joins

2)
I did not see any plan about this.

Thanks, vino.


2018-07-20 17:29 GMT+08:00 Porritt, James <[hidden email]>:

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

 

1)      Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2)      Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

 

Thanks,

James.

######################################################################
The information contained in this communication is confidential and
intended only for the individual(s) named above. If you are not a named
addressee, please notify the sender immediately and delete this email
from your system and do not disclose the email or any part of it to any
person. The views expressed in this email are the views of the author
and do not necessarily represent the views of Millennium Capital Partners
LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic
communications of MCP LLP and its affiliates, including telephone
communications, may be electronically archived and subject to review
and/or disclosure to someone other than the recipient. MCP LLP is
authorized and regulated by the Financial Conduct Authority. Millennium
Capital Partners LLP is a limited liability partnership registered in
England & Wales with number OC312897 and with its registered office at
50 Berkeley Street, London, W1J 8HD.
######################################################################