(DEPRECATED) Apache Flink User Mailing List archive.

Joining streamed data to reference data

Classic

List

Threaded

3 messages Options

Porritt, James

Joining streamed data to reference data

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

1) Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2) Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

Thanks,

James.

######################################################################

The information contained in this communication is confidential and

intended only for the individual(s) named above. If you are not a named

addressee, please notify the sender immediately and delete this email

from your system and do not disclose the email or any part of it to any

person. The views expressed in this email are the views of the author

and do not necessarily represent the views of Millennium Capital Partners

LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic

communications of MCP LLP and its affiliates, including telephone

communications, may be electronically archived and subject to review

and/or disclosure to someone other than the recipient. MCP LLP is

authorized and regulated by the Financial Conduct Authority. Millennium

Capital Partners LLP is a limited liability partnership registered in

England & Wales with number OC312897 and with its registered office at

50 Berkeley Street, London, W1J 8HD.

######################################################################

vino yang

Re: Joining streamed data to reference data

Hi Porritt,

Flink does not support streaming and batch join, currently, streaming and batch job are both independent.

I guess your use case is streaming and dimension table join? Unfortunately, it's not possible for the Flink SQL API to join a stream with a common dataset now.

1)
As a workaround, if the table is just a tiny one, you can achieve a inner/left outer join with the user defined table functions :

https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sql.html#joins

2)
I did not see any plan about this.

Thanks, vino.

2018-07-20 17:29 GMT+08:00 Porritt, James <[hidden email]>:

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

1) Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2) Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

Thanks,

James.

######################################################################
The information contained in this communication is confidential and
intended only for the individual(s) named above. If you are not a named
addressee, please notify the sender immediately and delete this email
from your system and do not disclose the email or any part of it to any
person. The views expressed in this email are the views of the author
and do not necessarily represent the views of Millennium Capital Partners
LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic
communications of MCP LLP and its affiliates, including telephone
communications, may be electronically archived and subject to review
and/or disclosure to someone other than the recipient. MCP LLP is
authorized and regulated by the Financial Conduct Authority. Millennium
Capital Partners LLP is a limited liability partnership registered in
England & Wales with number OC312897 and with its registered office at
50 Berkeley Street, London, W1J 8HD.
######################################################################

Dawid Wysakowicz-2

Re: Joining streamed data to reference data

Hi James,

1) Unfortunately, Flink does not support DataSet with DataStream joins as of now. If the "batch" table is small enough you might try the solution suggested by Vino to load it in the UDTF. You can also try implementing the Stream version of this table yourself. You can use the org.apache.flink.table.sources.CsvTableSource and org.apache.flink.orc.OrcRowInputFormat as examples.

2) Providing better out-of-the box support for multiple source and formats in high on the roadmap for upcoming releases. So I would guess you can expect support for orc in stream in the nearest future.

Best,

Dawid

On Fri, 20 Jul 2018 at 11:59, vino yang <[hidden email]> wrote:

Hi Porritt,

Flink does not support streaming and batch join, currently, streaming and batch job are both independent.

I guess your use case is streaming and dimension table join? Unfortunately, it's not possible for the Flink SQL API to join a stream with a common dataset now.

1)
As a workaround, if the table is just a tiny one, you can achieve a inner/left outer join with the user defined table functions :

https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/table/sql.html#joins

2)
I did not see any plan about this.

Thanks, vino.

2018-07-20 17:29 GMT+08:00 Porritt, James <[hidden email]>:

I was hoping to join a StreamTableSource to a BatchTableSource, but I find it’s not simple. A couple of questions:

1) Other than just pushing the DataSet to a Kafka topic (either internally or externally to the application) and reading it into a DataStream are there any means of doing the conversion?

2) Are there any plans to get OrcTableSource to be both StreamTableSource and BatchTableSource instead of just a BatchTableSource?

Thanks,

James.

######################################################################
The information contained in this communication is confidential and
intended only for the individual(s) named above. If you are not a named
addressee, please notify the sender immediately and delete this email
from your system and do not disclose the email or any part of it to any
person. The views expressed in this email are the views of the author
and do not necessarily represent the views of Millennium Capital Partners
LLP (MCP LLP) or any of its affiliates. Outgoing and incoming electronic
communications of MCP LLP and its affiliates, including telephone
communications, may be electronically archived and subject to review
and/or disclosure to someone other than the recipient. MCP LLP is
authorized and regulated by the Financial Conduct Authority. Millennium
Capital Partners LLP is a limited liability partnership registered in
England & Wales with number OC312897 and with its registered office at
50 Berkeley Street, London, W1J 8HD.
######################################################################