(DEPRECATED) Apache Flink User Mailing List archive.

fink sql client not able to read parquet format table

Classic

List

Threaded

8 messages Options

wanglei2@geekplus.com.cn

fink sql client not able to read parquet format table

Hive table stored as parquet.

Under hive client:

hive> select robotid from robotparquet limit 2;

1291097

1291044

But under flink sql-client the result is 0

Flink SQL> select robotid from robotparquet limit 2;

robotid

Any insight on this?

Thanks，

Lei

[hidden email]

Jark Wu-3

Re: fink sql client not able to read parquet format table

Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,

Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]

wanglei2@geekplus.com.cn

Re: Re: fink sql client not able to read parquet format table

I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.

But flink sql client will not work.

DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))

For example

RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and

CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)

is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,

Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]

Jingsong Li

Re: Re: fink sql client not able to read parquet format table

Hi lei,

Which hive version did you use?

Can you share the complete hive DDL?

Best,

Jingsong Lee

On Thu, Apr 9, 2020 at 7:15 PM [hidden email] <[hidden email]> wrote:

I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]

Best, Jingsong Lee

wanglei2@geekplus.com.cn

Re: Re: fink sql client not able to read parquet format table

I am using Hive 3.1.1

The table has many fields, each field is corresponded to a feild in the RobotUploadData0101 class.

CREATE TABLE `robotparquet`(`robotid` int, `framecount` int, `robottime` bigint, `robotpathmode` int, `movingmode` int, `submovingmode` int, `xlocation` int, `ylocation` int, `robotradangle` int, `velocity` int, `acceleration` int, `angularvelocity` int, `angularacceleration` int, `literangle` int, `shelfangle` int, `onloadshelfid` int, `rcvdinstr` int, `sensordist` int, `pathstate` int, `powerpresent` int, `neednewpath` int, `pathelenum` int, `taskstate` int, `receivedtaskid` int, `receivedcommcount` int, `receiveddispatchinstr` int, `receiveddispatchcount` int, `subtaskmode` int, `versiontype` int, `version` int, `liftheight` int, `codecheckstatus` int, `cameraworkmode` int, `backrimstate` int, `frontrimstate` int, `pathselectstate` int, `codemisscount` int, `groundcameraresult` int, `shelfcameraresult` int, `softwarerespondframe` int, `paramstate` int, `pilotlamp` int, `codecount` int, `dist2waitpoint` int, `targetdistance` int, `obstaclecount` int, `obstacleframe` int, `cellcodex` int, `cellcodey` int, `cellangle` int, `shelfqrcode` int, `shelfqrangle` int, `shelfqrx` int, `shelfqry` int, `trackthetaerror` int, `tracksideerror` int, `trackfuseerror` int, `lifterangleerror` int, `lifterheighterror` int, `linearcmdspeed` int, `angluarcmdspeed` int, `liftercmdspeed` int, `rotatorcmdspeed` int) PARTITIONED BY (`hour` string) STORED AS parquet;

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 21:45
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

Which hive version did you use?
Can you share the complete hive DDL?

Best,
Jingsong Lee
On Thu, Apr 9, 2020 at 7:15 PM [hidden email] <[hidden email]> wrote:
I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]
--
Best, Jingsong Lee

Jingsong Li

Re: Re: fink sql client not able to read parquet format table

Hi lei,

I think the reason is that our `HiveMapredSplitReader` not supports name mapping reading for parquet format.

Can you create a JIRA for tracking this?

Best,

Jingsong Lee

On Fri, Apr 10, 2020 at 9:42 AM [hidden email] <[hidden email]> wrote:

I am using Hive 3.1.1
The table has many fields, each field is corresponded to a feild in the RobotUploadData0101 class.

CREATE TABLE `robotparquet`(`robotid` int, `framecount` int, `robottime` bigint, `robotpathmode` int, `movingmode` int, `submovingmode` int, `xlocation` int, `ylocation` int, `robotradangle` int, `velocity` int, `acceleration` int, `angularvelocity` int, `angularacceleration` int, `literangle` int, `shelfangle` int, `onloadshelfid` int, `rcvdinstr` int, `sensordist` int, `pathstate` int, `powerpresent` int, `neednewpath` int, `pathelenum` int, `taskstate` int, `receivedtaskid` int, `receivedcommcount` int, `receiveddispatchinstr` int, `receiveddispatchcount` int, `subtaskmode` int, `versiontype` int, `version` int, `liftheight` int, `codecheckstatus` int, `cameraworkmode` int, `backrimstate` int, `frontrimstate` int, `pathselectstate` int, `codemisscount` int, `groundcameraresult` int, `shelfcameraresult` int, `softwarerespondframe` int, `paramstate` int, `pilotlamp` int, `codecount` int, `dist2waitpoint` int, `targetdistance` int, `obstaclecount` int, `obstacleframe` int, `cellcodex` int, `cellcodey` int, `cellangle` int, `shelfqrcode` int, `shelfqrangle` int, `shelfqrx` int, `shelfqry` int, `trackthetaerror` int, `tracksideerror` int, `trackfuseerror` int, `lifterangleerror` int, `lifterheighterror` int, `linearcmdspeed` int, `angluarcmdspeed` int, `liftercmdspeed` int, `rotatorcmdspeed` int) PARTITIONED BY (`hour` string) STORED AS parquet;

Thanks,
Lei

[hidden email]
From: [hidden email]
Date: 2020-04-09 21:45
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

Which hive version did you use?
Can you share the complete hive DDL?

Best,
Jingsong Lee
On Thu, Apr 9, 2020 at 7:15 PM [hidden email] <[hidden email]> wrote:
I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]
--
Best, Jingsong Lee

Best, Jingsong Lee

wanglei2@geekplus.com.cn

Re: Re: fink sql client not able to read parquet format table

https://issues.apache.org/jira/browse/FLINK-17086

It is my first time to create a flink jira issue.

Just point it out and correct it if I write something wrong.

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-10 11:03
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

I think the reason is that our `HiveMapredSplitReader` not supports name mapping reading for parquet format.
Can you create a JIRA for tracking this?

Best,
Jingsong Lee
On Fri, Apr 10, 2020 at 9:42 AM [hidden email] <[hidden email]> wrote:
I am using Hive 3.1.1
The table has many fields, each field is corresponded to a feild in the RobotUploadData0101 class.

CREATE TABLE `robotparquet`(`robotid` int, `framecount` int, `robottime` bigint, `robotpathmode` int, `movingmode` int, `submovingmode` int, `xlocation` int, `ylocation` int, `robotradangle` int, `velocity` int, `acceleration` int, `angularvelocity` int, `angularacceleration` int, `literangle` int, `shelfangle` int, `onloadshelfid` int, `rcvdinstr` int, `sensordist` int, `pathstate` int, `powerpresent` int, `neednewpath` int, `pathelenum` int, `taskstate` int, `receivedtaskid` int, `receivedcommcount` int, `receiveddispatchinstr` int, `receiveddispatchcount` int, `subtaskmode` int, `versiontype` int, `version` int, `liftheight` int, `codecheckstatus` int, `cameraworkmode` int, `backrimstate` int, `frontrimstate` int, `pathselectstate` int, `codemisscount` int, `groundcameraresult` int, `shelfcameraresult` int, `softwarerespondframe` int, `paramstate` int, `pilotlamp` int, `codecount` int, `dist2waitpoint` int, `targetdistance` int, `obstaclecount` int, `obstacleframe` int, `cellcodex` int, `cellcodey` int, `cellangle` int, `shelfqrcode` int, `shelfqrangle` int, `shelfqrx` int, `shelfqry` int, `trackthetaerror` int, `tracksideerror` int, `trackfuseerror` int, `lifterangleerror` int, `lifterheighterror` int, `linearcmdspeed` int, `angluarcmdspeed` int, `liftercmdspeed` int, `rotatorcmdspeed` int) PARTITIONED BY (`hour` string) STORED AS parquet;

Thanks,
Lei

[hidden email]
From: [hidden email]
Date: 2020-04-09 21:45
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

Which hive version did you use?
Can you share the complete hive DDL?

Best,
Jingsong Lee
On Thu, Apr 9, 2020 at 7:15 PM [hidden email] <[hidden email]> wrote:
I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]
--
Best, Jingsong Lee
--
Best, Jingsong Lee

Jingsong Li

Re: Re: fink sql client not able to read parquet format table

Thanks, looks well, nice job!

Best,

Jingsong Lee

On Fri, Apr 10, 2020 at 5:56 PM [hidden email] <[hidden email]> wrote:

https://issues.apache.org/jira/browse/FLINK-17086

It is my first time to create a flink jira issue.
Just point it out and correct it if I write something wrong.

Thanks,
Lei

[hidden email]
From: [hidden email]
Date: 2020-04-10 11:03
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

I think the reason is that our `HiveMapredSplitReader` not supports name mapping reading for parquet format.
Can you create a JIRA for tracking this?

Best,
Jingsong Lee
On Fri, Apr 10, 2020 at 9:42 AM [hidden email] <[hidden email]> wrote:
I am using Hive 3.1.1
The table has many fields, each field is corresponded to a feild in the RobotUploadData0101 class.

CREATE TABLE `robotparquet`(`robotid` int, `framecount` int, `robottime` bigint, `robotpathmode` int, `movingmode` int, `submovingmode` int, `xlocation` int, `ylocation` int, `robotradangle` int, `velocity` int, `acceleration` int, `angularvelocity` int, `angularacceleration` int, `literangle` int, `shelfangle` int, `onloadshelfid` int, `rcvdinstr` int, `sensordist` int, `pathstate` int, `powerpresent` int, `neednewpath` int, `pathelenum` int, `taskstate` int, `receivedtaskid` int, `receivedcommcount` int, `receiveddispatchinstr` int, `receiveddispatchcount` int, `subtaskmode` int, `versiontype` int, `version` int, `liftheight` int, `codecheckstatus` int, `cameraworkmode` int, `backrimstate` int, `frontrimstate` int, `pathselectstate` int, `codemisscount` int, `groundcameraresult` int, `shelfcameraresult` int, `softwarerespondframe` int, `paramstate` int, `pilotlamp` int, `codecount` int, `dist2waitpoint` int, `targetdistance` int, `obstaclecount` int, `obstacleframe` int, `cellcodex` int, `cellcodey` int, `cellangle` int, `shelfqrcode` int, `shelfqrangle` int, `shelfqrx` int, `shelfqry` int, `trackthetaerror` int, `tracksideerror` int, `trackfuseerror` int, `lifterangleerror` int, `lifterheighterror` int, `linearcmdspeed` int, `angluarcmdspeed` int, `liftercmdspeed` int, `rotatorcmdspeed` int) PARTITIONED BY (`hour` string) STORED AS parquet;

Thanks,
Lei

[hidden email]
From: [hidden email]
Date: 2020-04-09 21:45
To: [hidden email]
CC: [hidden email]; [hidden email]; [hidden email]
Subject: Re: Re: fink sql client not able to read parquet format table
Hi lei,

Which hive version did you use?
Can you share the complete hive DDL?

Best,
Jingsong Lee
On Thu, Apr 9, 2020 at 7:15 PM [hidden email] <[hidden email]> wrote:
I am using the newest 1.10 blink planner.

Perhaps it is because of the method i used to write the parquet file.

Receive kafka message, transform each message to a Java class Object, write the Object to HDFS using StreamingFileSink, add the HDFS path as a partition of the hive table

No matter what the order of the field description in hive ddl statement, the hive client will work, as long as the field name is the same with Java Object field name.
But flink sql client will not work.
DataStream<RobotUploadData0101> sourceRobot = source.map( x->transform(x));
final StreamingFileSink<RobotUploadData0101> sink;
sink = StreamingFileSink
    .forBulkFormat(new Path("hdfs://172.19.78.38:8020/user/root/wanglei/robotdata/parquet"),
        ParquetAvroWriters.forReflectRecord(RobotUploadData0101.class))
For example
RobotUploadData0101 has two fields: robotId int, robotTime long

CREATE TABLE `robotparquet`( `robotid` int, `robottime` bigint ) and
CREATE TABLE `robotparquet`( `robottime` bigint, `robotid` int)
is the same for hive client, but is different for flink-sql client

It is an expected behavior?

Thanks,
Lei

[hidden email]

From: [hidden email]
Date: 2020-04-09 14:48
To: [hidden email]; [hidden email]; [hidden email]
CC: [hidden email]
Subject: Re: fink sql client not able to read parquet format table
Hi Lei,

Are you using the newest 1.10 blink planner?

I'm not familiar with Hive and parquet, but I know [hidden email] and [hidden email] are experts on this. Maybe they can help on this question.

Best,
Jark

On Tue, 7 Apr 2020 at 16:17, [hidden email] <[hidden email]> wrote:

Hive table stored as parquet.

Under hive client:
hive> select robotid from robotparquet limit 2;
OK
1291097
1291044

But under flink sql-client the result is 0
Flink SQL> select robotid from robotparquet limit 2;
robotid
0
0

Any insight on this?

Thanks，
Lei

[hidden email]
--
Best, Jingsong Lee
--
Best, Jingsong Lee

Best, Jingsong Lee