Giving useful names to the SQL steps/operators.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Giving useful names to the SQL steps/operators.

Niels Basjes
Hi,

I'm playing around with the streaming SQL engine in combination with the UDF I wrote ( https://yauaa.basjes.nl/UDF-ApacheFlinkTable.html ) .
I generated an SQL statement to extract all possible fields of my UDF (i.e. many fields) and what I found is that the names of the steps in the logging and the UI become ... very very large.

In fact they become so large that it is hard to read what the step is actually doing.

As an example I get log messages like this (This is 1 logline, I added newlines for readability in this email).

2020-02-29 14:48:13,148 WARN org.apache.flink.metrics.MetricGroup - The operator name
select: (EventTime, useragent,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceClass') AS DeviceClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceName') AS DeviceName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceBrand') AS DeviceBrand,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpu') AS DeviceCpu,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpuBits') AS DeviceCpuBits,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceVersion') AS DeviceVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemClass') AS OperatingSystemClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemName') AS OperatingSystemName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemNameVersion') AS OperatingSystemNameVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineClass') AS LayoutEngineClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineName') AS LayoutEngineName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineVersionMajor') AS LayoutEngineVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineNameVersionMajor') AS LayoutEngineNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentClass') AS AgentClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentName') AS AgentName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentVersionMajor') AS AgentVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentNameVersionMajor') AS AgentNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguage') AS AgentLanguage,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguageCode') AS AgentLanguageCode,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationEmail') AS AgentInformationEmail,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationUrl') AS AgentInformationUrl,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentSecurity') AS AgentSecurity,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppName') AS WebviewAppName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppNameVersionMajor') AS WebviewAppNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Anonymized') AS Anonymized,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerAttackVector') AS HackerAttackVector,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerToolkit') AS HackerToolkit,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboAffiliate') AS KoboAffiliate,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboPlatformId') AS KoboPlatformId,
ITEM(ParseUserAgent(useragent), _UTF-16LE'IECompatibilityNameVersionMajor') AS IECompatibilityNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Carrier') AS Carrier,
ITEM(ParseUserAgent(useragent), _UTF-16LE'NetworkType') AS NetworkType, clicks, visitors)
exceeded the 80 characters length limit and was truncated.


As you can see this impacts not only the names of the steps but also the metrics. 

My question if it is possible to specify a name for the step, similar to what I can do in the Java code?

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply | Threaded
Open this post in threaded view
|

Re: Giving useful names to the SQL steps/operators.

Yuval Itzchakov

Unfortunately, it isn't possible. You can't set names to steps like ordinary Java/Scala functions.

On Sat, 29 Feb 2020, 17:11 Niels Basjes, <[hidden email]> wrote:
Hi,

I'm playing around with the streaming SQL engine in combination with the UDF I wrote ( https://yauaa.basjes.nl/UDF-ApacheFlinkTable.html ) .
I generated an SQL statement to extract all possible fields of my UDF (i.e. many fields) and what I found is that the names of the steps in the logging and the UI become ... very very large.

In fact they become so large that it is hard to read what the step is actually doing.

As an example I get log messages like this (This is 1 logline, I added newlines for readability in this email).

2020-02-29 14:48:13,148 WARN org.apache.flink.metrics.MetricGroup - The operator name
select: (EventTime, useragent,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceClass') AS DeviceClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceName') AS DeviceName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceBrand') AS DeviceBrand,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpu') AS DeviceCpu,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpuBits') AS DeviceCpuBits,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceVersion') AS DeviceVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemClass') AS OperatingSystemClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemName') AS OperatingSystemName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemNameVersion') AS OperatingSystemNameVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineClass') AS LayoutEngineClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineName') AS LayoutEngineName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineVersionMajor') AS LayoutEngineVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineNameVersionMajor') AS LayoutEngineNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentClass') AS AgentClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentName') AS AgentName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentVersionMajor') AS AgentVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentNameVersionMajor') AS AgentNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguage') AS AgentLanguage,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguageCode') AS AgentLanguageCode,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationEmail') AS AgentInformationEmail,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationUrl') AS AgentInformationUrl,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentSecurity') AS AgentSecurity,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppName') AS WebviewAppName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppNameVersionMajor') AS WebviewAppNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Anonymized') AS Anonymized,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerAttackVector') AS HackerAttackVector,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerToolkit') AS HackerToolkit,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboAffiliate') AS KoboAffiliate,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboPlatformId') AS KoboPlatformId,
ITEM(ParseUserAgent(useragent), _UTF-16LE'IECompatibilityNameVersionMajor') AS IECompatibilityNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Carrier') AS Carrier,
ITEM(ParseUserAgent(useragent), _UTF-16LE'NetworkType') AS NetworkType, clicks, visitors)
exceeded the 80 characters length limit and was truncated.


As you can see this impacts not only the names of the steps but also the metrics. 

My question if it is possible to specify a name for the step, similar to what I can do in the Java code?

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply | Threaded
Open this post in threaded view
|

Re: Giving useful names to the SQL steps/operators.

Niels Basjes
Thanks.

On Sat, Feb 29, 2020 at 4:20 PM Yuval Itzchakov <[hidden email]> wrote:

Unfortunately, it isn't possible. You can't set names to steps like ordinary Java/Scala functions.

On Sat, 29 Feb 2020, 17:11 Niels Basjes, <[hidden email]> wrote:
Hi,

I'm playing around with the streaming SQL engine in combination with the UDF I wrote ( https://yauaa.basjes.nl/UDF-ApacheFlinkTable.html ) .
I generated an SQL statement to extract all possible fields of my UDF (i.e. many fields) and what I found is that the names of the steps in the logging and the UI become ... very very large.

In fact they become so large that it is hard to read what the step is actually doing.

As an example I get log messages like this (This is 1 logline, I added newlines for readability in this email).

2020-02-29 14:48:13,148 WARN org.apache.flink.metrics.MetricGroup - The operator name
select: (EventTime, useragent,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceClass') AS DeviceClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceName') AS DeviceName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceBrand') AS DeviceBrand,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpu') AS DeviceCpu,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceCpuBits') AS DeviceCpuBits,
ITEM(ParseUserAgent(useragent), _UTF-16LE'DeviceVersion') AS DeviceVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemClass') AS OperatingSystemClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemName') AS OperatingSystemName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'OperatingSystemNameVersion') AS OperatingSystemNameVersion,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineClass') AS LayoutEngineClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineName') AS LayoutEngineName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineVersionMajor') AS LayoutEngineVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'LayoutEngineNameVersionMajor') AS LayoutEngineNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentClass') AS AgentClass,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentName') AS AgentName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentVersionMajor') AS AgentVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentNameVersionMajor') AS AgentNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguage') AS AgentLanguage,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentLanguageCode') AS AgentLanguageCode,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationEmail') AS AgentInformationEmail,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentInformationUrl') AS AgentInformationUrl,
ITEM(ParseUserAgent(useragent), _UTF-16LE'AgentSecurity') AS AgentSecurity,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppName') AS WebviewAppName,
ITEM(ParseUserAgent(useragent), _UTF-16LE'WebviewAppNameVersionMajor') AS WebviewAppNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Anonymized') AS Anonymized,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerAttackVector') AS HackerAttackVector,
ITEM(ParseUserAgent(useragent), _UTF-16LE'HackerToolkit') AS HackerToolkit,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboAffiliate') AS KoboAffiliate,
ITEM(ParseUserAgent(useragent), _UTF-16LE'KoboPlatformId') AS KoboPlatformId,
ITEM(ParseUserAgent(useragent), _UTF-16LE'IECompatibilityNameVersionMajor') AS IECompatibilityNameVersionMajor,
ITEM(ParseUserAgent(useragent), _UTF-16LE'Carrier') AS Carrier,
ITEM(ParseUserAgent(useragent), _UTF-16LE'NetworkType') AS NetworkType, clicks, visitors)
exceeded the 80 characters length limit and was truncated.


As you can see this impacts not only the names of the steps but also the metrics. 

My question if it is possible to specify a name for the step, similar to what I can do in the Java code?

--
Best regards / Met vriendelijke groeten,

Niels Basjes



--
Best regards / Met vriendelijke groeten,

Niels Basjes