EnvironmentInformation class logs secrets passed as JVM/CLI arguments

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

EnvironmentInformation class logs secrets passed as JVM/CLI arguments

jose-pvargas
Hi,

I am using Flink 1.13.1 and I noticed that the logs coming from the EnvironmentInformation class, https://github.com/apache/flink/blob/release-1.13.1/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L444-L467, log the value of secrets that are passed in as JVM and CLI arguments. For the JVM arguments, both the secret key and value are logged. For the CLI arguments, the secret key is obfuscated, but the actual value of the secret is not. This also affects Flink 1.12.

For example, with CLI arguments like "--my-password VALUE_TO_HIDE", the jobmanager will log the following (assuming cluster is in application mode)
jobmanager     | ****** (sensitive information)
jobmanager | VALUE_TO_HIDE
The key is obfuscated but the actual value isn't. This means that secret values can end up in central logging systems. Passing in the CLI argument as "--my-password=VALUE_TO_HIDE" hides the entire string but makes the value unusable and is different from how the docs mentions job arguments should be passed in [1].

I saw that there was a ticket to obfuscate secrets [2], but that seems to only apply to the UI, not for the configuration logs. Turning off, or otherwise disabling logs from the appropriate logger is one solution, but it seems to me that the logger that a user would need to turn off is dependent on how the Flink cluster is running (standalone, k8s, yarn, mesos, etc). Furthermore, it can be useful to see these configuration logs.


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/application_parameters/#from-the-command-line-arguments
[2] https://issues.apache.org/jira/browse/FLINK-14047

Thanks,
--

Jose Vargas

Software Engineer, Data Engineering

E: [hidden email]


fiscalnote.com  |  info.cq.com  | rollcall.com


Reply | Threaded
Open this post in threaded view
|

Re: EnvironmentInformation class logs secrets passed as JVM/CLI arguments

Arvid Heise-4
Hi Jose,

Masking secrets is a recurring topic where ultimately you won't find a good solution. Your secret might for example appear in a crash dump or on some process monitoring application. To mask reliably you'd either need specific application knowledge (every user supplies arguments differently) or disable logging of parameters completely.

Frankly speaking, I have never seen passwords being passed over CLI being really secure. The industry practice is to either use a sidecar approach or fetch secrets file-based (e.g., docker mounts). Even using ENV is discouraged.

On Wed, Jun 16, 2021 at 11:28 PM Jose Vargas <[hidden email]> wrote:
Hi,

I am using Flink 1.13.1 and I noticed that the logs coming from the EnvironmentInformation class, https://github.com/apache/flink/blob/release-1.13.1/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L444-L467, log the value of secrets that are passed in as JVM and CLI arguments. For the JVM arguments, both the secret key and value are logged. For the CLI arguments, the secret key is obfuscated, but the actual value of the secret is not. This also affects Flink 1.12.

For example, with CLI arguments like "--my-password VALUE_TO_HIDE", the jobmanager will log the following (assuming cluster is in application mode)
jobmanager     | ****** (sensitive information)
jobmanager | VALUE_TO_HIDE
The key is obfuscated but the actual value isn't. This means that secret values can end up in central logging systems. Passing in the CLI argument as "--my-password=VALUE_TO_HIDE" hides the entire string but makes the value unusable and is different from how the docs mentions job arguments should be passed in [1].

I saw that there was a ticket to obfuscate secrets [2], but that seems to only apply to the UI, not for the configuration logs. Turning off, or otherwise disabling logs from the appropriate logger is one solution, but it seems to me that the logger that a user would need to turn off is dependent on how the Flink cluster is running (standalone, k8s, yarn, mesos, etc). Furthermore, it can be useful to see these configuration logs.


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/application_parameters/#from-the-command-line-arguments
[2] https://issues.apache.org/jira/browse/FLINK-14047

Thanks,
--

Jose Vargas

Software Engineer, Data Engineering

E: [hidden email]


fiscalnote.com  |  info.cq.com  | rollcall.com


Reply | Threaded
Open this post in threaded view
|

Re: EnvironmentInformation class logs secrets passed as JVM/CLI arguments

jose-pvargas
Hi Arvid,

I see what you mean; no solution in Flink will be able to account for the different variations in which applications may want to pass in parameters or the external processes or events that introspect wherever the Flink process happens to run. I do think there is an opportunity to prevent logging secrets by focusing on a couple of areas. The reason I think we should improve where we can is because logs can end up in systems that a greater number of people have access to. For example, in a given environment, perhaps only automated systems have the ability to deploy and instropect the servers, but engineers across teams may have access to all logs from that environment.

The areas where I think we can prevent logging secrets are:
1) Obfuscating JVM parameters
and
2) Apply the logic in ParameterTool's "fromArgs" method to parse out arguments in the EnvironmentInformation class.

For example, one of the documented ways of passing in AWS credentials are via JVM parameters, https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html
By leveraging ParameterTool's logic in the EnvironmentInformation class, we can bridge the intent of the current code with how Flink's built-in argument parser works.

On Thu, Jun 17, 2021 at 2:31 PM Arvid Heise <[hidden email]> wrote:
Hi Jose,

Masking secrets is a recurring topic where ultimately you won't find a good solution. Your secret might for example appear in a crash dump or on some process monitoring application. To mask reliably you'd either need specific application knowledge (every user supplies arguments differently) or disable logging of parameters completely.

Frankly speaking, I have never seen passwords being passed over CLI being really secure. The industry practice is to either use a sidecar approach or fetch secrets file-based (e.g., docker mounts). Even using ENV is discouraged.

On Wed, Jun 16, 2021 at 11:28 PM Jose Vargas <[hidden email]> wrote:
Hi,

I am using Flink 1.13.1 and I noticed that the logs coming from the EnvironmentInformation class, https://github.com/apache/flink/blob/release-1.13.1/flink-runtime/src/main/java/org/apache/flink/runtime/util/EnvironmentInformation.java#L444-L467, log the value of secrets that are passed in as JVM and CLI arguments. For the JVM arguments, both the secret key and value are logged. For the CLI arguments, the secret key is obfuscated, but the actual value of the secret is not. This also affects Flink 1.12.

For example, with CLI arguments like "--my-password VALUE_TO_HIDE", the jobmanager will log the following (assuming cluster is in application mode)
jobmanager     | ****** (sensitive information)
jobmanager | VALUE_TO_HIDE
The key is obfuscated but the actual value isn't. This means that secret values can end up in central logging systems. Passing in the CLI argument as "--my-password=VALUE_TO_HIDE" hides the entire string but makes the value unusable and is different from how the docs mentions job arguments should be passed in [1].

I saw that there was a ticket to obfuscate secrets [2], but that seems to only apply to the UI, not for the configuration logs. Turning off, or otherwise disabling logs from the appropriate logger is one solution, but it seems to me that the logger that a user would need to turn off is dependent on how the Flink cluster is running (standalone, k8s, yarn, mesos, etc). Furthermore, it can be useful to see these configuration logs.


[1] https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/dev/datastream/application_parameters/#from-the-command-line-arguments
[2] https://issues.apache.org/jira/browse/FLINK-14047

Thanks,
--

Jose Vargas

Software Engineer, Data Engineering

E: [hidden email]


fiscalnote.com  |  info.cq.com  | rollcall.com




--

Jose Vargas

Software Engineer, Data Engineering

E: [hidden email]


fiscalnote.com  |  info.cq.com  | rollcall.com