(DEPRECATED) Apache Flink User Mailing List archive.

Using s3 bucket for high availability

Classic

List

Threaded

5 messages Options

Kurtis Walker

Using s3 bucket for high availability

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Kurtis Walker

Re: Using s3 bucket for high availability

Sorry, fat finger send before I finished writing….

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

kubernetes.service-account: flink-service-account

high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory

high-availability.storageDir: s3://corvana-target-file-upload-k8s-usw2.dev.sugar.build/flink/recovery

I’m getting an error accessing the bucket.

2021-06-08 14:33:42,189 DEBUG com.amazonaws.services.s3.AmazonS3Client [] - Bucket region cache doesn't have an entry for corvana-target-file-upload-k8s-usw2.dev.sugar.build. Trying to get bucket region from Amazon S3.

2021-06-08 14:33:42,193 DEBUG com.amazonaws.util.json.Jackson [] - Failed to parse JSON string.

com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

at [Source: (String)""; line: 1, column: 0]

at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[flink-s3-fs-presto-1.13.0.jar:1.13.0]

Is there an additional config I need for specifying the region for the bucket? I’ve been searching the doc and haven’t found anything like that.

From: Kurtis Walker <[hidden email]>
Date: Tuesday, June 8, 2021 at 10:55 AM
To: user <[hidden email]>
Subject: Using s3 bucket for high availability

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Yang Wang

Re: Using s3 bucket for high availability

It seems to be a S3 issue. And I am not sure it is the root cause. Could you please share more details of the JobManager log?

Or could you verify that the Flink cluster could access the S3 bucket successfully(e.g. store the checkpoint) when HA is disabled?

Best,

Yang

Kurtis Walker <[hidden email]> 于2021年6月8日周二下午11:00写道：

Sorry, fat finger send before I finished writing….

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

    kubernetes.service-account: flink-service-account

    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory

    high-availability.storageDir: s3://corvana-target-file-upload-k8s-usw2.dev.sugar.build/flink/recovery

I’m getting an error accessing the bucket.

2021-06-08 14:33:42,189 DEBUG com.amazonaws.services.s3.AmazonS3Client                     [] - Bucket region cache doesn't have an entry for corvana-target-file-upload-k8s-usw2.dev.sugar.build. Trying to get bucket region from Amazon S3.

2021-06-08 14:33:42,193 DEBUG com.amazonaws.util.json.Jackson                              [] - Failed to parse JSON string.

com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

at [Source: (String)""; line: 1, column: 0]

        at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[flink-s3-fs-presto-1.13.0.jar:1.13.0]

Is there an additional config I need for specifying the region for the bucket? I’ve been searching the doc and haven’t found anything like that.

From: Kurtis Walker <[hidden email]>
Date: Tuesday, June 8, 2021 at 10:55 AM
To: user <[hidden email]>
Subject: Using s3 bucket for high availability

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Tamir Sagi

Re: Using s3 bucket for high availability

I'd try several things

try accessing the bucket from CLI first locally

https://docs.aws.amazon.com/cli/latest/reference/s3/

If it does not work

please check your credentials under ~/.aws/credentials file + ~/.aws/config = since the AWS clients read the credentials from these files by default(unless other credentials are set)

If everything works well:

Are you running inside EKS? if so, you must attach the pods a service account with corresponded permissions to S3.
If not, make sure the pods have the credentials to AWS(access key, secret key, region)

Please provide more code snippet.

I recently ran Flink job on Application cluster in EKS. the job also reads files from S3. (Without HA)

Tamir

From: Yang Wang <[hidden email]>
Sent: Wednesday, June 9, 2021 11:29 AM
To: Kurtis Walker <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Using s3 bucket for high availability

EXTERNAL EMAIL

It seems to be a S3 issue. And I am not sure it is the root cause. Could you please share more details of the JobManager log?

Or could you verify that the Flink cluster could access the S3 bucket successfully(e.g. store the checkpoint) when HA is disabled?

Best,

Yang

Kurtis Walker <[hidden email]> 于2021年6月8日周二下午11:00写道：

Sorry, fat finger send before I finished writing….

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

    kubernetes.service-account: flink-service-account

    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory

    high-availability.storageDir: s3://corvana-target-file-upload-k8s-usw2.dev.sugar.build/flink/recovery

I’m getting an error accessing the bucket.

2021-06-08 14:33:42,189 DEBUG com.amazonaws.services.s3.AmazonS3Client                     [] - Bucket region cache doesn't have an entry for corvana-target-file-upload-k8s-usw2.dev.sugar.build. Trying to get bucket region from Amazon S3.

2021-06-08 14:33:42,193 DEBUG com.amazonaws.util.json.Jackson                              [] - Failed to parse JSON string.

com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

at [Source: (String)""; line: 1, column: 0]

        at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[flink-s3-fs-presto-1.13.0.jar:1.13.0]

Is there an additional config I need for specifying the region for the bucket? I’ve been searching the doc and haven’t found anything like that.

From: Kurtis Walker <[hidden email]>
Date: Tuesday, June 8, 2021 at 10:55 AM
To: user <[hidden email]>
Subject: Using s3 bucket for high availability

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

Confidentiality: This communication and any attachments are intended for the above-named persons only and may be confidential and/or legally privileged. Any opinions expressed in this communication are not necessarily those of NICE Actimize. If this communication has come to you in error you must take no action based on it, nor must you copy or show it to anyone; please delete/destroy and inform the sender by e-mail immediately.
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and attachments are free from any virus, we advise that in keeping with good computing practice the recipient should ensure they are actually virus free.

Kurtis Walker

Re: Using s3 bucket for high availability

Thank you, I figured it out. My IAM policy was missing some actions. Seems I needed to give it “*” for it to work.

From: Tamir Sagi <[hidden email]>
Date: Wednesday, June 9, 2021 at 6:02 AM
To: Yang Wang <[hidden email]>, Kurtis Walker <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: Using s3 bucket for high availability

EXTERNAL EMAIL

I'd try several things

try accessing the bucket from CLI first locally

https://docs.aws.amazon.com/cli/latest/reference/s3/

If it does not work

please check your credentials under ~/.aws/credentials file + ~/.aws/config = since the AWS clients read the credentials from these files by default(unless other credentials are set)

If everything works well:

Are you running inside EKS? if so, you must attach the pods a service account with corresponded permissions to S3.
If not, make sure the pods have the credentials to AWS(access key, secret key, region)

Please provide more code snippet.

I recently ran Flink job on Application cluster in EKS. the job also reads files from S3. (Without HA)

Tamir

EXTERNAL EMAIL

It seems to be a S3 issue. And I am not sure it is the root cause. Could you please share more details of the JobManager log?

Or could you verify that the Flink cluster could access the S3 bucket successfully(e.g. store the checkpoint) when HA is disabled?

Best,

Yang

Kurtis Walker <[hidden email]> 于2021年6月8日周二下午11:00写道：

Sorry, fat finger send before I finished writing….

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config:

    kubernetes.service-account: flink-service-account

    high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory

    high-availability.storageDir: s3://corvana-target-file-upload-k8s-usw2.dev.sugar.build/flink/recovery

I’m getting an error accessing the bucket.

2021-06-08 14:33:42,189 DEBUG com.amazonaws.services.s3.AmazonS3Client                     [] - Bucket region cache doesn't have an entry for corvana-target-file-upload-k8s-usw2.dev.sugar.build. Trying to get bucket region from Amazon S3.

2021-06-08 14:33:42,193 DEBUG com.amazonaws.util.json.Jackson                              [] - Failed to parse JSON string.

com.fasterxml.jackson.databind.exc.MismatchedInputException: No content to map due to end-of-input

at [Source: (String)""; line: 1, column: 0]

        at com.fasterxml.jackson.databind.exc.MismatchedInputException.from(MismatchedInputException.java:59) ~[flink-s3-fs-presto-1.13.0.jar:1.13.0]

Is there an additional config I need for specifying the region for the bucket? I’ve been searching the doc and haven’t found anything like that.

From: Kurtis Walker <[hidden email]>
Date: Tuesday, June 8, 2021 at 10:55 AM
To: user <[hidden email]>
Subject: Using s3 bucket for high availability

Hello,

I’m trying to set up my flink native Kubernetes cluster with High availability. Here’s the relevant config: