REST API randomly returns Not Found for an existing job

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

REST API randomly returns Not Found for an existing job

Tomasz Dudziak

Hi,

 

I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.

 

Do you know what could be the root cause of this? At the moment, as a workaround I just query a job a couple of times in a row to ensure whether it definitely does not exist or it is just being misreported as non-existent, but this feels a bit like cottage industry…

 

Kind regards,

Tomasz

 

Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street, London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061

 

 

This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.

Reply | Threaded
Open this post in threaded view
|

REST API randomly returns Not Found for an existing job

Tomasz Dudziak

Hi,

 

I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.

 

Do you know what could be the root cause of this? At the moment, as a workaround I just query a job a couple of times in a row to ensure whether it definitely does not exist or it is just being misreported as non-existent, but this feels a bit like cottage industry…

 

Kind regards,

Tomasz

 

 

Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street, London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061

 

 

This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.

Reply | Threaded
Open this post in threaded view
|

REST API randomly returns Not Found for an existing job

Tomasz Dudziak

Hi,

 

I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.

 

Do you know what could be the root cause of this? At the moment, as a workaround I just query a job a couple of times in a row to ensure whether it definitely does not exist or it is just being misreported as non-existent, but this feels a bit like cottage industry…

 

Kind regards,

Tomasz

 

 

Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street, London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061

 

 

This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.

Reply | Threaded
Open this post in threaded view
|

Re: REST API randomly returns Not Found for an existing job

Chesnay Schepler
How reproducible is this problem / how often does it occur?
How is the cluster deployed?
Is anything else happening to the cluster around that that time (like a JobMaster failure)?

On 24/07/2020 13:28, Tomasz Dudziak wrote:

Hi,

 

I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.

 

Do you know what could be the root cause of this? At the moment, as a workaround I just query a job a couple of times in a row to ensure whether it definitely does not exist or it is just being misreported as non-existent, but this feels a bit like cottage industry…

 

Kind regards,

Tomasz

 

 

Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street, London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061

 

 

This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.


Reply | Threaded
Open this post in threaded view
|

Re: REST API randomly returns Not Found for an existing job

Kostas Kloudas-2
In reply to this post by Tomasz Dudziak
Hi Tomasz,

Thanks a lot for reporting this issue. If you have verified that the
job is running AND that the REST server is also up and running (e.g.
check the overview page) then I think that this should not be
happening. I am cc'ing Chesnay who may have an additional opinion on
this.

Cheers,
Kostas

On Thu, Jul 23, 2020 at 12:59 PM Tomasz Dudziak <[hidden email]> wrote:

>
> Hi,
>
>
>
> I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.
>
>
>
> Do you know what could be the root cause of this? At the moment, as a workaround I just query a job a couple of times in a row to ensure whether it definitely does not exist or it is just being misreported as non-existent, but this feels a bit like cottage industry…
>
>
>
> Kind regards,
>
> Tomasz
>
>
>
> Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street, London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061
>
>
>
>
>
> This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.
Reply | Threaded
Open this post in threaded view
|

RE: REST API randomly returns Not Found for an existing job

Tomasz Dudziak
Yes, the job was running and the REST server as well. No JobMaster failures noticed.
I used a test cluster deployed on a bunch of VM's and bare metal boxes.
I am afraid, I can no longer reproduce this issue. It occurred a couple days ago and lasted for an entire day with jobs being quite often erratically reported as Not Found. As I said, I noticed that another query immediately after the one that returned Not Found consistently returned a correct result.
It had never occurred before and I am afraid now I could no longer observe it again. I appreciate it does not give too much information so I will come back with more info on this thread if it happens again.

-----Original Message-----
From: Kostas Kloudas <[hidden email]>
Sent: 24 July 2020 15:46
To: Tomasz Dudziak <[hidden email]>
Cc: [hidden email]; Chesnay Schepler <[hidden email]>
Subject: Re: REST API randomly returns Not Found for an existing job

Hi Tomasz,

Thanks a lot for reporting this issue. If you have verified that the job is running AND that the REST server is also up and running (e.g.
check the overview page) then I think that this should not be happening. I am cc'ing Chesnay who may have an additional opinion on this.

Cheers,
Kostas

On Thu, Jul 23, 2020 at 12:59 PM Tomasz Dudziak <[hidden email]> wrote:

>
> Hi,
>
>
>
> I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.
>
>
>
> Do you know what could be the root cause of this? At the moment, as a
> workaround I just query a job a couple of times in a row to ensure
> whether it definitely does not exist or it is just being misreported
> as non-existent, but this feels a bit like cottage industry…
>
>
>
> Kind regards,
>
> Tomasz
>
>
>
> Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street,
> London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061
>
>
>
>
>
> This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.

This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy .

Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.
Reply | Threaded
Open this post in threaded view
|

Re: REST API randomly returns Not Found for an existing job

Kostas Kloudas-2
Thanks a lot for the update Tomasz and keep up posted if it happens again.

Kostas

On Fri, Jul 24, 2020 at 6:37 PM Tomasz Dudziak <[hidden email]> wrote:

>
> Yes, the job was running and the REST server as well. No JobMaster failures noticed.
> I used a test cluster deployed on a bunch of VM's and bare metal boxes.
> I am afraid, I can no longer reproduce this issue. It occurred a couple days ago and lasted for an entire day with jobs being quite often erratically reported as Not Found. As I said, I noticed that another query immediately after the one that returned Not Found consistently returned a correct result.
> It had never occurred before and I am afraid now I could no longer observe it again. I appreciate it does not give too much information so I will come back with more info on this thread if it happens again.
>
> -----Original Message-----
> From: Kostas Kloudas <[hidden email]>
> Sent: 24 July 2020 15:46
> To: Tomasz Dudziak <[hidden email]>
> Cc: [hidden email]; Chesnay Schepler <[hidden email]>
> Subject: Re: REST API randomly returns Not Found for an existing job
>
> Hi Tomasz,
>
> Thanks a lot for reporting this issue. If you have verified that the job is running AND that the REST server is also up and running (e.g.
> check the overview page) then I think that this should not be happening. I am cc'ing Chesnay who may have an additional opinion on this.
>
> Cheers,
> Kostas
>
> On Thu, Jul 23, 2020 at 12:59 PM Tomasz Dudziak <[hidden email]> wrote:
> >
> > Hi,
> >
> >
> >
> > I have come across an issue related to GET /job/:jobId endpoint from monitoring REST API in Flink 1.9.0. A few seconds after successfully starting a job and confirming its status as RUNNING, that endpoint would return 404 (Not Found). Interestingly, querying immediately again (literally a millisecond later) would return a valid result. I later noticed a similar behaviour in regard to a finished job as well. At certain points in time that endpoint would arbitrarily return 404, but similarly querying again would succeed. I saw this strange behaviour only recently and it used to work fine before.
> >
> >
> >
> > Do you know what could be the root cause of this? At the moment, as a
> > workaround I just query a job a couple of times in a row to ensure
> > whether it definitely does not exist or it is just being misreported
> > as non-existent, but this feels a bit like cottage industry…
> >
> >
> >
> > Kind regards,
> >
> > Tomasz
> >
> >
> >
> > Tomasz Dudziak | Marshall Wace LLP, George House, 131 Sloane Street,
> > London, SW1X 9AT | E-mail: [hidden email] | Tel: +44 207 024 7061
> >
> >
> >
> >
> >
> > This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy.
> >
> > Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.
>
> This e-mail and any attachments are confidential to the addressee(s) and may contain information that is legally privileged and/or confidential. Please refer to http://www.mwam.com/email-disclaimer-uk for important disclosures regarding this email. If we collect and use your personal data we will use it in accordance with our privacy policy, which can be reviewed at https://www.mwam.com/privacy-policy .
>
> Marshall Wace LLP is authorised and regulated by the Financial Conduct Authority. Marshall Wace LLP is a limited liability partnership registered in England and Wales with registered number OC302228 and registered office at George House, 131 Sloane Street, London, SW1X 9AT. If you are receiving this e-mail as a client, or an investor in an investment vehicle, managed or advised by Marshall Wace North America L.P., the sender of this e-mail is communicating with you in the sender's capacity as an associated or related person of Marshall Wace North America L.P., which is registered with the US Securities and Exchange Commission as an investment adviser.