All Systems Operational
Account & Registration ? Operational
90 days ago
99.96 % uptime
Today
Product Catalog ? Operational
90 days ago
99.97 % uptime
Today
Checkout ? Operational
90 days ago
99.97 % uptime
Today
Landing & Product pages, Workspace, Support ? Operational
90 days ago
99.96 % uptime
Today
Provisioning & Cluster Management ? Operational
90 days ago
99.4 % uptime
Today
Subscription & Entitlements ? Operational
90 days ago
99.97 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
Past Incidents
Jun 1, 2020
Resolved - Incident has been resolved.

Incident Time: 1:06pm ET - 4:56pm ET; 4 hours 50 minutes
Impact: There were large amounts of time during the window where the services were available, but when down users would not have been able to see products or act upon them while the outage was occurring.
Initial Cause: Access to a backend database was slow causing time outs and instability across the platform. The database instance was eventually scaled up to fully resolve the issue. Postmortem work is underway to understand how to avoid this issue in the future.
Jun 1, 03:19 UTC
Monitoring - Services are back online, we will keep monitoring the application and checking for stability.
May 31, 20:56 UTC
Identified - Instability is back - platform wide outage has been identified and teams are working to solve it.
May 31, 20:34 UTC
Monitoring - The team was able to work on scaling up one server and RHM is stable again. Monitoring and checking overall application health.
May 31, 19:53 UTC
Investigating - We are facing more instability with overall RHM services again. Team is working to identify root cause.
May 31, 19:12 UTC
Monitoring - Major service instability during a 5 minutes slot which resolved itself. After around 30 minutes, we had another major instability of 20 minutes impacting overall availability and access to RHM.

What is/was our temporary workaround?
Services have been reestablished automatically and we are monitoring the application and checking overall stability.
May 31, 17:41 UTC
May 31, 2020
May 30, 2020

No incidents reported.

May 29, 2020

No incidents reported.

May 28, 2020
Resolved - The incident has been resolved. Monitoring shows RHM is stable as well as quay.io.

Incident Time: 7:19am ET - 12:24pm ET, 5 hours 7 minutes
Impact: Users were unable to register clusters
Cause: Quay.io had a major service disruption. However, we only rely on read-only access to provide the service to register clusters, so our impact was limited to the window above, even though the overall impact to Quay.io services lasted a longer period of time. Further postmortem activity, including following up with Quay.io will be performed.
May 28, 20:49 UTC
Update - Quay.io has been stabilized as well as our application. We will keep monitoring and will provide updates once completely resolved.
May 28, 18:10 UTC
Monitoring - Quay.io has been stabilized as well as our application. We will keep monitoring and will provide updates once completely resolved.
May 28, 18:09 UTC
Update - Quay.io is still investigating the issue and is working towards a solution. We are still monitoring and will keep providing updates.
May 28, 16:49 UTC
Identified - Quay.io that hosts RHM Operator images is currently experiencing an outage. Registration of new clusters will fail until the service recovers. We will monitor the situation and provide updates.
May 28, 13:25 UTC
May 27, 2020

No incidents reported.

May 26, 2020

No incidents reported.

May 25, 2020

No incidents reported.

May 24, 2020

No incidents reported.

May 23, 2020

No incidents reported.

May 22, 2020

No incidents reported.

May 21, 2020

No incidents reported.

May 20, 2020
Resolved - This incident has been resolved. An updated script has been deployed and we have verified that cluster registration is now working as expected.

Incident Time: 1:15pm ET - 2:19pm ET; Duration: 1 hours 4 minutes
Impact: Users were unable to register clusters
Cause: A bad script was deployed to cause the issue. Code fix was deployed to resolve the issue; Postmortem work is underway to understand the code that impacted the service
May 20, 18:19 UTC
Update - We have identified what we believe to be the erroneous script and have a temporary fix that we are deploying now.
May 20, 18:01 UTC
Investigating - We are currently investigating an issue around cluster registration. Customers are currently unable to register clusters.
May 20, 17:51 UTC
Resolved - This incident has been resolved.
May 20, 03:01 UTC
Update - Quay.io has now marked the issue entirely resolved so we are closing our incident as well. Please see the below information for a summary of the incident.

Incident Time: 3:24pm ET - 7:10pm ET; Duration: 3 hours 46 minutes
Impact: Users were unable to register clusters
Cause: The Quay.io had a major service disruption. However, we only rely on read-only access to provide the service to register clusters, so our impact was limited to the window above, even though the overall impact to Quay.io services lasted a longer period of time. Further postmortem activity, including following up with Quay.io will be performed.
May 20, 03:00 UTC
Monitoring - Quay.io that hosts RHM Operator images is currently experiencing an outage. Registration of new clusters may fail until the service fully recovers. However, the service has been running in read-only mode for the last 90 minutes, which allows for the registration of new clusters to complete successfully. Due to that, we are putting our incident in to monitoring state. We will continue to monitor the situation and provide updates.
May 20, 00:38 UTC
Update - Quay.io that hosts RHM Operator images is currently experiencing an outage. Registration of new clusters will fail until the service recovers. At this moment of their recovery, they are running in read-only mode, which will allow for the registration of new clusters, but service is expected to be intermittent. We will monitor the situation and provide updates.
May 19, 23:39 UTC
Update - Quay.io is still investigating the issue and exploring possible workarounds. They have communicated that they are contacting their infrastructure provider for assistance. As a result, registration of new clusters are still impacted. We will continue to monitor the situation and provide updates.
May 19, 22:32 UTC
Update - Quay.io is still investigating the issue and exploring possible workarounds. As a result, RHM Operator image installations are still impacted. We will continue to monitor the situation and provide updates.
May 19, 21:31 UTC
Update - Quay.io is still investigating the issue and exploring possible workarounds. As a result, RHM Operator image installations are still impacted. We will continue to monitor the situation and provide updates.
May 19, 20:28 UTC
Identified - Quay.io that hosts RHM Operator images is currently experiencing an outage. Registration of new clusters will fail until the service recovers. We will monitor the situation and provide updates.
May 19, 19:41 UTC
May 19, 2020
May 18, 2020

No incidents reported.