Uploaded image for project: 'WORKTERRA'
  1. WORKTERRA
  2. WT-11814

Production down; Can log in but getting a blank screen or a server error

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Cancelled
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Platform
    • Labels:
      None
    • Environment:
      Production
    • Bug Severity:
      Medium
    • Module:
      Platform
    • Reported by:
      Client
    • Company:
      All Clients/Multiple Clients

      Description

      Concern: Production down; Can log in but getting a blank screen or a server error
      Date: 24th Oct:
      Time: 11:35 AM PST to 04:22 PM PST
      Cause: Database 1 and 3 disk drive were not accessible, due to which cluster were not able to move
      Correction: Verizon restarted the database and cluster

      Root cause ticket open with Verizon, Below is the reply from Verizon Engineer
      “Created Microsoft case for this as well, as I see no obvious reason for this cluster behaviour. Also we will go through the logs to get the bottom of this”

        Attachments

          Activity

          Hide
          satyap Satya added a comment - - edited

          Error message – We observed that below error was coming from 09/06/2017 till cluster failover.
          Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be
          updated in domain 'managed.cln' for the following reason:
          Unable to update password for computer account.
          The text for the associated error code is: Access is denied.
          The cluster identity 'DAC30415VIR001$' may lack permissions required to update the object. Please work with your domain administrator to
          ensure that the cluster identity can update computer objects in the domain.

          Please find below image in which time is matching with exact production fail and up time, this service is the cause of production down.

          Cluster resource 'Cluster Disk 1 - Q:\Quorum' in clustered service or application 'Cluster Group' failed.
          Our analysis :  Both error logs stopped after cluster restart. Which means first is a root cause and second is an impact. When restarted, the server first and second both error stopped logging.

          Show
          satyap Satya added a comment - - edited Error message – We observed that below error was coming from 09/06/2017 till cluster failover. Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be updated in domain 'managed.cln' for the following reason: Unable to update password for computer account. The text for the associated error code is: Access is denied. The cluster identity 'DAC30415VIR001$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain. Please find below image in which time is matching with exact production fail and up time, this service is the cause of production down. Cluster resource 'Cluster Disk 1 - Q:\Quorum' in clustered service or application 'Cluster Group' failed. Our analysis :  Both error logs stopped after cluster restart. Which means first is a root cause and second is an impact. When restarted, the server first and second both error stopped logging.
          Hide
          jaideep.vinchurkar Jaideep Vinchurkar (Inactive) added a comment -

          Issue: Production was down for approximately 5 hours, 11:35 AM PST - 04:22 PM PST on 10.24
          Root Cause Type: Platform, Third Party
          Root Cause: Information from Verizon – an error was reported on the failover cluster.

          Error Message:
          Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be updated in domain 'managed.cln' for the following reason:
          Unable to update password for computer account.

          The text for the associated error code is: Access is denied.

          The cluster identity 'DAC30415VIR001$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

          Solution: Verizon restarted the database and cluster. Additionally, they created a Microsoft case to investigate this cluster behavior. As a precaution all database clusters were patched with latest the windows updates and all cluster hotfixes were installed on all 4 nodes.

          Show
          jaideep.vinchurkar Jaideep Vinchurkar (Inactive) added a comment - Issue: Production was down for approximately 5 hours, 11:35 AM PST - 04:22 PM PST on 10.24 Root Cause Type: Platform, Third Party Root Cause: Information from Verizon – an error was reported on the failover cluster. Error Message: Cluster network name resource 'Cluster Name' cannot be brought online. The computer object associated with the resource could not be updated in domain 'managed.cln' for the following reason: Unable to update password for computer account. The text for the associated error code is: Access is denied. The cluster identity 'DAC30415VIR001$' may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain. Solution: Verizon restarted the database and cluster. Additionally, they created a Microsoft case to investigate this cluster behavior. As a precaution all database clusters were patched with latest the windows updates and all cluster hotfixes were installed on all 4 nodes.
          Hide
          jaideep.vinchurkar Jaideep Vinchurkar (Inactive) added a comment -

          This issue caused because of changes done by Varizon. AS of now there is no action Items are pending at our end. Hence closing this ticket.

          Show
          jaideep.vinchurkar Jaideep Vinchurkar (Inactive) added a comment - This issue caused because of changes done by Varizon. AS of now there is no action Items are pending at our end. Hence closing this ticket.

            People

            Assignee:
            jaideep.vinchurkar Jaideep Vinchurkar (Inactive)
            Reporter:
            samir Samir
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: