CISL staff would like to thank everyone for their patience this week during the power loss and subsequent network outage. Services have now returned to normal. If you experience unexpected issues, please contact the CISL Help Desk, ext. 2400 or firstname.lastname@example.org.
Scheduled service interruptions planned
Some brief scheduled service interruptions will be necessary to restore full redundancy to software services, but these will take place outside UCAR business hours and will be announced in advance.
The incident was triggered by a component failure in the standby generator system causing a power outage at approximately 10:45 pm on Sunday, November 11. Uninterruptable Power Supply batteries drained quickly eventually dropping all computer systems in ML-29. Due to the abrupt nature of the power loss, some systems experienced data corruption that required recovery from backup.
While services were largely restored Monday by around 11:00 a.m., further service disruptions occurred around noon when attempts to restore one of the remaining services triggered a flaw in third-party software supporting a critical computer cluster. This created a service outage for a number of central CISL services. The services were restored by Tuesday morning.
On Tuesday afternoon it was discovered that certain data within LDAP servers (Lightweight Directory Access Protocol--which enables directory lookup on our websites, phones, etc.) was missing. This was addressed Tuesday evening.
CISL will be restructuring the service cluster to remove dependency on the suspect software, and other mitigation efforts are ongoing.