Longleaf and Dogwood Clusters Outage
Incident Report for UNC Chapel Hill - ITS
Resolved
This incident has been resolved.
Posted Apr 09, 2019 - 00:03 EDT
Identified
The Research Computing Clusters Dogwood and Longleaf were impacted by the power outage experienced by ITS Manning.

The clusters have been closed while recovery is in process. The nodes are being recovered now.

The storage that provides the /pine filesystem needs to be recovered due to errors caused by power fluctuations flipping the circuit breaker on its PDUs. Research Computing are working with the vendor to recover the storage.

Jobs that were running on compute nodes when power was lost will need to be resubmitted. Jobs that were queued will be released once all the components to the clusters are recovered and checked.
Posted Apr 08, 2019 - 21:11 EDT
Investigating
At 7:54:39 PM on 4/8/2019, ITS detected a problem with Research Computing longleaf and dogwood clusters due to the campus power outage. Members of the ITS staff are working to resolve the issue and to identify the root cause of the service interruption. Updates will be communicated as they become available.
Posted Apr 08, 2019 - 20:07 EDT
This incident affected: Research Computing (Virtual Computing Lab (VCL)).