Rackspace's explanation is roughly:
• 13:51: utility breaker trip. Critical load on UPS. Gen cluster starting.
• 14:35: synchronization failure in gen cluster. Critical load back on UPS.
• 15:15: UPSes exhausted, load off.
They don't explicitly say that the UPSes operated from generator for a time, but imply it; and 84 minutes would be very long for battery run-time. Assuming the generators did parallel and power UPSes from 13:51 to 14:35, that would be 40 minutes run-time, which is believable assuming 2N UPSes.
But: if the generators were running in parallel, how did they lose synchronization? It's not as if a generator needs to "try" and stay synchronized. The current on the emergency bus, and in the windings of each generator, forces the generator to stay in sync. I have read that you could shut off fuel supply to a generator, and if the parallel cluster is large enough, the energy on the emergency bus and in the windings will change it from being a generator into being an electric motor: the rest of the cluster will drag it along, making it turn in sync with the rest of the generators, even while it is turned-off. So I don't understand how a set of parallel generators would lose sync.
I did read one account that the generators lost sync with the UPSes. Would that imply that the generators were configured to parallel off the UPSes' critical bus? Maybe that would explain it, if the UPS output Hz. was drifting beyond generator tolerance (but why?... was there no bypass input present when running from generator? If bypass input was also from gen, then the UPS should have sync'd to the generator cluster).
“What we saw yesterday was a situation where the generators started fighting with one another on the bus,” Rackspace said of the generator challenges. “The generators were unable to get properly synchronized. Eventually, they failed in a cascading manner and we lost all of the generators. Each generator failed on a loss of excitation – an inability to maintain the magnetic field. But it was really the inability to get synchronized that created that fault.”
This implies that the generators were not quite paralleled, but yet were still powering load for half an hour. I don't understand how that could work. Once the paralleling bus closes, if they weren't in sync, then they would yank each other into sync... violently if necessary.
Overload would do it. It could take a while for heat to build-up due to THD, and that would shut-down the gen-sets in a cascading fashion. That kind of overload would imply that the facility had not been run from generator for longer than a half-hour at today's load. That's not inconceivable, especially if load has increased over time.
I guess this is a good reason to ask your data center if they run their gen sets under load at least 2 times a month and if they do extended run tests under load to simulate a real long term outage at least a few times a year.