Tuesday 26th March 2019

Service interruption.(bugzilla, askbot) (update)

After one of our hypervisors went down, one of the gluster filesystems that stores the VM disk images went into a bad state where it wouldn't heal files and thus some files got out of sync (split-brain) that was not fixed/prevented by gluster's self-heal-deamon.

We're manually fixing the split-brain/inconsistencies and bringing services back up, but before bringing back the critical services like bugzilla and askbot, we of course want to make sure the images are in a consistent state to avoid additional complications further down the road.

Update: bugzilla and askbot are up again. Unfortunately we had to restore bugzilla's from backup, so we "lost" some entries between time of last backup (March 25th 22:56 UTC) until beginning of outage (Mar 26 10:04 UTC) - the lost is in quotes, since the comments and assignments that were done in the meantime are not completely lost, but can be restored from the mail notifications/archive.