?

Log in

No account? Create an account
 

Warwick CompSoc

About Recent Entries

Campus missing, presumed, well, um, gone. Jul. 13th, 2005 @ 07:09 pm
shortcipher
As of 18:10ish, all of Warwick campus appears to have dropped off the net. This affects all CompSoc servers except Pat, our off-site mail handler, which will currently be queuing mail for all our hosted domains. Nothing we can do, but watch this space for updates.

Update: [19:25] some of campus has now returned, but not *.sunion.warwick.ac.uk, which includes Molotov. [19:33] It seems to all be back.

Molotov down, filesystem-related reasons Jun. 19th, 2005 @ 01:03 am
shortcipher
This looks a lot like that episode with /var in January all over again. There was a hardware error on /home...
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=12563, sector=12559
ide: failed opcode was: unknown
end_request: I/O error, dev hdc, sector 12559

...which caused the journal to abort. We suspect it may have failed due to heat. In any case, when we attempted to remount the disk read-only, the entire machine crashed, such that it now responds to ping and nothing else. Unfortunately we can't get to it to sort this all out until Monday morning.

Update: si1entdave brought Molotov back at 11:20.

Main site outage May. 29th, 2005 @ 10:43 am
shortcipher
The main CompSoc website returned 502 Bad Gateway from about 6am until just now, because of a mod-perl problem. No other sites should have been affected.

Bong down, disk crash May. 21st, 2005 @ 12:01 am
shortcipher
I'm told one of the disks holding /store (that's everything but the system disk) on bong (aka mirror.uwcs.co.uk) has crashed. I/O errors are being reported and we suspect the failure is at the hardware level, a head crash or similar. This means our mirrors of Linux distributions, game patches, etc. are unreachable until further notice (error page will be in place soon). It also means we have one month to sort out a replacement before the BFL. The tech team will do what we can. Needless to say, if anyone feels like donating a large disk about now, we'd be eternally grateful.

Update: the disk actually turned out to be ok, at least for the moment. We're fairly sure it was overheating, and have changed the case to one with better airflow.

Power cut in Coventry Apr. 15th, 2005 @ 08:32 am
shortcipher
At 5:40 this morning there was a power cut that affected parts of Coventry, but apparently not campus. No CompSoc servers lost power except mail3.warwickcompsoc.co.uk, aka polaris, the House of Geek router. That machine handles mail only when both Molotov and Insomnia drop off the map, and was brought back at 7:11, so as far as I know no CompSoc services were affected.
Other entries
» Molotov down, cause unknown
Molotov became unreachable at about 09:45 today. I was connected to various services and woke up to find they'd slowly disconnected, one after the other. This, and the current behaviour (SSH accepts a connection, waits a while, and drops it) suggests that something caused, and may still be causing, extremely heavy load. More information as we have it.

There is currently nobody in the IT office; some of HoG will be heading onto campus as soon as possible.

Update: 11:23, and all is probably well (after Union IT answered the phone and gave us a reboot). And now we know what was wrong, too... someone, who shall remain nameless, had a script in their home directory called doxycvs.sh, to keep a particular tree updated from some CVS server somewhere. The last line was: at +1 hour; ./doxycvs.sh. For the uninitiated, what that does is create an empty at-job to run in one hour, and then immediately run this same script again. Round, and round, and round. Stacking up bash processes and at-jobs until the thing falls over. What they probably *meant* was echo $HOME/doxycvs.sh | at +1 hour, or even (gasp) a crontab entry. *shakes head*... you have to laugh, otherwise you cry.
» Molotov down, filesystem failure
A couple of hours ago, we looked at tov, noticed /var was mounted read-only, and attempted to remount it read-write. The machine stopped responding. The console had "Journal has aborted" all over it and wouldn't respond either, SysRq was no good, so we took it out and ran some fscks at Drac's house (thanks Drac). These claim to have fixed all errors, but to be safe, we're about to swap in a new disk (thanks DrWatson). This was all on /var, which is on one of two 40GB disks, the other being /home, which should be fine. In any case we have a full backup as of 31/12/2004 and the system drive won't have changed much since then.

Update: Data is being copied, as long as the disk doesn't fall over before we've finished copying, it should be back before the end of the day. Should.

Update: It's back, as of 17:05.
» Welcome
We might not actually be using this after all (we might be using Warwick Blogs instead). In any case the idea is to have a status page hosted elsewhere in case all our servers die.

Update: we decided against warwick blogs because only current warwick students can post there, and some of the tech team have graduated or will do so soon. Also LJ is off-site and thus works if campus dies, as has occasionally been known to happen.
Top of Page Powered by LiveJournal.com