NinerNet Communications™
System Status

Server and System Status

NC036: Post-mortem following mail server issue in early June, and explanation of late invoicing

7 July 2025 07:12:58 +0000

As predicted immediately after the June mail server issue (that started on 11 June UTC), problems continued and new problems cropped up, delaying this post-mortem. The two primary results of this were that a relatively simple issue on the mail server that would normally have been addressed before it was even noticed by anyone was not addressed when it should have been, and the second was that our invoices that should have been sent on 15 June have still not gone out in early July! (It’s not unusual for our invoices to go out a couple of days late, but over half a month late is extreme.)

The primary issue on the mail server was that the disk drive that stores our clients’ email was about to fill up. This is a relatively routine occurrence that is addressed with the data centre and on the server in literally minutes in a two-step process: We buy new disk space in our data centre control panel, and then we configure the mail server to use that disk space.

Concurrent with the mail server issue, following fairly routine maintenance on my desktop machine, I could no longer log into it. This was traced to a configuration choice I had made in the maintenance that resulted in the main drive on my machine filling up; instead of the free space on the drive being overwritten with 1’s and 0’s or random data and being classed as free space, it was overwritten with data that looked like real data that could not be overwritten, deleted or re-classed as empty free space. The result was that the installation drive was full and I could not log in. This denied me access to data on my computer, namely a key that I need to log into the mail server to complete the second step listed in the previous paragraph. Very early on 13 June (UTC) I was able to access the encrypted drive on which the key was stored, log into the mail server and reconfigure it to use the additional space, and the problem on the mail server was fully resolved literally seconds later.

While that addressed the problem on the mail server, that, however, was the last time I was able to access the encrypted drive.

While I had access to the encrypted drive I stupidly saved files to it that I had been saving on flash drives. This wasn’t really “stupid”; it was a completely reasonable decision as the hard drive has far more space than the growing collection of flash drives I was using temporarily, and I had access to the encrypted drive and didn’t expect to lose access. As it turns out, since I no longer have access to the encrypted hard drive and the files that I saved to it were not included in a daily back-up that is run when I log into the machine, they now seem to be lost forever. Those files, while important at the time, do not include any vital business files.

A little earlier than planned I started the replacement of the now-just-outdated operating system on my work machine. For reasons I still can’t explain, the new operating system was so incredibly slow that I could make a sandwich and a cup of tea between clicks. (That’s a lot of sandwiches in an eight-hour work day!) Several days were spent troubleshooting that issue when it suddenly, for absolutely no reason and without any action on my part, started working properly. The next priority was, since I could no longer access the encrypted drive, recovering backed-up files from the most recent daily back-up. I started with recovering vital business files and was able to immediately contact delinquent clients who apparently don’t pay their invoices until they receive a reminder. Then I started restoring all of the remaining files, meaning I could move forward with June’s invoicing. However, the restoration failed part way through, so I have had to give up and start our billing before the middle of July comes!

Our invoices will be dated 30 June 2025, which I realise is a bit disingenuous, but it keeps them dated in June. The more important dates for invoices are the dates on which your services expire; you can pay your invoice as late as you want (keeping in mind what we have said often in the recent past about waiting until the last minute), but you just need to pay it before your service, domain or certificate expires.

As always, we do sincerely apologise for the disruption that has been caused. What we have learned from this are the following:

  • Keep back-up copies of certain data — i.e., server keys and invoicing records — in places that are more instantly accessible than where all of our other data is backed up en masse,
  • Implement alternative ways of logging into servers where they are available,
  • Implement a data-recovery process that is far quicker than the standard data-recovery process that our current back-up system employs, and
  • Figure out why and how the LUKS encryption with which our hard drive was encrypted failed, and ensure it never happens again.

The third item is already in progress, as we make a second attempt to recover our backed-up data; the fourth will have to happen over an extended period in the future with no goal date and no guarantee of success, but in the meantime the data we recover from our back-ups — that are intact and in place — will be saved in unencrypted form. (Technically this goes against the first point in the “data storage and transmission” section of our privacy policy, but if we cannot access our data, there’s no point in it being encrypted!) The first item will be implemented as part of getting our daily back-ups up and running again, and the second will be implemented where it can be at our earliest convenience.

Thank-you again for your noting this information that we take to ensure that we learn from our experiences where our existing systems have failed. Please advise if you have any questions or suggestions.

NinerNet home page

Systems at a Glance:


Loc.SystemStatusPing
Server NC023, London, United Kingdom (Relay server), INTERNAL.NC023InternalUp?
Server NC028, Vancouver, Canada (Monitoring server), INTERNAL.NC028InternalUp?
Server NC031, New York, United States of America (Web server), INTERNAL.NC031InternalUp?
Server NC033, Toronto, Canada (Primary nameserver), OPERATIONAL.NC033OperationalUp?
Server NC034, Lusaka, Zambia (Phone server), INTERNAL.NC034InternalUp?
Server NC035, Sydney, Australia (Secondary nameserver), OPERATIONAL.NC035OperationalUp?
Server NC036, Amsterdam, Netherlands (Mail server), OPERATIONAL.NC036OperationalUp?
Server NC040, Toronto, Canada (Web server), INTERNAL.NC040InternalUp?
Server NC041, New York, United States of America (Web server), OPERATIONAL.NC041OperationalUp?
Server NC042, Seattle, United States of America (Status website), OPERATIONAL.NC042OperationalUp?

Subscriptions:

RSS icon. RSS

Twitter icon. Twitter

Search:

 

Recent Posts:

Archives:

Categories:

Links

Tags:

.co.zm domains .com.zm domains .zam.co domains back-up bounce messages browser warnings connection issues control panel database dns dos attack dot-zm domains down time email email delivery error messages ftp hardware imap mail mailing lists mail relay mail server microsoft migration nameservers network networking performance php phplist pop reboot shaw shaw communications inc. smtp spam spamassassin ssl ssl certificate tls tls certificate viruses webmail web server

Resources:

On NinerNet: