NinerNet Communications™
System Status

Server and System Status

NC036: Post-mortem following mail server issue in early June, and explanation of late invoicing

7 July 2025 07:12:58 +0000

As predicted immediately after the June mail server issue (that started on 11 June UTC), problems continued and new problems cropped up, delaying this post-mortem. The two primary results of this were that a relatively simple issue on the mail server that would normally have been addressed before it was even noticed by anyone was not addressed when it should have been, and the second was that our invoices that should have been sent on 15 June have still not gone out in early July! (It’s not unusual for our invoices to go out a couple of days late, but over half a month late is extreme.)

The primary issue on the mail server was that the disk drive that stores our clients’ email was about to fill up. This is a relatively routine occurrence that is addressed with the data centre and on the server in literally minutes in a two-step process: We buy new disk space in our data centre control panel, and then we configure the mail server to use that disk space.

Concurrent with the mail server issue, following fairly routine maintenance on my desktop machine, I could no longer log into it. This was traced to a configuration choice I had made in the maintenance that resulted in the main drive on my machine filling up; instead of the free space on the drive being overwritten with 1’s and 0’s or random data and being classed as free space, it was overwritten with data that looked like real data that could not be overwritten, deleted or re-classed as empty free space. The result was that the installation drive was full and I could not log in. This denied me access to data on my computer, namely a key that I need to log into the mail server to complete the second step listed in the previous paragraph. Very early on 13 June (UTC) I was able to access the encrypted drive on which the key was stored, log into the mail server and reconfigure it to use the additional space, and the problem on the mail server was fully resolved literally seconds later.

While that addressed the problem on the mail server, that, however, was the last time I was able to access the encrypted drive.

While I had access to the encrypted drive I stupidly saved files to it that I had been saving on flash drives. This wasn’t really “stupid”; it was a completely reasonable decision as the hard drive has far more space than the growing collection of flash drives I was using temporarily, and I had access to the encrypted drive and didn’t expect to lose access. As it turns out, since I no longer have access to the encrypted hard drive and the files that I saved to it were not included in a daily back-up that is run when I log into the machine, they now seem to be lost forever. Those files, while important at the time, do not include any vital business files.

A little earlier than planned I started the replacement of the now-just-outdated operating system on my work machine. For reasons I still can’t explain, the new operating system was so incredibly slow that I could make a sandwich and a cup of tea between clicks. (That’s a lot of sandwiches in an eight-hour work day!) Several days were spent troubleshooting that issue when it suddenly, for absolutely no reason and without any action on my part, started working properly. The next priority was, since I could no longer access the encrypted drive, recovering backed-up files from the most recent daily back-up. I started with recovering vital business files and was able to immediately contact delinquent clients who apparently don’t pay their invoices until they receive a reminder. Then I started restoring all of the remaining files, meaning I could move forward with June’s invoicing. However, the restoration failed part way through, so I have had to give up and start our billing before the middle of July comes!

Our invoices will be dated 30 June 2025, which I realise is a bit disingenuous, but it keeps them dated in June. The more important dates for invoices are the dates on which your services expire; you can pay your invoice as late as you want (keeping in mind what we have said often in the recent past about waiting until the last minute), but you just need to pay it before your service, domain or certificate expires.

As always, we do sincerely apologise for the disruption that has been caused. What we have learned from this are the following:

  • Keep back-up copies of certain data — i.e., server keys and invoicing records — in places that are more instantly accessible than where all of our other data is backed up en masse,
  • Implement alternative ways of logging into servers where they are available,
  • Implement a data-recovery process that is far quicker than the standard data-recovery process that our current back-up system employs, and
  • Figure out why and how the LUKS encryption with which our hard drive was encrypted failed, and ensure it never happens again.

The third item is already in progress, as we make a second attempt to recover our backed-up data; the fourth will have to happen over an extended period in the future with no goal date and no guarantee of success, but in the meantime the data we recover from our back-ups — that are intact and in place — will be saved in unencrypted form. (Technically this goes against the first point in the “data storage and transmission” section of our privacy policy, but if we cannot access our data, there’s no point in it being encrypted!) The first item will be implemented as part of getting our daily back-ups up and running again, and the second will be implemented where it can be at our earliest convenience.

Thank-you again for your noting this information that we take to ensure that we learn from our experiences where our existing systems have failed. Please advise if you have any questions or suggestions.

NC036: Update 4

13 June 2025 01:23:56 +0000

The issue on our primary mail server has finally been resolved, and all messages in the queue have been delivered. As expected, once we had access it only took a few seconds.

We will post a post-mortem in the next couple of days … hopefully. I can’t exaggerate the extent to which numerous unrelated events have piled on top of one another — even in the last few minutes! — to prevent an earlier resolution of this problem, and at this point I can’t predict whether or not more issues will prevent the posting of the post-mortem. However, I’m finally taking a breath, as this issue (amongst other things) is finally resolved.

I do once again extend my heartfelt apology for this incident, and I will do everything in my power to review the cascading failures — all not even related to the mail server itself! — that led to this not being resolved much, much sooner.

NC036: Update 3

12 June 2025 13:45:11 +0000

Words cannot express my frustration at this point. 🙁

It will be another few hours again before this situation can be resolved. It just cannot go beyond tonight, UTC. By that time my computer will be completely reset with a fully updated operating system installed.

Sorry.

NC036: Update 2

12 June 2025 09:53:21 +0000

Let me explain the situation we’re in. It’s an illustration of the fact that sometimes too much is, in fact, too much.

My primary workstation stopped working late Wednesday afternoon (UTC). It stopped working because I could not log in after performing a maintenance/security operation that I routinely run, but I ran it in a certain way that was sightly different to how I usually run it with no problems.

At about the same time I received a report from a client about a problem with the mail server. I received it by email (of course) which I read on my phone. I hadn’t seen anything similar before, so I asked him for screenshots. In the meantime I had an idea of what the cause of the problem could be based on monitoring I had done the day before, but without access to my workstation I could not log in and check and fix the problem … which would (and will) take all of about 60 seconds if I am correct. Reports and my experience since have almost confirmed my suspicions.

So, given the fact that it is the middle of the night where I am I cannot do anything until business hours, which will be about 06:00 local, 13:00 UTC.

My local workstation is, of course, fully backed up, so it’s not a problem of a loss of data. The “problem” is with the additional security on logging into the server which we have purposely put into place in order to protect our infrastructure and your email. Because of that I cannot log into the mail server from the machine I am currently using, and will only have access to the resources I require in the morning, local time.

I cannot apologise enough for this situation that we have caused. We will calculate a credit that will be applied to all invoices of clients who host their email with us.

In the meantime, we apologise but this issue will continue until about 13:00 UTC. At that time I should have access to the server to fully and permanently address the problem. I will post an update here, on the status blog, when this issue is resolved. My humble and sincere apologies once again.

NC036: Update 1

12 June 2025 07:54:49 +0000

We continue to work on resolving this issue. The problem we’re having has nothing to do with the server itself, but our access to it.

One thing we can tell you for now is that one of the issues you may encounter is that incoming messages to your accounts may be duplicated, which is something I’m certainly experiencing. It’s frustrating and annoying for me, so I assume it is for you as well. Again, we apologise.

NC036: Mail server issues

12 June 2025 05:39:50 +0000

The mail server is having an issue at the moment. You will see different symptoms, but they are all “unusual”.

This has apparently been happening for about 24 hours now. Under normal circumstance we would have worked on the problem close to 24 hours ago now, but unfortunately our office network is currently not fully usable at the moment, so we can’t.

What we can tell you at this point is that outgoing messages are going out, even if you see errors that seem to indicate otherwise.

We are currently working on all three of the “perfect storms” to address this issue. I hope I’ll have an update for you within the next couple of hours.

Email failures from Apple email addresses

25 March 2025 04:33:20 +0000

Due to support requests recently, it has become obvious that Apple have mis-configured their mail servers, either intentionally or because they’re stupid (both are one and the same in this case, but I’m leaning towards the latter of course), so that they don’t play well with an anti-spam technology that NinerNet and millions around the world use called greylisting. (If you want to more thoroughly understand greylisting, we recommend you read the page at that link.)

It seems that some (many?) users of Apple-administered mail servers on domains like mac.com, me.com, icloud.com (maybe even apple.com itself) and possibly other domains that we’re not aware of receive bounce messages for email messages they send to NinerNet-hosted domains that contain this text:

<email-address@ninernet-hosted-domain.com>: host mx.niner.net[178.62.195.26] said:
451 4.7.1 <*****@*****************>: Recipient address rejected:
Intentional policy rejection, please try again later (in reply to RCPT TO command)

At this time we don’t know if this also applies to domains running on any of the servers that use an Apple operating system, but we haven’t seen evidence of this because even those servers generally run applications (e.g., Postfix [which we use], Qmail, etc.) that are not Apple products and comply with email standards. We also don’t know the frequency with which email sent to NinerNet-hosted domains fail.

Here’s the explanation we have sent to recent clients:

The four (hundred) codes (451 and 4.7.1) tell the sending server that the error is temporary, and that the sending server should (as it also says in plain English), “try again later”. A four-hundred error code is not a permanent error; those are five-hundred codes. Email messages should not be bounced by four-hundred error codes, so this is why the sending (Apple) server is behaving incorrectly according to email standards. Email doesn’t work if organisations (i.e., Apple) ignore standards.

Greylisting isn’t something we invented while I was bored last weekend; it’s an extremely widely used and also extremely successful anti-spam technique that we have been using for about as long as NinerNet has been in business, which is thirty years.

So the sender needs to bring this to the attention of the support department at Apple. NinerNet is not the cause of this problem.

As we’ve also explained, “standards” are not vague suggestions; they’re the “laws” on which servers and clients agree to operate so that they can work together.

In order to compensate for the stupidity of the brain trust at Apple we have whitelisted the following domains server-wide, so that greylisting is not applied to email from these domains:

  • apple.com
  • icloud.com
  • me.com
  • mac.com

This, of course, means that spam from these domains will more likely reach our clients, and for this we humbly apologise. We hope that Apple are doing a better job of keeping spammers off of their systems than they are at actually running the mail servers themselves.

If your correspondents on other Apple-hosted domains report similar problems sending email to you, please let us know and we’ll add those domains too to our mail servers’ whitelists.

If you have any questions about this, please feel free to contact NinerNet support. Thanks for your attention.

NC036: Mail server is back to normal

20 December 2024 13:51:23 +0000

Server NC036 (the primary mail server) is back to normal. All mail in the mail queue has been cleared and delivered to all local accounts and remote/foreign domains.

We do not currently have an explanation for this occurrence, but we are looking into it. Our apologies for the interruption.

If you have any specific questions or issues with this now-resolved event, please contact NinerNet support. Thank-you, and we apologise for this incident.

NC036: Mail server update

20 December 2024 13:28:50 +0000

Server NC036 is recovering. Delayed email is being delivered and the mail queue is decreasing.

NC036: Server is temporarily struggling

20 December 2024 13:07:52 +0000

Server NC036 (the primary mail server) is temporarily struggling under an unusually large mail load. We are working to determine the cause of this and bring everything back to normal.

In the meantime you may experience delays in sending or receiving messages.

We will update as and when there is new information.

NinerNet home page

Systems at a Glance:


Loc.SystemStatusPing
Server NC023, London, United Kingdom (Relay server), INTERNAL.NC023InternalUp?
Server NC028, Vancouver, Canada (Monitoring server), INTERNAL.NC028InternalUp?
Server NC031, New York, United States of America (Web server), INTERNAL.NC031InternalUp?
Server NC033, Toronto, Canada (Primary nameserver), OPERATIONAL.NC033OperationalUp?
Server NC034, Lusaka, Zambia (Phone server), INTERNAL.NC034InternalUp?
Server NC035, Sydney, Australia (Secondary nameserver), OPERATIONAL.NC035OperationalUp?
Server NC036, Amsterdam, Netherlands (Mail server), OPERATIONAL.NC036OperationalUp?
Server NC040, Toronto, Canada (Web server), INTERNAL.NC040InternalUp?
Server NC041, New York, United States of America (Web server), OPERATIONAL.NC041OperationalUp?
Server NC042, Seattle, United States of America (Status website), OPERATIONAL.NC042OperationalUp?

Subscriptions:

RSS icon. RSS

Twitter icon. Twitter

Search:

 

Recent Posts:

Archives:

Categories:

Links

Tags:

.co.zm domains .com.zm domains .zam.co domains back-up bounce messages browser warnings connection issues control panel database dns dos attack dot-zm domains down time email email delivery error messages ftp hardware imap mail mailing lists mail relay mail server microsoft migration nameservers network networking performance php phplist pop reboot shaw shaw communications inc. smtp spam spamassassin ssl ssl certificate tls tls certificate viruses webmail web server

Resources:

On NinerNet: