Improving reliability on our infrastructure
Published: 2024-05-30
Errata: The outage is now over, thanks for bearing with us. We have decided to postpone discontinuing the DE1 datacenter as well as transferring hudgens to a virtual instance (which might be cancelled); other planned maintenances stay the same. We will discuss migration to GitHub and, if approved, migrate our source code over the next few weeks.
We failed. Our entire infrastructure is down – once again.
Every now and then, the router in our FR1 datacenter goes down for a few days. This is something that has happened at least two times before and we now need to mitigate this issue more than ever.
Due to the fact that our Trusted Core Network server is hosted in FR1, we do not have access to any of our servers; we rely entirely on automated maintenance tasks to keep what's still up alive. Let's talk about the few issues we intend to fix:
Trusted Core Network will be moved to NL1: We picked FR1 as our prefered datacenter for TCN as it is the one with the highest bandwidth. It became obvious that, with such a critical piece of equipment, reliability should absolutely be prioritized over speed. We will therefore move TCN to the NL1 datacenter, managed by Scaleway, which can only deliver 100 Mbps but with extreme reliability.
Source code will be moved to GitHub: Our GitLab instance, which is now getting increasingly harder to manage, will be discontinued, also allowing us to free resources on servers. Active projects will be moved to GitHub, and archives will be moved to a dedicated server hosted using cgit. The package registry will most likely be discontinued for the time being.
Our new website (version 14) will be hosted on Vercel for maximum availability. Additionally, any software that is locally installed on user devices should continue working as it should.
We will continue to keep making our FR1 datacenter as reliable as it can possibly be and we will be looking for alternative network technologies we can use in case of a main router outage. Maintenance downtimes might occur and will be announced on the home page.
We again would like to express our sincerest apologies for the current incidents, and we hope to retain your continued trust and support. We will keep everyone updated as the incident progresses and would like to confirm that no data leak or security or privacy issue is involved in this outage.
Last updated