Issue with the "next screen" button/hitting the end of the posts prematurely is fixed now.
Many thanks to the very responsive mbin team.
Many thanks to the very responsive mbin team.
Ok, there are three problems going on. First, the site has been intermittently super slow. This is happening because of some database locking conflicts and long running queries. I don't have this fully mitigated yet, but I know what I need to do and just need to get it scripted so it can recover automatically and gracefully.
My attempts to solve the prior problem broke federation by causing the messenger services to think they didn't have a connection to the database. This should be fixed now, and now that I know it happens, I can mitigate it in the future.
The other problem is with only being able to go back one to three pages (some times the first page is all you get). This is a known issue in the new version of mbin and the devs are going to revert the pagination scheme to the prior version until the bugs get worked out. In the mean time, if you click on "combined", you will be able to go back for miles (or kilometers).
Anubis works well for a few hours, the slows the site to a crawl. I will be removing it first thing in the morning. My apologies.
Hi all. I've implemented anubis in front of fedia.io in an attempt to mitigate the absolutely insane amount of crawlers hammering the site. I have also reenabled anonymous access, hoping that anubis will be effective at keeping the bots out and letting legit people in.
For reference, I run 17 different instances*. All are using a common postgres database server, running on a latest gen AMD Epyc server with 48 physical/96 logical cores. When I enabled anonymous access previously, fedia.io would consume 6 times the amount of database resources as all the other 16 instances using the database server combined. It's off the hook absurd.
So far, it's going ok with anubis in place and anonymous access. I'll see how it performs over the next day or so.
I'm in the process of migrating images from locally stored to hosted on in an s3 bucket due to running out of disk space. It'll be broken for a while as the images replicate up to the storage bucket. Apologies for the inconvenience.
My apologies for how long the site was down. Just about everything that could go wrong did, but I think we are working now...
In an effort to reduce costs due to the exchange rate and declining donations, I’m going to be consolidating Fedia.io onto another existing server so I can decommission and return the server that it is running on. I’ll be turning off unauthenticated access to content again because that drastically reduces the bandwidth/compute load that Fedia uses. I am not sure on timing yet - likely around 2PM ET/6PM UTC. I’ll plan for an hour downtime but I’m hoping it should only be a few minutes if I do things correctly.
I have some time to babysit the server now and so reenabled anonymous access. I've also removed the prior ASN blocks, but may add those back in as needed based on various AI datacenter crawling.
Hi all. Fedia.io has for a long time been subject to ddos attacks, including many that are "accidental", caused by myriad scrapers constantly hammering the site. I gave up on trying to play whack-a-mole with blocking them based on IP address (they do not honor robots.txt and do not use a conspicuous user agent string) since I was inadvertently blocking some legitimate users. So, I've restricted access to the content of fedia.io to only those that are logged in. That will mean we don't show up in search engines and whatnot, which for some will considered a good thing and will likely cause others to leave.
There is a remaining problem related to the login form. Calls to the login page are breathtakingly expensive, computationally speaking, and so I also have a script that monitors unusual numbers of calls to that form and blocks at the firewall any offenders. I strongly suspect I'm catching some legitimate users with this too, and so I continue to try to tune it, but it's maddening, y'all.
These issues have been causing performance problems for everyone (despite the fedia.io app running on a dedicated 96 core, 256GB server with nvme disks), and became unavailable for certain people that accidentally tripped various thresholds. I'm hoping most of this is resolved now.
Thanks for the patience.
My apologies for the recent spate of problems. I think I’ve narrowed the problem down to the /m/fediverse and /m/random magazines. For some reason, mbin is generating an enormous amount of outbound delivery messages for these two magazines. I first tried removing the hashtags from /m/fedivese, but that was only a quick fix. So I deleted the magazine. (Note, the notifications appear to be related to the “microblog” function, and were originating from accounts on lots of mastodon instances, so I think there is a bug somewhere).
I noticed /m/random doing something similar. I have removed all the subscribers from that magazine to try to reduce the number of notifications it is sending. I don’t know if that will help - I have a feeling the instance can’t keep up with that happening in both random and fediverse.
Anyhow, the queues are draining fast now. I purged about 600000 queued delivery messages that (based on a random sample) all appeared to be associated with fediverse and random. That should let the rest of whatever is backed up get moving again. and hopefully stay moving.
The following instances will be offline briefly on Saturday, December 14 from 9am ET / 2pm UTC for approxmately 10 minutes: infosec.exchange infosec.town infosec.pub pixel.infosec.exchange books.infosec.exchange matrix/element.infosec.exchange relay.infosec.exchange meetup.infosec.exchange video.infosec.exchange infosec.press infosec.place fedia.io fedia.social elk,.infosec.exchange infosec.space convo.casa
The servers supporting these instances require a reboot. The Dell servers these instances run on take a very long time to boot, so I am estimating 10 minutes of downtime. It could be more, could be less.
We use live patches to minimize reboots needed for patching, however Ubuntu only provides livepatch support for a year, which is how long most of these systems have been running for.
It’s been a long day. I will fix it when I am back in front of a computer. It might be a few hours. My apologies.
I have sort of given up in fixing the problem, and will instead work on auto-detecting and auto-recovering when the problem happens.
I just saw this: https://every.to/p/the-disappearance-of-an-internet-domain
I have no idea if it's real, but if it is, that will be most unfortunate
After I resolved the federation issue, I had to clean up a few things and so the site may have been unavailable for a bit. I'm done fussing with it and will keep an eye on it to make sure things are working.
IF YOU SEE PROBLEMS - please let me know. As far as I know, I've fixed all of the federation and error 500 issues we've had, so please don't assume it's just more of the same if you see them.
Thanks for your patience.
Fedia.io is sort of like she Ship of Theseus right now - I literally replaced nearly everything trying to get it back working.
The problem ended up being a silent out of memory error that php-fpm was running into. I had to increase the memory limit to about 10x what the docs require to get it to work, but once I did that, it works great.
I was only able to sort this out after @bentigorlich recommended I move the site to debug mode (which requires me to lock everyone else out). Once I did that, it started giving some useful errors.
My apologies for the amount of time it took to fix this. I learned a lot about php today.
Unfortunately outbound federation - making posts on communities/magazines on other instances is broken. I do not yet know the cause or have an idea on when it will be fixed.