Forgejo access and AI scraper block | F-hub.org Portal Page

Due to excessive web-scraping by AI companies to train their code-completion LLMs, we were forced to employ multiple ways to block non-genuine users on the web-interface of Forgejo. The combination of all these seems to work right now, and thus we were able to remove the login requirement again.

Anubis

Besides an extensive user-agent based block we also had to resort to an proof-of-work based internal proxy called Anubis. Currently it is configured to ask the browser once per week to solve a complicated math problem and then store the result for direct access. This requires a modern browser with Javascript enabled and various additional capabilities that web-scraper software currently doesn’t have. This solution is far from ideal, but the alternative was to lock down the instance completely as the server was regularly overloaded by these bots.

Other updates

There have been various updates GoToSocial version 0.18.x, that we successfully deployed to our instance. We still need to do a few customisations and finally set up a proper fediverse enabled service update account, but it seems stable enough for regular use now.

We also updated our Weblate translation service and internal Bookstack wiki to the lastest releases.

The other previously mentioned services, such as our Woodpecker CI continue to be a work in progress and we hope to find the time to finalize our setup soon. In addition there is some ongoing work with automated monitoring of services which we might also open to others interested in an external monitoring service of their servers. We will likely use Bezel for this.