I guess that's why I'm interested in a tooling based solution. My selfhosting is small-fry junk, but a lot of others like me are hosting entire fedi communities or larger websites.
In the hackernews comments for that geraspora link people discussed websites shutting down due to hosting costs, which may be attributed in part to the overly aggressive crawling. So maybe it's just a different form of DDOS than we're used to.
Thank you for the detailed response. It's disheartening to consider the traffic is coming from 'real' browsers/IPs, but that actually makes a lot of sense.
I'm coming at this from the angle of AI bots ingesting a website over and over to obsessively look for new content.
My understanding is there are two reasons to try blocking this: to protect bandwidth from aggressive crawling, or to protect the page contents from AI ingestion. I think the former is doable, and the latter is an unwinnable task. My personal reason is because I'm an AI curmudgeon, I'd rather spend CPU resources blocking bots than serving any content to them.
This time of year I take the computer running my home NAS and move it to my bedroom and set up BOINC. Literally keeps the room 7-10 degrees (F) warmer.
After trying and failing to get used to Proton's UX, Fastmail has been great.