More than 50% of the traffic on the internet is made by automated bots. 1 It’s estimated that by 2026 more than 90% of content on the INTERNET will be synthetically generated by bots 2
The Problems bot traffic causes
Bot traffic causes certain problems the small independent web
- It uses resources CPU,disk, memory (RAM) and bandwidth resources.
- Botnets can steal your data and boost fake webste clicks/traffic
So one soltion is to block and ban crawlers
Mediocre Content Paralysis
The Dead Internet Theory3 already speculated that most content was already bot generated since 2015, this was before AI agents and LLM’s where even a thing.
Now that anyone can generate a whole load of mediocre or “crap” content, fake news and even the value of text/information is destroyed, it will become even harder to find valuable content.
Prooving the statements and facts
I can attest and proove this fact by a simple experiment I’ve done multiple times Each time I buy and configure a new Linux Server (VPS), within just a few minutes, if I check the logs, bots are already trying to hack into the server! You can try this for yourself, I even have a step by step article and video describing how you can do setup your own Linux Virtual Private Server .
Surely, if I just purchased the server, how can anyone know about it?
New domains & existing domains
Whenever I buy a new domain and link it to an server via DNS the first traffic I receive is from myself AND a huge amount from automatic bots crawling the net. REmember, no one else knows about this, I haven’t added the URL/domain. to any search engine
Modern client side javascript analytics systems today have ways of “hideing” the bot traffic from logs, this might seem helpful but it fails to show how much traffic IS really from users.
Javascript only traffic analysis usually fail to catch crawlers/bots which don’t have javascript enabled.
REmember, there can BE plenty of people who will use noscript, ublock and may have javascript disabled as a way to enhance their privacy and security.
What do these bots want?
Mostly they try to crawl your websites and all of it’s pages to:
- Index them in search engines of af various kinds
- Steal your data to train large language models. So it’s copyright infringement all the way, the rabbit hole of AI devolution is real . Huge corporations want your data..for free
- Are searching for known vulnerabilities to attack your website/server and make it part of the botnet I wrote a cybersecurity speech a few years ago how downloading illegal software can make you part of the botnet
- Unknown reasons by malefactors
Rarely, they might be proxying your content to various other apps. But this will not generate traffic so I’d rather they come to my website via a browser without specific automations. Thus this will mess up your analytics and steal your followers through their sites. Almost all big social media sites have such bots which provide a preview of your website, making it less likely that people will click through.
The solution
The problem & solution is outlined in my web independence article , you can and should block certain crawlers and bots
-
Bad Bot Traffic https://www.imperva.com/resources/resource-library/reports/2024-bad-bot-report/ ↩︎
-
AI generated content = more fake news https://thelivinglib.org/experts-90-of-online-content-will-be-ai-generated-by-2026/ ↩︎
-
Is the internet Dead since 2015? https://en.wikipedia.org/wiki/Dead_Internet_theory ↩︎