Webmasters spend their time trying to get more people to visit and return to their website. However, it can be quite difficult for smaller, independent websites to figure out how many times they have really been visited by humans.
That is because exactly half of all visits to websites these days come from machines, not people. In fact, the smaller the website the worse those figures may be, with evidence showing that websites with less than 10,000 visitors per day are visited by bots at even higher percentages than bigger sites.
Even though those figures make it seem like there are a lot of bots on the internet (and there are), it is also important to remember the nature of the beast. Bots are designed to do the jobs that humans can’t do. This involves repetitive tasks being done very quickly. As such, even a relatively small number of bots could be causing huge amounts of web traffic.
Who Owns the Bots?
Half of website visits aren’t made by people, and the vast majority of people aren’t running bots, so the number of bots (although growing) is relatively small. With that in mind, the important question is: who runs the bots and what are they doing?
A lot of the time, the bots visiting websites are doing it for benign, even helpful, reasons. For the websites that are being visited, however, this friendly bot activity can be quite costly. If, for example, their web host charges per hit on the server, then half of the charges accumulated for web hosting could be due to bots.
You would think then that people would be flocking to services like ZenCaptcha – a service for website owners that promises:
“Bots are stopped automatically. It’s self-learning, too, so it keeps getting better.”
On top of that, the service promises to deliver activity reports that show you how many bots have been blocked and how many humans were given access.
The Dreaded ‘Captcha’
The first thing that a webmaster will probably think of at this point is ‘captcha.’ A captcha is a picture of hard-to-make-out letters that users sometimes have to enter in order to verify that they are human. Although captchas do work, they are an annoyance that can put people off visiting websites.
Luckily things are changing and ZenCaptcha, for example, promises to figure out who the bots are without ever asking humans for any input:
“Users never need to prove they are human, so there are no accessibility issues for the blind, visually impaired, impatient or anyone else.”
Although this may sound like the holy grail of bot management, the positives come with a possible downside. After all, the idea of software having the final say when letting users onto a website naturally causes a different kind of concern: what if the service accidentally stops humans from entering my website? That could be very bad for business, and it is why (and because so many bots are helpful) combating them isn’t a walk in the park.
Most of the bots that crawl through websites are there for a reason. Search engines like Google, Yahoo and Bing (who share their index) use bots to follows links, jump from site to site, and index new and recently altered content. These bots are there to decide a website’s placement (which is what good SEO is all about).
Googlebots are incredibly complex, and far more sophisticated than most other search engines’ software. No matter who they belong to, however, this class of bots follow rules that make them behave in specific ways. This is why, for example, the NoFollow command on a link renders it (effectively) invisible to one of Google’s bots.
Unfortunately for webmasters, there are plenty of bad bots out there too. Bots that work for ‘dark web’ search engines are created to ignore a webmaster’s command. Those bots search out and index content on a website that the webmaster doesn’t want displayed to the public (by ignoring bot directives). One way to combat them is to use IP blocks (if you can identify the bots).
One problem with bots that index content from websites (that webmasters don’t want indexing), is that it can lead to hackers gaining access to those files to launch cyber attacks. If cyber criminals find certain files – and a site is out of date – hackers could compromise it.
Then there are spam bots that look for comment sections on sites and leave predetermined messages. This is an automated form of advertising used to bring traffic to another site.
Finally, there are the more dangerous botnet bots, such as the Mirai malware, which has been causing the recent surge of massive DDoS attacks. Those bots are a virus that use computers or Internet of Things products to launch attacks, decided on by the hacker remotely.
This Bot Yes, That Bot No
With swarms of bots on the net causing around half of website visits, it becomes very important to know which bots you want to stop and which bots you want to let through. If you stop Google’s bots from getting through to your website, it would be a massive disaster, as it would mean you would be de-indexed and would lose out on organic traffic.
With that in mind, many bots are actually a webmaster’s friend and any services designed to keep bad bots out need to work in a highly efficient manner if they are going to succeed.
The best way to handle bots on a website is with robots.txt directives. This will allow good bots to crawl your website how you want them to. Unfortunately, as mentioned before, not every bot will follow the commands. Next, you can block bots by editing a server file called HTAccess and adding in bot IP addresses. This denies bots access, but is fairly tricky and has a fairly steep learning curve.
It is possible to get a hugely comprehensive list of bot IP addresses from IAB. If you aren’t a member of the association, however, it can cost around $14,000 to get access to the list. In addition, it is a massive list, which would make it almost impossible to effectively implement manually.
With the bots here to stay, it seems highly likely that people are going to turn to services for combating them more and more. As such, this can be perceived as a market that is likely to grow. With so many good bots out there (pinging sites to see if there is new content available yet, doing website health monitoring, measuring site speed, and doing a host of other important tasks), differentiating between good bots and bad bots is the name of the game.