Msnbot – Ignorant spider
Msnbot – Ignorant spider or Deliberate Rule Breaker
Msnbot – (Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) – from search.msn.com has become a pest, not only for my websites, but for many others as well.
First of all, Microsoft sees fit to send multiple bots at the same time; as many as 17 have been reported crawling at once. I regularly have up to 12 on-site simultaneously. This number of bots crawling at the same time is effectively a dDOS (Distributed Denial of Service) attack, possibly locking the server up and preventing normal traffic.
Secondly, the Msnbot often ignores robots.txt disallow rules and crawls prohibited folders and paths, e.g. /js/ folders and Java files (.js). The numerous iterations of msnbot don’t seem to talk to each other – one bot will GET robots.txt, while the rest don’t bother. A good bot, e.g. Googlebot reads robots.txt regularly, and obeys the directives. Good bots also share that information!
Is Msmbot Stupid or Evil
Maria Nikishyna, an accredited Search Engine Marketing specialist says;
In my experience with MSNBot I have gone from thinking that it’s stupid to believing that it is intentionally evil. And the lack of desire from Microsoft to fix this is really something they should be ashamed of. These issues are causing good, reputable websites to elect to block MSNBot, block it for good. Really, Microsoft, do something before we all give up on it. MSNbot: stupid or just plain evil
In my opinion the bot is simply poorly configured (like Windows 8 for a desktop PC 🙂 ). Nevertheless it wastes resources. Looking at my server CPU resource usage, I often find it runs at 100% for extended periods – often at the times msnbot is crawling.
With all this crawling, one would think the sites are well indexed in Bing – strangely enough they are not. One site with 500 links indexed by Google has under 50 indexed by MSN / Bing. And traffic from Bing is rare and sporadic. Maybe MSNbots spend too much time crawling Java, CSS, plugin and module folders, and images, and not enough time indexing content!
Restricted Range
I don’t want to deny access to Bing completely, but I have limited the IP’s Microsoft use for the crawler. Bots using the range 65.* are blocked in .htaccess rules, those using 157.* may continue to crawl the sites. For now. The 65.* range is not limited to msnbot, but used by various other Microsoft services – fine, I don’t use any of them…
Related articles
Posted on December 31, 2012, in Internet and tagged Search Engines, Technology, Website. Bookmark the permalink. Leave a comment.
Leave a comment
Comments 0