Msnbot – Ignorant spider


Msnbot – Ignorant spider or Deliberate Rule Breaker

Msnbot – (Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) – from search.msn.com has become a pest, not only for my websites, but for many others as well.

msnbot spider iconFirst of all, Microsoft sees fit to send multiple bots at the same time; as many as 17 have been reported crawling at once. I regularly have up to 12 on-site simultaneously. This number of bots crawling at the same time is effectively a dDOS (Distributed Denial of Service) attack, possibly locking the server up and preventing normal traffic.

Secondly, the Msnbot often ignores robots.txt disallow rules and crawls prohibited folders and paths, e.g. /js/ folders and Java files (.js). The numerous iterations of msnbot don’t seem to talk to each other – one bot will GET robots.txt, while the rest don’t bother. A good bot, e.g. Googlebot reads robots.txt regularly, and obeys the directives. Good bots also share that information!

Is Msmbot Stupid or Evil

Maria Nikishyna, an accredited Search Engine Marketing specialist says;

In my experience with MSNBot I have gone from thinking that it’s stupid to believing that it is intentionally evil. And the lack of desire from Microsoft to fix this is really something they should be ashamed of. These issues are causing good, reputable websites to elect to block MSNBot, block it for good. Really, Microsoft, do something before we all give up on it. MSNbot: stupid or just plain evil

In my opinion the bot is simply poorly configured (like Windows 8 for a desktop PC 🙂 ). Nevertheless it wastes resources. Looking at my server CPU resource usage, I often find it runs at 100% for extended periods – often at the times msnbot is crawling.

With all this crawling, one would think the sites are well indexed in Bing – strangely enough they are not. One site with 500 links indexed by Google has under 50 indexed by MSN / Bing. And traffic from Bing is rare and sporadic. Maybe MSNbots spend too much time crawling Java, CSS, plugin and module folders, and images, and not enough time indexing content!

Restricted Range

I don’t want to deny access to Bing completely, but I have limited the IP’s Microsoft use for the crawler. Bots using the range 65.* are blocked in .htaccess rules, those using 157.* may continue to crawl the sites. For now. The 65.* range is not limited to msnbot, but used by various other Microsoft services – fine, I don’t use any of them…

me on google plus+Mike Otgaar

Advertisements

About Mike

Web Developer and Techno-geek Saltwater fishing nut Blogger

Posted on December 31, 2012, in Internet and tagged , , . Bookmark the permalink. Leave a comment.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: