Bing Banned


Bing and MSN Bots Are Banned

I have banned Bing, Yahoo and MSN search engine spiders from my sites! I’m tired of the constant rule breaking and over-crawling by Bing and MSN search bots.

Bing is a Rule Breaker

Microsoft claims Bing honours robots.txt rules. In my experience that is a blatant lie. Bingbot / msnbot simply ignore robots.txt rules and crawl whatever they want. Some of the specific rules broken include;

  • crawling system folders
  • crawling image folders (msn-media bot). Image folders and extensions jpg,  png,gif, bmp are disallowed
  • crawling RSS feeds. All RSS feeds are disallowed; rss.xml, /feed/, etc
  • crawling comment forms; DOMAIN/comment/184 – the path /comment/ is disallowed in robots.txt

The last straw was today. 2 days ago I added Bing and MSN user agent strings to disallowed bots in robots.txt across all my sites; this morning I see these bots read robot.txt then ignored it totally, and crawled the sites anyway.

Have You Seen Bad Activity by Bing?

Bad Get Requests

Bing / msnbot are often seen using bad GET requests – the type of requests commonly used by hackers. These bad requests generate errors and are logged. e.g:

  • Trying to get property of non-object in aggregator_page_rss() (line 390 of /home/******i/public_html/modules/aggregator/aggregator.pages.inc)
  • Cannot change zlib.output_compression – headers already sent in drupal_serve_page_from_cache() (line 1353 of /home/*******/public_html/includes/bootstrap.inc).
    Cannot modify header information – headers already sent by (output started at /home/*******/public_html/includes/bootstrap.inc:1364) in drupal_send_headers() (line 1216 of /home/*******/public_html/includes/bootstrap.inc).

Bingbot Tried to Post Comments

Bingbot tried to “reply to post” on every page on site using e.g:
/?replytocom=79/?ctf_form_num=2&ctf_show_captcha=1&ctf_sm_captcha=1

You can imagine that generated a lot of errors in the log – several thousand in a few minutes – more like a dDOS attack than search bot behaviour, but coming from a valid Bing / MSN IP address

Minimal traffic from Bing and MSN

I get few referrals from Bing. If I get 10 referrals out of 1000 from Bing, Yahoo and MSN combined it’s a lot. Yet these bots are constantly crawling my sites. Not only one bot at a time, as many as 15 are seen at the same time on my sites , and there are reports of more simultaneous bots from other webmasters.

How to Ban Bingbot

The only way to ban these bad bots is using server rules. Webmasters with sites hosted on Apache servers can block the IP ranges used by Bing . MSN in the .htaccess file.

There’s no point in disallowing these spiders in robots.txt – as we’ve seen, they ignore these rules completely, it’s a waste of time trying.

Microsoft’s Bing China denied responsibility yesterday for a data leak…  “Bing search has not violated the robots.txt agreement,” the company said in a statement.  Read more here – Bing Denies Wrongdoing in Sogou Privacy Leak Mess (techinasia.com)

Advertisements

About Mike

Web Developer and Techno-geek Saltwater fishing nut Blogger

Posted on June 14, 2013, in General News, Websites and tagged , , , , , , , . Bookmark the permalink. 1 Comment.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: