homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Get my site off bing?
bing msnbot

 5:53 am on Apr 4, 2010 (gmt 0)

Just wondering what are the ways to prevent my website from appearing on bing?

I have disallowed SandCrawler, msnbot, MSRBot in robots.txt as well as blocking user agent of SandCrawler, msnbot, MSRBot in my web server configuration.

Are there other ways that I should use as well? Maybe blocking bing bot IP addresses at the firewall?




 11:31 am on Apr 6, 2010 (gmt 0)

That should do it.

For more information you may want to visit this old post [webmasterworld.com...]

Also remember that robots.txt is a voluntary protocol and sometimes there are rare glitches. If you want to truly block anyone from accessing your website you should use htaccess file.


 11:52 am on Apr 6, 2010 (gmt 0)

Not seeking to change mind, just curious as to why no Bing? Even if one sees only 10% from there, that is quite a bit of traffic. Interested into the reasoning for denying Bing.


 12:15 pm on Apr 6, 2010 (gmt 0)

Hey goodroi thanks for the link.

Regarding to htaccess, what do you really mean by that? I know you can deny access but base on what sort of rules? IP address? User agent string?

I am not using Apache but I have a similar set up in my web server that actually close the connection for any clients with the user agent strings above... so I am just wondering if I were doing the same thing?

This is for a personal web site so I am not concern with losing visitors from bing as most visitors are probably going to be a purely word of the mouth basis. I am just interested to learn about how these things work, and bing is just a nice one to try this out as it is big enough search engine, yet I am not losing anything from blocking it for a personal web site.


 12:30 pm on Apr 6, 2010 (gmt 0)

If it is an experiment, then choose Teoma (ask) or Yandex, Baidu, Twiceler, etc... these are significant, yet not nearly as large as Bing for Western (EU, USA) websites. Bing is driving traffic these days, and it will continue to grow.

Bans are done by IP, UA. Your .htaccess might grow for Bing, which brings on new IPs and UAs everyday it seems. See: [webmasterworld.com...] for starters.

For understanding regarding .htaccess see: [webmasterworld.com...]


 7:40 pm on Apr 6, 2010 (gmt 0)

Yes I was reading that thread after this, so there are no concrete way to ensure a web site doesn't end up on Bing?

Please note that I am not asking how .htaccess works but what specific rules can be used to get Bing off a site in addition to what I have done in the initial post... and I was not asking which search engine I am picking to try things out for fun. Remember that driving traffic isn't always the goal, especially for non-commercial sites where bandwidth may be limited and it is not necessarily that one would want as much traffic as possible.



 10:41 pm on Apr 6, 2010 (gmt 0)

I doubt you'll find Bing specific bans here at WW. If bandwidth is a problem I'd ban Google. :)


 8:15 am on Apr 7, 2010 (gmt 0)

I am not trying to save as much as bandwidth as I can neither... neither visitors nor bandwidth are the focus...


 10:23 am on Apr 7, 2010 (gmt 0)

Ban IP ranges first, then UAs. The IP range will be more accurate, the UA will catch new IP addresses as they are introduced.


 12:45 pm on Apr 7, 2010 (gmt 0)

I'd like to point out that Disallowing a robot in robots.txt or blocking it using .htaccess or any other server-side code will not prevent a site from appearing in most modern search engines; If they find links to that site anywhere on the Web, they may list the site by URL and link-text, even if they are unable to fetch pages from that site. This is often called a "URL-only" listing, but major search engines now use the link text they find along with the URLs they discover to "build" a listing with more than just a URL in it.

The easiest way to prevent a site's URLs from appearing in search results is to NOT Disallow the robots in robots.txt, and to NOT block IP addresses or user-agents, but instead to allow full access and then use the on-page HTML <meta name="robots" content="noindex"> tag.

Note that robots.txt prevents fetches by compliant robots, while the on-page meta-tag prevents indexing... not at all the same thing. And note that if a robot cannot fetch the page due to robots.txt or server access restrictions, then it cannot see the on-page noindex tag.



 9:04 am on Apr 9, 2010 (gmt 0)

Hey jdMorgan thanks for the informative post that's very interesting.

Not all bots honor things like robots.txt and noindex though... so I am thinking to make an empty web site with robots.txt allowing robots to visit index.html, and use the noindex meta tag in there, and use server configuration to detect and redirect bots to there?

What do you think of that approach?

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved