homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Block Search Engines From Indexing Member Profiles

 8:16 pm on Jul 29, 2011 (gmt 0)

I'm trying to block search engine bots from accessing member profiles who check a box to exclude their profiles from being indexed in search engines.

It's basically how you can remove a Facebook profile from search engines (specifically Google), by checking the box in privacy settings. I want to do exactly that so when a member search their name or profile name their profile listing isn't coming up from my website.

How is this accomplished? What do I need (software/hardware)? Please help me. If you can point me to instructions or products then that's great!



 9:00 pm on Jul 29, 2011 (gmt 0)

just add a <meta> tag robots with noindex value in the head of the profile page.

See [robotstxt.org...]


 9:08 pm on Jul 29, 2011 (gmt 0)

Hi bhukkel,

Thanks for the tip. Just to note, I am looking for the most guaranteed method. The way Facebook, LinkedIn, and the alike does it. I've used meta tags and robots.txt but none of these guarantee blocking or no indexing. However, I definitely will use these methods in conjunction with the most solid method of privacy. Thanks.


 9:41 pm on Jul 29, 2011 (gmt 0)

You haven't provided the method of delivery for these profiles?

Is it PHP, cgi, SMF, or some of other software?
Are the profiles URL's provided with a query string?
Are the profiles confined (from within the URL) to a specific directory?


 11:22 pm on Jul 29, 2011 (gmt 0)

Hi wilderness,

Thanks for the reply.

- Right now the site is designed using ASP/.NET but that's changing as we'll be moving to a java platform.

- Yes the are provided via a query string i.e., [mysite.com...]

- No its not confined to a specific directory its managed and grouped by the db

I hope this answer your question. Look forward to hearing back. Thanks.


 12:53 am on Jul 30, 2011 (gmt 0)

There are no guarantees, sorry. But in addition to the usual robots.txt file, sitemap(s), meta tags and such, you can restrict access to URLs containing the profile query only to hits coming from your server.

(In mod_rewrite, that involves HTTP_REFERER [sic] code, the nitty-gritty aspects of which are best suited for WW's Apache server forum. Applying mod_rewrite's wizardry to other platforms is discussed there, and possibly elsewhere.)

The downside to limiting access to your-server referrers is that you prevent some people whose UAs don't show referrers, or who choose to cloak them, from seeing the profiles.

Personally, when it comes to contact pages, any/all CGIs/Perl scripts, and any/all POST'able forms, I require ALL hits to those pages to have refs on my server. The number of not-allowed good bots and bad bots and exploiters and botnets that get tripped up by this simple access control measure vastly exceeds that of real people inconvenienced by having to e-me (via info they see in a 403).


 1:12 am on Jul 30, 2011 (gmt 0)

Thanks Pfui, useful information. I'm definitely gonna look more into the HTTP_REFERER option.


 2:03 am on Jul 30, 2011 (gmt 0)

I'm not a QSA kinda guy ;)

Your "profiles", would almost certainly contain "profile" in the string, your the only one whom knows, nobody else is capable of vieweing your server.

that involves HTTP_REFERER

I believe that would be unwise. Although refer based would work in a limited capacity.

I'm trying to block search engine bots from accessing member profiles

#Containes Query Profile string and request comes from IP range
Request "string"
RewriteCond %{REMOTE_ADDR} ^bot1 [OR]
RewriteCond %{REMOTE_ADDR} ^bot2 [OR]
RewriteCond %{REMOTE_ADDR} ^bot3 [OR]
RewriteCond %{REMOTE_ADDR} ^bot4 [OR]
RewriteCond %{REMOTE_ADDR} ^bot_more [OR]
RewriteCond %{REMOTE_ADDR} ^bot1
RewriteRule .* - [F]

Use example.com to stop forum auto-linking.


 3:11 am on Jul 30, 2011 (gmt 0)

nitty-gritty aspects of which are best suited for WW's Apache server forum.

FWIW the Apache forum has been running on "crippled mode" since late October.
The fourm is quite unique in that most everything revolves around jdmorgan.

There's never been an official announcement, and perhaps Jim'll return, however nobody is capable of filling the void left by Jim's absence.

g1smd, has been quite dedicated, however his and others efforts are merely to honor Jim's wishes for the Apache Forum to remain a learning center, rather that free-support offering copy and past solutions.



 7:48 pm on Jul 30, 2011 (gmt 0)

anthonyon - the first thing is to arrange that ALL robots and non-browsers are rejected by ALL of your site except for those robots you want to allow (eg bing, maybe facebookhit, probably google, etc).

Having done that, add all no-index profiles to a file/database. Check that list when an allowed bot comes calling and either serve the profile or a 403 or 405 (work out which is most appropriate for your site).

I would not rely on noindex,noarchive meta tags (although by all means include them) and robots.txt is too clumsy for what could be a very long list of profiles. Also, a few bots sometimes do not always manage robots.txt correctly all of the time.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved