Welcome to WebmasterWorld Guest from 54.242.224.250

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Block Search Engines From Indexing Member Profiles

     
8:16 pm on Jul 29, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 10, 2004
posts: 48
votes: 0


I'm trying to block search engine bots from accessing member profiles who check a box to exclude their profiles from being indexed in search engines.

It's basically how you can remove a Facebook profile from search engines (specifically Google), by checking the box in privacy settings. I want to do exactly that so when a member search their name or profile name their profile listing isn't coming up from my website.

How is this accomplished? What do I need (software/hardware)? Please help me. If you can point me to instructions or products then that's great!
9:00 pm on July 29, 2011 (gmt 0)

Full Member

5+ Year Member

joined:Aug 16, 2010
posts:225
votes: 11


just add a <meta> tag robots with noindex value in the head of the profile page.

See [robotstxt.org...]
9:08 pm on July 29, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 10, 2004
posts: 48
votes: 0


Hi bhukkel,

Thanks for the tip. Just to note, I am looking for the most guaranteed method. The way Facebook, LinkedIn, and the alike does it. I've used meta tags and robots.txt but none of these guarantee blocking or no indexing. However, I definitely will use these methods in conjunction with the most solid method of privacy. Thanks.
9:41 pm on July 29, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


You haven't provided the method of delivery for these profiles?

Is it PHP, cgi, SMF, or some of other software?
Are the profiles URL's provided with a query string?
Are the profiles confined (from within the URL) to a specific directory?
11:22 pm on July 29, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 10, 2004
posts: 48
votes: 0


Hi wilderness,

Thanks for the reply.

- Right now the site is designed using ASP/.NET but that's changing as we'll be moving to a java platform.

- Yes the are provided via a query string i.e., [mysite.com...]

- No its not confined to a specific directory its managed and grouped by the db

I hope this answer your question. Look forward to hearing back. Thanks.
12:53 am on July 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


There are no guarantees, sorry. But in addition to the usual robots.txt file, sitemap(s), meta tags and such, you can restrict access to URLs containing the profile query only to hits coming from your server.

(In mod_rewrite, that involves HTTP_REFERER [sic] code, the nitty-gritty aspects of which are best suited for WW's Apache server forum. Applying mod_rewrite's wizardry to other platforms is discussed there, and possibly elsewhere.)

The downside to limiting access to your-server referrers is that you prevent some people whose UAs don't show referrers, or who choose to cloak them, from seeing the profiles.

Personally, when it comes to contact pages, any/all CGIs/Perl scripts, and any/all POST'able forms, I require ALL hits to those pages to have refs on my server. The number of not-allowed good bots and bad bots and exploiters and botnets that get tripped up by this simple access control measure vastly exceeds that of real people inconvenienced by having to e-me (via info they see in a 403).
1:12 am on July 30, 2011 (gmt 0)

Junior Member

10+ Year Member

joined:Nov 10, 2004
posts: 48
votes: 0


Thanks Pfui, useful information. I'm definitely gonna look more into the HTTP_REFERER option.
2:03 am on July 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


I'm not a QSA kinda guy ;)

Your "profiles", would almost certainly contain "profile" in the string, your the only one whom knows, nobody else is capable of vieweing your server.

that involves HTTP_REFERER


I believe that would be unwise. Although refer based would work in a limited capacity.

I'm trying to block search engine bots from accessing member profiles


#Containes Query Profile string and request comes from IP range
Request "string"
RewriteCond %{REMOTE_ADDR} ^bot1 [OR]
RewriteCond %{REMOTE_ADDR} ^bot2 [OR]
RewriteCond %{REMOTE_ADDR} ^bot3 [OR]
RewriteCond %{REMOTE_ADDR} ^bot4 [OR]
RewriteCond %{REMOTE_ADDR} ^bot_more [OR]
RewriteCond %{REMOTE_ADDR} ^bot1
RewriteRule .* - [F]


Use example.com to stop forum auto-linking.
3:11 am on July 30, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


nitty-gritty aspects of which are best suited for WW's Apache server forum.


Pfui,
FWIW the Apache forum has been running on "crippled mode" since late October.
The fourm is quite unique in that most everything revolves around jdmorgan.

There's never been an official announcement, and perhaps Jim'll return, however nobody is capable of filling the void left by Jim's absence.

g1smd, has been quite dedicated, however his and others efforts are merely to honor Jim's wishes for the Apache Forum to remain a learning center, rather that free-support offering copy and past solutions.

Don
7:48 pm on July 30, 2011 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3125
votes: 4


anthonyon - the first thing is to arrange that ALL robots and non-browsers are rejected by ALL of your site except for those robots you want to allow (eg bing, maybe facebookhit, probably google, etc).

Having done that, add all no-index profiles to a file/database. Check that list when an allowed bot comes calling and either serve the profile or a 403 or 405 (work out which is most appropriate for your site).

I would not rely on noindex,noarchive meta tags (although by all means include them) and robots.txt is too clumsy for what could be a very long list of profiles. Also, a few bots sometimes do not always manage robots.txt correctly all of the time.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members