Forum Moderators: phranque

Message Too Old, No Replies

Here's a really tricky one for you. ;) .htaccess and variables

Googlebot can see it but not you.

         

Mikroz

9:00 pm on May 31, 2010 (gmt 0)

10+ Year Member



With many thanks to JDMorgan I am now sitting with this wicked piece of redirect code...

# Root redirect exclusions
# Exclude hostnames "example.co.za" and "www.example.co.za" and blank
RewriteCond %{HTTP_HOST} !^((www\.)?example\.co\.za\.?(:[0-9]+)?)?$
# Exclude phpinfo.php, sitemap.xml and robots.txt (applicable to root website only)
RewriteCond $1 !^(phpinfo\.php|sitemap\.xml|robots\.txt)$
# Externally redirect all other requests to "sub.domain.com"
RewriteRule ^(.*)$ http://sub.example.com/$1 [R=301,L]


With the exception of traffic heading for the addon domain and the three excluded files everything else goes to a subdomain.

Now it gets interesting...

I want ONLY googlebot to be able to see domain.com/index.php (primary domain) - everything else should follow the other redirects.

I've put this into my .htaccess:

SetEnvIf User-Agent ^googlebot googlebot
Order Deny,Allow
Allow from env=googlebot


Now the question is htF do I modify my redirect code to get this right? :(

Is this along the right path? At all?

RewriteCond %{env:googlebot} ^$


Thanks, Nic

[edited by: jdMorgan at 10:01 pm (utc) on May 31, 2010]
[edit reason] Please use example.com only. [/edit]

jdMorgan

10:00 pm on May 31, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I suspect that the constructs

RewriteCond %{HTTP_USER_AGENT} googlebot [NC]

and

RewriteCond %{HTTP_USER_AGENT} !googlebot [NC]

would be more useful to you...

But take care:
1) This is cloaking.
2) Not all Googlebots will identify themselves as such (so see item #1 again).

If you wish to cloak in an almost-foolproof manner, then you will have to track and recognize all IP addresses used by Google for spidering and cloaking detection.

Be very sure of your need and reason to do this before heading down this path... Some domain/registrant bans are quite permanent.

I suggest that you do not serve anything to Googlebot that is significantly different to what you serve to a user, unless you have sufficient experienced staff to keep up with Google's IP addresses 24/7 or your domains are all "throw-away" domains for which loss of branding is of no consequence.

Jim

Mikroz

5:44 am on Jun 1, 2010 (gmt 0)

10+ Year Member



Hi Jim,

Thanks for the input.

I signed up with Google's Webmaster Tools and they request that webmasters put a specific meta tag into their index page in order to prove ownership.

That's all fine and dandy except that (as you know) I'm redirecting all traffic from my root.

I have added my subdomains and addon domain as individual 'sites but figured it might be a good idea to have the root tracked as well. It's not crucial though and is perhaps something I should look into when/if I stop redirecting.

Thanks, Nic

PS: I'm curious - what is the difference between your two lines of code? What does the ! do? Does it match patterns with 'googlebot' in it?

g1smd

7:32 am on Jun 1, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The ! is "NOT".

Mikroz

7:56 am on Jun 1, 2010 (gmt 0)

10+ Year Member



Aha. Thanks g1smd. :)

Much appreciated.