homepage Welcome to WebmasterWorld Guest from 54.243.13.30
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Marketing and Biz Dev / Cloaking
Forum Library, Charter, Moderator: open

Cloaking Forum

    
.htaccess doesn't work with googlebot any more
htaccess googlebot rewriterule rewritecond
albertb




msg:675110
 5:55 pm on Jun 13, 2006 (gmt 0)

Hi,

I have some flash websites and, being flash movies, spiders can't read well their content and follow the links.
I used .htaccess to redirect googlebot (but also textual browsers as links
and linx) to an alternative home page (index_text.php) which contained
the same text and links as the flash animation, but was written in plain
html and was easily indexed by search engines.

This method worked on all my flash sites (hosted on different
providers) till february-march 2006, when googlebot stopped indexing this
textual page... and started to index the common homepage as a normal user...

Any ideas of what happened and how to solve the problem?

This is my .htaccess

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [OR]
RewriteCond %{HTTP_USER_AGENT} ^Links [OR]
RewriteCond %{HTTP_USER_AGENT} ^Lynx
RewriteRule ^$ index_text.php

Thanks in advance.

 

volatilegx




msg:675111
 5:57 pm on Jun 13, 2006 (gmt 0)

Sounds like they are spidering your site with a "stealthed" spider not identifying itself as Googlebot. If you know its IP addresses you could use .htaccess to perform the redirect, otherwise, you're S.O.L.

the_nerd




msg:675112
 7:55 pm on Jun 29, 2006 (gmt 0)

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [OR]

I wouldn't touch cloaking with a 10-foot-pole, but common sense tells me nobody would juggle around lists with 10 of 1000s of spider IPs and keep them up-to-the-second if you could fool 4000 phds simply by using "... HTTP_USER_AGENT} ^Googlebot"

brizad




msg:675113
 10:44 pm on Jun 29, 2006 (gmt 0)

IP based cloaking is the only way to go in my opinion. It's too easy for the SE to NOT label themselves as who they truly are, and it's too easy for real humans to diguise themselves as bots and see your cloaked pages.

I'd say you might need some better cloaking software that keeps up with the SE IPs automatically. PM me if you want my recomendations.

jdMorgan




msg:675114
 1:17 am on Jun 30, 2006 (gmt 0)

The reason that your code 'quit working' is that Googlebot changed its user-agent string some time ago to a "Mozilla compatible" format of "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Therefore, your start-anchored regular expressions pattern no longer matches their requests.

You could remove the start-anchor and use:

RewriteCond %{HTTP_USER_AGENT} Googlebot [OR]

-or the more specific-

RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Googlebot/ [OR]

As others have stated, this won't fool a hand-check. But if index_txt.php is indeed a plain-text equivalent of your Flash page, no more, no less, then I wouldn't worry about it; Google is against cloaking with intent to deceive the user, not against user-agent-dependent content negotiation per se.

You might also want to make sure you send a 'Vary' header to warn network caches that you are serving user-agent-dependent content:

# Tell caches that page content changes depending on client user-agent
<FilesMatch "\.(html¦php)$">
Header set Vary: "User-Agent"
</FilesMatch>

Change the broken pipe "¦" character to a solid pipe before use; Posting on this board modifies that character.

Jim

volatilegx




msg:675115
 2:20 pm on Jun 30, 2006 (gmt 0)

Good catch, Jim... I missed the carat :o

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Marketing and Biz Dev / Cloaking
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved