Forum Moderators: phranque

Message Too Old, No Replies

redirect based on user-agent (all bots)

Regex Noob!

         

balusyam

1:31 pm on Mar 20, 2012 (gmt 0)

10+ Year Member



Current scenario: I have a website xyz.com. The default home page is configured as index.php currently. That is, when apache gets a HTTP GET "/" request, we serve index.php.

Requirement:
Now, I want to serve index1.php wherever the user-agent string contains "google" as the sub-string. So I wrote the following code in .htaccess:

=======================
RewriteCond %{HTTP_USER_AGENT} Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1 (+http://www.google.com/bot.html) [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot/2.1 (+http://www.google.com/bot.html) [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot-Image/1.0 [OR]
RewriteCond %{HTTP_USER_AGENT} Googlebot-Video/1.0 [OR]
RewriteCond %{HTTP_USER_AGENT} Mediapartners-Google [OR]
RewriteCond %{HTTP_USER_AGENT} AdsBot-Google (+http://www.google.com/adsbot.html)

RewriteRule ^/$ /index1.php [R=301,L]
===============================

After apache restart, we could see 500 server error.

Then I used Regex to match "google" like this:

=============================

RewriteCond %{HTTP_USER_AGENT} (.*)(Google)(.*) [NC]

RewriteRule ^/$ /index1.php [R=301,L]
==============================

This time, the website came up fine. However, when i spoof as Google bot, I am still seeing index.php and not index1.php.

I strongly suspect that there is something wrong with this part :

"(.*)(Google)(.*)"

Can someone help me with the correct regex for matching the substring "google"?

Thanks in advance.

phranque

1:45 pm on Mar 20, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



there are special/reserved characters in your patterns that must be escaped with a backslash.

After apache restart, we could see 500 server error

did you check your serve error log for clues?

(.*)(Google)(.*)

just "google" (without the quotes) is good enough for that pattern to match

balusyam

1:51 pm on Mar 20, 2012 (gmt 0)

10+ Year Member



Thanks Phranque!

I was suspecting this special charaters thing!

However, I will try just "google" (without quotes) and see if it works.

I just asked sys admin for error_log. Will update here if i see any errors.

g1smd

9:09 pm on Mar 20, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Why are you redirecting to a different URL? You need an internal rewrite so that content is returned from a different file when "/" is requested.

Once you fix your code your browser will cache responses by URL, so you will see the "wrong" page when you change UA - your browser will just pull it from its cache. Clear the cache before each test.

balusyam

3:38 pm on Mar 24, 2012 (gmt 0)

10+ Year Member



@g1smd

Why are you redirecting to a different URL?


I have optimized the other URL for Google. I want to keep one version for users accessing from browsers and the other one for Search engine bots (no black hat here, however, the optimized page does not look good for normal users)

You need an internal rewrite so that content is returned from a different file


Sorry for being dumb here! Can you please explain a bit?

Thanks all for looking in.

g1smd

4:00 pm on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This code:

DirectoryIndex index.php


will allow content to be served from index.php when a URL ending with a slash is requested and the named folder exists and has an index.php file within it. This happens for all regular users.

However, adding a rewrite like this:

RewriteCond %{HTTP_USER_AGENT} user-agent-string
RewriteRule ^$ /index1.php [L]


will serve content from a different file when the same example.com/ URL is requested and the named user agent string is included in that request.

For the ^$ part use ^/$ when the code is in http.conf.

[edited by: g1smd at 4:25 pm (utc) on Mar 24, 2012]

balusyam

4:20 pm on Mar 24, 2012 (gmt 0)

10+ Year Member



@g1smd

Thanks for your quick reply!

I understand that I will need to put the below code in httpd.conf file instead of .htaccess in root directory:


RewriteCond %{HTTP_USER_AGENT} (.*)(Google)(.*) [NC]

RewriteRule ^/$ /index1.php [R=301,L]


Please correct me if I am wrong.

Is there any specific reason why I should not be using .htaccess ?

g1smd

4:26 pm on Mar 24, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use httpd.conf if you have access to it. The code will run a whole lot faster.

There are several errors in your code. Use this:

RewriteCond %{HTTP_USER_AGENT} Google [NC]
RewriteRule ^/$ /index1.php [L]


(.*) is always redundant if you are not using the captured information. It's especially redundant on unanchored patterns.

The R=301 forces a redirect. As I explained above you need a rewrite, not a redirect. Use only the [L] flag here.

balusyam

1:31 pm on Mar 25, 2012 (gmt 0)

10+ Year Member



Thanks a ton! Will update u guys about the results.

balusyam

10:43 am on Mar 30, 2012 (gmt 0)

10+ Year Member



Hi,

I tried the code (without the 301 redirect). I could still see index.php but not index1.php when i spoof as Googlebot.

Any pointers about the glitch?

lucy24

8:33 pm on Mar 30, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you mean you see the content of index.php, or your browser's address bar says index.php? The essence of a Rewrite is that the two things do not have to be the same.

balusyam

8:48 pm on Mar 30, 2012 (gmt 0)

10+ Year Member



Do you mean you see the content of index.php


Exactly, I still see the "content" of index.php when I use spoof Googlebot.

There is something that is missing here! Thanks for looking in