Forum Moderators: phranque

Message Too Old, No Replies

.htaccess 301 redirect + block googlebot from redirect

         

tdog553

6:29 pm on Dec 24, 2016 (gmt 0)

5+ Year Member



Hey all,

Trying to do two things.
1. A complete domain 301 redirect using my .htaccess file
2. Block googlebot from seeing the redirect(sounds dumb but I have a good reason)

Here’s the code I have so far,
RewriteCond %{HTTP_USER_AGENT} googlebot|yahoobot|microsoftbot [NC] 
RewriteRule ^.*$ – [R=403,L]
RedirectMatch 301 ^(.*)$ http://www.example.com/


This is successfully redirecting, but I'm not sure that it's blocking googlebot from seeing the redirect. I used my browser user-agent, switched it to googlebot, and I am still being redirected.

However, If I try to go to www.oldredirectedURL.com/randomdirectory as googlebot I do get a forbidden message like it's blocking it.

Pretty confused as I'm not very good with .htaccess stuff and just trying my best here.
Thanks all!

[edited by: phranque at 1:54 am (utc) on Dec 25, 2016]
[edit reason] [url=http://www.webmasterworld.com/apache/4452736.htm]IMPORTANT: Please Use [b]exampl [/edit]

phranque

4:32 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you shouldn't mix mod_rewrite and mod_alias directives in the same configuration.

http://httpd.apache.org/docs/current/rewrite/avoid.html [httpd.apache.org]:
when there are Redirect(Match) and RewriteRule directives in the same scope, the RewriteRule directives will run first, regardless of the order of appearance in the configuration file


i'm not sure why you are getting inconsistent results but i would change your RedirectMatch to a RewriteRule and see what happens.
if you are still having problems, fix that.


also there is no "yahoobot" or "microsoftbot", it's Slurp [help.yahoo.com] and Bingbot/MSNBot [bing.com].

phranque

4:34 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



oh, and welcome to WebmasterWorld, tdog553!

keyplyr

7:44 am on Dec 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Block googlebot from seeing the redirect(sounds dumb but I have a good reason)

Hi tdog553 and welcome to WebmasterWorld [webmasterworld.com]

Blocking Googlebot & the other Search Engines with a 403 Forbidden server response will cause your site and all your pages to be removed from their index. Is this really what you want?

tdog553

5:09 pm on Dec 25, 2016 (gmt 0)

5+ Year Member



Thanks for the replies guys!

I will change the RedirectMatch to RewriteRule and see if that makes any difference. Just wish there was a way to know for sure if Googlebot was being blocked. I thought changing my browser user-agent would do it, but I still get passed through to the new site via the 301 redirect even as googlebot.

Thanks for the concern Keyplyr, this is definitely something that would make you think twice, but it is what I am after. Basically, I want other SEO crawlers to see the link juice being passed from the 301 redirect, but I do not want google to see this.

It's not actually a site I'm trying to rank, but more of bragging rights and artificially inflating metrics if that makes more sense.

lucy24

6:40 pm on Dec 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just wish there was a way to know for sure if Googlebot was being blocked.
There is. Just look in your logs and verify that Googlebot requests for the relevant pages get a 403 response rather than 301.

RewriteCond %{HTTP_USER_AGENT} googlebot [NC]
This is the textbook case of when not to use the [NC] flag. Learn the correct casing of the robot's name (also, as noted upthread, its correct name), and use it.

<tangent>
It's possible that your desired result would be better achieved by serving Google a 410 response for the specified URLs. But be awfully careful, because you don't want to be accused (or even suspected) of "cloaking".
</tangent>

tdog553

6:02 am on Dec 26, 2016 (gmt 0)

5+ Year Member



I checked my logs and see this:
66.249.75.100 - - [25/Dec/2016:22:44:50 -0500] "GET / HTTP/1.1" 403 202 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +(google website here))"


So it looks like it's working then, correct?

Lucy24 shouldn't [NC] not matter since that just telling it to apply to any CaSe of the letters? I guess it's redundant since the proper code would be to just make case sensitive "Googlebot", but it should still work fine and not have any issues right?

keyplyr

6:56 am on Dec 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Don't use the [NC] when there is only one casing for a known UA. This is for a couple of reasons:

• For explicity - This blocks the other case variants from impostors.

• Resource efficiency - The server only has to read the rule once if there is only one version of the UA.

[webmasterworld.com...]

lucy24

5:53 pm on Dec 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



shouldn't [NC] not matter

The element [NC] is easy to type, like clicking or unclicking "case sensitive" in a text editor. But where you see
Googlebot [NC]
the server --or whatever is construing your text string-- sees
[Gg][Oo][Oo][Gg][Ll]
... et cetera.