Fetch as Googlebot working strangely

Forum Moderators: phranque

Message Too Old, No Replies

Fetch as Googlebot working strangely

Fetch as Googlebot returns 404 for 301'd pages..Strange?

nikhilrajr

12:41 pm on Oct 18, 2012 (gmt 0)

Hi,

I did a 301 redirection from example.com/about-contact/ to example.com/contact/. And everything is working fine. But not with Googlebot. I tried to fetch the old url in GWT but it returns 404 for googlebot. Any idea why this happens?

g1smd

12:45 pm on Oct 18, 2012 (gmt 0)

Change your Browser UA to Googlebot, install the Live HTTP Headers extension for Firefox and check again.

nikhilrajr

1:18 pm on Oct 18, 2012 (gmt 0)

Checked it. It says "Bots get the naked version" same as in Fetch as Googlebot in GWT. What might be the problem?

nikhilrajr

1:26 pm on Oct 18, 2012 (gmt 0)

SEOmoz also lists the old url as 404. So something is causing problems for the bots. Any idea?

lucy24

8:21 am on Oct 19, 2012 (gmt 0)

I'm missing something. Is the redirect happening? That was the point of using Live Headers or equivalent. If the redirect takes place as intended, it doesn't matter whether the old URL exists or not. All that matters is the new one.

"naked version" implies that you're redirecting via javascript or some other optional means, instead of an absolute, unconditional redirect from the server side-- php script, config/htaccess etc. How is the redirect coded?

nikhilrajr

9:04 am on Oct 19, 2012 (gmt 0)

Yes, the redirect is happening but if I select Googlebot as user agent then no redirection takes place and the old url will return a page with "Bots get the naked version". The problem is with crawlers only as SEOMoz also returned 404 for old urls.. I am using htaccess. But not sure. Have to ask the web designer...

lucy24

9:49 am on Oct 19, 2012 (gmt 0)

Have to ask the web designer...

Uhm, yeah, sounds like a plan. The redirect is clearly not originating from htaccess, or else it would happen equally to everyone. Is the web designer also the ongoing webmaster, or is he/she only responsible for the front end of the site?

If the old page doesn't exist (404), what are the robots getting the naked version of? If you disable javascript do you get redirected yourself?

nikhilrajr

10:00 am on Oct 19, 2012 (gmt 0)

Ha my designer is great :) He is the only person handling the front end of site. I disabled javascript and the redirection works fine.

phranque

1:31 am on Oct 20, 2012 (gmt 0)

what type of content management system do you use?
have you checked the .htaccess file or the source code on the page served for anything unusual?

nikhilrajr

6:58 pm on Oct 20, 2012 (gmt 0)

Wordpress..There is nothing on the old url's page source just "bots get the naked version". Can anyone tell me what that means? What is with naked version and javascript? lucy24 said ""naked version" implies that you're redirecting via javascript or some other optional means, instead of an absolute, unconditional redirect from the server side-- php script, config/htaccess etc." Any reference link..

lucy24

9:24 pm on Oct 20, 2012 (gmt 0)

Oh. Oops. I thought that was google talking. So it's one of those programs that looks at what kind of browser the visitor has got, and builds the page accordingly? If so, I wonder what the plainclothes robots get. Saying outright "bots get the naked version" sounds unnervingly like "this site serves cloaked content" doesn't it?

Where does the redirect happen? Within the page code itself? What does your config file or htaccess say about the original URL?

nikhilrajr

6:41 am on Oct 22, 2012 (gmt 0)

It's actually happening at the host level so I've dropped them an email to see that they can ignore it.

phranque

7:19 am on Oct 22, 2012 (gmt 0)

There is nothing on the old url's page source

what is in the source of the script that generates your response?

i haven't seen any answers from you about the contents of your .htaccess file.

It's actually happening at the host level

what does that mean?

by the way, welcome to WebmasterWorld, nikhilrajr!

nikhilrajr

3:57 am on Oct 23, 2012 (gmt 0)

Thanks phranque..
Identified an issue at the server level where the bots are not properly being redirected. WPEngine is the hosting provider. WPEngine has a redirect management system at the Nginx level and I will add the redirects there so this can be avoided. I'll keep you posted once it's completed. Thanks for the help everyone.

phranque

6:15 am on Oct 23, 2012 (gmt 0)

make sure you understand the implications before you implement your solution.

Cloaking - Webmaster Tools Help:
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=66355 [support.google.com]

Official Google Webmaster Central Blog: How Google defines IP delivery, geolocation, and cloaking:
http://googlewebmastercentral.blogspot.com/2008/06/how-google-defines-ip-delivery.html [googlewebmastercentral.blogspot.com]

nikhilrajr

9:16 am on Oct 23, 2012 (gmt 0)

If the objective was to display alternate content to entice search that would be cloaking... But not a 404. This is dumb :)

phranque

10:18 am on Oct 23, 2012 (gmt 0)

i'm not saying your current or future implementation will be seen as cloaking.

however, you should be aware of the issues whenever googlebot is seeing a different response than a human visitor.

nikhilrajr

10:45 am on Oct 23, 2012 (gmt 0)

Yes, I understood what you said. Thanks for the reference links.
This is not related. But check this out connected with cloaking "Why hasn't Google banned Quora for hiding answers from search engine visitors?" [quora.com...]