Fetch as Googlebot returns 404 for 301'd pages..Strange?
12:41 pm on Oct 18, 2012 (gmt 0)
I did a 301 redirection from example.com/about-contact/ to example.com/contact/. And everything is working fine. But not with Googlebot. I tried to fetch the old url in GWT but it returns 404 for googlebot. Any idea why this happens?
12:45 pm on Oct 18, 2012 (gmt 0)
Change your Browser UA to Googlebot, install the Live HTTP Headers extension for Firefox and check again.
1:18 pm on Oct 18, 2012 (gmt 0)
Checked it. It says "Bots get the naked version" same as in Fetch as Googlebot in GWT. What might be the problem?
1:26 pm on Oct 18, 2012 (gmt 0)
SEOmoz also lists the old url as 404. So something is causing problems for the bots. Any idea?
8:21 am on Oct 19, 2012 (gmt 0)
I'm missing something. Is the redirect happening? That was the point of using Live Headers or equivalent. If the redirect takes place as intended, it doesn't matter whether the old URL exists or not. All that matters is the new one.
9:04 am on Oct 19, 2012 (gmt 0)
Yes, the redirect is happening but if I select Googlebot as user agent then no redirection takes place and the old url will return a page with "Bots get the naked version". The problem is with crawlers only as SEOMoz also returned 404 for old urls.. I am using htaccess. But not sure. Have to ask the web designer...
9:49 am on Oct 19, 2012 (gmt 0)
Have to ask the web designer...
Uhm, yeah, sounds like a plan. The redirect is clearly not originating from htaccess, or else it would happen equally to everyone. Is the web designer also the ongoing webmaster, or is he/she only responsible for the front end of the site?
10:00 am on Oct 19, 2012 (gmt 0)
1:31 am on Oct 20, 2012 (gmt 0)
what type of content management system do you use? have you checked the .htaccess file or the source code on the page served for anything unusual?
6:58 pm on Oct 20, 2012 (gmt 0)
9:24 pm on Oct 20, 2012 (gmt 0)
Oh. Oops. I thought that was google talking. So it's one of those programs that looks at what kind of browser the visitor has got, and builds the page accordingly? If so, I wonder what the plainclothes robots get. Saying outright "bots get the naked version" sounds unnervingly like "this site serves cloaked content" doesn't it?
Where does the redirect happen? Within the page code itself? What does your config file or htaccess say about the original URL?
6:41 am on Oct 22, 2012 (gmt 0)
It's actually happening at the host level so I've dropped them an email to see that they can ignore it.
7:19 am on Oct 22, 2012 (gmt 0)
There is nothing on the old url's page source
what is in the source of the script that generates your response?
i haven't seen any answers from you about the contents of your .htaccess file.
It's actually happening at the host level
what does that mean?
by the way, welcome to WebmasterWorld, nikhilrajr!
3:57 am on Oct 23, 2012 (gmt 0)
Thanks phranque.. Identified an issue at the server level where the bots are not properly being redirected. WPEngine is the hosting provider. WPEngine has a redirect management system at the Nginx level and I will add the redirects there so this can be avoided. I'll keep you posted once it's completed. Thanks for the help everyone.
6:15 am on Oct 23, 2012 (gmt 0)
make sure you understand the implications before you implement your solution.
If the objective was to display alternate content to entice search that would be cloaking... But not a 404. This is dumb :)
10:18 am on Oct 23, 2012 (gmt 0)
i'm not saying your current or future implementation will be seen as cloaking.
however, you should be aware of the issues whenever googlebot is seeing a different response than a human visitor.
10:45 am on Oct 23, 2012 (gmt 0)
Yes, I understood what you said. Thanks for the reference links. This is not related. But check this out connected with cloaking "Why hasn't Google banned Quora for hiding answers from search engine visitors?" [quora.com...]