|301 or 404|
What is best for users and googlebot?
Lately, I'm seeing some odd incoming traffic (not a lot, maybe 50 hits a day) from urls that don't exist. They seem to be kind of consistent like in the form of:
The majority of the errors seem to be from Googlebot. My question is, should I be 404'ing them to the 404 not found page or 301'ing them to the correct url (http://www.mysite.com/directory/filename.html which actually does exist)? Or, should I send a 404 header but redirect (php) to the correct url? If it's an actual user coming in I'd want them to find the correct page, but if I do that - won't Google just keep spidering them? Or, do I specifically tell google only that they never existed but redirect for the non spidering user?
To recap, it seems like my options are:
1) send 404 header - redirect to 404 not found page
2) send 301 header - redirect to correct page (for which there has always been one based on these particular errors)
3) send 404 header - redirect to correct page (or is that bad for Googlebot?)
4) send varied header - 301 for user with redirection to correct page, 404 for googlebot and redirect to 404 not found page
Also, I should stress that it's not just googlebot that I'm seeing these errors from, but it is the majority of the errors. I am seeing some actual users following these links and similar ones from somewhere.
i would suggest option 2 - use a 301 status code and redirect to the canonical url (i.e. without the query string)
Have you just migrated your site from Wordpress to some other system?
|3) send 404 header - redirect to correct page (or is that bad for Googlebot?) |
If you send a 404 header, how do you propose to send a 301 header as well? A URL can return only one status code. It does not matter how that code is generated (internally in Apache, by .htaccess rule or sent by PHP).
I would either 404 for all visitors and bots, or IF there is real traffic for those URLs, use a 301 redirect to strip the parameters. Since I use URL rewriting a lot, and since I do not use parameterised URLs, I usually install a parameter-stripping redirect that redirects to the canonical URL for page, right from the outset.
|Have you just migrated your site from Wordpress to some other system? |
No, those pages (with the odd parameters) have never existed. There are about 7-8 consistent parameters being added. A couple seem to be from rss readers of some kind.
|If you send a 404 header, how do you propose to send a 301 header as well? |
I was wondering about only sending the 404 header, but then redirecting to the correct page using php.
header ("HTTP/1.1 404 Not Found");
header ("Location: http://www.mysite.com/correctpage.html");
I currently have the pages set to detect if the url is not the canonical one. I originally was having them send the 404 header and go to the "not found" page on the website. I was only recently logging these looking for a particular problem when I saw these patterns and wondered if I should be intercepting them instead.
the Location: response header is only prescribed for 3XX status codes (redirection) and 201 (Created):
a Location: header with a 404 (Not Found) status code makes no sense.
how can you provide a location for something you cannot find?
The location header on its own generates a 302 redirect.
If you combine it with a 404 header, it is unknown what will actually be sent (I assume only a 302 header).
If you somehow managed to send both 404 and 302, there is no knowing what Google would do and which one they would ignore.
|If you combine it with a 404 header, it is unknown what will actually be sent (I assume only a 302 header). |
That's exactly what I was asking about. Thank you! So I either should send a strict 404 not found or use the 301 to redirect to the canonical url. The latter would be the most user friendly, of course.
|If you somehow managed to send both 404 and 302, there is no knowing what Google would do and which one they would ignore. |
It will give you an error in WMT.
Google accessing non-existing pages. Few things. First make sure these urls aren't somehow created from within your domain by setting up some combination of url parameters. Because if that's the case then google will keep reporting them. Second you can check my comments here: