Welcome to WebmasterWorld Guest from 18.204.227.250

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

Is 301 to 404 very bad?

     
9:11 am on Sep 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 17, 2005
posts: 109
votes: 0


I have an htaccess which 301's (simplified):

^/[a-z] to /thing/$1

and then if /thing/$1 isn't a valid "thing", 404s. Basically a 301 redirect to a 404 page.

Whilst the ideal would be to 404 at the first instance instead of sending two sets of headers (a bit tricky in this instance), I was wondering if there was a particular reason I should work towards removing the first 301 or if it's not actually that much of a problem (SEO speaking).

Would appreciate your opinions.
10:15 am on Sept 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Search engines will take a dim view of this, simply because it indicates poor site administration. Google will throw a warning in their Webmaster tools for every URL they find that does this. So with my SEO hat on on, I'd have to say that no, it's not just very bad... It's an outrage, a travesty, a crime against nature and the very HTTP protocol itself! Claim to moved permanently to a non-existent address, eh? We'll have you for that bit of fraud, my son!

In 99% of all cases, this is not at all a "tricky" problem -- the exception being when "/thing" is on another server, in which case a scripted solution will be required. For static "/things" it's usually a trivial one-line addition to an existing rule. For dynamic "/things" it's usually a modification to your script(s) or the addition of an additional simple "wrapper" script.

So what is the specific reason that you deem your own case tricky?

And also, why are you redirecting the URL-path "/gizmo" to what *should* be an internal filepath -- "/thing/gizmo"?

Providing these details may reveal that the answer is quite simple once an earlier mistake has been corrected, and that the apparent complexity arises only as a result of an earlier design problem -- which is something that is very common.

Jim
10:46 am on Sept 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 17, 2005
posts: 109
votes: 0


Thanks. Travesty was my opinion too, but was hoping..

/gizmo is a short url used for display url in ads, and has invariably been scrapered or re-created incorrectly as links around the interwebs. 90% of the time the links are fine and work as designed, but 10% strangely morph into complete rubbish.

/thing/gizmo/do/this is dynamic, ending up as do_this.php?what=gizmo

do_this.php checks if gizmo is a valid "do", and if not then do_this.php issues the 404.

I wondered if AliasMatch-ing rather than rewriting the first move from /gizmo to /thing/gizmo/do/this would be the answer, but so far am just hitting 500s and haven't been able to figure out why with AliasMatch ^([gizmo]+)\/?$ /thing/$1/do/this
11:25 am on Sept 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


This is only a problem if you try to redirect first, then rewrite to do_this.php, then validate the the "what=" value.

Instead, rewrite the requests to do_this.php regardless of whether the "/thing" path-part is missing, and let do_this.php insert the "/thing" path-part if it is missing, validate the resulting "thing" against your database, and then issue a 301-Moved Permanently redirect to a corrected URL, a 410-Gone if the thing's page has been intentionally removed, or a 404-Not Found if it's missing, or serve a page for that thing if everything about the request was valid.

The only difficulty that arises is if you use extensionless URLs for other purposes such as blog or forum posts. In that case, you'll have to handle distinguishing between requests for those posts and requests for "things" when "/thing/" is missing from the URL. This might entail using the "wrapper script" method to check for both possible cases before dispatching to either do_this.php or the blog/forum script.

Barring that complication, the solution is simply to re-order/re-allocate the responsibility for 301 redirects of faulty "thing" URLs to the script that can check if the "thing" exists before actually issuing the redirect.

Do be sure to output the 301 "Status" header from the script in addition to the "Location" header, so that a 301-Moved Permanently redirect is generates -- If you don't specify the status, then the default will be a 302-Found, which would have "unfortunate" results in search engines...

Jim
11:58 am on Sept 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 17, 2005
posts: 109
votes: 0


Thanks again Jim, I appreciate your help.

Other extensions on URLs are handled independently, with /gizmo being the last unfound option to check.

So, did you mean:

1: RewriteRule ^([a-zA-Z0-9]+)\/?$ /thing/do_this.php?what=$1 [R=permanent,L]

2: do_this.php --if !exists($what) then 404

I guess not as that still creates 301 > 404, so I guess the real question is how to do #1 without incurring a 301?

Chris
5:15 pm on Sept 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


1: RewriteRule ^([a-z0-9]+)/?$ /thing/do_this.php?what=$1 [NC,R=301,L]

(Use [NC] for a 30% speed-up. No escaping required for "/" character. "301" is shorter and faster to parse than "Permanent".)

2: do_this.php:

if exists($what) then {
create and output the page content (using the database records)
}
elseif marked-as-replaced-in-database($what) then {
create and output 301-Moved Permanently status response and Location header
}
elseif marked-as-deleted-in-database($what) then {
create and output 410-Gone status response and page content
}
elseif no-database-entry-exists($what) then {
create and output 404-Not-Found status response and page content
}
endif

(Let your php file do everything once you have entered the content-handling phase of the API. If you wish you may "include" the 404 and 410 page contents from an external file, but you must still output the correct server status responses from the script itself.)

Jim
7:20 pm on Sept 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 17, 2005
posts:109
votes: 0


Hi Jim - that's pretty much what's going on right now with the exception of the 410 and an additional error trapping 301. 301 redirect resulting in either a page or a 404 page - hence, 301 to 404. The 301 must be sent from the .htaccess and the 404 header must come from the content system.

I didn't know about the NC & 301 for speed, thanks for that!
7:27 pm on Sept 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> The 301 must be sent from the .htaccess and the 404 header must come from the content system.

I'm not sure what the 'contingencies' of this statement are. This is certainly not an Apache requirement. I recommend that you do NOT do the 301 until you know that the 'target' URL is good, and therefore recommend that you do both in the script -- or in a "wrapper" script around your main script. Otherwise, there is no way to control the order of the 301-to-404 chain and correct it to a 301-or-404 case.

Jim
7:49 pm on Sept 9, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 17, 2005
posts:109
votes: 0


When checking the headers actually being sent, the redirect in the .htaccess appears to create the first 301 to the script; the script then sends the second.
11:35 pm on Sept 9, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Right, because if you implement the scripting approach I outlined, you should internally rewrite, not externally redirect, to that script.

The rule becomes:

RewriteRule ^([a-z0-9]+)/?$ /thing/do_this.php?what=$1 [L]

Jim