Forum Moderators: phranque

Message Too Old, No Replies

Rewrite or direct to 404

         

glimbeek

11:10 am on Mar 23, 2010 (gmt 0)

10+ Year Member



After searching around for a fair few hours, I again resorted to asking on this forum in hope of getting a good answer like last few times (:

I'm using Joomla! 1.5.x so I have a default Joomla! 404 page and of course the default apache 404 page.

I have a "few" URL's which are indexed by Google but I don't want them to show up in Google or be accessible anymore. I could of course make a redirect for those URL's and redirect them to the homepage or a similar page. However this not a proper solution and I much rather "point" them to a 404 page.
I don't want to actually rewrite/redirect them to a new URL I just want to display the default 404 page when people browse to the URL. So instead of seeing the content they see a 404 page on the URL the browsed to. I prefer doing this using my .htaccess file.

I hope I'm making any sense and I hope someone can help me.

jdMorgan

11:47 am on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For pages which are intentionally removed and have no "logical" replacement, neither a 404-Not Found nor a 301-Moved Permanently is the proper response. The correct response, in accordance with the HTTP/1.1 Protocol [w3.org], is 410-Gone.

Note that Google recently announced that they will now treat 410-Gone as different from 404-Not Found, likely resulting in faster removal from the index and fewer repeated attempts to fetch these removed URLs to see "if you really mean it."

RewriteRule ^(removed-page1\.html|removed-page2\.php|removed-page3\.jsp)$ - [G]

Jim

glimbeek

11:52 am on Mar 23, 2010 (gmt 0)

10+ Year Member



Thanks jdMorgan for yet again a fast and very good response!

To keep a better overview of what happened to what link I'd prefer to create a redirect per link aka:
RewriteRule ^removed-page1\.html$ - [G]

Shouldn't I add a ,L after the [G ?

jdMorgan

12:43 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As documented, [G] implies [L], and asking mod_rewrite to parse extra characters wastes CPU time...

Jim

glimbeek

12:50 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



I did not know what.
Reading before asking...

Thanks for the great support jdMorgan!

jdMorgan

1:10 pm on Mar 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Per-URL redirect rules will be less efficient. A possible compromise between efficiency and readability/maintainability would be:

RewriteCond $1 ^removed-page1\.html$ [OR]
RewriteCond $1 ^removed-page2\.php$ [OR]
RewriteCond $1 ^removed-page3\.asp$
RewriteRule ^(.+\.[^/.]+)$ - [G]

Or, if all removed pages have the same file extension, this would be faster, since the rule can be immediately rejected if the file extension isn't matched, and the RewriteConds then won't have to be processed at all:

RewriteCond $1 ^removed-page1\.html$ [OR]
RewriteCond $1 ^removed-page2\.html$ [OR]
RewriteCond $1 ^removed-page3\.html$
RewriteRule ^(.+\.html)$ - [G]

Jim

glimbeek

1:16 pm on Mar 23, 2010 (gmt 0)

10+ Year Member



If it really does make a big difference I will try your first example.

Thanks again Jim

glimbeek

9:30 am on Mar 29, 2010 (gmt 0)

10+ Year Member



For extra info about the G flag:
[httpd.apache.org...]

glimbeek

6:11 am on Mar 30, 2010 (gmt 0)

10+ Year Member



Just for my peace of mind...

Is there a way to force a 404?

Only thing I could find is:
[groups.google.com...]

But that doesn't use a "flag" like the above example with the returning a 410.

g1smd

7:15 am on Mar 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



For URL request
example.com/thisfile.html
which no longer exists

RewriteRule ^thisfile.html /path-inside-server-does-not-exist [L]


will serve a 404 error, as there's no file on the server to fulfil the request.

glimbeek

7:28 am on Mar 30, 2010 (gmt 0)

10+ Year Member



Isn't this a rather "crude" solution? Creating a redirect to a URL that doesn't exists?

Does Google accept this solution?

jdMorgan

5:10 pm on Mar 30, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is not a redirect, it's rewrite.

As such, no client can 'see' that it happened. All they see is the server's 404 response.

Jim

glimbeek

6:23 am on Mar 31, 2010 (gmt 0)

10+ Year Member



That still confuses me at times.

When I use for instance r=301 it's a rewrite.
When I just use a flag, like [L], it's a redirect?

Does Google accept this solution?

jdMorgan

1:07 pm on Mar 31, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You've got that backwards.

But it seems you're focusing on the mod_rewrite syntax, and not on what it actually does...

An internal rewrite says to the server, "If the client requests this URL-path, then serve that file to the client.

An external redirect says to the server, "If the client requests this URL-path, then send a message to the client telling it to re-request what it wants from this other URL.

Therefore, internal rewrites are totally invisible to clients (browsers, search engine robots, etc.) and unless you make a mistake while coding the rewrite, Google will see no indication whatsoever that this URL has been rewritten.

URLs and filepaths are completely different things. URLs are used only "out on the Web" and filepaths are used only inside your server. For example, look at your browser address bar right now: It says, "http://www.webmasterworld.com/apache/4102928.htm". But in fact, this entire site runs on PERL scripts and a database, and no such "HTML page" actually exists; this thread is simply a group of entries in a database. Google doesn't know this or care about this, because all it sees, ranks, or cares about is the URL.

Therefore, it is not up to Google to "accept" or "not accept" this solution. Because this solution only acts inside your server, Google won't even be aware that the solution has been implemented. All it will see is a standard 404-Not Found response when it requests "/removed-page.html" from your site.

If you don't trust the answers given, here, I suggest that you either ask on a forum site that you *do* trust, or simply don't use mod_rewrite until you have learned enough about it to *know* what works and what doesn't.

Also, install and use the "Live HTTP Headers" add-on for Firefox and other Mozilla-based browsers. When testing your site (and this code) with that tool, you will be able to see all HTTP client request and server response headers. After learning to "read" the information that it shows, you will *know* that internal rewrites are not exposed to clients unless they are incorrectly-coded.

Jim

glimbeek

9:47 am on Apr 2, 2010 (gmt 0)

10+ Year Member



Jim,

thanks for the extensive reply.

"If you don't trust the answers given, here, I suggest that you either ask on a forum site that you *do* trust, or simply don't use mod_rewrite until you have learned enough about it to *know* what works and what doesn't."

It's not that I don't trust it, this forum and the people on it have supported me more the once, I just thought it a bit odd that Google would accept this solution. But that's because of my lack of knowledge on the subject.

With your explanation it's all more clear now and I will follow up on the suggested solution.

Thanks again Jim and also; Thank you g1smd, I appreciate the support you both provided!