Forum Moderators: phranque

Message Too Old, No Replies

Unique htaccess question

Unique htaccess question

         

CainIV

1:49 am on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello all.

I have a bit of a bind with a content management type site that I webmaster for.

The site urls are all rewritten via htaccess to this formation:

index.php/category/some-file.html

which is exactly how i want them. However, there are remnants of files which are indexed which are not helping me and are creating dupe content along this line:

index.php?s=20034&paged=some-file

Since I rewrite everything to static html, I wish to simply return a 404 not found code for every page which is accessed at this url wildcard:

index.php?

Is it possible to do this from htaccess?

RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule (.*) [mysite...] [R=301,L]

RewriteBase /
RewriteRule ^index.php/category/(.*)/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?category_name=$1&feed=$2 [QSA]
RewriteRule ^index.php/category/?(.*) /index.php?category_name=$1 [QSA]
RewriteRule ^index.php/author/(.*)/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?author_name=$1&feed=$2 [QSA]
RewriteRule ^index.php/author/?(.*) /index.php?author_name=$1 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})?/?([_0-9a-z-]+)?.html([0-9]+)?/?$ /index.php?year=$1&monthnum=$2&day=$3&name=$4&page=$5 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})/([_0-9a-z-]+)?.html/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?year=$1&monthnum=$2&day=$3&name=$4&feed=$5 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})/([_0-9a-z-]+)?.html/trackback/?$ /wp-trackback.php?year=$1&monthnum=$2&day=$3&name=$4 [QSA]
RewriteRule ^feed/?([_0-9a-z-]+)?/?$ /wp-feed.php?feed=$1 [QSA]
RewriteRule ^comments/feed/?([_0-9a-z-]+)?/?$ /wp-feed.php?feed=$1&withcomments=1 [QSA]

jdMorgan

2:23 am on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We actually had a very similar question only yesterday... ;)

But since you want to 404 the old-style URLs instead of redirecting them to the new-style URLs, the solution is simpler. You just return a 410-Gone status for HTTP/1.1 or extended HTTP/1.0 requests. If you're on a unique IP address, then you can rewrite HTTP/1.0 requests to a filepath that does not exist, and let your standard 404 handler take care of them.
(True HTTP/1.0 clients don't understand the 'new-for-HTTP/1.1' 410-Gone response, but you won't get any true HTTP/1.0 requests if you're on a name-based virtual server, because HTTP/1.0 doesn't support name-based servers, either.)


# Handle HTTP/1.1 or enhanced HTTP/1.0 client requests for index.php using 410-Gone response
RewriteCond %{HTTP_HOST} .
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php$ - [G]
#
# Handle true HTTP/1.0 requests using 404 response
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php$ /some_file_path_that_does_not_exist [L]

You can also concatenate the first two RewriteConds if you like:

RewriteCond %{HTTP_HOST}<>%{QUERY_STRING} .<>.

Jim

CainIV

6:06 am on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks JD, i tried both sets of code, and both worked on anything accessed at index.php.

However, the issue is that the way the pages are rewritten is as follows:

[mysite.com...]

This htaccess also prevents this page from being accessed as well as the index.php? requests.

Any workaround for this?

All of the pages are rewritten like the above example.

Thanks for your time

Todd

extras

3:01 pm on Jan 30, 2006 (gmt 0)

10+ Year Member



You need to match against %{THE_REQUEST}.
Otherwise, the rule can match the url treated by the RewriteRules and create unwanted results or even error.

PS.

I noticed the there is no [L] flag in your rules.
By adding it (ex. [L,QSA]), it will be more efficient.
Current rule set of yours isn't optimal, at all.

[edited by: extras at 3:04 pm (utc) on Jan. 30, 2006]

jdMorgan

3:03 pm on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The solution may be as simple as removing the end-anchors from the 'index\.php' patterns:

# Handle HTTP/1.1 or enhanced HTTP/1.0 client requests for index.php using 410-Gone response
RewriteCond %{HTTP_HOST} .
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php - [G]
#
# Handle true HTTP/1.0 requests using 404 response
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php /some_file_path_that_does_not_exist [L]

For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com], particularly, the material related to regular-expressions.

Jim