Forum Moderators: phranque
I have a bit of a bind with a content management type site that I webmaster for.
The site urls are all rewritten via htaccess to this formation:
index.php/category/some-file.html
which is exactly how i want them. However, there are remnants of files which are indexed which are not helping me and are creating dupe content along this line:
index.php?s=20034&paged=some-file
Since I rewrite everything to static html, I wish to simply return a 404 not found code for every page which is accessed at this url wildcard:
index.php?
Is it possible to do this from htaccess?
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mysite\.com
RewriteRule (.*) [mysite...] [R=301,L]
RewriteBase /
RewriteRule ^index.php/category/(.*)/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?category_name=$1&feed=$2 [QSA]
RewriteRule ^index.php/category/?(.*) /index.php?category_name=$1 [QSA]
RewriteRule ^index.php/author/(.*)/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?author_name=$1&feed=$2 [QSA]
RewriteRule ^index.php/author/?(.*) /index.php?author_name=$1 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})?/?([_0-9a-z-]+)?.html([0-9]+)?/?$ /index.php?year=$1&monthnum=$2&day=$3&name=$4&page=$5 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})/([_0-9a-z-]+)?.html/(feed¦rdf¦rss¦rss2¦atom)/?$ /wp-feed.php?year=$1&monthnum=$2&day=$3&name=$4&feed=$5 [QSA]
RewriteRule ^index.php/([0-9]{4})?-([0-9]{1,2})?-([0-9]{1,2})/([_0-9a-z-]+)?.html/trackback/?$ /wp-trackback.php?year=$1&monthnum=$2&day=$3&name=$4 [QSA]
RewriteRule ^feed/?([_0-9a-z-]+)?/?$ /wp-feed.php?feed=$1 [QSA]
RewriteRule ^comments/feed/?([_0-9a-z-]+)?/?$ /wp-feed.php?feed=$1&withcomments=1 [QSA]
But since you want to 404 the old-style URLs instead of redirecting them to the new-style URLs, the solution is simpler. You just return a 410-Gone status for HTTP/1.1 or extended HTTP/1.0 requests. If you're on a unique IP address, then you can rewrite HTTP/1.0 requests to a filepath that does not exist, and let your standard 404 handler take care of them.
(True HTTP/1.0 clients don't understand the 'new-for-HTTP/1.1' 410-Gone response, but you won't get any true HTTP/1.0 requests if you're on a name-based virtual server, because HTTP/1.0 doesn't support name-based servers, either.)
# Handle HTTP/1.1 or enhanced HTTP/1.0 client requests for index.php using 410-Gone response
RewriteCond %{HTTP_HOST} .
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php$ - [G]
#
# Handle true HTTP/1.0 requests using 404 response
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php$ /some_file_path_that_does_not_exist [L]
RewriteCond %{HTTP_HOST}<>%{QUERY_STRING} .<>.
However, the issue is that the way the pages are rewritten is as follows:
[mysite.com...]
This htaccess also prevents this page from being accessed as well as the index.php? requests.
Any workaround for this?
All of the pages are rewritten like the above example.
Thanks for your time
Todd
PS.
I noticed the there is no [L] flag in your rules.
By adding it (ex. [L,QSA]), it will be more efficient.
Current rule set of yours isn't optimal, at all.
[edited by: extras at 3:04 pm (utc) on Jan. 30, 2006]
# Handle HTTP/1.1 or enhanced HTTP/1.0 client requests for index.php using 410-Gone response
RewriteCond %{HTTP_HOST} .
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php - [G]
#
# Handle true HTTP/1.0 requests using 404 response
RewriteCond %{QUERY_STRING} .
RewriteRule ^index\.php /some_file_path_that_does_not_exist [L]
Jim