Welcome to WebmasterWorld Guest from 35.175.180.108

Forum Moderators: Ocean10000 & phranque

Message Too Old, No Replies

RewriteRule Problem

Trying...And Failing...to Redirect an Entire Directory

     
5:12 am on Nov 13, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:512
votes: 5


Hi,

I need to redirect the contents of an entire directory back to the home page of the website. The code snippet below is what I came up with.


RewriteCond %{HTTP_HOST} ^domain\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.domain\.com$
RewriteRule ^ugly-folder-name\/?(.*)$ "http\:\/\/www\.domain\.com" [R=301,L]


While the actual redirect works, unfortunately the final URL isn't correct. Once the redirection happens, I end up with the home page having a final URL like this:

www.domain.com/page-title-of-redirected-page.php

Basically, what seems to be happening is that the URL of the page being redirected is being "added on" to the end of the domain.com (which would essentially lead to 1000's of copies of my home page).

Any ideas on what I'm doing wrong?

Jim
6:16 am on Nov 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


Not enough information. In random order:
While the actual redirect works

The one thing I'm sure of is that the redirect is not happening because of the quoted rule. Some other rule somewhere else is creating a different redirect. Do you have any existing redirects using mod_alias (Redirect by that name)? It sure looks like a mod_alias pattern.

What's the %{HTTP_HOST} for? Are there other domains or subdomains sharing the same htaccess?

"http\:\/\/www\.domain\.com"

Neither the quotation marks nor the escapes are necessary. (In a target, neither one is ever necessary.)

I need to redirect the contents of an entire directory back to the home page of the website.

This is a use of the word "need" that I'm not familiar with. Most people in this situation would serve up a 410 with a nice custom 410 page, possibly one that's specific to the now-deleted directory. (As long as the directory itself exists, it can have its own htaccess file specifying its own ErrorDocuments. Doesn't matter what-if-anything else is present or not present in the directory.)

(which would essentially lead to 1000's of copies of my home page).

I don't understand how this follows. Is there a CMS involved?

Can we assume Apache 2.2? If 2.4, we need to check a few more inheritance issues.

Edit: Is there a connection between this redirect and the problem discussed here [webmasterworld.com]? Awright, come clean now...
6:33 am on Nov 13, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:512
votes: 5


> Edit: Is there a connection between this redirect and the problem discussed here [webmasterworld.com]? Awright, come clean now...

No relation. Sorry. Instead, it's an old directory of very outdated pages that are best put to death. I actually have several directories of old pages that just "need to go."
6:38 am on Nov 13, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:512
votes: 5


> This is a use of the word "need" that I'm not familiar with. Most people in this situation would serve up a 410 with a nice custom 410 page, possibly one that's specific to the now-deleted directory. (As long as the directory itself exists, it can have its own htaccess file specifying its own ErrorDocuments. Doesn't matter what-if-anything else is present or not present in the directory.)

That is a fine idea. I've never used 410's before, so didn't know anything about them until now. But since these pages are "gone for good," using a 410 makes absolute sense.

Is this is all that's needed for 410's? And should I put this code in a separate .htaccess file inside the folder? Or should I put it the main .htaccess file?

Redirect gone /folder-name/
ErrorDocument 410 default
12:29 pm on Nov 13, 2015 (gmt 0)

Full Member

Top Contributors Of The Month

joined:Apr 11, 2015
posts: 328
votes: 24


Redirect gone /folder-name/
ErrorDocument 410 default


Well, sort of.

If you specify "default" for the ErrorDocument then you are only going to return the server's default error document, not a nice custom error document, that Lucy24 mentions. Presumably the reason for the redirect in the beginning was to try and keep users on the site? So, the error document should contain the relevant information to do this.

If you are using mod_rewrite (ie. RewriteRule) anywhere else then you should probably use this rather than mod_alias (Redirect). mod_rewrite runs first, so if you have a conflicting rule anywhere, then this might not work - and it will be confusing and possibly hard to debug.

This Redirect also includes a trailing slash, whereas your original RewriteRule did not (well, it was optional).

And should I put this code in a separate .htaccess file inside the folder?


If this folder still exists then it does make sense (and is possibly "easier") to have a separate .htaccess file in the subfolder for this sort of thing. Especially if this is a separate 410 document just for this subfolder. However, having multiple .htaccess files can make a system more complex and harder to debug - just something to be aware of.

RewriteCond %{HTTP_HOST} ^domain\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.domain\.com$
RewriteRule ^ugly-folder-name\/?(.*)$ "http\:\/\/www\.domain\.com" [R=301,L]


This looks like the sort of carbuncle that cPanel generates?
6:59 pm on Nov 13, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


the sort of carbuncle

memo to self: memorize this phrase.

Jim, if the htaccess file is only used by one domain, then you don't need a RewriteCond looking at HTTP_HOST at all. If you need to exclude other domains and/or subdomains use the single line
RewriteCond %{HTTP_HOST} ^(www\.)?example\.com
The only reason the www. is optional is that this rule comes before the domain-name-canonicalization redirect. Otherwise you'd specify one or the other. No need for closing anchor except in the rare case where you've also got an exact-same-name.com.tld in the same place.

Edit: You could attach [NC] to the RewriteCond, but only if you particularly want to send the 410 to the rare unwelcome robot who asks for EXAMPLE.COM
2:03 am on Nov 14, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:512
votes: 5


the sort of carbuncle


That is a cool word.

Maybe I need to explain my problem further.

Visitors don't really access these pages. The pages have long been gone from Google's index. And there's few (if any) links that point to these old pages from my site. In short, "real people" don't attempt to visit the pages anymore.

Instead, the various scraper sites and such link to these older pages - and thus Googlebot follows the links from the scraper sites to the now non-existent pages on my site (I deleted these pages months ago). This has led to more than 1500+ 404 errors showing up in the Webmaster Tools info for my site.

Basically, I'm trying to "clean up" the mess in my Webmaster Tools area, to make it easier to find "real problems" should they arise.

My .httaccess file, however, is also a bit of a mess, too (with a size of about 110k). It works fine (I'm very careful editing it), but over 15 years it has grown as Ive converted pages from .html to .shtm then finally to .php, as well as having added/deleted pages during this time.

This has resulted in many 301 redirects.

RewriteCond %{HTTP_HOST} ^domain\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.domain\.com$
RewriteRule ^ugly-folder-name\/?(.*)$ "http\:\/\/www\.domain\.com" [R=301,L]

This looks like the sort of carbuncle that cPanel generates?

Yeah, precisely...which is why I modify the .htaccess file now by hand instead of through cPanel.

However, that above example is used in my .httaccess file (put on the very last line) to redirect all traffic (people and bots) from the directory that held my old vBulletin forum to my new Xen forum. My webhost helped me set it up. They inserted that particular redirect and everything works fine.

Since that redirect worked fine for redirecting traffic from my old forum to the new forum, I "assumed" I could copy and then modify that line to make it work for the other directories that held these old, dead pages. But alas...it hasn't proved that simple.
2:17 am on Nov 14, 2015 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 24, 2002
posts:512
votes: 5


I thought it might be helpful to include parts of my messy .htaccess file. So here it is:


RewriteOptions inherit
Options +Includes
RewriteEngine on
# -FrontPage-

# Re-write non-www to www
RewriteEngine On
RewriteCond %{HTTP_HOST} ^mydomain.com
RewriteRule (.*) http://www.mydomain.com/$1 [R=301,L]

IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
#The next line modified by DenyIP
order allow,deny
#The next line modified by DenyIP
#deny from all
allow from all
</Limit>
<Limit PUT DELETE>
order deny,allow
deny from all
</Limit>
AuthName www.mydomain.com
AuthUserFile /home/bigsky/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/bigsky/public_html/_vti_pvt/service.grp

AddType image/svg+xml svg svgz
AddEncoding gzip svgz

# Begin Cache Control

Header unset Pragma
FileETag None
Header unset ETag

# cache images/pdf/css docs for 1 Month
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif|svg|css)$">
Header set Cache-Control "max-age=2629000, public, must-revalidate"
Header unset Last-Modified
</FilesMatch>

# cache html/htm/xml/txt diles for 2 Days
<FilesMatch "\.(xml|txt|xsl|js|woff)$">
Header set Cache-Control "max-age=172800, must-revalidate"
</FilesMatch>

#End Cache Control

# compress text, html, javascript, css, xml:
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/htm
AddOutputFilterByType DEFLATE text/shtm
AddOutputFilterByType DEFLATE text/php
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript
#End Compression

(------ A Whole Pile of 301 Redirects Go Here ---------)
(and the last line of my .htaccess file is below)

RewriteCond %{HTTP_HOST} ^mydomain\.com$ [OR]
RewriteCond %{HTTP_HOST} ^www\.mydomain\.com$
RewriteRule ^phpbb2\/?(.*)$ "http\:\/\/www\.mydomain\.com\/xen\/" [R=301,L] (
5:15 am on Nov 14, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15932
votes: 887


Pro tip: Google stops crawling a lot faster if you return a 410 response. (Other search engines don't seem to care.) If no humans visit the page, you may not need to bother about a nice 410 page-- assuming nobody really follows those outdated links. It's well to make sure, because the Apache-default 410 page is even more intimidating than the Apache-default 404 page. In fact, one easy copout solution is

ErrorDocument 410 /missing.html

where "missing.html" means whatever (physical) document you currently use for your 404 page. Humans are not likely to care whether the page used to exist or not if it's already been gone for a long time.

Assuming you have other RewriteRules (and please do convert anything that currently uses mod_alias!) the overall order is:
-- escape clauses (things like your 403 page or robots.txt that have to be accessible by everyone)
-- access control (rules in [F])
-- 410's if any (rules in [G])
-- external redirects (rules in [R])
-- internal rewrites (rules in [L] or nothing)

So you can see where your ex-directory goes:

RewriteRule ^old-discontinued-directory - [G]
after access control, before any existing redirects. The [G] flag means 410, like [F] means 403. It carries an implied L. No need for a closing anchor; you're serving up an all-encompassing 410 for everything in the directory.

Unless you're a glutton for punishment, put all RewriteRules in the same htaccess file. (Did we establish that you're on 2.2, and/or that you're using the traditional mod_rewrite inheritance system?)