homepage Welcome to WebmasterWorld Guest from 54.227.12.219
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Simple rewrite question: file rename
devil in details
Slud




msg:1497803
 7:12 pm on May 29, 2003 (gmt 0)

I've asked this question before, but I think I've boiled it down to the fundamentals.

Goal:
Make an old URL accessible via a new URL & redirect requests for old URL to the new one.

Approach:
#redirect from old to new
Redirect /old.html [server...]
#rewrite from new to old
RewriteRule ^new\.html$old.html[L]

Result:
Requests for old.html result in infinite loop of "302" requests for /new.html.

Notes:
Either line works fine by itself, but results in a loop when put together.

This seems like a *very* simple problem, but I haven't been able to find any combination of rules/flags that would achieve the goal.

 

jdMorgan




msg:1497804
 7:28 pm on May 29, 2003 (gmt 0)

Slud,

Since you have redirected old to new and new back to old, the infinite loop is the expected result.

The only line you need is this one:

Redirect 301 /old.html [server...]

-or-

RedirectPermanent /old.html [server...]

whichever you like. See Apache mod_alias [httpd.apache.org].

Jim

Slud




msg:1497805
 7:45 pm on May 29, 2003 (gmt 0)

Thanks (again) for the reply jd.

The 2nd line should be a re*write* (i.e. The contents of the "old" file are served up in response to a browser request for "new.html".) This is totally transparent to the browser. The server sends a 200 "OK" not a 302 "moved".

By itself

Redirect /old.html [server...]

results in a 404 because there is no "real" "new.html", only "old.html" exists in the file system.

jdMorgan




msg:1497806
 7:54 pm on May 29, 2003 (gmt 0)

Slud,

I'd suggest that you rename the file(s) and use only one redirect, then.

You could use the following, but any redirects done subsequent to the second rule -- by scripts, for example, will throw you into an infinite loop.


#redirect from old to new, using external redirect to force a new request, stop rewriting if rule matches.
RewriteRule ^old\.html$ http://server/new.html [R=302,L]
#internally rewrite from new to old in response to new request created by above rule.
RewriteRule ^new\.html$ /old.html [L]

I can't figure out why you want to do this, but there it is. I think renaming the actual file(s) and doing only one redirect will be much more "robust."

Jim

Slud




msg:1497807
 8:20 pm on May 29, 2003 (gmt 0)

RewriteRule ^old\.html$ [server...] [R=302,L]
RewriteRule ^new\.html$ /old.html [L]

It seems like the above would work, but it also results in an infinite redirect-request loop. (Apache 2.0.45 on Windows)

This thread [webmasterworld.com] explains some of the reason it wouldn't work out well to rename the file (or move many directories).

jdMorgan




msg:1497808
 8:49 pm on May 29, 2003 (gmt 0)

Slud,

> any redirects done subsequent to the second rule -- by scripts, for example, will throw you into an infinite loop.

Add to that any redirects done by .htaccess files in any subdirectories below the directory where this code is placed.

You might want to follow the progress of the rewrites step-wise using the WebmasterWorld server headers [webmasterworld.com] checker, and see where it goes wrong.

You might also want to check out the IS_SUBREQ environment variable for use with RewriteCond [httpd.apache.org].

Jim

Slud




msg:1497809
 9:35 pm on May 29, 2003 (gmt 0)

I played with [NS], but haven't tried IS_SUBREQ. I may give that a go before I go looking for some non Redirect/Rewrite workarounds.

Wizcrafts




msg:1497810
 5:57 pm on Jun 4, 2003 (gmt 0)

This topic is very close to a problem I am experiencing and need help to solve.

I have a series of files in a directory named "Testing" which are different file-sizes and were named accordingly, ie: 50kb_test_page.html, 100kb_test_page.html, 500kb_test_page.php, etc. They are meant to be called by links in a file in my root directory named "baudtest.html." They are not meant to be accessed directly as a point of entry into my website, are Javascript generated depending on the visitor's baudrate, and contain no information about the rest of my website.

I have learned that a SE problem with AOL Search has developed; AOL Search has ignored the Meta Robots="none" Exclusions on the test pages (and probably indexed the files before I created a robots.txt), and has indexed all of the test files in the "Testing" directory, which is causing AOLers to try to enter my website from an invalid POE. Furthermore, I have been re-coding the test files and have renamed them, so now I am seeing 404 errors on AOL search visitors looking for my renamed test pages.

What redirect codes should I write into my .htaccess to redirect all queries to files containing _test_page.html" to my root directory file "baudtest.html?" The options for the old file name prefixes are 50kb, 100kb, 250kb, 500kb, and 1mb, followed by "_test_page.html". I need to match those exact names because I still have files there with names like 250kb_test.html (without the _page) and 100kb_test_page.php, etc.

Really, I guess what I need is a line of code that will redirect any request for a file in the Testing directory that does not get referred by my /baudtest.html control file to be sent directly to it, along with the appropriate error code for the search engine to make it stop indexing files (and 404 renamed files) in the Testing directory.

TIA

jdMorgan




msg:1497811
 7:10 pm on Jun 4, 2003 (gmt 0)

Wizcrafts,

Slud's problem is one of recursion, and it is a tough one. Something is interfering with his rewrites, and causing them to be re-processed unexpectedly, leading to an infinite rewrite loop. Your problem bears only passing resemblance to his. However, you've got a tough problem too, so...

RewriteRule ^Testing/.*\_test\_page\.html /baudtest.html [R=301,L]
#
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://www\.yourdomain\.com/baudtest\.html$
RewriteRule ^Testing/.+ /baudtest.html [R=301,L]
#
RewriteCond %{HTTP_USER_AGENT} ^dloader\(NaverRobot\) [OR]
RewriteCond %{HTTP_USER_AGENT} ^(ETS\ v¦ExactSeek) [OR]
RewriteCond %{HTTP_USER_AGENT} ^FAST-WebCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^Fluffy\ the\ spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Gigabot/¦Googlebot) [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia\_archiver$ [OR]
RewriteCond %{HTTP_USER_AGENT} ^Lycos_Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^(MARTINI¦Mercator-) [OR]
RewriteCond %{HTTP_USER_AGENT} ^MicrosoftPrototypeCrawler [OR]
RewriteCond %{HTTP_USER_AGENT} ^(NationalDirectory-WebSpider¦NutchOrg) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Openfind\ data\ gatherer [OR]
RewriteCond %{HTTP_USER_AGENT} ^(polybot¦Pompos) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Robozilla/ [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Scooter¦Scrubby/¦Seeker¦SurveyBot¦Szukacz) [OR]
RewriteCond %{HTTP_USER_AGENT} (Slurp¦surfsafely) [OR]
RewriteCond %{HTTP_USER_AGENT} ^(Teoma¦\(Teradex\ Mapper¦T-H-U-N-D-E-R-S-T-O-N-E¦Tulipchain) [OR]
RewriteCond %{HTTP_USER_AGENT} ^Vagabondo/ [OR]
RewriteCond %{HTTP_USER_AGENT} (Zealbot¦Zyborg)
RewriteRule ^Testing/.+ /baudtest.html [R=301,L]

You'll need all three rules for best results; Not all browsers - and very few search engine spiders - will provide a referer. Therefore, this ruleset accomodates the browsers, and then exempts the robots from the referer requirement. Without the first RewriteCond, you could deny access to a large number of visitors even though they had come in through baudtest.html
However, this leaves a hole where some visitors will still be able to access your *new* test pages without a referer. There's not much you can do about that.

The complexity of the above "fix" should serve to illustrate a rule - always validate and publish your updated robots.txt before publishing any new page.

A further note on that complexity -- The fact that some lines are missing [OR] flags and that some user-agent patterns do not have start-anchors "^" on them is intentional and necessary. I sure hope I didn't put a typo somewhere else in the code, but those particular omissions are intentional. If you add more spiders, be very careful! Also, remember to replace the broken vertical pipe "¦" characters with solid vertical pipes from your keyboard.

Your comment about AOL ignoring <meta name="robots" content="none"> is interesting, since AOL uses Google search results. AFAIK, Google handles <meta robots> tags correctly - I have never had such a problem.

Ref: Introduction to mod_rewrite [webmasterworld.com]

HTH,
Jim

Wizcrafts




msg:1497812
 7:45 pm on Jun 4, 2003 (gmt 0)

Thanks Jim, again.

I'm not sure that I need to worry about the other spiders indexing my test pages. I've only see this problem when the search comes from AOL so far, but, as you know I have been known to be wrong before and may throw that set in also.

Can I just use these rules then and are they free-standing on my (getting rather large) .htaccess page?

RewriteRule ^Testing/.*\_test\_page\.html /baudtest.html [R=301,L]

RewriteCond %{HTTP_REFERER}!^$
RewriteCond %{HTTP_REFERER}!^http://www\.yourdomain\.com/baudtest\.html$
RewriteRule ^Testing/.+ /baudtest.html [R=301,L]

If so, are the multiple [L]s ok? I have a bad-bot 403 rewrite-rule-set above where these codes will go, which you assisted me with, and it ends in [F] for all excluded bots.
You have mentioned somewhere that there should only be one [L] in a rewrite so I am confused at your using it three times in your examples. Everything else makes sense to me and thanks again for sharing the mystic knowledge.

jdMorgan




msg:1497813
 8:22 pm on Jun 4, 2003 (gmt 0)

Wizcrafts,

> You have mentioned somewhere that there should only be one [L] in a rewrite

No, not me... I have said that [F,L] and [G,L] are redundant though, since [F] and [G] both stop rewrite processing and redirect immediately, as does [L].

The [L] at the end of a rule only applies if the rule matches and the rewrite action is taken. Unless you are using multiple-step rules, where the output URL of one rule is rewritten again by a following rule, then [L] should be included... That is to say, almost all the time for the applications we deal with the majority of the time here. If the output URL of a rule does not match the RewriteCond or RewriteRule patterns in any following rulesets, then there is no use in having the rewrite engine process the rest of the file, so including [L] is a good idea to speed things up.

You can use the rules I posted above stand-alone if you like, just be advised that SE spiders almost always *won't* have a referer, and so they will not be redirected, and will continue to try to spider your old test files.

The alternative is to remove the first RewriteCond, which allows blank referers to access the test files, without requiring them to be referred by baudtest. If you do that, then perhaps 25% of your visitors won't be able to get to the test files at all - they will be repeatedly redirected to baudtest.html. Anyone coming in through a corporate or ISP proxy, or using internet security software will be vulnerable.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved