homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Rewrite rule working fine tons of 404s
same urls are rewritte just fine in browser but server shows error log
parorrey




msg:4040644
 1:29 pm on Dec 10, 2009 (gmt 0)

Hi Jim,

here's the problem..

Rewritten urls are working fine and urls redirect properly where they should.. only problem is that I see error log getting bigger, longer everyday for the same urls which open perfectly fine in browser window.

example is

http://www.example.com/section

It should redirect to http://www.example.com/section/ and it does and page opens fine. but server shows 404 error hit.

Another example is http://www.example.com/section, and it just redirects properly but it still shows 404 errors in server logs.

What's worse, I think this specific issue has caused drop in traffic, although problem can be something else but traffic drop and error log for these urls started around the same time.

p.s: It also started around the same time when website was shifted to VPS hosting. Can that cause this? or anything missing on VPS which might have been available on shared hosting env?

I'm at lost here.

Thanks in advance.

[edited by: jdMorgan at 2:29 pm (utc) on Dec. 10, 2009]
[edit reason] example.com [/edit]

 

jdMorgan




msg:4040676
 2:30 pm on Dec 10, 2009 (gmt 0)

We can't likely make very useful comments without seeing a code sample.

You can verify that the 404 status that you see in your logs is accurate by using a server headers checker such as the "Live HTTP Headers" add-on for Firefox/Mozilla browsers (or similar).

Jim

parorrey




msg:4040710
 3:42 pm on Dec 10, 2009 (gmt 0)

Hi,

Just checked with Live HTTP Headers Add-on & I can confirm that status is 200, there's no 404 issue while the urls are browsed, but server shows some of these hits in error log albeit with a slightly different url like without trailing slash or ending with a comma although there's no mal-structured url on the website.

here's some sample code

Options -Indexes

RewriteEngine on

#for all mal-structured urls
RewriteRule ^(.+)[^0-9a-z/]$ http://www.example.com/$1/ [NC,R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_URI} !(\./$)
RewriteRule (.+) http://www.example.com/$1/ [R=301,L]

RewriteRule ^([a-z0-9\-]+)/$ /landing-page.php?%2uri=$1 [L]
RewriteRule ^(([a-z\-]+)/[a-z0-9\-]+/)$ /landing-page.php?uri=$1&tag_link=$2 [L]

RewriteRule ^(([a-z\-]+)/)([0-9]+)/$ /landing-page.php?uri=$2&lm_offset=$3 [L]
RewriteRule ^(([a-z\-]+)/[a-z0-9\-]+/)([0-9]+)/$ /landing-page.php?uri=$2&tag_link=$1&lm_offset=$3 [L]

jdMorgan




msg:4040741
 4:45 pm on Dec 10, 2009 (gmt 0)

The 'bad' URLs you cite should be handled by rule #1, which removes any trailing character except for alphanumerics or slash. As such, they should not log a 404, but should show a 301-Moved Permanently redirect status.

You're not providing a sufficient level of detail here. If URLs are 'bad,' please give a few specific examples. Also post the corresponding server access *and* server error log entries showing these same requests, and report your browsers behavior when you manually request them.

Malformed URLs of the type you describe often comes from forum, blog, and wiki posts that include trailing punctuation, such as "Check out this site at [forum.example.com!"...] where the "linking" function of the forum/blog/wiki software does not validate the URL and remove such punctuation (Hover over this link and look at your browser's status bar to see the problem). So seeing them in your logs doesn't necessarily mean that the bad links are on your own site.

I can't see anything specific to your problem in your code, but the following problems need to be corrected:

Rule #3 should precede rule #2 to avoid a double redirect if a URL is requested from the non-canonical hostname *and* is missing a trailing slash. The rule-of-thumb is, Put all of your external redirects first, in order from most-specific patterns and conditions to least-specific, followed by your internal rewrites, again in order from most- to least-specific. This avoids multiple.stacked/chained redirects, prevent internal filepaths from being 'exposed' to the clients as URLs, and prevents unexpected rule matches and resulting malfunctions.

Rule #4 contains an undefined back-reference to the "%2" variable. If you want a literal "%2" in the query string, then escape the percent sign with a leading "\" and use the [NE] flag in addition to [L].

Rule #6 contains an unneeded set of parentheses, since $1 is never back-referenced. Assuming that you didn't intend to define and include "tag-link=$1" in the query passed to your script (as done in rule #7), that rule can be re-coded as
RewriteRule ^([a-z\-]+)/([0-9]+)/$ /landing-page.php?uri=$1&lm_offset=$2 [L]
for better efficiency.

Jim

parorrey




msg:4040793
 6:18 pm on Dec 10, 2009 (gmt 0)

Thanks for the corrections for rules. I've made them in my htaccess.

however, my original problem persists.

I've checked Google webmaster tools too, xml sitemaps and urls getting indexed properly, crawler is good too. few errors there but that's normal.

I'm running XENU at the moment and it's also reporting all 200 status messages.

A specific example of the problematic url is:

[Thu Dec 10 18:12:47 2009] [error] [client 119.73.**.97] File does not exist: /home/*account*/public_html/people-parties, referer: http://www.example.com/people-parties/

Another :

[Thu Dec 10 18:14:28 2009] [error] [client 119.73.**.97] File does not exist: /home/*account*/public_html/living-lifestyle, referer: http://www.example.com/living-lifestyle/

As you can see, both of these urls are rewritten properly, and even trailing comma is taken care of by first rule but server still registers it as 404.

If these urls are opened http://www.example.com/living-lifestyle/, they work just fine, being cached properly in google, crawled successfully.

Thanks for your help.

[edited by: jdMorgan at 7:04 pm (utc) on Dec. 10, 2009]
[edit reason] Obscured IP address for security. [/edit]

jdMorgan




msg:4040822
 7:05 pm on Dec 10, 2009 (gmt 0)

Please note that I asked you to post both the access log and the error log for each 'bad URL' request. I can't help efficiently without full information, and I don't have enough time or any inclination to help inefficiently.

We need to see the access log entry and the error log entry for the same request, because the access log shows the client request with the requested URL-path, HTTP method, and server status response, while the error log shows the filepath to which that URL-request was resolved. Thus, we can only get a before-and-after view of the URL-to-filepath rewrite if you post both.

Servers are complex, so every detail is important. I try very hard to ask for exactly what is needed; Nothing more, nothing less. I may write a lot of words, but I try very hard to make sure that each one is important...

All I can see from the error log is that the filepath "living-lifestyle" apparently did *not* get a trailing slash appended to it by your first or hopefully-now-second rule, and as such would not have been rewritten to landing-page.php by your fourth rule. But I'm looking at a filepath there, and I need to see the requested URL, as otherwise it's impossible to tell whether that's a result of a bad or missing rule, or the interaction of two or more rules, or the interaction of mod_rewrite with one or more other Apache modules.

Jim

g1smd




msg:4040920
 8:48 pm on Dec 10, 2009 (gmt 0)

If these URLs are opened http://www.example.com/living-lifestyle/, they work just fine

What happens if you request http[b]://e[/b]xample.com/living-lifestyle/, or htt[b]ps://[/b]www.example.com/living-lifestyle/, instead?

parorrey




msg:4041170
 6:12 am on Dec 11, 2009 (gmt 0)

@ g1msd, http://example.com/living-lifestyle/, resolves to http://www.example.com/living-lifestyle// & page displsys.

Jim, I'll check the access log and post it shortly. I really appreciate your time and help.

g1smd




msg:4041240
 9:39 am on Dec 11, 2009 (gmt 0)

You're serving Duplicate Content if the URL with // at the end 'exists'.

There's multiple problems in the rules that will need fixing.

parorrey




msg:4041375
 3:09 pm on Dec 11, 2009 (gmt 0)

I meant by 'exists' or 'displays' that it showed a custom error page not the page what it was supposed to show as original intended url.

Can you please suggest some any correction in the rules. Thanks.

jdMorgan




msg:4041394
 3:38 pm on Dec 11, 2009 (gmt 0)

> Can you please suggest some any correction in the rules. Thanks.

We need a solid definition of the problem first. I have asked for very specific 'data' above. Otherwise, we will likely end up giving you the right solution to the wrong problem. You have the necessary data. We do not.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved