Using Rewrite, getting 404s - Apache Web Server forum at WebmasterWorld - WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

Using Rewrite, getting 404s

no www rewrite causing 404s?

braiiins

4:24 pm on May 30, 2008 (gmt 0)

10+ Year Member

I have been using a no-www rewrite rule in my .htaccess which usually works. Occasionally, though, there seem to be 404s (I have my 404s set to email me upon every error). I'm not sure if it is the rewrite code, but I can't imagine anything else causing it.

For example, today I go one saying, "A user tried to go to http://example.com/local-honey-allergy-remedy/images/printericon.gif and received a 404 (page not found) error. It wasn't their fault, so try fixing it." It shouldn't even be trying to load from /images, as it is set to load, in my css file, from /wp-content/themes/organic/images. When the 404 errors come in, they come in 10-15 at a time, so I assume it is from one viewer.

A weird one I got yesterday was trying to load "http://example.com/http://example.com/wp-comments-post.php" See the problem there?

Here is what I have in my .htaccess file (I've removed sections I know definitely are not causing the problem):


php_flag register_globals off
 
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
 
ErrorDocument 404 /index.php?error=404
 
Options All -Indexes

[edited by: jdMorgan at 3:32 am (utc) on May 31, 2008]
[edit reason] No URLs, please. See Terms of Service. [/edit]

jdMorgan

4:56 pm on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

"A user tried to go to http://example.com/local-honey-allergy-remedy/images/printericon.gif and received a 404 (page not found) error. It wasn't their fault, so try fixing it."

I hope you're not paying attention to the "it wasn't their fault" part of this message...

There's nothing in the code you posted that is going to work "only some of the time." The code is correct in all ways that matter, and will either always work or never work - depending only on whether mod_rewrite is enabled on this server.

The only minor errors I see are a missing esacpe on the literal period in the hostname pattern, an unnecessary RewriteBase directive, and two regex tokens that could be removed. The following RewriteRule should function identically to what you have there:


RewriteEngine on
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [L,R=301]

But none of this makes any difference to your problem.

It is more likely that these requests are coming from badly-coded scraper robots or from users trying to 'explore' the directory structure of your site, perhaps to find your image library for a mass-copy operation. A look at your server's raw log files and analysis of those users' transactions should determine whether this is the case.

If it's not malicious/suspicious user activity, then you may have a problem with WP or its database.

Jim

braiiins

5:39 pm on May 30, 2008 (gmt 0)

10+ Year Member

Ha, yes, I know it's not their fault. When I said "I assume it is from one viewer," I meant that only one viewer at a time is getting the error messages. I don't think that it is malicious (well, I hope not), just because the errors that come in at one time are all from the same page (the ones that came in with the first one I listed were all from the same post, all trying to load images from an image file which does not exist in the root).

The logs do show bots, but the ones I recognize as bots seem legit (GoogleBot, Yahoo Slurp, HostTracker). There are some logs I'm not sure of:

84.16.233.*** - - [30/May/2008:05:16:47 -0700] "GET /local-honey-allergy-remedy/trackback/ HTTP/1.1" 302 5 "-" "-"

There are a few that have no info like that one.

And it's the weirdest thing, my 404 error logs are showing an error for pretty much everything (although the viewer is not getting an error, and I am not getting the error emails)... "[Fri May 30 10:36:37 2008] [error] [client 69.89.31.210] File does not exist: /home/focusorg/public_html/green-car-insurance/"

I know for a fact it does exist (it is a post I just made a few minuts ago), as does pretty much everything that is showing up in the error logs...

I feel like I screwed something up big time and it's worrying me. I did move my WP install from /wp to the root, but I followed the instructions on the WP support site.

[edited by: jdMorgan at 5:59 pm (utc) on May 30, 2008]
[edit reason] Obscured possibly-private IP address. [/edit]

braiiins

5:47 pm on May 30, 2008 (gmt 0)

10+ Year Member

Here is the info on the visitor that got the particular recent 404s

Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en; rv:1.8.1.14) Gecko/20080409 Camino/1.6 (like Firefox/2.0.0.14)

jdMorgan

5:58 pm on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

That 84.16.233.x request in your access log is from a German ISP that is often used by scrapers.
The 69.89.31.x request in your error log is from a Web server -- Not a user.

Look at the URL-paths in your access log, and correlate them with the filepaths in your error log. Are both the URL-path and the filepath 100% correct? If so, then you've got a malfunctioning server.

Posting uncorrelated access/error log entries here doesn't help much. For example, why was the allergy remedy request redirected using a 302-Found response? In general, you NEVER want to use a 302 redirect, unless the URL has changed only temporarily and will be restored soon. But a 302 redirect is often the result of poorly-implemented code, and so they're common.

I don't have much else to recommend to you, except to say that you should keep digging until you find the problem. failing that, it might be time to back up your site, uninstall WP and everything related to it, and then reinstall clean, restoring your posts from your backup after testing to make sure this weird behavior has been fixed.

In the future, once established, don't ever change domains or change URLs unless there is a lawyer with a court order forcing you to do it. I mean this quite literally.

Jim

[edited by: jdMorgan at 6:02 pm (utc) on May 30, 2008]

braiiins

6:30 pm on May 30, 2008 (gmt 0)

10+ Year Member

Honestly, I don't know why it is using a 302 redirect. I have never set up a 302 redirect manually. I do notice that most of the 302s are when accessing my feed or on the trackback links for posts. Don't know if that means anything.

The paths all seem correct. Here is one corresponding error from the error log and from the access log:

[Fri May 30 10:04:39 2008] [error] [client 65.83.***.**] File does not exist: /home/focusorg/public_html/strange-ecofriendly-products/

65.83.***.** - - [30/May/2008:10:04:39 -0700] "GET /strange-ecofriendly-products/ HTTP/1.1" 404 33141 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

I apologize for not providing enough information, this is all still pretty new to me (as I'm sure you hear a lot!). I would understand if this problem were happening all of the time, it just boggles my mind why it is only happening occasionally.

Oh, and thank you for your help thus far, I really appreciate it.

jdMorgan

7:50 pm on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

So is "/home/focusorg/public_html/strange-ecofriendly-products/" the absolutely-correct filepath -- as perhaps visible using FTP or the "File manager" on your server? This is a directory listing of the directory /strange-ecofriendly-products/, requested with the URL "example.com/strange-ecofriendly-products/" right?

Jim

braiiins

11:29 pm on May 30, 2008 (gmt 0)

10+ Year Member

strange-ecofriendly-products is the path to a post, and since WP is in the root, /home/focusorg/public_html/strange-ecofriendly-products is the right path, I double checked with my file manager through cPanel.

Every post that gets loaded is logging a 404 error in the error log without giving the viewer any errors.

jdMorgan

11:41 pm on May 30, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Then it is likely that the problem is not the code in your .htaccess file, but rather code in the WP path, or the WP script itself.

Jim

braiiins

11:56 pm on May 30, 2008 (gmt 0)

10+ Year Member

Any idea why it is happening now, when I went a few days with no errors? Or what I should do to fix it?

braiiins

1:26 am on May 31, 2008 (gmt 0)

10+ Year Member

Just got confirmation from someone that trying to load my latest post resulted in a 404 on their side. I'm seriously considering backing everything up and starting from scratch, but I'm still looking for possible easier answers...

braiiins

1:54 am on May 31, 2008 (gmt 0)

10+ Year Member

Ok, so I added back the code the Permalinks options section of the WP admin wanted in my .htaccess


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

This stopped the constant 404 errors in the 404 log every time someone loads a post, but now it's back to giving me an error for files that aren't even loaded from my site. Example: "A user tried to go to http://example.com/eco-blog-carnival-volume/%slogolink.js and received a 404 (page not found) error. It wasn't their fault, so try fixing it." This is coming from a javacsript loaded from Blog Carnival, and this is the code to load it:


<script type="text/javascript" src="http://example.com/bc/logolink_20650.js"></script>

This 404 error doesn't get logged in the 404 logs, but it does get logged as a 404 in the access logs.

jdMorgan

2:24 am on Jun 1, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

So where is the "%s" in this string coming from? http://example.com/eco-blog-carnival-volume/%slogolink.js
... and why is the "bc/" path-part missing?

I'd call that a broken link, and return a 404, myself!

That's either a broken or malicious client, there.

Jim

braiiins

3:08 am on Jun 1, 2008 (gmt 0)

10+ Year Member

I have NO idea where it is coming from. The only time that JS is being called is with the script code I showed in my last post. What's odd is that, in spite of the 404, the JS is running the way it is supposed to. When I removed the IfModule rewrite code, it stopped returning 404s on that javascript code.

Nuts to this. I'm just going to remove it, it's just the rating thing for Blog Carnivals, it's not necessary!

Thank you for your help!

braiiins

4:51 pm on Jun 21, 2008 (gmt 0)

10+ Year Member

I hate to beat an old horse, but I am having problems... again. I didn't want to start up a new thread. If I should, let me know.

The 404s stopped for awhile (and are not showing in my error log), but I am getting the notifications in my email again.

The access logs are showing things like this:

[21/Jun/2008:09:20:42 -0700] "GET /images/clouds.jpg HTTP/1.1" 404 12510 "http://example.com/ecofriendly-friday-tips-volume/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"

It is trying to call images from the root that are not in the root (for example, this image is supposed to be /wp-content/themes/organic/images/clouds.jpg, not images/clouds.jpg). It is also dropping the full address for Google Analytics and appending my domain name (like this: http://example.com/ecofriendly-friday-tips-volume/google-analytics.com/ga.js )

All the 404s I am notified of seem to have GET in them. Here is one that has me completely lost:

http://example.com/eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+

which coincides with this in the access log:

[21/Jun/2008:06:53:31 -0700] "GET /eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+ HTTP/1.0" 404 27767 "http://example.com/eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; Deepnet Explorer)"

And happens once a day.

jdMorgan

5:34 pm on Jun 21, 2008 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

As I previously stated:

That's either a broken or malicious client, there.

Jim