Forum Moderators: phranque
For example, today I go one saying, "A user tried to go to http://example.com/local-honey-allergy-remedy/images/printericon.gif and received a 404 (page not found) error. It wasn't their fault, so try fixing it." It shouldn't even be trying to load from /images, as it is set to load, in my css file, from /wp-content/themes/organic/images. When the 404 errors come in, they come in 10-15 at a time, so I assume it is from one viewer.
A weird one I got yesterday was trying to load "http://example.com/http://example.com/wp-comments-post.php" See the problem there?
Here is what I have in my .htaccess file (I've removed sections I know definitely are not causing the problem):
php_flag register_globals off
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^example.com$ [NC]
RewriteRule ^(.*)$ http://example.com/$1 [L,R=301]
ErrorDocument 404 /index.php?error=404
Options All -Indexes
[edited by: jdMorgan at 3:32 am (utc) on May 31, 2008]
[edit reason] No URLs, please. See Terms of Service. [/edit]
I hope you're not paying attention to the "it wasn't their fault" part of this message...
There's nothing in the code you posted that is going to work "only some of the time." The code is correct in all ways that matter, and will either always work or never work - depending only on whether mod_rewrite is enabled on this server.
The only minor errors I see are a missing esacpe on the literal period in the hostname pattern, an unnecessary RewriteBase directive, and two regex tokens that could be removed. The following RewriteRule should function identically to what you have there:
RewriteEngine on
RewriteCond %{HTTP_HOST} !^example\.com$ [NC]
RewriteRule (.*) http://example.com/$1 [L,R=301]
It is more likely that these requests are coming from badly-coded scraper robots or from users trying to 'explore' the directory structure of your site, perhaps to find your image library for a mass-copy operation. A look at your server's raw log files and analysis of those users' transactions should determine whether this is the case.
If it's not malicious/suspicious user activity, then you may have a problem with WP or its database.
Jim
The logs do show bots, but the ones I recognize as bots seem legit (GoogleBot, Yahoo Slurp, HostTracker). There are some logs I'm not sure of:
84.16.233.*** - - [30/May/2008:05:16:47 -0700] "GET /local-honey-allergy-remedy/trackback/ HTTP/1.1" 302 5 "-" "-"
There are a few that have no info like that one.
And it's the weirdest thing, my 404 error logs are showing an error for pretty much everything (although the viewer is not getting an error, and I am not getting the error emails)... "[Fri May 30 10:36:37 2008] [error] [client 69.89.31.210] File does not exist: /home/focusorg/public_html/green-car-insurance/"
I know for a fact it does exist (it is a post I just made a few minuts ago), as does pretty much everything that is showing up in the error logs...
I feel like I screwed something up big time and it's worrying me. I did move my WP install from /wp to the root, but I followed the instructions on the WP support site.
[edited by: jdMorgan at 5:59 pm (utc) on May 30, 2008]
[edit reason] Obscured possibly-private IP address. [/edit]
Look at the URL-paths in your access log, and correlate them with the filepaths in your error log. Are both the URL-path and the filepath 100% correct? If so, then you've got a malfunctioning server.
Posting uncorrelated access/error log entries here doesn't help much. For example, why was the allergy remedy request redirected using a 302-Found response? In general, you NEVER want to use a 302 redirect, unless the URL has changed only temporarily and will be restored soon. But a 302 redirect is often the result of poorly-implemented code, and so they're common.
I don't have much else to recommend to you, except to say that you should keep digging until you find the problem. failing that, it might be time to back up your site, uninstall WP and everything related to it, and then reinstall clean, restoring your posts from your backup after testing to make sure this weird behavior has been fixed.
In the future, once established, don't ever change domains or change URLs unless there is a lawyer with a court order forcing you to do it. I mean this quite literally.
Jim
[edited by: jdMorgan at 6:02 pm (utc) on May 30, 2008]
The paths all seem correct. Here is one corresponding error from the error log and from the access log:
[Fri May 30 10:04:39 2008] [error] [client 65.83.***.**] File does not exist: /home/focusorg/public_html/strange-ecofriendly-products/
65.83.***.** - - [30/May/2008:10:04:39 -0700] "GET /strange-ecofriendly-products/ HTTP/1.1" 404 33141 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
I apologize for not providing enough information, this is all still pretty new to me (as I'm sure you hear a lot!). I would understand if this problem were happening all of the time, it just boggles my mind why it is only happening occasionally.
Oh, and thank you for your help thus far, I really appreciate it.
Jim
Every post that gets loaded is logging a 404 error in the error log without giving the viewer any errors.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
This stopped the constant 404 errors in the 404 log every time someone loads a post, but now it's back to giving me an error for files that aren't even loaded from my site. Example: "A user tried to go to http://example.com/eco-blog-carnival-volume/%slogolink.js and received a 404 (page not found) error. It wasn't their fault, so try fixing it." This is coming from a javacsript loaded from Blog Carnival, and this is the code to load it:
<script type="text/javascript" src="http://example.com/bc/logolink_20650.js"></script>
This 404 error doesn't get logged in the 404 logs, but it does get logged as a 404 in the access logs.
Nuts to this. I'm just going to remove it, it's just the rating thing for Blog Carnivals, it's not necessary!
Thank you for your help!
The 404s stopped for awhile (and are not showing in my error log), but I am getting the notifications in my email again.
The access logs are showing things like this:
[21/Jun/2008:09:20:42 -0700] "GET /images/clouds.jpg HTTP/1.1" 404 12510 "http://example.com/ecofriendly-friday-tips-volume/" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11"
It is trying to call images from the root that are not in the root (for example, this image is supposed to be /wp-content/themes/organic/images/clouds.jpg, not images/clouds.jpg). It is also dropping the full address for Google Analytics and appending my domain name (like this: http://example.com/ecofriendly-friday-tips-volume/google-analytics.com/ga.js )
All the 404s I am notified of seem to have GET in them. Here is one that has me completely lost:
http://example.com/eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+
which coincides with this in the access log:
[21/Jun/2008:06:53:31 -0700] "GET /eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+ HTTP/1.0" 404 27767 "http://example.com/eco-blog-carnival-volume/++GET+http://example.com/eco-blog-carnival-volume/+%5B0,21110,34683%5D+-%3E+" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; Deepnet Explorer)"
And happens once a day.