Welcome to WebmasterWorld Guest from 54.211.86.24

Forum Moderators: Ocean10000 & incrediBILL & phranque

Files with %20 in name overwhelming error log

Looking for way to eliminate the error vis htaccess

   
11:42 pm on May 19, 2013 (gmt 0)

10+ Year Member



I'm helping a friend's web site that has a large number of blog entries. The problem I'm having is that whoever set it up originally allowed the use of spaces in the filenames of the jpg's. There are over 20,000 jpg's on the site that have names like "picture of tree.jpg".

The problem comes from $_SERVER['REQUEST_URI'] adding %20 to the spaces. The server displays the correct file for the request but every single jpg request is showing up in the error log as "File does not exist: /path/picture%20of%20a%20tree.jpg" even though the proper file is displayed.

I know the right solution is to rename the jpg's but there are so many plus the links to them that it would be overwhelming. Is there a mod rewrite I can use in htaccess to stop these requests from ending up in the error log? There are so many "File does not exist" entries that it's nearly impossible to find other errors.
2:14 am on May 20, 2013 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



The problem comes from $_SERVER['REQUEST_URI'] adding %20 to the spaces.

Splitting hairs here: This is not strictly accurate. All characters except alphanumerics and a very short list of others are "escaped" (percent-encoded) in transit. Not adding but replacing.

Yes, you can do almost anything you like in mod_rewrite. But not until you've precisely identified the problem. They don't call it the Swiss army knife of Apache for nothing; there are many many different ways to hurt yourself.

#1 The end user's browser asks for filenames in the form it sees them in the page source. Does the page itself say {literal space} or %20? If the pages are generated by php, look at the final text that the user sees, not at your original code.

#2 According to your post there's a difference between error logs and observed behavior. First it can't find the file, then it displays the file it just got through saying doesn't exist. So there has to be a missing piece. What do you see in the access logs? Does each image request come in pairs-- first the 404'd version and then a second request that results in successfully serving the file?

Incidentally, renaming files is not a huge problem, especially if they're all collected in just a few directories. But then you'd have to redirect all the search engines coming in asking for the old nameform-- or resign yourself to a temporary flurry of 404s. (Don't know about the others, but g### doesn't make much fuss about missing image files. Certainly nothing like pages, where it comes back sporadically for years.)
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month