|error logs showing URL with garbage added|
I recently noticed a bunch of errors in my cpanel error log that show URLs formatted something like this:
The correct URL paths should be (respectively):
What can cause the "width=0" or "(null)" to be added to a path when the referring page links to the correct path?
The spurious "width=0" sounds like a bad (badly-typed, and possibly-hidden) HTML image link. Actually it could be a link contained in any HTML element that allows a width attribute, so I'm just guessing here.
The (null) sounds like something we were discussing over in the WebmasterWorld Search Engine Spider Identification forum -- possibly an IE toolbar or plug-in, etc. It generates requests for /(null), and that's about all we know about it.
If the traffic is important and you cannot get the source link corrected, then 301-redirect these bogus URLs to the correct URLs. If the traffic is negligible, then let the requests go 404 as they should.
Thanks for the response Jim. I've dug a little deeper into the problem of width=0.
It appears that the referring URL is coming from my own site, however, I've triple checked my link code and there are no badly formed links on the page.
These pages receive thousands of visitors per day. I've been trying to find a pattern of users that have this error generated. At first, I thought it was just IE 8 browsers that may be doing something quirky, but then I saw a couple of other browsers in the list as well.
A consistent factor is that IPs that are having the error generated are all from Australia - real traffic that will be heavy through Sunday. Numerous individual IPs are involved and numerous referring pages from my own site that supposedly include "width=0" at the end of the URL. I've tested in Firefox, IE and Safari for the same pages and don't generate an error. Australian visitors generating errors are using IE8, IE7 and Firefox.
Viewing the source code of these referring pages shows nothing amiss. What could append an attribute tag after an URL?
> What could append an attribute tag after an URL?
A fake "browser" -- that is, a badly-coded 'bot could do this.
Load your "referring page" in a browser, then use the 'Save page as' option to save the HTML, and then validate it. If nothing turns up --such as a missing end-quote or closing ">" on a link-- then you may be dealing with a distributed scraper.
Another possibility is that this may be caused by an ISP that has deployed some sort of 'accelerator' in their network, and it is buggy. Check the WHOIS on these IP addresses to see if these visitors are all using the same ISP or if their ISPs are all using the same 'backbone' provider.
If you have the capability, you might want to log all of these so-called browsers' HTTP request headers and compare them to those of your own real browsers. It wouldn't surprise me if they didn't match. If you can, include a small PERL or PHP script in your 404 error page to log (at least) the HTTP Accept, Accept-Language, Accept-Encoding, Connection, X-Forwarded-For, and Via headers to a file. Then by intentionally typing a bad URL on your own domain into your browser, you can collect a 'reference set' of headers from your own real browsers to compare against the headers from these suspect visitors.
If you've got server config access, then you could do this logging using 'conditional logfiles' instead of a script -- See the Apache mod_log_config docs for details.
I am looking after a website which has a predominantly Australia user base and have noticed this issue occurring on our website since late September. We are yet to identify the cause of this issue, but similar to yourself cannot pin this issue to any specific code on our site.
Did you ever find a cause for this issue?
No, I never found an issue on my end. Links all checked out fine - went through everything with a fine tooth comb.
All errors are being generated from IPs originating from Australia. I keep checking webmaster tools to see if there are any repercussions. So far, nothing detrimental showing up there. My guess is that there is some kind of proxy server or accelerator involved (that doesn't get indexed) that is adding the width=0 parameter to the url that causes the 404 error. It still drives me crazy - let me know if you come up with a better answer. :)
You could always set up detection for that type of request and serve a 301 redirect for it.