Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- error logs showing URL with garbage added


jdMorgan - 4:14 am on Sep 5, 2009 (gmt 0)


> What could append an attribute tag after an URL?

A fake "browser" -- that is, a badly-coded 'bot could do this.

Load your "referring page" in a browser, then use the 'Save page as' option to save the HTML, and then validate it. If nothing turns up --such as a missing end-quote or closing ">" on a link-- then you may be dealing with a distributed scraper.

Another possibility is that this may be caused by an ISP that has deployed some sort of 'accelerator' in their network, and it is buggy. Check the WHOIS on these IP addresses to see if these visitors are all using the same ISP or if their ISPs are all using the same 'backbone' provider.

If you have the capability, you might want to log all of these so-called browsers' HTTP request headers and compare them to those of your own real browsers. It wouldn't surprise me if they didn't match. If you can, include a small PERL or PHP script in your 404 error page to log (at least) the HTTP Accept, Accept-Language, Accept-Encoding, Connection, X-Forwarded-For, and Via headers to a file. Then by intentionally typing a bad URL on your own domain into your browser, you can collect a 'reference set' of headers from your own real browsers to compare against the headers from these suspect visitors.

If you've got server config access, then you could do this logging using 'conditional logfiles' instead of a script -- See the Apache mod_log_config docs for details.

Jim


Thread source:: http://www.webmasterworld.com/apache/3984474.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com