|Referrer same as request URL|
Are these invalid log entries?
I was analyzing my log file today, when I found 750 requests for a single page. Not actually a page, but a handler script that logs and redirects the user off my site. Of those 750 request, 748 were such that the referrer string was the same as the requested URL. Example:
22.214.171.124 - - [21/Nov/2011:22:53:50 -0500] "GET /cgi-bin/item.cgi?s=41&t=3&id=1617234 HTTP/1.0" 302 483 "http://www.mysite.com/cgi-bin/item.cgi?s=41&t=3&id=1617234" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) Opera 6.01 [en]"
When I look at all the item.cgi requests, they generally have a referrer that is a page on my site, or sometimes a yahoo email account or such.
Is it valid to have a page request like this where the URL is the same as the referrer?
There's a relatively recent thread on this in the SSID forum.
For a given definition of "valid". I kinda think robots do it to get around rules that begin by looking at requests with blank referers. If you're going to forge a referer, may as well just grab the name of whatever you're asking for. My Ukrainians have started doing this; it only took them a few months to figure out that any referer in their geographic region gets locked out at the gate. Darn. I really thought they were gone.
Should be a short-lived robot fad, though. Unlike blank referers, it would never happen with bona fide humans. Unless, ahem, you goofed in your own page code.
@wilderness, you wouldn't happen to have a link to that thread would you? I didn't find it right away. @lucy24, for sure it's not my code - the redirect script is plain and simple. Thanks for the info, I didn't know it, but it sure makes sense what you're saying. The interesting thing is this: (1) there are 120,000 such item.cgi links that could be formulated, (2) these types of false hits to the page were arriving 1 at a time every few minutes or even over 10's of minutes or more, and (3) for the 750 such hits I saw, about 400 of them were attributable to unique IP addresses! The biggest single IP count of the 750 was 24. That was what was confounding, b/c I would have expected a huge bunch from one IP or a bunch of hits every few seconds. And the same URL over and over! Not one after another in numerical sequence. Strange.
Look at the UA. My latest batch had to be a botnet, because they were all different IPs, but they all used, er, forged the same UA. I looked up all the IPs anyway. Found another half-dozen ranges from a region I block routinely, so it was worth it.
Is there a distinguishing feature to the URL they wanted? With me it's always my fattest file. Actually second-fattest, but the one that's slightly bigger is in a directory that robots seem to dislike. I don't mean statistically plumper, I mean that the HTML is bigger by an order of magnitude than most of my pages. (It's a MiSTing, if you must know.) So there's something they expect to find based on filesize.
>>Unlike blank referers, it would never happen with bona fide humans. Unless, ahem, you goofed in your own page code.
how about say if you are on the home page and you click the homepage logo/link, or if you have a navigation system where all the categories are shown down the left side, including the category page the user is actually on and they click that link.
... wouldn't that reload the page and send a self referencing referrer header
"Referrer equals request url" bot
Identify a bot where the referrer repeats the requested url
URI=REF is definitely not a short-lived fad.
I've observed literally tens of thousands of URI=REF hits and very, very rarely does a real user click one same-page directory'esque link. (And that single link appears to be an image map thing because it only happens in certain browsers.)
|how about say if you are on the home page and you click the homepage logo/link, or if you have a navigation system where all the categories are shown down the left side, including the category page the user is actually on and they click that link. |
I count that as a goof, because you don't need to link to the page you're already on. My current version shows the name of the page but instead of being an active link it's in boldface. (I do it manually, but I can't imagine it would be very hard to have php do the same thing.)
As a user it tends to make me feel stupid when I click on a link and all I get is the current page refreshing. Oh. Oops. Guess that means I was there already.