Forum Moderators: phranque

Message Too Old, No Replies

Find where bad URL was accessed

Is there a way to determine where the bad URL originated?

         

netsites

4:21 pm on Nov 4, 2009 (gmt 0)

10+ Year Member



Whenever a site visitor gets a 404 error I log the bad referring URL.

Is there a way to determine from where the bad URL was accessed (another web site, or search engine)? Is it possible to capture this on the server (.htaccess or other means)?

jdMorgan

4:53 pm on Nov 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is the definition of the contents of the "HTTP Referer" header. So this question is confusing.

The logged "Referer" tells you where the bad requested URL was linked from, so are you sure that that is what you are logging, or are you confusing the requested (bad) URL with the referring URL?

Jim

netsites

7:33 pm on Nov 4, 2009 (gmt 0)

10+ Year Member



OK, I see what is happening. I was logging both the REQUEST_URI and HTTP_REFERER in the same string. I didn't realize the HTTP_REFERER was always blank in my log, which made me think the REQUEST_URI was actually the HTTP_REFERER.

I just tested it by adding a bad URL on another site and clicking the link to see it log correctly. Sorry for the confusion.

So now I'm wondering why I have so many 404 log entries without any HTTP_REFERER. Are these all produced by bad bots?

netsites

7:43 pm on Nov 4, 2009 (gmt 0)

10+ Year Member



Actually, there are some log entries where the REQUEST_URI is the same as HTTP_REFERER. How is this possible? Is this done with some sort of masking?

g1smd

11:39 pm on Nov 4, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Referrer doesn't exist if the browser or bot is accessing your site from a bookmark, or stored URL list.

Referrer doesn't have to be given by the UA, and in any case, is easily be faked. Additionally, some internet 'security' systems strip it out.

jdMorgan

12:33 am on Nov 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If the referer to a URL is the same as as that URL, and the page at that URL does not link to itself, then you're simply dealing with a scraper.

I block all such self-referring requests with a 403-Forbidden response. If that agent comes back again, then I'll block the IP address or their entire sub-net if it's a server farm... No time to bother with these leeches; my sites are for people to read and use, and that is all.

Jim