Forum Moderators: phranque

Message Too Old, No Replies

confusing apache log entries

just learning logs

         

jammera

3:05 am on Mar 21, 2014 (gmt 0)

10+ Year Member



hi all
i have a small site that i let trundle along for a long time till i started seeing weird amounts of bandwidth being used. the hosting company is really cheap and really can't help me analyze the logs. i got jannet.orgs apache viewer, and most of it i understand just fine except for some referrer entries-somehow ive got a couple hundred instances of a media file referring itself! is this some weird hijacking? is a site referring to itself and even media file "referring" themselves anywhere close to normal? i can't believe it is, since the ip noted in the line is not my server or my computer. i've been doing low level web stuff since 98 but i've never had to crawl into base server logs before.

lucy24

3:48 am on Mar 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Paste in some specimens so we can see what you're talking about. Auto-referers* are a tried and true robot trick, but I'm not sure that's what you are describing.


* When each page request names itself as referer. The idea is to bypass most referer blocks by pretending to come from your own site. It's especially (in)effective when they get your domain name wrong.

jammera

12:46 pm on Mar 21, 2014 (gmt 0)

10+ Year Member



ill have to go get the csvs i guess; but the phrase you used there, auto-referrer, sounds like exactly what's happening. what would a generic solution for those be?

lucy24

9:06 pm on Mar 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Unfortunately you can't block auto-referers as such-- at least not in Apache alone. They can be detected in an instant using any programming language; my own log wrangling uses javascript and they're flagged automatically as robots. But if you try to make a RewriteRule you slam straight into limitations on what can be on the left vs. what can be on the right.

My personal compromise is to make individual rules for a handful of very large files:

RewriteCond %{HTTP_REFERER} /hovercraft/april_blues\.html$
RewriteRule ^hovercraft/april_blues\.html - [F]


and so on. (Note that you can only do this if your navigation is structured so that no page ever does link to itself! Internal # fragment links don't count, because those are handled by the browser without server involvement.) If you find that only certain files are being hit over and over, this is a reasonable compromise. If there are lots and lots of them, you have to go to more complicated remedies.

One more possibility, though. You said "media files". Does that mean files that don't, themselves, link to anything-- you can link to them but not from them? Then you can make an unconditional referer block that looks something like this:

RewriteCond %{HTTP_REFERER} \.xtn
RewriteRule \.xtn$ - [F]


where ".xtn" means whatever extension your media files use. The only purpose of ".xtn" in the body of the rule is to keep the server from having to evaluate conditions on every single request, ever.

If they all live in certain directories, include that part in the body of your RewriteRule so the server doesn't have to read all the way to the end:

RewriteCond %{HTTP_REFERER} \.xtn($|/)
RewriteRule ^mediafiles/blahblah\.xtn - [F]


If these directories contain nothing but non-page files, you don't even need to spell out the full filename.

jammera

1:43 pm on Mar 22, 2014 (gmt 0)

10+ Year Member



"One more possibility, though. You said "media files". Does that mean files that don't, themselves, link to anything-- you can link to them but not from them?"
that's precisely it-the mp3s for my podcast.

"If these directories contain nothing but non-page files, you don't even need to spell out the full filename."

so if the only directories that contained the media files were \castfiles\xxx then i could put that path in
"RewriteRule ^mediafiles/blahblah\.xtn - [F] "?
sounds easy enough

lucy24

8:39 pm on Mar 22, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can then do the same in the body of the rule, so it becomes

RewriteCond %{HTTP_REFERER} /castfiles/
RewriteRule ^castfiles/ - [F]

meaning: whenever there's a request for a file living in /castfiles/ -- i.e. an mp3 file -- check the referer. And if the referer claims to also be in /castfiles/, block them on the spot.

You can shave a few nanoseconds by front-anchoring the referer, so it becomes

^http://(www\.)?example\.com/castfiles/

substituting the actual URLpath. Say https if that's what the files really use. Normally the (www\.) isn't optional. But auto-referers don't necessarily get the form right, so block them either way.

jammera

3:01 pm on Mar 25, 2014 (gmt 0)

10+ Year Member



ok, so that's pretty cool and i'll implement it soon but..what to these fools actually get by doing this? what's gained by this process of self referral

lucy24

8:39 pm on Mar 25, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The idea is to bypass referer-based blocks. Ordinarily your server or php supplement will look no further than "Oh, this is OK, he's already on the site so we don't need to check anything further." A close variation is to name your root as referer for all requests. In fact I've got a parallel lockout that says
RewriteCond %{HTTP_REFERER} example\.com/?$
RewriteCond %{REQUEST_URI} !index\.html
RewriteCond %{REQUEST_URI} !/blahblah/
RewriteRule ^([^/.]+/)+[^/.]+(\.html|/)$ - [F]

That is: "If there's a request for any interior file-- excluding 'index.html' and the /blahblah/ directory-- naming the root as referer, lock them out on the spot."

And, conversely:
RewriteCond %{HTTP_REFERER} example\.com/\w+\.(html|php)$
RewriteRule (^|\.html|/)$ - [F,NS]


Both of these are specific to my site. Since I have no direct links from the front page to interior pages other than the /blahblah/ directory, the request is obviously bogus. I also have no named pages in the top directory, and don't allow "index.html" as a visible URL, so "example.com/something.html" can't possibly be a real referer.

Referer-based blocks by their nature will be site-specific. Even my test site can't have the same rules, because that one does have top-level named pages.