Forum Moderators: phranque

Message Too Old, No Replies

Get Rid Of The Turkeys Who Use

XXX:+++++++++++ As A REFERER?

         

jim_w

4:02 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I searched here, but I couldn’t find anything on this.

Can I do something like…

RewriteCond %{HTTP_REFERER} XXX:+++++++* [OR]

To get rid of those turkeys who use…

XXX:+++++++++++, etc.

And are now using

XXXX:+++++++++++, etc.

They are really screwing up my charts in my backup log.

I mean come on, XXXX:+++++++++++ is more than sufficient

Birdman

4:07 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, those turkeys could be potential clients..I wouldn't write them off so quickly. I believe that's just some custom browser, but a real person just the same. I don't mind them at all :)

Birdman

jim_w

4:13 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There not customers. They are information seekers. And even if they would be potential customers, they are the kind that create more support problems than their worth. Been there did that.

I heard it was from a router.

jdMorgan

4:27 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jim_w,

I agree with Birdman that you should not block based soley on this referer, since it comes from a common proxy server in use at many corporations. The user often has no control over whether this "security" feature is turned on. But for an academic exercise:


RewriteCond %{HTTP_REFERER} ^X{3,4}:(\+){2,}$ [OR]

Should take care of 3 or 4 "X" characters followed by colon and then two or more "+" characters. The parentheses around "\+" may not be necessary; I just threw them in to be safe. The "\" preceding "+" is required, however, because "+" is a special character for use in regular expressions, and it must be escaped with the preceding backslash.

Jim

jim_w

4:42 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perfect Jim. Just what I needed. I need to dig into that Latin looking stuff deeper.

Everyone that I ever had, and I get about 2 a month, are all on ISP’s. I’ve had hits from every Fortune 1000 company and none have use that.

It goes in to my AXS log and then the chart and percentages are out in rightfield. Then I have to edit the bloody thing every month. Whom ever is doing it should realize that it is annoying to some, and using nothing or a ‘-‘ would still get the job done.

Muchas gracias

jdMorgan

4:49 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jim_w,

Well, that was my point. The proxy authors used X's and +'s to mask the real referer. The actual user may not even know it's happening. Most have never seen a log file and never will.

If you want to redirect them to a special page using that RewriteCond above, you can. But of course, at least the one original request (complete with annoying referer string) will still be in your logs. :(

But it's up to you, of course.

One other thing... If it doesn't work, it may be because those "+" characters are really spaces. Just as "-" is logged for a blank referer or user-agent, "+" is often used in logs for space characters.

Jim

grahamstewart

4:50 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



So why don't you use a regular expression to edit it out of your logs - rather than just blocking people who have decided that they don't particularly like telling you what website they have just been looking at.

You could easily write a script to erase the entries from the log, or any decent text editor should be able to handle a regular expression search and replace.

jim_w

5:16 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They do get redirected to an html file. I'll add the HTTP_REFERER to the html and tell them.

I don't care that it is in my assess_log, I have another log that I filter out SE bots, etc. and I don't want it to show there brcause it makes a real mess out of it.

could easily write a script to erase the entries from the

Yea, in my spare time. It's more of a principle thing. It’s not required and to say it is overkill would be an understatement. They are making a statement, and so am I.

If they stop making theirs, mine automatically goes away.

Chris_R

5:24 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If people went to as much effort to get people to their sites as they do in blocking people to their sites - the world would be a better place.

Well not really.

However, be careful who you block - you might want them in the future. You never know when some person who is paranoid is paranoid for a reason - or when some company like alexa (which is listed in some of the master block lists) might be incorporated in a metric you care about. This is not to say you shouldn't block the truly abusive, but otherwise - I just don't get it.

I was recently "hired" more or less to help increase the SE traffic to this company's site.

It looked to me at first inspection his site was blocked by google. On further inspection - the geniuses he hired before me (for webdesign) - had BOTH a robots.txt and metatags on EVERY page blocking all search engines.

$10,000 later and 30 days later - he is in google.

Ok I am kidding about the 30 days....

and the $10K - I turned the job down.

Brett_Tabke

5:32 am on May 2, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



It was webmasters looking the other way that allowed products such as Gator to worm their way into the market place.

Blocking referrals is fine. It is when users start falsifying them, that action must be taken.

Unless someone can specifically name a product that uses this referer, I still maintain it is a server exploit bot that is being used. It isn't a proxy issue, a privacy issue, it is a site security issue.

I block any and all ips coming in with the xxx:+++ referrer. Give these types of products no quarter.

jim_w

5:38 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, we'll find out 'cause I put that sucker in.

Ooh Nooo another Browser Nazi

dmorison

7:09 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is there a good reason why this proxy server changes the referrer to something like XXX:++++++++++++++ as opposed to an empty string?

grahamstewart

9:40 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, if it just omitted the Referer header then it would have to recalculate the content length of the message.

So to avoid hassle it just overwrites it with spaces instead, which in turn usually get logged as +.

You can scratch ZoneAlarm.

Ah but do they have ZoneAlarm Pro and do they have the privacy options cranked up?

jim_w

10:46 am on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



grahamstewart

So why don't you use a regular expression to edit it out of your logs - rather than just blocking people who have decided that they don't particularly like telling you what website they have just been looking at.

I don't have a problem with ppl not telling me where they've been, I have a problem with the way they are doing it. I have a problem with it costing me more work to fix the problem. It's over kill and not required. It fills my logs and takes bandwidth when a simple [Generic Search Engine] or something like that would do. 80 +'s is unacceptable.

Yeah, if it just omitted the Referer header then it would have to recalculate the content length of the message

Sounds like poor coding practices to me. But I’m not quite sure what you’re taking about.

Ah but do they have ZoneAlarm Pro and do they have the privacy options cranked up?

I don't know I'll have to check.

grahamstewart

1:54 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It is pretty poor coding - but it keeps the process fast, which is what the users want.

The reason you get 80 +'s is because the Referer URL was 80 bytes long and it has been overwritten with spaces.

Thats not so unreasonable.
[webmasterworld.com...]
is 78 bytes long.

If they simply removed the header or they replaced it with [Generic Search Engine] or whatever, then they would also have to calculate a new value for the Content-Length header.

It fills my logs and takes bandwidth

It should take up exactly the same bandwidth and possibly the same amount of space in your logs (depending how they work).

jim_w

2:12 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a second log that only has pages. No .js or css, no images, etc. and I control what pages go into it via…

<!--#exec cgi="cgi-bin/log.cgi" -->

I put the above in each page I want in it. This way I can quickly look and see ‘evil doers’ and who’s eyeballs are actual on the page.

I have a bot trap that just says no entry because you look like a bot, smell like a bot and ergo must be a bot. It will now also say your HTTP_REFERER is unacceptable, get some software that isn’t so annoying, or change your HTTP_REFERER to something less than 15 characters.

<edit>log.cgi also has a filter so that I can filter SE IP's, etc.</edit>

[edited by: jim_w at 2:29 pm (utc) on May 2, 2003]

korkus2000

2:16 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is an earlier thread about it.
[webmasterworld.com...]

I don't like it. The developers of the software or proxy should just write private or something like that. Seems suspicious to me.

jim_w

2:16 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



is 78 bytes long.

Can we say proportional spacing? They should have used l it would have been much smaller.

They used the some of the widest characters they could find to make a ststement. So am I, making a statement also.

TomWaits

2:26 pm on May 2, 2003 (gmt 0)

10+ Year Member



This must be going over my head, because we receive a number of valuable buyers who have the XXX+++ referrer. Has never bothered me. What do I care if I don't get complete information, I got a buyer.

Heck if visitors knew even a quarter of the stuff we know about them, we'd have far fewer customers.

jim_w

2:32 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Tom

We must be selling different things. None of these people ever purchased anything and have saved my site for off line viewing, etc. I don't recall them even going to the price page let alone the purchase page. The last one just went straight to an article I had written.

I don't care where they came from. It's the statement they are making. What they are saying could just as well be said with N/A. And that is causing me extra work for nothing.

korkus2000

2:45 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thats my point also jim_w, they could have chosen a different inconspicous string. Instead they are calling attention to their visit.

jim_w

2:51 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And....

If they simply removed the header or they replaced it with [Generic Search Engine] or whatever, then they would also have to calculate a new value for the Content-Length header

If the string was 'N/A' it would only take milliseconds to recalculate it. It is either some of the poorest coding I have ever seen, (and I God knows I have written my fair share of poor code, yesteryear), or it is something else.

jdMorgan

3:34 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Back before all the hoopla started in this thread, I commented that I would not block based on this referer alone.

I think it is each individual webmaster's right to block anybody they want to block, which is why I posted the code. However, in this case, I personally rely on a variant of key_master's bad-bot trap script [webmasterworld.com] to catch the real troublemakers, and ignore the fact that this UA makes a messy log entry. That's what I'd recommend to stop site downloaders, but everyone's needs are different.

Jim

jim_w

8:34 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don’t think I’m comfortable with changing the .htaccess on the fly, (besides the fact that a fly is too small), it wouldn’t take much to get it all fubar’ed. (Fouled Up Beyond All Repair) Are you talking about, (if so good idera), or is there a way to change the HTTP_REFERER on the fly without writing the .htaccess. As I sit here a think about it, that makes sense.

I have to look up the commands and syntax to see how to do it, I just wrote my 1st from scratch PERL last week, it does FP so that was fun, so I need to look up how to actually do it, but something like…

If(HTTP_REFERER == XXX or XXXX){
HTTP_REFERER=The X ppl
}

But can you change the value in HTTP_REFERER like that. It's environmental right

Then Server Side it before my log entry.

<!--#exec cgi="cgi-bin/ref-fix.cgi" -->

<!--#exec cgi="cgi-bin/log.cgi" -->

jdMorgan

8:56 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jim_w,

Sure, you could do something like that with an environment variable, but then that user-agent or referer could come right back again with the very next request, and access your pages. Environment variables only have meaning for the current http request. http is a "stateless" protocol - each http request has no knowledge or memory of previous or concurrent (or subsequent) http requests.

As to being worried about changing .htaccess on the fly, note that the thread I cited contains a modified version of key_master's original script. It flocks (file-locks) .htaccess while it adds an entry so that no "collisions" can occur. Several of us here on WebmasterWorld are using this script and talking about it; I'd guess that many, many more are using it and not talking about it. I've had it up and running for six months with no problems whatsoever. And remember, it only gets invoked when a troublemaker hits your site, so performance impact is very, very low.

The way I see it, either the script works or it doesn't. If it doesn't work, then it's just like any of the other 1000's of lines of code in your scripts and in Apache itself - it can screw things up.

But it works.

I encourage you to try it and watch it work. Once you're comfortable with it, you'll be able to walk away from your logs for days at a time, knowing that your guard dog is still watching over your site. :)

Jim

jim_w

9:15 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim

I'm sure it works. What I’m skittish about is having a write file open on a system component like that when if the server crashes for some reason it would corrupt the file system.

I’m on a hosting system using a SUN Unix box and it is virtual. On my 1st hosting service, I had my log file corrupted several times. So I changed services. (this is why I run a second log) The new service I got was great. Never a problem for more than a year. Then they were purchased by a hosting service in Atlanta GA, and I have had at least one problem a week sometimes several a week for the last 30 days. Toooooo risqué for me right now. They have corrupted my mailbox files, and I have already lost about $500 in business. So at this stage, I really need to play it really, really safe.

Since I only want to change the HTTP_REFERER for the pages I log in my second log, I could just do the put the logic in my log.cgi before the write. After thinking about it for a while longer.

bird

11:19 pm on May 2, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah, if it just omitted the Referer header then it would have to recalculate the content length of the message

A simple HTTP GET request doesn't have a content lenght. And even with a POST request, it would only count the lenght of the data payload, not the headers. Replacing the referer URL with XXX:+++ is more work for the software and therefore slower than just omitting that header line.

Yes, it may indeed be some kind of "security product". But if so, then it's an incredibly stupid one. And anyone thinking that they will improve their own (or their customers) privacy by making their visits stick out of my logs like a sore thumb should seriously get their sanity checked.

grahamstewart

2:05 am on May 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A simple HTTP GET request doesn't have a content lenght

Hmm.. good point. Oh well, that was just the explanation I had heard.

Heck if visitors knew even a quarter of the stuff we know about them, we'd have far fewer customers.

Ahh.. I take it your not a big fan of P3P [w3.org] then!
Any and all personally identifiable information that you record about visitors should be openly declared on your site via a privacy statement. Otherwise I suspect you risk being in breach of the data protection act (or your country's equivalent law).

Jim: I still think your shooting yourself in the foot just to make a statement. (or "Cutting off your nose to spite your face" as my mum used to say)

They may have only looked at an article this time, but presumably they read it so their awareness of your site has been raised. Next time they return they may be a customer.

Brett_Tabke

2:23 am on May 3, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



No one has yet produced a program that uses ++++ by default.

grahamstewart

2:29 am on May 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think the thought process was that the various privacy programs (e.g. AdSubtract) replace the referer string with space characters.

It's the logging software that then encodes the space characters into the old url standard of '+'.

This 35 message thread spans 2 pages: 35