Welcome to WebmasterWorld Guest from 34.238.192.150

Forum Moderators: martinibuster

Message Too Old, No Replies

Adsense pages drawing false visits

Always with +SV1;+.NET+CLR+1.1.4322

     
1:07 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


Some of our pages that display Adsense (and no others) are drawing increasing numbers of false visits in our server logs. I call them false visits because they aren't real people or even bots, but they show up as if they were genuine direct hits (no refferer) from random IP addresses. Many of these IP addresses appear to be related to China, a couple appear on blacklists.

These false visits aren't hurting us with Adsense because neither Adsense nor Analytics counts them, apparently the Javascript Google uses is smarter than the server logging, or maybe Google has an extensive IP blacklist. However, they are growing both in quantity of false visits and in the number of pages affected. They appear in our logs as:

GET (ourpage.html) - 80 - (random IP Addy) HTTP/1.1 Mozilla/4.0+compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)

That the visits are false is apparent because they increase the logged traffic to a given page by a factor of 4X to 8X, resulting in hundreds of false visits a day on some pages. What's more, they never go away. After these false visits appears, they simply continue, for months in the case of our earliest page affected. Overall, our server logs are showing over a thousand of the false visits a day.

I did some research on this, saw quite a few comments from last year that it may be related to search engine caching or to a flaw in the .NET architecture, but I'm not enough of an infrastructure guy to understand those discussions. I didn't find any discussions relating the false visits to pages showing Adsense, which is what concerns me. Since we only run Adsense on a fraction of our pages and don't do huge traffic to start with, it shows up pretty clearly. Also, removing Adsense from an affected page does not free it from the daily beating.

Has anybody else seen such a rise in false traffic and does the Adsense relation hold up? Is it an attack, somebody trying to spam the system with noise to cover their tracks? Is it (to use a hardware analogy) just chattering relays out on the Internet infrastructure?

2:16 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 26, 2002
posts:3295
votes: 9


they aren't real people or even bots

Another 'Net life form discovered? ;-)

This looks like a regular XP user agent. A hint could possibly be in the IP addresses and frequency of hits. The best place to get help for something like this is over at the tracking and logging [webmasterworld.com] or spider [webmasterworld.com] forums where all the log obsessed folks hang out.

2:37 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:July 31, 2006
posts:629
votes: 0


If your Google Analytics does not count them then those "visitors" do not have JavaScript enabled. If JavaScript is not enabled then these visits have no effect on your AdSense either.
Maybe they just "ping" some high-pagerank pages to steal content/updates.
3:20 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


If JavaScript is not enabled then these visits have no effect on your AdSense either.

Correct, but it doesn't explain what they are trying to do. It's only a handfull of Adsense pages drawing this attention, no relation to Page Rank.

Maybe they just "ping" some high-pagerank pages to steal content/updates.

No, they wouldn't be coming back to the same pages (which are static by the way) day after day, generating hundreds of false visits on moderately trafficed pages and dozens of false visits on pages that only draw a couple real visitors a day.

This looks like a regular XP user agent. A hint could possibly be in the IP addresses and frequency of hits. The best place to get help for something like this is over at the tracking and logging or spider forums where all the log obsessed folks hang out.

I thought of that, but since it's only happening with pages that show Adsense, I thought I'd try here first to see if anybody else is seeing the same issue. On a page that only gets a couple dozen of these false visits per day, they come every hour or so from unique IP's. I haven't carefully analyzed the logs for one of the pages with 300 or 600 false visits a day to see if the IP's repeat yet, I suppose I'll try that if I get really annoyed.

3:32 pm on Oct 26, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Jan 14, 2006
posts:222
votes: 0


My guess is someone is using a bot net to scrape your content. They probably manually reviewed your pages, figured if the content was good enough for you to place adsense on, then the pages were good enough to scrape.

They then use a bot net to keep the scraped content updated. I know you said they were static, but a scraper that uses a bot net is lazy and will keep automatically checking anyway.

It's just my guess, but I would still post your question in the tracking and logging forum.

3:39 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


I see all kinds of crazy crypto-bot activity on my website, dating from long before I ever used AdSense.

To this day I have no idea whether it's due to some kind of ISP pre-cacheing, or slow-motion email harvesting, or search engines doing some under-the-radar crawling in order to identify cloaking, or what.

If it becomes a major problem you can try to ban the IP blocks involved. Personally I don't see simple crawling as anything worth losing sleep over, except when they ignore my robots.txt file and hammer my executable scripts with requests.

4:16 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Unless this is an artifact of your logging method, the user-agent is invalid. Note the "+" characters substituted for spaces. As such, it should be easy to deny it from your site.

Jim

4:17 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


I know you said they were static, but a scraper that uses a bot net is lazy

But why hit the same page hundreds of times a day? I understand the scraper paradigm, before Google worked out their duplicate content algo, I used to see tens of thousands of scrapes of our pages in their index. I guess I need to study up on bot nets, that sounds like something that would use unlimited random IP's. And getting back to this forum, why only target pages displaying Adsense? It's not like we have Adsense on the high quality pages or the low quality pages, it's pretty random on our site, depending on where Adsense converts well and what doesn't interfere with out core business.

4:20 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


the user-agent is invalid. Note the "+" characters substituted for spaces.

Thanks, I'll look into this, but I haven't seen any mention of the + character being suspicious in other threads related to this type of log entry.

4:41 pm on Oct 26, 2006 (gmt 0)

Full Member

10+ Year Member

joined:Jan 14, 2006
posts:222
votes: 0


But why hit the same page hundreds of times a day?

I hear you, but to illustrate why I think bot nets can and will do this because they are stupid and lazy, I'll give you a real example that is off topic from adsense, but might answer your question.

On one of my sites, I had a form where users could submit info to a page. I didn't properly sanitize one of the fields, but figured it was ok, because whatever a user submitted, I had to approve prior to it showing up on the page.

Well, someone used that field to insert some java script from another site. When I viewed the field to approve it, I also ran the javascript (because I viewed the field in html). No other users ever ran that javascript, but I didn't appreciate it so much, so I went ahead and made sure that field was sanitized. I fixed the issue immediately. Also, the site hosting the code was taken down.

Well, I guess whoever did it thought they found a vulnerable page, because next thing I know I'm getting hits to that field from random IP's all inserting the same javascript.

Even though the field has been sanitized for over a month and the site hosting the bad code has also been down for over a month, this one page is still to this day getting these multiple hits from random IP's doing the exact same thing and still pointing to that down site.

All I can do is just sit back and shake my head, I guess.

But as I said before in my earlier post, it's just my guess(with some tin hat stuff mixed in) that it's a bot net causing your log entries.

[edited by: Jordo_needs_a_drink at 4:53 pm (utc) on Oct. 26, 2006]

4:52 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


BTW that browser ID string is also missing an opening parenthesis. If, as jdMorgan says, that's exactly the way it appears in you log file, then that would be another indicator of a nonstandard browser.
5:08 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 26, 2002
posts:3295
votes: 9


Note the "+" characters substituted for spaces

Sheesh, I completely missed the fact that the "+" was ignored in my searches, a quoted search at that. Saw a ton of results and didn't look closely enough to see if what was returned was actually what I searched for.

Live and learn.

5:13 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


BTW that browser ID string is also missing an opening parenthesis.

Good eyes. Typo on my part, should be open parenthesis before "compatible." One day I'll learn how to cut and paste right:-)

7:57 pm on Oct 26, 2006 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


GET (ourpage.html) - 80 - (random IP Addy) HTTP/1.1 Mozilla/4.0+compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322)

Those aren't false visitors, it's a bot of some sort, you're getting scraped.

The + signs are because whoever pasted that in the user agent string copied it from a Windows IIS server log which replaces spaces with + and they think that's how the world sees that information which is hysterical.

You can learn more about these things in Pubcon Vegas:
[pubcon.com...]

9:13 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 18, 2005
posts:817
votes: 0


Here is a URL for the botnets mailing list: [whitestar.linuxbox.org...]
10:22 pm on Oct 26, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


You can learn more about these things in Pubcon Vegas:

Going to be overseas for business in November, Boston Pubcon was it for the year. I'm glad the consensus is scraper bots, better than some sort of attack on Adsense. I can't say it makes any sense to me that the same page would get scraped hundreds of times a day, but I'm a white hat, so what do I know:-)

10:29 pm on Oct 26, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


could it be that someone is framing those pages?

I had that happen many times until I added code to always breakout of the frames.

10:32 pm on Oct 26, 2006 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


could it be that someone is framing those pages?

Framing doesn't change the user agent of the browser, unless they are downloading the page each time via a proxy server that used that UA.

12:17 am on Oct 27, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


could it be that someone is framing those pages?

I'm not sure what framing is, but if it works like it sounds, it still wouldn't explain the number of hits, since I can't see a framed copy with no search engine ranking outdrawing the original by a factor of four.

1:17 am on Oct 27, 2006 (gmt 0)

Preferred Member from US 

10+ Year Member

joined:Mar 10, 2004
posts:471
votes: 51


I found log entries like this as well.

Now blocked.

1:47 am on Oct 27, 2006 (gmt 0)

New User

10+ Year Member

joined:Apr 29, 2006
posts:15
votes: 0


Could someone be framing those pages and injecting their own PUB ID into the AdSense ads?

That could explain the odd behavior coupled with the fact that the hits aren't showing in your AS stats.

- P -

4:17 am on Oct 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member jomaxx is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 6, 2002
posts:4768
votes: 0


Not likely, because
(a) simple framing doesn't work that way,
(b) a site grabbing the page as a proxy wouldn't normally show a variety of random IP addresses (as described in post 1), and
(c) if anyone wanted to accomplish this, it would be vastly easier and less traceable to simply copy the page to their own server, then modify it to their heart's content.
4:36 am on Oct 27, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:June 10, 2006
posts:606
votes: 0


This is an eye-opening thread. For all I know, some of this is happening to me too and I've just never noticed.

It's got me thinking. I had been considering becoming a malnourished data center detective, but now I'm intrigued with the thought of being a log-obsessed, crypto-bot-fearing recluse.

Thanks!

5:27 am on Oct 27, 2006 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


Could someone be framing those pages and injecting their own PUB ID into the AdSense ads?

No, that won't change the USER AGENT in the browser, it's something else.

2:36 pm on Oct 27, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Apr 25, 2006
posts:475
votes: 0


Now blocked.

What did you block, a series of IP's? We're getting a thousand different IP's a day with this junk, so that route won't work for us, and there' plenty of legit traffic with the 1.4332 bit. When I first saw this, maybe a half year ago, the false hits were coming from just two IP's in a known bad neighborhood, but now they're random.

11:52 pm on Oct 27, 2006 (gmt 0)

System Operator from US 

incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14664
votes: 99


a thousand different IP's a day with this junk

Please forward a sample of at least 100 to me in stickymail.

Bot busting is what I do and I'd like to evaluate whether these are actual bots or a botnet.

11:55 pm on Oct 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 1, 2004
posts:1987
votes: 0


I for one am going to stick to the stuff I know.

...

Crap, now i have nothing to do.

4:03 am on Oct 28, 2006 (gmt 0)

Preferred Member from US 

10+ Year Member

joined:Mar 10, 2004
posts:471
votes: 51


I'm blocking via a rewrite rule that targets any user agent with:
(compatible;+

So far its nailed 5 IPs and dozens of page requests each, feeding them an empty 403 page.