Welcome to WebmasterWorld Guest from 54.224.160.42

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Been getting 200+ page loads per day from same IP

recorded by js based analytics

     
9:33 pm on May 11, 2012 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 16, 2010
posts:533
votes: 0


Hi All

This visit is recorded by statcounter so, it doesn't appear to be a traditional spider and there is up to 5 minutes between some visits.

Its no load problem for my servers but, it does make a nonsense of analytics for a low traffic site espicially given shortcommings in unique visit identification.

Could it be 1 person visiting 200 pages day after day, same pages too sometimes ?

do other folk see this type of profile?

Do you ban em ?

Can supply IP range if helpfull
5:00 am on May 12, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


You haven't provided any information that would enable anybody to assist you.

Stats and stats software provide generalized data.

You need to view and provide the data from your "raw visitor logs", before anybody may help.
7:40 am on May 12, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13210
votes: 347


:: peering into crystal ball ::

It started out as a human. But then they forgot to close the tab containing your site, and

(1) every so often their computer crashes, and when it restarts, the browser reopens all former tabs (OK, this is not likely to be happening every few minutes rather than every few days)

(2) after a while the browser's auto-caching kicks in and the page is quietly reloaded at fixed intervals

(3) every time the user does anything else in the browser, all open tabs also reload.

Oh, wait. 200 different pages? I get weirdly repeated visits to the same page.

Does it seem to be a human? Don't look at the spacing of hits to pages themselves. See whether all associated files-- css, images etc-- are loaded up. This part happens faster with a human than with a robot. The js-based analytics alone isn't enough; I'm waging an ongoing battle to keep robots out of piwik. And I don't mean slimy Ukrainian bots either-- I mean well-known search engines.
8:47 am on May 12, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6538
votes: 114



You need to view and provide the data from your "raw visitor logs", before anybody may help.


ditto
1:48 pm on May 12, 2012 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 16, 2010
posts:533
votes: 0


38.99.***.*** HTTP/1.1 Mozilla/5.0+(X11;+U;+Linux+x86_64;+en-US)+AppleWebKit/533.3+(KHTML,+like+Gecko)+Qt/4.7.1+Safari/533.3


There appear to be up to 4 Javascript enabled crawlers from 38.99. range.

I am now certain they are crawlers cos i diverted them into a cul de sac and they just keep hitting the same page at the same rate even though they're been redirected from up to 500 different pages
2:21 pm on May 12, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5459
votes: 3


38.99.***.***


forum practices are only to obscure the Class D (last group)

Deny from 38.
or
RewriteCond %{REMOTE_ADDR} ^38\. [OR]

You need two lines or one combined to catch this UA and similar pests.
1) missing semi-colons
2) plus signs as opposed to spaces.

There are some others things you could key on, however they are personal preference.
I don't allow "Linux" users, at leas when so designated in the UA.
2:34 pm on May 12, 2012 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 16, 2010
posts:533
votes: 0


Thanks , I give those a shot
6:41 pm on May 12, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3121
votes: 3


I have 96.0 - 99.255 listed (and enabled) as Scoutjet robot from Blekko.

Beyond that I have cogent completely blocked at 38.96.0.0 - 38.127.255.255.

I have notes of bots from Discovery, Trustwave and Voyager/Kosmix with the blocked range going back more than two years; they may be no longer on that range.

Whatever, if it's a (semi-)genuine bot it should have an identity within the UA. Your example UA has no such thing.

I would have suspicions about safari coming from linux unless it were a skilled user or hacker OR a bot. Default browser (at least for ubuntu) is konqueror but I think most people use firefox or opera. Webkit is mostly a Mac or Chrome browser or used by google as a site scraper - sorry, "Web Preview" bot. Can't see the latter running on cogent but it's always possible, I suppose, although I think the UA is wrong for that.
6:53 pm on May 12, 2012 (gmt 0)

Preferred Member

5+ Year Member

joined:Nov 16, 2010
posts:533
votes: 0


Interesting you mention Cogent,

Where these automata you blocked js enabled ?
7:01 pm on May 12, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3121
votes: 3


Sorry, no idea about JS. I think scoutjet probably is, though. Most large engines seem to be, now.