homepage Welcome to WebmasterWorld Guest from 23.22.29.137
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Been getting 200+ page loads per day from same IP
recorded by js based analytics
scooterdude




msg:4452393
 9:33 pm on May 11, 2012 (gmt 0)

Hi All

This visit is recorded by statcounter so, it doesn't appear to be a traditional spider and there is up to 5 minutes between some visits.

Its no load problem for my servers but, it does make a nonsense of analytics for a low traffic site espicially given shortcommings in unique visit identification.

Could it be 1 person visiting 200 pages day after day, same pages too sometimes ?

do other folk see this type of profile?

Do you ban em ?

Can supply IP range if helpfull

 

wilderness




msg:4452474
 5:00 am on May 12, 2012 (gmt 0)

You haven't provided any information that would enable anybody to assist you.

Stats and stats software provide generalized data.

You need to view and provide the data from your "raw visitor logs", before anybody may help.

lucy24




msg:4452499
 7:40 am on May 12, 2012 (gmt 0)

:: peering into crystal ball ::

It started out as a human. But then they forgot to close the tab containing your site, and

(1) every so often their computer crashes, and when it restarts, the browser reopens all former tabs (OK, this is not likely to be happening every few minutes rather than every few days)

(2) after a while the browser's auto-caching kicks in and the page is quietly reloaded at fixed intervals

(3) every time the user does anything else in the browser, all open tabs also reload.

Oh, wait. 200 different pages? I get weirdly repeated visits to the same page.

Does it seem to be a human? Don't look at the spacing of hits to pages themselves. See whether all associated files-- css, images etc-- are loaded up. This part happens faster with a human than with a robot. The js-based analytics alone isn't enough; I'm waging an ongoing battle to keep robots out of piwik. And I don't mean slimy Ukrainian bots either-- I mean well-known search engines.

keyplyr




msg:4452517
 8:47 am on May 12, 2012 (gmt 0)


You need to view and provide the data from your "raw visitor logs", before anybody may help.


ditto

scooterdude




msg:4452581
 1:48 pm on May 12, 2012 (gmt 0)

38.99.***.*** HTTP/1.1 Mozilla/5.0+(X11;+U;+Linux+x86_64;+en-US)+AppleWebKit/533.3+(KHTML,+like+Gecko)+Qt/4.7.1+Safari/533.3


There appear to be up to 4 Javascript enabled crawlers from 38.99. range.

I am now certain they are crawlers cos i diverted them into a cul de sac and they just keep hitting the same page at the same rate even though they're been redirected from up to 500 different pages

wilderness




msg:4452592
 2:21 pm on May 12, 2012 (gmt 0)

38.99.***.***


forum practices are only to obscure the Class D (last group)

Deny from 38.
or
RewriteCond %{REMOTE_ADDR} ^38\. [OR]

You need two lines or one combined to catch this UA and similar pests.
1) missing semi-colons
2) plus signs as opposed to spaces.

There are some others things you could key on, however they are personal preference.
I don't allow "Linux" users, at leas when so designated in the UA.

scooterdude




msg:4452600
 2:34 pm on May 12, 2012 (gmt 0)

Thanks , I give those a shot

dstiles




msg:4452653
 6:41 pm on May 12, 2012 (gmt 0)

I have 96.0 - 99.255 listed (and enabled) as Scoutjet robot from Blekko.

Beyond that I have cogent completely blocked at 38.96.0.0 - 38.127.255.255.

I have notes of bots from Discovery, Trustwave and Voyager/Kosmix with the blocked range going back more than two years; they may be no longer on that range.

Whatever, if it's a (semi-)genuine bot it should have an identity within the UA. Your example UA has no such thing.

I would have suspicions about safari coming from linux unless it were a skilled user or hacker OR a bot. Default browser (at least for ubuntu) is konqueror but I think most people use firefox or opera. Webkit is mostly a Mac or Chrome browser or used by google as a site scraper - sorry, "Web Preview" bot. Can't see the latter running on cogent but it's always possible, I suppose, although I think the UA is wrong for that.

scooterdude




msg:4452659
 6:53 pm on May 12, 2012 (gmt 0)

Interesting you mention Cogent,

Where these automata you blocked js enabled ?

dstiles




msg:4452661
 7:01 pm on May 12, 2012 (gmt 0)

Sorry, no idea about JS. I think scoutjet probably is, though. Most large engines seem to be, now.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved