Forum Moderators: open

Message Too Old, No Replies

Spider tracking

Tools to analyse crawling

         

rupalis

2:32 am on Jun 21, 2006 (gmt 0)

10+ Year Member



Hi

Is there a comparsion chart for all the tools available to analyse spider visits?

Log analyser tools have robots as a section but I was looking for tools which do just spider tracking.

Thanks.

Pfui

12:08 am on Jun 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry but all too many spiders don't ID themselves as such -- they may use the same browser you do, for example -- so any tool claiming to track spiders would be missing an awful lot. Even those of us admitting to bot-tracking obsessive-compulsiveness (raises hand) miss all too many.

So thank goodness for the really impressive (and impressively scary) compilation at Psychedelix.com: "Database of robots, spiders & other user-agents [psychedelix.com]" pages. I don't know how they do that, but I'm glad they do because they do a terrific job.

incrediBILL

12:26 am on Jun 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whatever tool you get will just undercount horrendously as only the lamest stealth crawlers can be detected in a log file as it requires real-time activity to catch most of them.

For instance, so far today I've blocked the following:

* 65 identified crawlers blocked requesting 357 pages.

* 116 stealth crawlers blocked requesting 980 pages.

The only reason the identified crawlers ask for fewer pages is because the new ones get stopped at the index page. Older crawlers that have been to my site before and know the page names prior to being blocked still ask for what they know about before going away.

Stealth crawlers unfortunately get further before you can determine bot or human and get tend to get more pages. Since they got a peek at the full site navigation they'll keep asking for more pages too although they're being bounced at that time.

It's actually a slow day, yesterday was about 4K blocked pages total but there's a few hours to go and I have faith a couple of greedy pigs will hit my server before the day is over.