Forum Moderators: open

Message Too Old, No Replies

Spiders: All You Ever Wanted to Know

Identification and the Software to Use

         

frontpage

10:30 pm on Mar 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a problem.

I want to be able to identify the spider activity of my websites. However, the software that I have tested is marginal. Some programs miss spiders, misidentify them, or don't provide any information. (I've used Urchin, online stats services like Addfreestats, and Webtrends Log Analyzer.)

I have lurked on this site and noted that Yea webmasters are able to determine which spider visited, if they obey the robots.txt, and what the spider's activities were.

Please let me know what software you like the best.

I am sure there are others out there who would like to know so please DISH.

Key_Master

11:10 pm on Mar 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need some type of server side logging software to accurately detect all spider hits. Most on-line stat services use JavaScript to track hits and it isn't very reliable.

As for identifying spiders, often a Google search will bring up some usable info. That is if the spider IP or agent can't be traced to a source. Also, there is a ton of info on different spiders on this website.

Tapolyai

11:15 pm on Mar 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or you can just use the poor men's logger. Write a short script on your front page that logs all visits. Might not be the best, but if you store the data in a DB, you can extract some decent info...

Crazy_Fool

2:22 am on Mar 25, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i have webalizer statistics set up on my raq servers to show every user agent, not just the top 20 or top 50. if i spot a new or unusual user agent there, i download the log file and do a search for it. i can then tell which pages / files were visited and when. with windows servers i just download the log files and scan them manually every now and then, although i sometimes run them through analog stats (www.analog.cx)

you might want to write a script (or modify an existing one) to read through your log files and save the results into a database for analysis. you might find some suitable scripts floating around the net. i'd love to do this myself and fully automate it etc, but i just don't have the time....