Forum Moderators: goodroi

Message Too Old, No Replies

unknown robot

robot

         

kool002

3:45 am on Apr 6, 2007 (gmt 0)

10+ Year Member



Hi,
In my logs, under spider visits I regularly see this line
Unknown robot (identified by 'crawl')
beside google, msn bots. - 491 hits

Who is this guy. Is he scraping my links?

My site is a niche directory and it is human edited. I display the links using frame. Can he still steal all my links.

How can I prevent him?

cyberdyne

8:40 pm on Apr 10, 2007 (gmt 0)

10+ Year Member



Sorry I cant help kool, but I'm interested too:


Unknown robot (identified by 'crawl')5512+10 74.69 MB
Unknown robot (identified by 'robot')87+8 2.06 MB
Unknown robot (identified by hit on 'robots.txt')0+36 3.02 KB
Unknown robot (identified by 'spider')10+9 171.19 KB

How do I also stop them using my bandwidth please?

goodroi

1:16 pm on Apr 11, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You can use htaccess to block access to certain ips. That might be a simpler solution for you.

jdMorgan

1:59 pm on Apr 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key to solving this problem is to take the timestamp and/or IP address information from your 'stats' program, and look up these accesses in your raw server access logs. There you will see the full client request information, including the full user-agent string of the agent accessing your site.

Stats programs, while extremely useful at getting an overview of your site's traffic, hide information by their very nature, and so are only marginally useful for detailed technical work. In this case, the stats program is hiding the full user-agent string, making it impossible to give you any kind of useful answer here.

Dig into the raw access logs, get the full user-agent string, and then you can do something about this robot if it is malicious or unwelcome.

Jim