Forum Moderators: goodroi

Message Too Old, No Replies

bad bots

         

rhodopsin

11:42 am on Oct 2, 2004 (gmt 0)

10+ Year Member



Can I assume that a robots.txt disallow command - saying that no bots may browse my site will work for many of the search engine bots. such as googlbot. BUT will not work for bots searching for e-mail addresses to spam, offline browsers (web rippers), etc...? That is other types of robot. Does anyone know whether marcspider would obey robots.txt? This is a robot that looks for images on the web - images with registered watermarks.

jdMorgan

4:59 am on Oct 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



robots.txt contains disallow requests. Good spiders comply, and bad ones don't. So harvesters are just going to ignore robots.txt. Certain trademark and copyright 'bots may feel they have a right to ignore robots.txt as well, but I can't answer your specific question.

Bad 'bots require stronger measures.

Jim

rhodopsin

11:16 am on Oct 3, 2004 (gmt 0)

10+ Year Member



Bad 'bots require stronger measures

can you elaborate on what kind of measures I can implement? I realise this might be a big topic - are there any resources or other threads that you can point me to?

ncw164x

11:40 am on Oct 3, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This method uses the .htaccess file
[webmasterworld.com...]

the above method is the best way but it can also be implemented using the httpd config file if you use a unix server and have root access.

There is a file you can use on a windows server but i don't know how to implement it

hope this helps