bad bots

Forum Moderators: goodroi

Message Too Old, No Replies

bad bots

rhodopsin

11:42 am on Oct 2, 2004 (gmt 0)

Can I assume that a robots.txt disallow command - saying that no bots may browse my site will work for many of the search engine bots. such as googlbot. BUT will not work for bots searching for e-mail addresses to spam, offline browsers (web rippers), etc...? That is other types of robot. Does anyone know whether marcspider would obey robots.txt? This is a robot that looks for images on the web - images with registered watermarks.

jdMorgan

4:59 am on Oct 3, 2004 (gmt 0)

robots.txt contains disallow requests. Good spiders comply, and bad ones don't. So harvesters are just going to ignore robots.txt. Certain trademark and copyright 'bots may feel they have a right to ignore robots.txt as well, but I can't answer your specific question.

Bad 'bots require stronger measures.

Jim

rhodopsin

11:16 am on Oct 3, 2004 (gmt 0)

Bad 'bots require stronger measures

can you elaborate on what kind of measures I can implement? I realise this might be a big topic - are there any resources or other threads that you can point me to?

ncw164x

11:40 am on Oct 3, 2004 (gmt 0)

This method uses the .htaccess file
[webmasterworld.com...]

the above method is the best way but it can also be implemented using the httpd config file if you use a unix server and have root access.

There is a file you can use on a windows server but i don't know how to implement it

hope this helps

bad bots

rhodopsin

jdMorgan

rhodopsin

ncw164x

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week