Forum Moderators: open

Message Too Old, No Replies

Need to block a specific spider

Help with robots txt

         

coosblues

3:18 am on May 22, 2003 (gmt 0)

10+ Year Member



First, let me first admit my ignorance with the robots.txt file, but I've never found the need for one until i just looked at my logs. It appears a spider/bot is stealing my copyrighted photos then displaying them on their site. How do I stop this one particular spider but let the rest have access? I don't have the ip number of the spider, only the web addie (perhaps should not post it here and mods remove if necessary). I want to block this site (http://infopop.univision.com/2/OpenTopic?a=tpc&s=864094322&f...) but allow all other bots. Thanks and enjoy my ignorance.

jatar_k

3:23 am on May 22, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Maybe this will help you with robots
Robots.txt Tutorial [searchengineworld.com]
thats if it obeys robots.txt

coosblues

3:32 am on May 22, 2003 (gmt 0)

10+ Year Member



I did read that tutorial before I came and posted here - and thanks for your reference to it. I cannot however find anywhere there where I can block a specific site as the one I mentioned. Do appreciate your help

jatar_k

4:01 am on May 22, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



depends on what you want to block. If they have a spider then, if it obeys robots.txt then you can use that.

If you want to block referers from that site you can do it using the scripting language of your choice.

If you want to block a person from accessing it you can do that by IP using the scripting language of your choice.

You may also be able to use something like this
modified "bad-bot" script blocks site downloads [webmasterworld.com]

wilderness

4:04 am on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



coos,
Robots.txt is a "suggestion only" for mannerly bots which act resposibly.
Robots.txt will NOT prevent any bot from doing anything.

If your having difficulties with robots.txt than you ability to understand htaccess (which implemnts your desired controls to deny/ban specific IP ranges,) is rather unlikely.

This thread will provide some examples:
[webmasterworld.com...]

jdMorgan

4:23 am on May 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



coosblues,

Robots.txt is unlikely to help here. Only well-behaved/honest robots pay any attention to it.

You might also want to check out the threads on image hot-linking.

It really depends on whether this is an actual spider copying the images from your site, or if they are just hot-linking to your site, and loading the images off your server each time they are displayed on the other site.

If they actually copied them, you need to send them a notice of copyright violation. If they don't respond, then contact their hosting service, and ask that the disputed images be removed. If that doesn't work, file a DMCA complaint with the major search engines, and send a lawyer after the copiers.

Jim

Busynut

9:38 pm on May 25, 2003 (gmt 0)

10+ Year Member



Hi Coosblues,
the url you provided is not for a robot/spider - even though the url was truncated I've seen this website before. It's a website where individuals/groups can set up their own forums - it may be similar a bit to MSN Groups but since I don't speak/read the language I'm not entirely sure on that point.

In this case the robots.txt file won't do anything for you. It seems one of their forum groups is hotlinking your images. The way to stop them is to use your htaccess file to prevent access to images from sites other than your own. Do a search here for hotlinking as Jim suggested and you'll find lots of info on this topic. Here's one thread to get you started but it's best to read as much as you can on this topic because there are different solutions that may be better for you depending on how your server is set up.

[webmasterworld.com...]

Hope this helps.