Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Help request.

I need help with stopping spyder robots.

         

westmore

6:22 pm on Apr 19, 2005 (gmt 0)

10+ Year Member



I've downloaded the robots.txt tutorial. Written a simple robots.txt file using vi.
User-agent: *
Disallow: /
Put it in directory that contains the directories and pages that make up the website. Here's the hitch.
www.mydomain.com is located on a Windows 2003 server running IIS v6. A link on the home page redirects to the unix server (running Apache) where the website is located. When I look at the http.log file on the unix server I still see references to googlebot.com. I'm trying to stop the googlebot spyder. The tutorial says to put the robots.txt file in www.mydomain.com. Can you create a robots.txt file and put it on a Windows server? Or does this only work on a Unix box?
If it only works on a Unix box, then what's an equivalent solution on a Windows box?

mack

4:14 am on Apr 20, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The robots.txt file will only work for the server it is physicaly installed on. In order for this to work you will need to have a robots.txt file on your nix server.

Robots.txt is a universal standard for all web servers. It will work on Windows and Nix servers just as well.

Mack.

westmore

3:34 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



Mack,

Many thanks for the feedback. I do have the robots.txt file on the nix box. I'll create one for the Windows box also.

devi8or

6:42 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



I Just go for the gold and avoid the headaches, if a bot, robot, or an SE crawls my site excesively, I lookup the IP it came from, get the info I need, which is the company name and IP range, then I simply ban the entire IP range. My robots.txt is configured to prevent any SEs from browsing any restricted areas, however, both SLURP and GOOGLEBOT both do "ALOT" of crawling, and they often look for URLs and Subdirectories that do not exsist, which uses up alot of bandwidth...

I have been using this proccess for about a week now and it has freed up alot of bandwith...

Unfourtunatly, both Google and Yahoo are blocked, However, I am not worried about it, Google still has my original site cached from a year and a half ago, and I am still using the same URL, so it all evens out :D ...

Just thought this might help y'all out.

-- The DEVI8OR

westmore

7:40 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



Well thanks for the feedback. I have blocked the ip range at the firewall. But, I have ran into a hitch once or twice using that method. I once blocked an ip range assigned to yahoo, and blocked inbound yahoo mail.

devi8or

8:00 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



Blocked inbound Yahoo mail? I am assuming that you are using an email client like Thunderbird or Outlook, Just curious, cause I block it with the server program, not the firewall...

westmore

8:56 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



We don't want it blocked. But, I'm curious about how you block it with the server? We're using Exchange5.5\Outlook 2000. What kind of server are you talking about?

devi8or

9:41 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



I am using ABYSS X1, it is free from [aprelium.com...] and I use that to block IPs, and it blocks IPs from the site, and it is REALLY easy to use...not the entire computer, like a firewall does, which allows me to use my email client, etc...

westmore

9:49 pm on Apr 20, 2005 (gmt 0)

10+ Year Member



Thanks, I'll check it out.

devi8or

6:22 am on Apr 21, 2005 (gmt 0)

10+ Year Member



No prob...Always glad to help out a fellow Admin..