Forum Moderators: phranque

Message Too Old, No Replies

Spider.txt and Siphons

Need a new txt file excludes trouble

         

Putz

9:47 am on Apr 23, 2002 (gmt 0)

10+ Year Member



Hello All:

I've away from web mastering for a few years and see by all the activity here, much has changed as it was back then.

I was a victim of Email Siphons and I am now looking for a direction to locate either a place to learn how to write an update Spider.txt file or be able to use something like a freeware/shareware or sample of one I can modify to keep out the pests while allowing weekly updates from all Spiders for SE's.

Will anyone here help this 55 yr old returning mayor/webmaster and help or provide map to locate updating education? TIA if U do!

Woz

10:08 am on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld Putz, although I am sure you aren't one...

I think you are looking for help with a robots.txt file, and there has been quite a deal of discussion about them here over the last few months. Particularly one memorable thread started by Brett with what could be the ultimate Robots.txt file.

Try using the search function above and search for robot.tst and see how you go. Holler if you get lost and well help out.

>55 yr old
Ah, so there's hope for me yet. ;)

Onya
Woz

Putz

10:59 am on Apr 23, 2002 (gmt 0)

10+ Year Member



OK, I went to the search, typed in spider.txt and read on showing, no Brett post found and I even had my glasses on. So maybe its my too high Blood Sugar at 498 as checked yesterday.

OK, I'll bite (not myself or U) where wood I find the ultimate spider.txt :) or do you provide maps ?

I use Putz cuz it fits me well :)

TallTroll

11:29 am on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Putz, the correct name for the file is robots.txt. If you try a search on that, you will find plenty

Although we are in the habit of referring to them as "spiders", officially they are web robots. Its just one of those bits of jargon, I guess

There is an excellent resource at [robotstxt.org...] including links to the spec documents

Putz

11:41 am on Apr 23, 2002 (gmt 0)

10+ Year Member



But of course...robots.txt, did that and wow, thanks so much and for that link U provided, thanks, just returned from there, great place...

I was sent a file like this...

<.Files .htaccess>
deny from all
<./Files>
RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^Crescent [OR]
RewriteCond %{HTTP_USER_AGENT} ^CherryPicker [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR] RewriteCond %{HTTP_USER_AGENT} ^WebEMailExtrac.* [OR] RewriteCond %
{HTTP_USER_AGENT} ^NICErsPRO [OR] RewriteCond %{HTTP_USER_AGENT} ^Teleport [OR] RewriteCond %{HTTP_USER_AGENT} ^Zeus.
*Webster [OR] RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR] RewriteCond %{HTTP_USER_AGENT} ^Wget [OR] RewriteCond %
{HTTP_USER_AGENT} ^LinkWalker [OR] RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR] RewriteCond %
{HTTP_USER_AGENT} ^ia_archiver [OR] RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR] RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector RewriteRule ^.* - [F] RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.your-site.com.* - [F]

What exactly is this, how duz it work and is there a top to it, what duz it llok like and where can I find it? I no, a bunch of questions, but the help here is of such great quality, thanks for all the help!

Named appropriately Putz, sumtimes with an R added...;)

richlowe

10:37 pm on Apr 23, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Now the question is: how do you do this in IIS?