homepage Welcome to WebmasterWorld Guest from 54.196.206.80
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Accredited PayPal World Seller

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt specific query
IanTurner




msg:1526968
 12:50 pm on Sep 24, 2001 (gmt 0)

if a robots.txt file has

useragent *
Disallow: /this.htm

will this disallow the robots from all instances of this.htm within the site or just from the home directory?

 

Rumbas




msg:1526969
 1:09 pm on Sep 24, 2001 (gmt 0)

IMO the syntax you describe would disallow the file - this.htm - no matter where you place it.
But let's see what the experts say ;)

In the mean time you can take a look at WebmasterWorld's own

Robots Checker [searchengineworld.com]

It validates a robots.txt file and has some nice info on robots-txt.

Brett_Tabke




msg:1526970
 1:21 pm on Sep 24, 2001 (gmt 0)

This is a question where ambiguity reigns. According to spec, it will only block it, if the url STARTS with that match:

This can be a full path, or a partial path; any URL that starts with this value will not be retrieved. For example, Disallow: /help disallows both /help.html and /help/index.html, whereas Disallow: /help/ would disallow /help/index.html but allow /help.html

useragent *
Disallow: /this.htm

That would only get:
/this.htm

But would also get:
/this.htm/rocks

The problem is, that I don't believe all spiders follow the spec that way. They tend to do sliding regexs.

IanTurner




msg:1526971
 1:27 pm on Sep 24, 2001 (gmt 0)

Hmm and I thought this was going to be easy for an expert on robots.txt files.

Oh well of to W3C again - though this probably won't help

bufferzone




msg:1526972
 6:50 pm on Sep 24, 2001 (gmt 0)

It is my understanding, that spiders works in very different wayes (and most of the times mysterios). Somtimes it even seemes that some spiders "eat" in directoryes, they are not allowed in.

Regards
Kim (snipped URL.....please no signatures)

(edited by: agerhart at 6:51 pm (gmt) on Sep. 24, 2001)

agerhart




msg:1526973
 6:55 pm on Sep 24, 2001 (gmt 0)

Bufferzone,

What I think you are referring to are rogue spiders and the ones that do not adhere to or abide by the robots.txt, which is the not the case for the major search engines.

bufferzone




msg:1526974
 7:16 pm on Sep 24, 2001 (gmt 0)

agerhard>>Thank's and sorry for the URL, I now know it is not alowed

Kim

agerhart




msg:1526975
 7:19 pm on Sep 24, 2001 (gmt 0)

No problem Kim.....I hope that you enjoy the forums. There is a whole lot of great information here.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved