homepage Welcome to WebmasterWorld Guest from 54.145.191.14
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Validation Error
decdim



 
Msg#: 309 posted 2:22 am on Mar 4, 2004 (gmt 0)

I kept getting an error validating my site blocking certain directories:

User-agent: *
Disallow /block/this/
Disallow /blocked/

However when I put in a full address, it approved it:

User-agent: *
Disallow http//www.domain.tld/block/this/
Disallow http//www.domain.tld/blocked/

(left out : so that it doesn't create a true link here, but in my robots.txt it exists)

Anyone else running into this issue?
Didn't know if it really mattered which way is better or should not be used.

 

ncw164x

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 309 posted 9:28 am on Mar 4, 2004 (gmt 0)

The robots.txt file need not exist but if it does it must be called "robots.txt" and must be written and uploaded in ascii mode

It must be in the root directory of the web site as spiders will not look for it anywhere else

To exclude all robots from parts of the server
User-agent: *
Disallow: /cgi-bin/
Disallow: /misc/sitestats/

Exclude a specific spider from parts of the server
User-agent:slurp.so/
Disallow: /cgi-bin/
Disallow: /secure/
Disallow: /products/
Disallow:/misc/sitestats/

This indicates that nothing is disallowed and the spider can follow all links
User-agent: *
Disallow:

To allow a single robot complete access and exclude all others
User-agent: Googlebot/1.0
Disallow:
User-agent: *
Disallow: /

This would prevent your entire web site from being indexed
User-agent: *
Disallow: /

Hope this helps

ncw164x

decdim



 
Msg#: 309 posted 11:05 pm on Apr 10, 2004 (gmt 0)

No it didn't help because I already knew that information but I 'tested' my robots file to verify it was 'clean'.

The site gave me errors when I only tried the directory

/this
/and/this

but when I use a full link, it comes back as 'okay':

http//www.site.tld/this
http//www.site.tld/and/this

I have found that most bots don't listen anyways including Yahoo...they keep going after disallowed.

I'm so disgruntled with the net that I'm going to make my site go stealth.

What the hell is the point for search engines posting how to deal with their bots/spiders if they can't even control them properly?

And for all those worthless sites like DMOZ who fail to update on a timely basis, I have no respect for & could care less about dealing with them...

ncw164x

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 309 posted 12:12 am on Apr 11, 2004 (gmt 0)

>>No it didn't help because I already knew that information
Well if you already knew the information above you would also of known that you have missed the ":" off all of your disallows

Disallow: /blocked/
not
Disallow /blocked/

ncw164x

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved