homepage Welcome to WebmasterWorld Guest from 54.226.168.96
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Visit PubCon.com
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Robots.txt Question :)
Question regarding how to use robots txt
Prana




msg:1526073
 3:37 pm on Sep 26, 2003 (gmt 0)

Hi,

If I were to enter:

User-Agent: *
> Disallow: /

Would this block the spiders from ALL pages including HOME page?

Also, how would I set it up to just block the inner pages built using CFM and leaving the home page and reciprocal links page spiderable?

[sitename.com...]

Thanks a bunch in advance for your knowledge and assistance!

WebMasterWorld ROCKS! ;)

 

Mike12345




msg:1526074
 4:20 pm on Sep 26, 2003 (gmt 0)

[searchengineworld.com...] - should help you with both Questions :)

Prana




msg:1526075
 4:59 pm on Sep 26, 2003 (gmt 0)

Thanks for the reply Mike!

I had searched WW and read several other posts and also the robots.txt tutorial previous to posting question and could not find anything saying if it blocked home page or if I could block all pages written in ASP.

Any other suggestions completely appreciated! :)

Site is ready to fly and I want to make sure to please the SE God's. Only the finest Homemade Organic Cookies and Almond Milk for Ms. Googlebot! She works very hard and we got to keep her healthy ;)

Prana




msg:1526076
 6:14 pm on Sep 26, 2003 (gmt 0)

If I were to put:

User-agent: *
Disallow:/ingr
Disallow:/1AR
Disallow:/1AU
Disallow:/1bk
Disallow:/1BU
Disallow:/1CA
Disallow:/1cd
Disallow:/1CL
Disallow:/1CR
Disallow:/1di
Disallow:/1FE
Disallow:/1FS
Disallow:/1GI
Disallow:/1GR
Disallow:/1HE
Disallow:/1HO
Disallow:/1HH
Disallow:/1IN
Disallow:/1JE
Disallow:/1MA
Disallow:/1ME
Disallow:/1MT
Disallow:/1NA
Disallow:/1PC
Disallow:/1PA
Disallow:/1SN
Disallow:/1TA
Disallow:/1VB
Disallow:/1VD
Disallow:/1WO
Disallow:/1WT
Disallow:/SCAT
Disallow:/VENDORS
Disallow:/bk
Disallow:/AU
Disallow:/VD
Disallow:/ITEMGRAPHICS
Disallow:/THUMBS
Disallow:/DOCUMENT
Disallow:/REFERENCE
Disallow:/CSS
Disallow:/ART
Disallow:/IMPORT2
Disallow:/INTERNATSTORE
Disallow:/CGI-BIN
Disallow:/JAVA
Disallow:/JAVASCRIPT
Disallow:/keywds
Disallow:/NEWSLETTER
Disallow:/nEWgfx
Disallow:/keywds2

Would spiders still be able to access my home and links.html page?

Thanks again :)

jdMorgan




msg:1526077
 8:05 pm on Sep 26, 2003 (gmt 0)

Prana,

If those page names don't start with something already in your list, then yes, they'd be spiderable. Robots.txt uses prefix matching. For example, the second of these two lines is redundant:

Disallow: /JAVA
Disallow: /JAVASCRIPT

The first line blocks anything the second line also might block. You may find other opportuinties to make use of this prefix-matching to reduce the size of your robots.txt file.

Note that you should have a space after the ":"

As hinted at several times above, you need to read and fully understand the robots.txt standard [robotstxt.org]. We prefer to keep the discussion here general -- and useful to more than one person. Because of that, we prefer not to have "fix my site" threads.

Once you've got your file sorted out, validate it here [searchengineworld.com]

After you're comfortable with the basics, you might find this thread [webmasterworld.com] interesting.

Jim

Prana




msg:1526078
 10:34 pm on Sep 26, 2003 (gmt 0)

Thanks Jim,

You fast assistance and links are highly appreciated.

Out of curiosity if I LEFT OUT the part you noted
***The robots.txt should have a space after the ":" ***

Would ALL the pages on website be spiderable?

Once again..... Sincere Thanks :)

jdMorgan




msg:1526079
 10:40 pm on Sep 26, 2003 (gmt 0)

I don't know. Your robots.txt might be considered invalid by some robots. Then they would decide for themselves whether to spider your site - or not.

Jim

Prana




msg:1526080
 9:33 pm on Sep 27, 2003 (gmt 0)

Thanks again Jim, and best of luck with all your endeavors :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved