Welcome to WebmasterWorld Guest from 54.205.210.125

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt Question :)

Question regarding how to use robots txt

     
3:37 pm on Sep 26, 2003 (gmt 0)

New User

10+ Year Member

joined:July 26, 2003
posts:11
votes: 0


Hi,

If I were to enter:

User-Agent: *
> Disallow: /

Would this block the spiders from ALL pages including HOME page?

Also, how would I set it up to just block the inner pages built using CFM and leaving the home page and reciprocal links page spiderable?

[sitename.com...]

Thanks a bunch in advance for your knowledge and assistance!

WebMasterWorld ROCKS! ;)

4:20 pm on Sept 26, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Nov 28, 2002
posts:317
votes: 0


[searchengineworld.com...] - should help you with both Questions :)
4:59 pm on Sept 26, 2003 (gmt 0)

New User

10+ Year Member

joined:July 26, 2003
posts:11
votes: 0


Thanks for the reply Mike!

I had searched WW and read several other posts and also the robots.txt tutorial previous to posting question and could not find anything saying if it blocked home page or if I could block all pages written in ASP.

Any other suggestions completely appreciated! :)

Site is ready to fly and I want to make sure to please the SE God's. Only the finest Homemade Organic Cookies and Almond Milk for Ms. Googlebot! She works very hard and we got to keep her healthy ;)

6:14 pm on Sept 26, 2003 (gmt 0)

New User

10+ Year Member

joined:July 26, 2003
posts:11
votes: 0


If I were to put:

User-agent: *
Disallow:/ingr
Disallow:/1AR
Disallow:/1AU
Disallow:/1bk
Disallow:/1BU
Disallow:/1CA
Disallow:/1cd
Disallow:/1CL
Disallow:/1CR
Disallow:/1di
Disallow:/1FE
Disallow:/1FS
Disallow:/1GI
Disallow:/1GR
Disallow:/1HE
Disallow:/1HO
Disallow:/1HH
Disallow:/1IN
Disallow:/1JE
Disallow:/1MA
Disallow:/1ME
Disallow:/1MT
Disallow:/1NA
Disallow:/1PC
Disallow:/1PA
Disallow:/1SN
Disallow:/1TA
Disallow:/1VB
Disallow:/1VD
Disallow:/1WO
Disallow:/1WT
Disallow:/SCAT
Disallow:/VENDORS
Disallow:/bk
Disallow:/AU
Disallow:/VD
Disallow:/ITEMGRAPHICS
Disallow:/THUMBS
Disallow:/DOCUMENT
Disallow:/REFERENCE
Disallow:/CSS
Disallow:/ART
Disallow:/IMPORT2
Disallow:/INTERNATSTORE
Disallow:/CGI-BIN
Disallow:/JAVA
Disallow:/JAVASCRIPT
Disallow:/keywds
Disallow:/NEWSLETTER
Disallow:/nEWgfx
Disallow:/keywds2

Would spiders still be able to access my home and links.html page?

Thanks again :)

8:05 pm on Sept 26, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


Prana,

If those page names don't start with something already in your list, then yes, they'd be spiderable. Robots.txt uses prefix matching. For example, the second of these two lines is redundant:


Disallow: /JAVA
Disallow: /JAVASCRIPT

The first line blocks anything the second line also might block. You may find other opportuinties to make use of this prefix-matching to reduce the size of your robots.txt file.

Note that you should have a space after the ":"

As hinted at several times above, you need to read and fully understand the robots.txt standard [robotstxt.org]. We prefer to keep the discussion here general -- and useful to more than one person. Because of that, we prefer not to have "fix my site" threads.

Once you've got your file sorted out, validate it here [searchengineworld.com]

After you're comfortable with the basics, you might find this thread [webmasterworld.com] interesting.

Jim

10:34 pm on Sept 26, 2003 (gmt 0)

New User

10+ Year Member

joined:July 26, 2003
posts:11
votes: 0


Thanks Jim,

You fast assistance and links are highly appreciated.

Out of curiosity if I LEFT OUT the part you noted
***The robots.txt should have a space after the ":" ***

Would ALL the pages on website be spiderable?

Once again..... Sincere Thanks :)

10:40 pm on Sept 26, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


I don't know. Your robots.txt might be considered invalid by some robots. Then they would decide for themselves whether to spider your site - or not.

Jim

9:33 pm on Sept 27, 2003 (gmt 0)

New User

10+ Year Member

joined:July 26, 2003
posts:11
votes: 0


Thanks again Jim, and best of luck with all your endeavors :)
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members