Welcome to WebmasterWorld Guest from 54.224.96.57

Forum Moderators: goodroi

Message Too Old, No Replies

How to manage this in robots.txt

disallow a folder but include one page

     
11:08 am on Apr 28, 2011 (gmt 0)

New User

5+ Year Member

joined:Apr 28, 2008
posts: 32
votes: 0


Hi,

I have disallowed crawlers to index a folder, but want to have one page get indexed. Since there are 100's of pages in that folder, and more getting added, listing all of them would not be possible. Is there a way to handle this?
11:12 am on Apr 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 18, 2002
posts:3064
votes: 1


Yes, you can use the allow function. Example:

Disallow: /widgets/
Allow: /widgets/widget-news
1:44 pm on Apr 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 24, 2002
posts:894
votes: 0


Since it is only one page amongst 100s, why not put this one page in an allowed folder and link from there ?
1:53 pm on Apr 28, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Not all agents understand "allow" so moving to a different folder is likely a good idea.
11:23 am on July 12, 2011 (gmt 0)

New User

5+ Year Member

joined:Apr 28, 2008
posts: 32
votes: 0


Disallow: /widgets/
Allow: /widgets/widget-news

This works. Thanks

Pankaj
11:26 am on July 12, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


It works for Google and some others.

It does not work for all.
12:01 am on July 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3443
votes: 1


Hello

How can I block those endings/versions of pages from getting spidered.

Ex.

/gallery/page
/gallery/page?noCount=1

ex.
everything that has this in the url slideshow.php

/gallery/slideshow.php?set_albumName=page

ex.
everything that has this at end ?page=1

/gallery/newpage?page=1

here another situation

/page/TV-widget/endpage
/page/TV-widgets/endspage


here i would like to block TV-widget so only TV-widgets are in the serps.

So i guess some 301 or robots.txt is needed but how would it look like
7:31 am on July 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0



Blocking specific endings is very easy. You specify a "wildcard" for the beginning and then the unique string to block.

Disallow: /*?noCount=1


Disallow: /*?set_albumName=page



For the other examples, remember that
robots.txt
matches "from the left" so you need to specify enough of the URL to match what you want to match and not match what you don't want to match.

Disallow: /page/TV-widget

will block both
example.com/page/TV-widget/<anything>

and
example.com/page/TV-widgets/<anything>


Disallow: /page/TV-widget/

will block only
example.com/page/TV-widget/<anything>


However, the 301 redirect is preferred as that will also preserve some PageRank. It also stops further proliferation of the incorrect URLs being copied and paste to new links.
10:22 am on July 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 28, 2002
posts:3443
votes: 1


thanks buddy - if i want to block anything that has to do with slidershow.php? bl bla

/gallery/slideshow.php?set_albumName=page

is that so
Disallow: /*slideshow.php
1:36 pm on July 21, 2011 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Yes. That blocks requests that begin with <somestuff>, followed by "slideshow.php" and may or may not then be followed by <otherstuff>.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members