Welcome to WebmasterWorld Guest from 54.242.134.77

Forum Moderators: goodroi

Message Too Old, No Replies

How to manage this in robots.txt

disallow a folder but include one page

     

pankajj

11:08 am on Apr 28, 2011 (gmt 0)

5+ Year Member



Hi,

I have disallowed crawlers to index a folder, but want to have one page get indexed. Since there are 100's of pages in that folder, and more getting added, listing all of them would not be possible. Is there a way to handle this?

sem4u

11:12 am on Apr 28, 2011 (gmt 0)

WebmasterWorld Senior Member sem4u is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Yes, you can use the allow function. Example:

Disallow: /widgets/
Allow: /widgets/widget-news

Staffa

1:44 pm on Apr 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since it is only one page amongst 100s, why not put this one page in an allowed folder and link from there ?

g1smd

1:53 pm on Apr 28, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Not all agents understand "allow" so moving to a different folder is likely a good idea.

pankajj

11:23 am on Jul 12, 2011 (gmt 0)

5+ Year Member



Disallow: /widgets/
Allow: /widgets/widget-news

This works. Thanks

Pankaj

g1smd

11:26 am on Jul 12, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It works for Google and some others.

It does not work for all.

zeus

12:01 am on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Hello

How can I block those endings/versions of pages from getting spidered.

Ex.

/gallery/page
/gallery/page?noCount=1

ex.
everything that has this in the url slideshow.php

/gallery/slideshow.php?set_albumName=page

ex.
everything that has this at end ?page=1

/gallery/newpage?page=1

here another situation

/page/TV-widget/endpage
/page/TV-widgets/endspage


here i would like to block TV-widget so only TV-widgets are in the serps.

So i guess some 301 or robots.txt is needed but how would it look like

g1smd

7:31 am on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Blocking specific endings is very easy. You specify a "wildcard" for the beginning and then the unique string to block.

Disallow: /*?noCount=1


Disallow: /*?set_albumName=page



For the other examples, remember that
robots.txt
matches "from the left" so you need to specify enough of the URL to match what you want to match and not match what you don't want to match.

Disallow: /page/TV-widget

will block both
example.com/page/TV-widget/<anything>

and
example.com/page/TV-widgets/<anything>


Disallow: /page/TV-widget/

will block only
example.com/page/TV-widget/<anything>


However, the 301 redirect is preferred as that will also preserve some PageRank. It also stops further proliferation of the incorrect URLs being copied and paste to new links.

zeus

10:22 am on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member zeus is a WebmasterWorld Top Contributor of All Time 10+ Year Member



thanks buddy - if i want to block anything that has to do with slidershow.php? bl bla

/gallery/slideshow.php?set_albumName=page

is that so
Disallow: /*slideshow.php

g1smd

1:36 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes. That blocks requests that begin with <somestuff>, followed by "slideshow.php" and may or may not then be followed by <otherstuff>.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month