homepage Welcome to WebmasterWorld Guest from 54.197.130.16
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
How to manage this in robots.txt
disallow a folder but include one page
pankajj




msg:4305298
 11:08 am on Apr 28, 2011 (gmt 0)

Hi,

I have disallowed crawlers to index a folder, but want to have one page get indexed. Since there are 100's of pages in that folder, and more getting added, listing all of them would not be possible. Is there a way to handle this?

 

sem4u




msg:4305302
 11:12 am on Apr 28, 2011 (gmt 0)

Yes, you can use the allow function. Example:

Disallow: /widgets/
Allow: /widgets/widget-news

Staffa




msg:4305365
 1:44 pm on Apr 28, 2011 (gmt 0)

Since it is only one page amongst 100s, why not put this one page in an allowed folder and link from there ?

g1smd




msg:4305368
 1:53 pm on Apr 28, 2011 (gmt 0)

Not all agents understand "allow" so moving to a different folder is likely a good idea.

pankajj




msg:4338437
 11:23 am on Jul 12, 2011 (gmt 0)

Disallow: /widgets/
Allow: /widgets/widget-news

This works. Thanks

Pankaj

g1smd




msg:4338438
 11:26 am on Jul 12, 2011 (gmt 0)

It works for Google and some others.

It does not work for all.

zeus




msg:4342006
 12:01 am on Jul 21, 2011 (gmt 0)

Hello

How can I block those endings/versions of pages from getting spidered.

Ex.

/gallery/page
/gallery/page?noCount=1

ex.
everything that has this in the url slideshow.php

/gallery/slideshow.php?set_albumName=page

ex.
everything that has this at end ?page=1

/gallery/newpage?page=1

here another situation

/page/TV-widget/endpage
/page/TV-widgets/endspage


here i would like to block TV-widget so only TV-widgets are in the serps.

So i guess some 301 or robots.txt is needed but how would it look like

g1smd




msg:4342070
 7:31 am on Jul 21, 2011 (gmt 0)


Blocking specific endings is very easy. You specify a "wildcard" for the beginning and then the unique string to block.

Disallow: /*?noCount=1

Disallow: /*?set_albumName=page


For the other examples, remember that
robots.txt matches "from the left" so you need to specify enough of the URL to match what you want to match and not match what you don't want to match.

Disallow: /page/TV-widget
will block both
example.com/page/TV-widget/<anything>
and
example.com/page/TV-widgets/<anything>

Disallow: /page/TV-widget/
will block only
example.com/page/TV-widget/<anything>

However, the 301 redirect is preferred as that will also preserve some PageRank. It also stops further proliferation of the incorrect URLs being copied and paste to new links.

zeus




msg:4342107
 10:22 am on Jul 21, 2011 (gmt 0)

thanks buddy - if i want to block anything that has to do with slidershow.php? bl bla

/gallery/slideshow.php?set_albumName=page

is that so
Disallow: /*slideshow.php

g1smd




msg:4342175
 1:36 pm on Jul 21, 2011 (gmt 0)

Yes. That blocks requests that begin with <somestuff>, followed by "slideshow.php" and may or may not then be followed by <otherstuff>.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved