homepage Welcome to WebmasterWorld Guest from 54.205.105.23
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Spiders and redirects to disallowed directories
Are the disallowed pages indexed?
cigjonser

5+ Year Member



 
Msg#: 711 posted 6:47 pm on Aug 19, 2005 (gmt 0)

If a spider follows a link to a page that redirects to a different page which is disallowed in robots.txt, what happens?

For example, lets say someone's site links to www.domain.com/hello.php :

www.domain.com/hello.php
--------------------------
<?php
header("location:http://www.domain.com/disallowed/index.html");
?>

robots.txt
----------
User-agent: *
Disallow: /disallowed

Will the spider index the page from the disallowed directory?

My guess is that it will index it because robots.txt will only keep it from requesting the page directly, and in this case it didn't request the page by name, but was instead presented with it "through" a different (allowed) link.

Does anyone have any experience with this who can answer for sure?

 

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 711 posted 7:18 pm on Aug 19, 2005 (gmt 0)

The php redirect creates a new client request, so yes, the client will use the new URL, and robots.txt will apply.

However, many spiders such as Google, Yahoo, and Ask Jeeves/Teoma, will list a URL-only result in their SERPs if they "know about" the URL, but are disallowed by robots.txt from actually fetching the page. In Yahoo's case, they will use the link text they found with the link (if any) to create a listing.

A partial solution is to allow the page to be spidered, but include a <meta name="robots" content="noindex"> tag on the page. However, I've seen Google ignore this occasionally as well, and include a URL-only listing anyway.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved