homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

static mod rewritten URLS Vs. Physical Path in robots.txt
Robots.txt static and dynamic urls

5+ Year Member

Msg#: 3826029 posted 8:03 am on Jan 14, 2009 (gmt 0)

Hello All!

question, when Disallowing Paths in Robots.txt, do I have to Disallow both paths in Robots?

for instance,

real path: /stores/products/electronics/plasma.php?=33
mod_rewritten virtual path: /televisions/plasma-33-sony-trinitron.html

If I wanted to disallow search bots from indexing any of the above paths, Do I have to Disallow Both? Or do I have to just disallow the Physical path?

so If I Disallow: /stores/products/electronics/plasma.php?=33

will it still index: /televisions/plasma-33-sony-trinitron.html ?

wondering if I have to disallow both, the physical path, AND the virtual directory?

Thank You for your response

regards, frogz



WebmasterWorld Administrator phranque us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

Msg#: 3826029 posted 9:52 am on Jan 14, 2009 (gmt 0)

robots only know about urls, not physical file paths.
if you disallow the virtual path of the initially requested resource, the well-behaved robot will not be allowed to make the request.
the robot never sees the internally rewritten request, so there is never a chance for the robot to "behave well" for the rewritten url.

assuming the physical path is externally accessible as a url:
what you probably want to do for url canonicalization reasons is to externally redirect requests for the physical path to the virtual path.
the physical path should be allowed so the robot can make the request and get the 301 or 302 response.
when the robot makes the subsequent request for the virtual path it will be disallowed the access to the internal rewrite that would have ultimately provided the resource.

check out this WebmasterWorld thread for more information:
Robots.txt and Mod Rewrite [webmasterworld.com]

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved