homepage Welcome to WebmasterWorld Guest from 54.167.174.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Syntax for disallowing folder with '.' in the name?
Asking for clarification
ALbino

10+ Year Member



 
Msg#: 3681885 posted 9:36 pm on Jun 23, 2008 (gmt 0)

Hey there,

I know virtually nothing about robots.txt so forgive me if this question is silly. I figured better safe than sorry. Anyway...

For whatever reason this site has it's directory structure setup like this:

http://www.example.com/product.php/0001
http://www.example.com/product.php/0002
http://www.example.com/product.php/0003
etc.

And:

http://www.example.com/company.php/1234
http://www.example.com/company.php/1235
http://www.example.com/company.php/1236
etc.

They want to block only the /company.php/* ones for fear of overlapping duplicate content (many of the companies only have only 1 product and thus the pages are virtually identical).

I was just wondering what the correct disallow syntax is for that? Thanks!

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3681885 posted 9:57 pm on Jun 23, 2008 (gmt 0)

User-agent: *
Disallow: /compan

Put as much or as little of the /company.php/.... part into the Disallow statement as you like, enough to make it globally unique.

ALbino

10+ Year Member



 
Msg#: 3681885 posted 10:40 pm on Jun 23, 2008 (gmt 0)

Great, thanks g1smd!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved