homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Robots.txt and htm - html
is disallow for htm killing pages that are html?

 10:27 pm on Feb 27, 2006 (gmt 0)

I've got a system where we have all .htm pages marked with noindex / nofollow, and just to make sure, we also used a disallow:

User-agent: *
Disallow: /*.htm

The reason is that we use .htm to track PPC traffic etc, and all .html pages are organic.

In a pinch, this works. Been doing it for years. I don't reccomend it if you have a better way, but for us, this works.

Is the disallow potentially going to kill the spiders for html pages?



Lord Majestic

 10:35 pm on Feb 27, 2006 (gmt 0)

Your disallow statement is not correct - wildcards are not allowed, though supported by Googlebot.

If you list all .HTM files then you will automatically disallow .HTML ones too because of the way url matching works.


 6:22 am on Feb 28, 2006 (gmt 0)

hoo boy -- what would you suggest?

robot.txt removed.

perhaps this explains why things are looking so flooey...


 4:51 pm on Mar 7, 2006 (gmt 0)

For best results, put the disallow in your HTAccess file instead. You have more control and no one can see what you are doing.




 6:37 pm on Mar 8, 2006 (gmt 0)

Both G and MSN accepts

Disallow: /*.htm$

Using this should be the end of htm-files at your websites, but not html.

I'd just read their guidelines, because I'm going from asp to aspx (so I haven't tested it)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved