Robots.txt and htm - html - Sitemaps, Meta Data, and robots.txt forum at WebmasterWorld

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt and htm - html

10:27 pm on Feb 27, 2006 (gmt 0)

I've got a system where we have all .htm pages marked with noindex / nofollow, and just to make sure, we also used a disallow:

User-agent: *
Disallow: /*.htm

The reason is that we use .htm to track PPC traffic etc, and all .html pages are organic.

In a pinch, this works. Been doing it for years. I don't reccomend it if you have a better way, but for us, this works.

Is the disallow potentially going to kill the spiders for html pages?

-c

10:35 pm on Feb 27, 2006 (gmt 0)

Your disallow statement is not correct - wildcards are not allowed, though supported by Googlebot.

If you list all .HTM files then you will automatically disallow .HTML ones too because of the way url matching works.

6:22 am on Feb 28, 2006 (gmt 0)

hoo boy -- what would you suggest?

robot.txt removed.

perhaps this explains why things are looking so flooey...

4:51 pm on Mar 7, 2006 (gmt 0)

For best results, put the disallow in your HTAccess file instead. You have more control and no one can see what you are doing.

Cheers,

CaboWabo

6:37 pm on Mar 8, 2006 (gmt 0)

Both G and MSN accepts

Disallow: /*.htm$

Using this should be the end of htm-files at your websites, but not html.

I'd just read their guidelines, because I'm going from asp to aspx (so I haven't tested it)