Welcome to WebmasterWorld Guest from 174.129.135.89

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt and htm - html

is disallow for htm killing pages that are html?

   
10:27 pm on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a system where we have all .htm pages marked with noindex / nofollow, and just to make sure, we also used a disallow:

User-agent: *
Disallow: /*.htm

The reason is that we use .htm to track PPC traffic etc, and all .html pages are organic.

In a pinch, this works. Been doing it for years. I don't reccomend it if you have a better way, but for us, this works.

Is the disallow potentially going to kill the spiders for html pages?

-c

10:35 pm on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your disallow statement is not correct - wildcards are not allowed, though supported by Googlebot.

If you list all .HTM files then you will automatically disallow .HTML ones too because of the way url matching works.

6:22 am on Feb 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hoo boy -- what would you suggest?

robot.txt removed.

perhaps this explains why things are looking so flooey...

4:51 pm on Mar 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For best results, put the disallow in your HTAccess file instead. You have more control and no one can see what you are doing.

Cheers,

CaboWabo

6:37 pm on Mar 8, 2006 (gmt 0)

10+ Year Member



Both G and MSN accepts

Disallow: /*.htm$

Using this should be the end of htm-files at your websites, but not html.

I'd just read their guidelines, because I'm going from asp to aspx (so I haven't tested it)