Welcome to WebmasterWorld Guest from 54.196.244.206

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt and htm - html

is disallow for htm killing pages that are html?

     
10:27 pm on Feb 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 7, 2003
posts:1048
votes: 0


I've got a system where we have all .htm pages marked with noindex / nofollow, and just to make sure, we also used a disallow:

User-agent: *
Disallow: /*.htm

The reason is that we use .htm to track PPC traffic etc, and all .html pages are organic.

In a pinch, this works. Been doing it for years. I don't reccomend it if you have a better way, but for us, this works.

Is the disallow potentially going to kill the spiders for html pages?

-c

10:35 pm on Feb 27, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2004
posts:1679
votes: 0


Your disallow statement is not correct - wildcards are not allowed, though supported by Googlebot.

If you list all .HTM files then you will automatically disallow .HTML ones too because of the way url matching works.

6:22 am on Feb 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 7, 2003
posts:1048
votes: 0


hoo boy -- what would you suggest?

robot.txt removed.

perhaps this explains why things are looking so flooey...

4:51 pm on Mar 7, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 12, 2003
posts:723
votes: 0


For best results, put the disallow in your HTAccess file instead. You have more control and no one can see what you are doing.

Cheers,

CaboWabo

6:37 pm on Mar 8, 2006 (gmt 0)

New User

10+ Year Member

joined:Apr 28, 2005
posts:17
votes: 0


Both G and MSN accepts

Disallow: /*.htm$

Using this should be the end of htm-files at your websites, but not html.

I'd just read their guidelines, because I'm going from asp to aspx (so I haven't tested it)