Welcome to WebmasterWorld Guest from 54.163.35.238

Forum Moderators: goodroi

Message Too Old, No Replies

Robots.txt and htm - html

is disallow for htm killing pages that are html?

     

chewy

10:27 pm on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've got a system where we have all .htm pages marked with noindex / nofollow, and just to make sure, we also used a disallow:

User-agent: *
Disallow: /*.htm

The reason is that we use .htm to track PPC traffic etc, and all .html pages are organic.

In a pinch, this works. Been doing it for years. I don't reccomend it if you have a better way, but for us, this works.

Is the disallow potentially going to kill the spiders for html pages?

-c

Lord Majestic

10:35 pm on Feb 27, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your disallow statement is not correct - wildcards are not allowed, though supported by Googlebot.

If you list all .HTM files then you will automatically disallow .HTML ones too because of the way url matching works.

chewy

6:22 am on Feb 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hoo boy -- what would you suggest?

robot.txt removed.

perhaps this explains why things are looking so flooey...

cabowabo

4:51 pm on Mar 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For best results, put the disallow in your HTAccess file instead. You have more control and no one can see what you are doing.

Cheers,

CaboWabo

Madx

6:37 pm on Mar 8, 2006 (gmt 0)

10+ Year Member



Both G and MSN accepts

Disallow: /*.htm$

Using this should be the end of htm-files at your websites, but not html.

I'd just read their guidelines, because I'm going from asp to aspx (so I haven't tested it)

 

Featured Threads

Hot Threads This Week

Hot Threads This Month