Forum Moderators: open

Message Too Old, No Replies

Moving pages from .html to .shtml

Want to avoid duplicate page penalty

         

jaeden

9:33 pm on Jan 13, 2003 (gmt 0)

10+ Year Member



If I want to take a bunch of my pages that are already indexed with .html and change them all to .shtml, how can I get around getting smacked with a duplicate page violation? Someone mentioned something once about a 302 redirect. What is that?

jatar_k

9:36 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



it's a big read jaeden but it is great stuff

An Introduction to Redirecting URLs on an Apache Server [webmasterworld.com]

why change your extensions at all? I assume it is to use SSI but can't you just enable it for the html extension?

jaeden

9:38 pm on Jan 13, 2003 (gmt 0)

10+ Year Member



I guess I never thought of that. Isn't that normally a big overhead for the HTTP server?

jatar_k

9:41 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



not in my personal experience, I have php enabled for html on most of my sites but have had ssi enabled on a bunch and only had problems because of dumb programmers. ;)

jaeden

9:45 pm on Jan 13, 2003 (gmt 0)

10+ Year Member



I'll try that then. You where right on the nose when you asked if it was because I wanted to try SSI. I just figured out how to do it on my server (I actually am using an AS/400 with the standard HTTP server, not Apache). I've got so many HTML files with the same darn code in it it's getting to be a pain to change all of them. I figured with SSI I can reduce that pain, but thought I had to go to .shtml instead. But now that I think about it, almost every one of my HTML files has repeats in it that could be removed with SSI, so then if 80+ percent of my html files would then be .shtml, I might as well just turn on SSI for my .html files.

Thanks for the wake-up call.

Jaeden

Ove

9:46 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i agree with ssi i doing that and it works fine for me

Visit Thailand

10:34 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just so you know, I changed a few hundred pages from .htm to .shtml and was very nervous about it but it had to be done.

Left the pages that were going to die and concentrated on the move. I went with 301's, and for a time google had the old .htm page and the new .shtml page next to each other (which was nice when that page was in a no. 1 pos!), and the following update the .shtml was the only one there in the exact position of the old .htm page.

Since I have had no problems, and SSI is a life saver.

ADD IN

and in case you are wondering why I changed to .shtml rather than just parse the .htm to allow SSI the reason was that as there were so many pages I wanted to be able to clearly see which pages were calling SSI and which weren't.

jatar_k

10:44 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



I use .htm as plain vanilla and .html has some kind of parsing on every site. It is a good thing to have a plain vanilla option, helps keep things straight.

Flippi

11:01 pm on Jan 13, 2003 (gmt 0)

10+ Year Member



It's also possible to have SSI without changing to .shtml extension.
On an Apache server it works fine and is not a performance issue. Perhaps it is also possible on a AS/400 server?

born2drv

11:11 pm on Jan 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also converted .html to .shtml - no problems. I didn't use re-directs or anything since it wasn't too important in my mind and didn't pull alot of traffic, just put a custom 404 page to show people where the content was now.

But I was never penalized. Once google realized my new link structure (all .html files were no longer being linked to)... then it dropped them naturally after 2 cycles (month and a half) and picked up the .shtml ones.

jaeden

11:17 pm on Jan 13, 2003 (gmt 0)

10+ Year Member



Yep, just found out how to do it on the AS/400. Just added this to my HTTP config.

Imbeds file http

Basically this says to look for imbedded code within files in my Integrated File Structure with file types of html. So every html file will be parsed to see if there are any SSI commands. With a server as big as ours, I really don't think this will be any overhead.

As far as keeping things straight, which have SSI commands and which don't, I think I'll just have to come up with some kind of note, maybe a comment in the header or something.

Okay, next question... I've heard some servers allow you to map an html request that look static but is actually dynamic. Something like /CategoryDisplay/26/1026/ mapped to /CategoryDisplay?cgmenbr=26&cgrfnbr=1026. How does Google look at the first address? Does it think it is looking for a directory instead of a file? Does that work just as well as if you were trying to map to a .html file? Do both get indexed just as well?

jatar_k

12:24 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



/CategoryDisplay/26/1026/

yes, this appears as a dir structure, the default for this would be index.something. That is what a bot is looking for.

As far as the rest goes the thread I referenced originally talks about mod_rewrite and there are tons of threads around about it as well.

jaeden

4:38 am on Jan 14, 2003 (gmt 0)

10+ Year Member



Is the bot really expecting to see an index.html file, or isn't that the job of the server to decide which file it is going to serve up. As far as I know, it is just like if it were going to the root page. It would be like [mysite.com...] except for it would go further to

[mysite.com...]

I don't believe it's the bot that looks for the index.html, but the server that decides what it is going to present when that "directory" is chosen.

andreasfriedrich

4:43 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Itīs the server indeed.

jatar_k

4:55 am on Jan 14, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



sorry, that is what I meant, should have been clearer.

jaeden

3:19 pm on Jan 14, 2003 (gmt 0)

10+ Year Member



So does Google index and present in the SERP's those types of URL's (directory structure) just as well as it does as if it was indexing links to an html file? I'm assuming it would do this easier then it would the standard cgi-bin address with a? and a few parms.