Forum Moderators: phranque

Message Too Old, No Replies

what to do with *.htm and *.shtm

Do i have to add the new url's?

         

Danielo22

8:37 am on Mar 4, 2003 (gmt 0)

10+ Year Member



Hi,

i am from Germany and i hope you understand my question.

I've got a site and they have all the ending *.htm and they are listed well in google :)

Now i have to change the endings from *.htm to *.shtm

If i open the browser and open the site with *.htm the site is automatically opened with *.shtm (but the *.htm aren't on the server anymore).

Question: Do i have to add all pages now again in google with *.shtm-endings or does google that find itself?

Thanks a lot for help.

hakre

8:45 am on Mar 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hi Danielo22, welcome to webmasterworld [webmasterworld.com].

to pick the answer out of your question: google won't find the .shtm pages itself. :(

depending on your webserver (for example apache with .htaccess files and mod_rewrite allowed by your hosting company), you don't have to re-upload the .htm files again. you can create a simple so called rewrite rule to handle this easily and in a professional manner.

checkout your hosts features ;), setting up such a .htaccess file later will go fast.

aus_dave

8:50 am on Mar 4, 2003 (gmt 0)

10+ Year Member



Hi Danielo22, welcome to WebmasterWorld ;).

If I understand you correctly you renamed files to .shtm (or .shtml?), maybe to use server side includes?

As a shortcut you can add this to your .htaccess file instead:


AddType text/html .html
AddHandler server-parsed .html

AddType text/html .htm
AddHandler server-parsed .htm

This will force all .htm and .html files to be parsed for includes. I use this as I don't like renaming extensions all the time.

Apart from a site map or redirect pages Google probably won't find the 'new' pages if there aren't any links to them.

Danielo22

8:57 am on Mar 4, 2003 (gmt 0)

10+ Year Member



Hi,

thanks for answering! That is really wonderful with that *.htaccess idea :)

Okay, I were too fast and now i can rename again. this time from shtm to good old htm ... ;)

Thanks a lot and a nice day

EBear

3:55 pm on Mar 4, 2003 (gmt 0)

10+ Year Member



If you are using a combination of SSI pages and non-SSI pages, then there may be a small problem with the AddHandler approach above. This causes ALL .htm and .html files to be parsed for SSI, even when it's not necessary, adding a slight delay to the delivery of the document.

Another approach, which avoids this, is to use the following code in your .htaccess:

Options +Includes
XBitHack on

This tells the server to parse all executable files, irrespective of the file extension. Now you need to make any file with a SSI directive in it executable. To do this through FTP use the CHMOD command to set permissions to 755. For example, in WS_FTP select the file, right click on it, choose chmod (UNIX) and set RWE R-E R-E.

I think you may need to do this for the folders as well. Somone else may confirm this.

EBear

3:56 pm on Mar 4, 2003 (gmt 0)

10+ Year Member



And could I suggest that a mod move this thread to where it wil be more easily found in future?

jamie

4:31 pm on Mar 4, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi

i'd be careful about using ssi on .htm pages with addhandler

addhandler .htm prevents the pages from being cached - which depending on the weight of the html code can result in sizeable delays in loading - of course only if you have not modified since the last visit.

xbithack needs to be "full" in order to allow the correct determination of Last Modified Date by searchbots and browsers.

i posted this recently in webserver technologies:
[webmasterworld.com...]

i have since left all SSI out of my pages - and am a very happy man for it.

jcistheman

7:12 pm on Mar 4, 2003 (gmt 0)

10+ Year Member



I've got a similar problem,

I had .htm pages, changed them to .shtml pages for quicker updates, and then I noticed no one could get to my .htm pages from google because I had to take them off the server. I had my hosting company do mod rewrites to redirect the .htm links to my .shtml pages and now all my shmtl pages have gray bar on google...

I want to get rid of the rewrite asap because I'm afraid this redirecting is going to mess up my backlinks...

Do I just have to tough it out for a month with people clicking on links that send them to pages not found or is there something I can do?

Thanks for any help!

aus_dave

1:15 pm on Mar 5, 2003 (gmt 0)

10+ Year Member



Jamie, I had not heard of this caching problem before and am now slightly confused. Just when I thought I had SSI under control! ;)

When you say the 'AddHandler etc.' prevents pages being cached, do you mean by a local browser or a search engine spider? I would have thought each individual element (images mostly) in the include file would be cached locally.

When I view my sites with includes there seems to be some kind of caching going on as they download quicker than when I do a Ctrl+Refresh. But then I'm on ADSL so I don't really know what is happening for modem users!

Will read some of the sources you mentioned and try get up to speed with this.

ciml

1:30 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, Danielo22 and jcistheman.

It would have been better to keep the URIs as Danielo's going to do, but now that you have put the content at different URLs the redirects make perfect sense. Depending on the type of redirect, you may loose the PageRank benefit of links to the old addresses, but switching them back to 404s won't help anyway.

aus_dave, it's probably just that your ISP and browser caches don't use the HTTP headers in the way that you expect. The caching problem is just for proxies and user agents. The headers won't affect the Google cache, but may affect how deeply Googlebot spiders you. See Are you using If Modified Since? [webmasterworld.com]

jamie

1:56 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



aus_dave - that post which ciml recommends and the links going from it were my saviour too. also take a look at web-caching.com - you can test your .htm parsed pages for last modified information.

in my case:

i used addhandler .htm - which prevented a cache (or googlebot) from correctly determining the last modified date for all .htm pages in my site. meaning every time someone viewed a page for the second time it had to be downloaded again (i only speak of the html code here - the graphics should be cacheable anyway).

now that i have taken off ssi my pages are being correctly cached (test using web-caching.com) and my site has become much much faster - there are many pages which i don't change in months, and these now display in an instant.

you could always look at the xbithack method to work around this, although it does mean chmodding every page you create - which on a large site might get tedious.

best thing is to test your pages now and see if they return a last-modified date.

chiyo

2:25 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, this is what Brett's "Server Headers" said about one of our sites,

HTTP/1.1 200 OK
Date: Wed, 05 Mar 2003 14:21:40 GMT
Server: Apache/1.3.27
X-Powered-By: PHP/4.2.2
Connection: close
Content-Type: text/html

Is this good or bad?

Very bad guess.. we have to do something about the If modified since thingo in htaccess is simething.

jamie

3:14 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi chiyo,

the way i understand it is: as long as a 'last modified date' is showing there is no need for .htaccess modifications.

where cacheing really comes into its own though is by specifiying expiry dates for objects on your server. this is done in the .htaccess

e.g. we use a graphical header bar which will not change this year - so i'd specify that this graphic will not expire within the next 12 months. whereas the index page is updated every day at 6 am, so i'd put that it expires every day at 6.02 for example. this is giving robots and caches / browers and proxies really detailed information about how to deal with your site.

that is real fine tuning though - i think a last modified date is enough for most purposes.

i am sure a real guru can add to this.

jimbeetle

3:31 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As always, when I'm even just thinking of how to do something I find an already active thread on WW.

Bottom line, what's the consensus? Is it:

AddType text/html .html
AddHandler server-parsed .html

AddType text/html .htm
AddHandler server-parsed .htm

With the XBitHack:

Options +Includes
XBitHack [on] or[full]

Is there a specific order for this in .htaccess?

Thanks,

Jim

jamie

7:21 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi jim,

best of both worlds allowing cacheing and SSI is:

add

XBitHack Full

to your .htaccess file

and chmod 744 any files which have SSI includes in them (can be both .htm and .html)

p.s. if you have XBitHack, you don't need the AddHandler in your .htaccess. it's either / or.

(at least that worked for me when testing ;-)

jimbeetle

10:56 pm on Mar 5, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Jamie. I'm going to give it a shot on a site I'm working on over the weekend.

aus_dave

12:45 am on Mar 6, 2003 (gmt 0)

10+ Year Member



Thanks from me also Jamie. I gave the xbithack method a go on a development site and it works ok. Now I need to bite the bullet and do it on some existing sites. Now I need to validate for HTML, CSS and cacheability - phew! ;)

As mentioned in the other thread the permissions do seem to stick so it's not as bad as I thought it would be.

I'm not entirely sure its all working correctly though as I get the green light from the validators when the permissions are 644 (but includes don't work) and when set to 744 the includes work but the checkers (including the header checker here) don't validate.

Further testing required I think!

[edit] setting permissions to 754 allows the last modified header to be sent but is this a security risk? :( [/edit]

chiyo

3:27 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Does any of this info re A-Bit Hack apply to php includes as well as SSI? We use php and it looks like all our php pages are not "Keep-Alive" but "Closed" and there is no cache info, making me think they need to be read every time.

jamie

9:02 am on Mar 6, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi aus_dave. looks like 754 is right for 'Full' - i wasn't sure about that. nice one.

i asked back a while ago in this forum if there were any security issues with chmodding everything executable, but no one replied. hmmm i'm not sure - this is fairly new to me too.

i did find this though :-)

744 (or 754)

The file will be in fact executable. If you accidently run it from the command line, with all the < and > in it, you can possibly trash your site and spend the rest of the day bringing up furballs.

p.s. would be a nice topic for a tutorial like the css one and mod_rewrite one. could cover dynamic pages too?