Forum Moderators: phranque
i am from Germany and i hope you understand my question.
I've got a site and they have all the ending *.htm and they are listed well in google :)
Now i have to change the endings from *.htm to *.shtm
If i open the browser and open the site with *.htm the site is automatically opened with *.shtm (but the *.htm aren't on the server anymore).
Question: Do i have to add all pages now again in google with *.shtm-endings or does google that find itself?
Thanks a lot for help.
to pick the answer out of your question: google won't find the .shtm pages itself. :(
depending on your webserver (for example apache with .htaccess files and mod_rewrite allowed by your hosting company), you don't have to re-upload the .htm files again. you can create a simple so called rewrite rule to handle this easily and in a professional manner.
checkout your hosts features ;), setting up such a .htaccess file later will go fast.
If I understand you correctly you renamed files to .shtm (or .shtml?), maybe to use server side includes?
As a shortcut you can add this to your .htaccess file instead:
AddType text/html .html
AddHandler server-parsed .html
AddType text/html .htm
AddHandler server-parsed .htm
Apart from a site map or redirect pages Google probably won't find the 'new' pages if there aren't any links to them.
Another approach, which avoids this, is to use the following code in your .htaccess:
Options +Includes
XBitHack on
This tells the server to parse all executable files, irrespective of the file extension. Now you need to make any file with a SSI directive in it executable. To do this through FTP use the CHMOD command to set permissions to 755. For example, in WS_FTP select the file, right click on it, choose chmod (UNIX) and set RWE R-E R-E.
I think you may need to do this for the folders as well. Somone else may confirm this.
i'd be careful about using ssi on .htm pages with addhandler
addhandler .htm prevents the pages from being cached - which depending on the weight of the html code can result in sizeable delays in loading - of course only if you have not modified since the last visit.
xbithack needs to be "full" in order to allow the correct determination of Last Modified Date by searchbots and browsers.
i posted this recently in webserver technologies:
[webmasterworld.com...]
i have since left all SSI out of my pages - and am a very happy man for it.
I had .htm pages, changed them to .shtml pages for quicker updates, and then I noticed no one could get to my .htm pages from google because I had to take them off the server. I had my hosting company do mod rewrites to redirect the .htm links to my .shtml pages and now all my shmtl pages have gray bar on google...
I want to get rid of the rewrite asap because I'm afraid this redirecting is going to mess up my backlinks...
Do I just have to tough it out for a month with people clicking on links that send them to pages not found or is there something I can do?
Thanks for any help!
When you say the 'AddHandler etc.' prevents pages being cached, do you mean by a local browser or a search engine spider? I would have thought each individual element (images mostly) in the include file would be cached locally.
When I view my sites with includes there seems to be some kind of caching going on as they download quicker than when I do a Ctrl+Refresh. But then I'm on ADSL so I don't really know what is happening for modem users!
Will read some of the sources you mentioned and try get up to speed with this.
It would have been better to keep the URIs as Danielo's going to do, but now that you have put the content at different URLs the redirects make perfect sense. Depending on the type of redirect, you may loose the PageRank benefit of links to the old addresses, but switching them back to 404s won't help anyway.
aus_dave, it's probably just that your ISP and browser caches don't use the HTTP headers in the way that you expect. The caching problem is just for proxies and user agents. The headers won't affect the Google cache, but may affect how deeply Googlebot spiders you. See Are you using If Modified Since? [webmasterworld.com]
in my case:
i used addhandler .htm - which prevented a cache (or googlebot) from correctly determining the last modified date for all .htm pages in my site. meaning every time someone viewed a page for the second time it had to be downloaded again (i only speak of the html code here - the graphics should be cacheable anyway).
now that i have taken off ssi my pages are being correctly cached (test using web-caching.com) and my site has become much much faster - there are many pages which i don't change in months, and these now display in an instant.
you could always look at the xbithack method to work around this, although it does mean chmodding every page you create - which on a large site might get tedious.
best thing is to test your pages now and see if they return a last-modified date.
HTTP/1.1 200 OK
Date: Wed, 05 Mar 2003 14:21:40 GMT
Server: Apache/1.3.27
X-Powered-By: PHP/4.2.2
Connection: close
Content-Type: text/html
Is this good or bad?
Very bad guess.. we have to do something about the If modified since thingo in htaccess is simething.
the way i understand it is: as long as a 'last modified date' is showing there is no need for .htaccess modifications.
where cacheing really comes into its own though is by specifiying expiry dates for objects on your server. this is done in the .htaccess
e.g. we use a graphical header bar which will not change this year - so i'd specify that this graphic will not expire within the next 12 months. whereas the index page is updated every day at 6 am, so i'd put that it expires every day at 6.02 for example. this is giving robots and caches / browers and proxies really detailed information about how to deal with your site.
that is real fine tuning though - i think a last modified date is enough for most purposes.
i am sure a real guru can add to this.
Bottom line, what's the consensus? Is it:
AddType text/html .html
AddHandler server-parsed .html
AddType text/html .htm
AddHandler server-parsed .htm
With the XBitHack:
Options +Includes
XBitHack [on] or[full]
Is there a specific order for this in .htaccess?
Thanks,
Jim
best of both worlds allowing cacheing and SSI is:
add
XBitHack Full
to your .htaccess file
and chmod 744 any files which have SSI includes in them (can be both .htm and .html)
p.s. if you have XBitHack, you don't need the AddHandler in your .htaccess. it's either / or.
(at least that worked for me when testing ;-)
As mentioned in the other thread the permissions do seem to stick so it's not as bad as I thought it would be.
I'm not entirely sure its all working correctly though as I get the green light from the validators when the permissions are 644 (but includes don't work) and when set to 744 the includes work but the checkers (including the header checker here) don't validate.
Further testing required I think!
[edit] setting permissions to 754 allows the last modified header to be sent but is this a security risk? :( [/edit]
i asked back a while ago in this forum if there were any security issues with chmodding everything executable, but no one replied. hmmm i'm not sure - this is fairly new to me too.
i did find this though :-)
744 (or 754)
The file will be in fact executable. If you accidently run it from the command line, with all the < and > in it, you can possibly trash your site and spend the rest of the day bringing up furballs.
p.s. would be a nice topic for a tutorial like the css one and mod_rewrite one. could cover dynamic pages too?