Forum Moderators: coopster

Message Too Old, No Replies

parsing html as php

confirmation of method

         

Patrick Taylor

6:35 am on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As previously advised in this forum I'm using this in my .htaccess file:

AddType application/x-httpd-php .php .htm .html

I only have my index page with an .htm file suffix - all the other pages have a .php file suffix. Is this the best way to have the index page parsed as php? I believe there's an issue regarding server resources in doing it this way.

And is there any possibility of this doing any harm to the index page being crawled by search engine spiders? I'm only asking this to be on the safe side.

jamie

11:03 am on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



patrick,

a search engine spider does not know anything which happens server-side, all it sees is the html that is generated.

re. resources:
it could mean a slight extra processing hit, as instead of serving plain vanilla html files, the server first tries to process all .htm files as php files. but i think you'd have to have a *very busy* site to notice any difference. we do it for a 100,000 views per day site and don't notice any performance hit.

however, you might like to run your .htm files through the server header checker [searchengineworld.com] because once they are processed as php files, unless you specifically send last-modified / expires headers, they will be non-cacheable. see this post about php and cacheability [webmasterworld.com]

hope that helps

Patrick Taylor

1:28 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jamie,

Thanks for the reply, which is partly reassuring. So I've got my .htaccess file as it should be, and I presume this won't delay the loading of the page.

I'm interested in what you said about cacheing. I presume you're talking about the browser cache and not, for example, Google's cache, but this is an area I've obviously neglected. My php pages are cached by Google, and, as I would expect, they're not usually the current page. So is what you're saying that my php pages will never be cached by the user's browser, and that therefore each time they visit, the page will be downloaded from the server?

The http header for a typical php page is:

HTTP/1.1 200 OK
Keep-Alive: timeout=5, max=150
Connection: Keep-Alive

What does this mean? My .htm suffix pages (on another site) show the actual date the page was last modified.

timster

2:07 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FWIW, there's another approach you might want want to consider. You can set up the server to consider index.php to be the index page of the directory/directories. That way your server won't look for PHP in all your .html pages.

You might want to run your question by the Apache board.

jamie

2:37 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> So is what you're saying that my php pages will never be cached by the user's browser, and that therefore each time they visit, the page will be downloaded from the server?

exactly.

normal htm pages on your other sites are sent with a last-modified header by apache itself, with php files (or htm parsed as php), you have to send the headers manually

ergophobe

3:11 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I also was wondering, like Timster, why you don't just make index.php your default index page (or default.php or whatever). Unless you already have a lot of incoming links to your index page, it seems like the best solution, and I doubt you have a lot of links to "index.html" since typically the "index.html" part of the URI would be omitted from links anyway. Google certainly doesn't care. Try these two searches

link:www.webmasterworld.com
link:www.webmasterworld.com/index.html

Both return exactly 8610 hits.

Jamie,

I have to beg to differ slightly, but I think it's worth noting. I don't believe that you can say that his PHP pages will not be cached, but rather may not cached.

From bitter experience, I can say that if you do not send any headers to tell the browser whether or not to cache, the result is unreliable. I have had to add headers

header("Cache-Control: no-store, no-cache, must-revalidate");
header("Pragma: no-cache");

to prevent constantly changing dynamic pages from getting cached. I believe that some browsers cache more aggressively than others. So if you want to control caching, one way or the other, you need to send headers. But you can't say that by not send headers a page does/does not get cached. You can only say that the result may not be as expected.

Patrick Taylor

4:14 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the replies. I don't have any links to index.htm - they all go to the domain, so I might consider switching to index.php (though as it happens, I always thought that Google would treat the domain and the index page as 2 different URLs). However, I do have other sites where all the pages are .htm suffixed, so the feedback has been doubly helpful.

jamie

8:04 pm on Apr 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi tom,

i've never noticed a difference myself between sending no headers and sending explicit no-cache headers, but now you mention it i remember now several posts in this very forum asking how to explicitly stop cacheing... so obviously what you say is true :-)

one lives and learns :-)