should I parse or mod rewiite? - Apache Web Server forum at WebmasterWorld - WebmasterWorld

Forum Moderators: phranque

Message Too Old, No Replies

should I parse or mod rewiite?

landguy

12:50 pm on Sep 24, 2009 (gmt 0)

10+ Year Member

I am very new to this and advice would be appreciated.

I have a website that currently ranks well with google.
I am changing content providers and all of the current pages have a .htm extension.

I have written new code with the same page names except they have a .php file extension, I did this to have includes before I realized that you can parse .htm files to php

AddType application/x-httpd-php .htm

Since i am not using a database and using php only for includes. also from some things that i have read, google likes .htm over php my questions are.

1. should i mod_rewrite or parse?

2. If i do parse can i just change the file extensions of the current .php pages to .htm

3. and the old content provider/cms had every page off of the main directory www.site.com/page.htm , I would like to have subdirectories for pages, do i have to have a laundry list in the .htaccess for every page.

4. and would those redirects only be needed until search engines re-index?

I hope these are not stupid questions.

Thanks

jdMorgan

1:25 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

1. I'd say parse .htm files for PHP code, because you should avoid changing the URLs if you wish to keep your search rankings intact. Although the search engines will follow your redirects and 'figure it out' after awhile, there is a fair chance that your rankings will suffer --possibly badly-- while this back-end process takes place.

2. Yes, that's the point of parsing .htm files for PHP includes.

3. This implies that you may be changing your URL-structure. If possible, try to implement this as an internal rewrite, so that although the file structure changes, the URLs remain as they were. This can be done, but to use your term, it may require an extensive 'laundry list' of URL rewrites. If that list gets out of hand, you may prefer to simply 'bite the bullet,' take the temporary ranking hit, and get it over with.

4. No, the redirects (if any are actually needed after considering the above) must stay in place forever, or until you determine that the ranking factors passed by old URLs are no longer important to the ranking of the new URLs. (Note that it is URLs that are ranked -- not files, not pages, and not sites, just URLs).

Google doesn't have any preference for any particular 'filetype' or 'file extension' in the URL; The URL is 'just text' to them, and you could just as well name your pages '/page.goog' as anything else. The only time the 'file extension' on a URL would make a difference is when the searcher included that filetype as a search term in his/her search. (That is, it might be somewhat important vis-a-vis search ranking on a technical site targeted at PHP or PERL programmers or at HTML authors, and it might be important if you're selling images for use on other Web sites (where the search terms would likely include the desired image filetype), but not on a general-topic site.)

In general, it is best to *never* change any URL. Search engines prefer to see the Web as a library, rather than a street-corner newspaper stand. They don't want to waste their time on URLs that are here today and gone tomorrow, and they don't want to send their users to pages that no longer exist or that redirect to some other URL instead of resolving directly. So to the extent possible you'll want to keep your URLs forever [w3.org].

However, if your URL-architecture is a big mess and is failing to meet your needs, AND if your site is young and you believe that the best part of its growth is in the near future, then you may opt to re-arrange the URL architecture now to get it over with. If so, then work hard to come up with an architecture that meets your needs for administration, for easy robots.txt construction, for cache-control, and for encoding and language controls.

That is, you may wish to place resources into different URL-groups based on any or all of these attributes. Of these, the only attribute that is actually based on the URL is robots.txt, which specifies access policies based on URLs; All of the others are based on filepaths. Since mod_rewrite can be used to change the URL-to-filepath mapping for the others, it's not quite so critical to get them perfectly-right the first time, except as it may save you from having to write and use numerous mod_rewrite rules in the future to address any shortcomings.

Also, for future-proofing, you may wish to opt for extensionless URLs, so that any future changes in the underlying technology of your site need not affect the URLs at all. As stated above, "/page.goog" is just as good as "/page.htm", but just "/page" is even better.

Jim

landguy

1:39 pm on Sep 24, 2009 (gmt 0)

10+ Year Member

thank you for your quick reply,
I will take your advice and parse, but what about the http header.
should it be

Content-Type: text/html; charset=iso-8859-1
or
Content-Type" content="text/html; charset=utf-8"

it is currently charset=iso-8859-1

or does it not matter?

jdMorgan

1:51 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The first is an HTTP server response header, while the second appears to be a line from the <head> section of an HTML page. So, I may not understand the question, really.

But generally, yes, it's important that the Content-Type and Character-Encoding correctly match the actual page content and the character set used to represent that page content, and that the server header and HTML page <head> tags should agree as to what content-type and charset constitute the page.

Jim

landguy

5:16 pm on Sep 24, 2009 (gmt 0)

10+ Year Member

thanks for your help. Everything works fine except I do have a php include that calls to a .txt file database for meta tags that doesn't seem to work anymore. get_meta.php has the script in it to look for meta_tags.txt can i exclude that file in my .htaccess from getting parsed or what can i do?

this is in my page.htm

<?php include('get_meta.php'); ?>

<title><?php print $title; ?></title>

<meta name="description" content="<?php print $meta_description; ?>">

<meta name="keywords" content="<?php print $meta_keywords; ?>">

this is get_meta.php..................................

<?php

$database = $_SERVER['DOCUMENT_ROOT'].'/meta_tags.txt';

$meta_db = fopen($database, 'r');

$page = $_SERVER['SCRIPT_NAME'];

$page = substr($page, 1);

while($data = fgetcsv($meta_db, 9000, '*'))

{

if($data[0] == $page)

{

$title = $data[1];

$meta_description = $data[2];

$meta_keywords = $data[3];

}

}

?>

Thanks Again

jdMorgan

10:58 pm on Sep 24, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

You're going to need to test and find out why this no longer works. If all you're doing is parsing .htm files for PHP includes, then that does not seem to have anything to do with the problem, since this is a PHP file.

Also, saying "it doesn't work" doesn't tell us anything useful, another reason to dig into this and find out what specifically is failing. Also, you're likely to get faster/better help with such a problem in our PHP forum, which generally has more traffic than this forum, and is full of PHP users.

Jim

jdMorgan

12:31 pm on Sep 25, 2009 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

As an afterthought, it might be as simple as changing your Addtype to read:


AddType application/x-httpd-php .htm .php

Jim