Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

.PHP or .HTML? – That is the Question

PR1 ranks higher than PR3 – What to do?

         

spina45

1:09 pm on Nov 30, 2006 (gmt 0)

10+ Year Member



My site is 6 years old and the most well-known in its category. I have 1000s of incoming links. Until recently my SE rankings were excellent.

Background:
Four years ago my web developer installed a file into my OsCommerce shop that would convert product URLs from: product_info.php?products_id=nn to: product-nn.html and list them in a sitemap linked to my homepage. This was done to achieve better SE rankings. Within a month 100s of my individual product pages started to appear in the SEs and sales took off.

Over the years I’ve noticed that 90% of the URLs that appear in SEs were the .php version and NOT the .html. This seemed odd because the .php URLs produce a PR1 while the .html URLs produce a PR3. Weird?!

In June ’06 a well known company listed my website in a print publication and misprinted my domain name. I found the domain was available so I purchased it along with another that I thought was cool. I then pointed both new domain names to my existing website. But I did NOT do a 301 redirect because I didn’t know any better.

About 3 months later my site’s ranking plummeted in Google and I began to research why. I determined (but not 100% sure) that Duplicate Content was the culprit.

I modified my .htaccess with a 301 for both domains, including www and non-www, and waited until the next noticeable “update” in Google. Bummer! No change in my rankings

I’ve now been reexamining everything about my site to eliminate Duplicate Content.

So here’s my question, should I standardize on: product_info.php?products_id=nn or: product-nn.html?

Note: there are many incoming links from sites, blogs, articles, etc that point to the .php URL version.

Any insight would be appreciated as to what is going on, how long I’ll have to wait and what to do to regain my once stellar rankings in Google.

spina45

8:45 pm on Nov 30, 2006 (gmt 0)

10+ Year Member



Note: for clarity I should mention that the .PHP and .HTML page versions mentioned in above post contain identical content. Hence the desire to remove one to eliminate dupe content.

g1smd

11:43 pm on Nov 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Serve a meta robots noindex on one of them, so that it drops out of the index, whilst still allowing old inlinks and old bookmarks to function.

Look at additionally, setting up a 301 redirect from one version to the other. That might require a fair amount of tricky work to set up though.

spina45

12:27 am on Dec 1, 2006 (gmt 0)

10+ Year Member



g1smd,

I was hoping you'd respond. Thanks!

> Serve a meta robots noindex on one of them

1) Which one? The higher PR .html that doesn't rank as well, or the lower PR .php that ranks higher? That's the heart of my problem -- I don’t know which one to go with.

2) Could I also achieve via robots.txt using Disallow: /*php?products_id=

> Look at additionally, setting up a 301 redirect

Will this send the PR over to the chosen version?

> That might require a fair amount of tricky work to set up though.

But is it doable?

Thank you for your help.

g1smd

12:45 am on Dec 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, the 301 redirect will send most of the PR over.

Yes, it is doable.

It needs very careful testing before you let it go live on the real site.

I helped someone do some of this work last year on a live osCommerce site and some basic coding errors we introduced made the problem worse for a short while. I am not a PHP guru, and neither was the site owner; and we both were aware of that.

jtara

1:25 am on Dec 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



An aside not directly related to your problem: like "www", ".html" is an anachronism, and adds nothing for the user. If you can, drop it.

Not so good: product_info.php?products_id=nn

Better: product-nn.html

Best: product-nn

Decius

2:20 am on Dec 1, 2006 (gmt 0)

10+ Year Member



Better: product-nn.html

Best: product-nn

I do not agree with this. Although this may be permissable currently, I feel that it is probable Google will prioritize urls with page types than ones without. Similar to a lot of the other SEO filters that kick in, it is very probable that if a url contains keywords and no document type then it is a dynamically generated URL. That being said, Google would prefer to index sites that do not have a dynamically generated url, which is in tune with their prioritiziation of reliable long term urls. A dynamic url is less likely to remain long term.

This may not be a large issue, but I believe will contribute to the possibility of tripping a filter.

spina45: My suggestion is to do this - If they are identical documents (as in the same document) create a function that creates the proper URL format for a given product or category page (as those are the only two pages that I assume you want to focus on). Then you check the current URL being displayed. If it differs at all with what you have as the proper URL, do a 301 redirect.

I see no reason why you would ever want anyone to link to or see the .php? pages if you have already SEOd them to look static.

If you need help implementing this, sticky me. Although I loathe the oscommerce infrastructure... :-)

Decius

2:23 am on Dec 1, 2006 (gmt 0)

10+ Year Member



Another great benefit is that if anyone ever links to your site using another miss-spelled or incorrect URL that still works, Google will find it and you may get hit for duplicate content. This, in essence, funnels everything into one unique product page and does not permit anyone including google to see it any other way. This will push all your PR and inbound links into that one page.

spina45

2:27 am on Dec 1, 2006 (gmt 0)

10+ Year Member



1) product_info.php?products_id=nnn has a PR1 and ranks well

2) product-nnn.html has a PR3 and does NOT rank well...

Which version should I standardize on, php or html?

Decius

2:30 am on Dec 1, 2006 (gmt 0)

10+ Year Member



1) product_info.php?products_id=nnn has a PR1 and ranks well
2) product-nnn.html has a PR3 and does NOT rank well...

The HTML one not ranking well probably has nothing to do with the fact that it is HTML. I am assuming it is a newer url than the PHP?

This is sort of up to you. I prefer static pages (as they look cleaner and Google then does not know whether it is dynamic or not) so I would lean towards the HTML and hope that all the strength of the PHP pages are carried over when you do the 301 redirect.

On the other hand if you want to stick with the PHP because it is stronger, forward everything to that.

In my opinion, you should 301 to the HTML pages, not the PHP ones. This "may" cause a minor bump while the PHP docs are taken out of the listings but will in the long term support the HTML pages.

Decius

2:32 am on Dec 1, 2006 (gmt 0)

10+ Year Member



Additionally, the HTML ones might not get good rankings because of the very problem you are talking about - duplicate content. In fact that is likely the issue. That being said, I would almost completely recommend sticking to the HTML docs.

spina45

2:47 am on Dec 1, 2006 (gmt 0)

10+ Year Member



> In my opinion, you should 301 to the HTML pages, not the PHP ones.
> This "may" cause a minor bump while the PHP docs are taken out of
> the listings but will in the long term support the HTML pages.

Yes, I see your point. Thank you for you input. I'm not thrilled about any bumps but at this point my Holiday sales are out the window so I may as well get it right for next year. -sigh-

So...choosing to go with HTML,

can I prevent Google and other SEs from seeing my OSC product and category PHP pages by adding the following to my robots.txt?

Disallow: /*php?products_id=
Disallow: /*?cPath=

Or do I have to use Meta NOINDEX? (or both?)

Jordo needs a drink

2:51 am on Dec 1, 2006 (gmt 0)

10+ Year Member



spina45: My suggestion is to do this - If they are identical documents (as in the same document) create a function that creates the proper URL format for a given product or category page (as those are the only two pages that I assume you want to focus on). Then you check the current URL being displayed. If it differs at all with what you have as the proper URL, do a 301 redirect.

I agree 100%! In fact, this needs to become a dynamic page developer's best practice considering how easy it is for the duplicate content issue to appear on dynamic pages.

[edited by: Jordo_needs_a_drink at 2:51 am (utc) on Dec. 1, 2006]

spina45

3:02 am on Dec 1, 2006 (gmt 0)

10+ Year Member



> I agree 100%! In fact, this needs to become a dynamic page
> developer's best practice considering how easy it is for the
> duplicate content issue to appear on dynamic pages.

Oh yeah, good point. I've read that dynamic page variables can be out of sequence and still serve up the desired page. I.e. yet again more Dupe Content.

jtara

3:04 am on Dec 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a myth that Google prefers static to dynamic content. Google actually goes out of their way to handle dynamic content.

Historically, Google has had two problems with URLs that use query strings:

1. In the past, they did not index pages that use an id= parameter. The singular case of "id=". Not, "product_id=", "framitz=", etc. Just "id=". This was disallowed because this particular parameter is often used for tracking users, etc. But Google now permits "id=" and is able to distinguish when it is used to identify content and when it is used to track users.

2. Obviously, there are issues because parameters can appear in any order, can be optional, etc. Google has largely ironed this out.

The issue has never been one of Google prefering static content over dynamic. The issue has been the difficulty of uniquely identifying dynamic content, due to the flexible nature of query strings and their use and misuse to identify things other than content.

It's naive to think that ".html" signals that content is static. If Google DID want to filter-out dynamic content, there are far better ways of doing so than by relying on URL parts.

".html" serves no purpose whatsover. Historically, it served to let the SERVER know the content format. It has NEVER served to let the browser know the content format. The browser knows the content format from the Mime-Type header.

".html" is just excess baggage that is of no use to anyone. If anything, I'd expect Google to reward webmasters for clarity, and give a slight edge to URLs that DON'T include ".html".

I have a system that can replicate any url you have and incorparate in any new site structure, no need for 301's and this works with dynamic and static content.

And I have oceanfront lots in Arizona available for sale. Please don't contact me, though, unless you are prepared to spend at least $1,000,000. I am very busy hustl... selling these rare properties.

The 301's are absolutely necessary in order to let search engine spiders know that old URLs have been moved to new URLs. It might be acceptable for users to simply be served-up the new page without redirect. It's absolute unacceptable for spiders.

tedster

3:25 am on Dec 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with a lot of this, but with a couple of clarifications:

1. Google's former guidelines, as printed, said not to use "&id" -- with the ampersand in front. That kind of query string held more potential to be a session ID. Long before they dropped this "prohibition" they were already indexing query strings like this:
example.com/page.php?id=1443

2. Get indexed and getting crawled frequently, effectively, and deeply are different issues. I still feel that complex dynamic urls require extra processing from Google to safeguard their crawlers from getting into loops. That extra processing has a price attached - it can be seen in more time to index, and less frequently updated pages in the cache. Or so it seems to me.

spina45

3:33 am on Dec 1, 2006 (gmt 0)

10+ Year Member



Thanks to all who have replied.

One thing that no one has commented on...

Why would a PR1 rank higher than a PR3?

Both pages are identical content. Both pages were created simultaneously. PR1 page is .php PR3 page is .html

This seems odd.

Any reasons why this would happen?

Decius

4:34 am on Dec 1, 2006 (gmt 0)

10+ Year Member



It would be clearly explained by the fact that your PHP docs are older than the HTML ones, yet exactly the same in content. Google will prefer the older one to the newer one. Also, PR does not relate directly to your position in the listings anymore... quality backlinks that are relevent hold a higher importance than irrelvent ones from high PR.

The argument that HTML does not indicate static is not relevent: It is obvious Google reacts to the format of the URL in some way, and most certainly will note if there is no document type or not. So even though HTML does not certify you are static, having no doctype at all certainly tells us that it is dynamic and that the webmaster knows all about htaccess. Always remember that Google is trying to mimick a human... a human that sees an HTML doc is not going to give it more weight, but a human that sees a keyword oriented url that ends in a keyword and no doctype will know, in an instant, that it has been seo'd.

Decius

4:43 am on Dec 1, 2006 (gmt 0)

10+ Year Member



I missed your question... Do NOT dissallow people from seeing the old docs.

A 301 redirect is not the same as NOINDEX.

A 301 redirect is a special redirect that PHP can perform that tells the receiver of the page that this page has permanently been moved to the target page, and then it tells the receiver the target page.

If you noindex the PHP documents, Google will toss those out of the listings ASAP. That's not what you want. You want Google to think those are wonderful pages, but they are now a part of the new and improved HTML versions.

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://www.newdomain.com/newpage/newurl.htm");
exit();

This is a Google safe (as in you can use this exact code) to forward your PHP doc to the HTML one. In order to do this you need to know what the HTML one is (exactly).

Additionally, this cannot be done in htaccess... this can only be performed on every product and/or category page, and must be executed via PHP because only PHP can tap into the database to find out what the proper SEOd HTML url should be.

[edited by: Decius at 4:46 am (utc) on Dec. 1, 2006]

Jordo needs a drink

5:19 am on Dec 1, 2006 (gmt 0)

10+ Year Member



header("HTTP/1.1 301 Moved Permanently");
header("Location: [newdomain.com...]
exit();

Taking it one step further...


if($_SERVER['REQUEST_URI']!= 'http://www.newdomain.com/newpage/newurl.htm') {
header("HTTP/1.1 301 Moved Permanently");
header("Location: [newdomain.com...]
exit();
}

spina45

5:19 am on Dec 1, 2006 (gmt 0)

10+ Year Member



So are you saying that every time I add a new product to my store I should continue to simultaneously create a .php and .html page -- only now a 301 should be included on the .php page that points to the .html page? YES?

If I do not use Meta NOINDEX or .htaccess Disallow, Google will continue to spider both identical pages. YES?

Is this really what I want?

Also, I have over 600 product pages, I assume my web developer can create some code that performs the 301 you describe in an automated way? YES?

Decius

5:38 am on Dec 1, 2006 (gmt 0)

10+ Year Member



So are you saying that every time I add a new product to my store I should continue to simultaneously create a .php and .html page -- only now a 301 should be included on the .php page that points to the .html page? YES?

The PHP page will be created no matter what... the HTML page does not really exist. It just appears like it does.

If I do not use Meta NOINDEX or .htaccess Disallow, Google will continue to spider both identical pages. YES?

No. A 301 redirect forces Google, or Firefox, or internet explorer to not only note that it is a permanent redirect, but also go to the new url.

Also, I have over 600 product pages, I assume my web developer can create some code that performs the 301 you describe in an automated way? YES?

Your product pages all have titles and descriptions and what not in a database. Your HTML files don't really exist... they are created on the fly by PHP whenever you link to them internally, and your htaccess file converts a received HTML file to its appropriate PHP file (behind the scenes).

In order to accomplish what we have discussed in this thread, you simply add a function at the top of your product.php and your category.php files that checks the current URL and compares it to the target URL. If it is false, 301 to the new one.

If you implement this, your PHP docs will no longer be accessible from the web as product.php... it will only work via the HTML url.

Decius

5:39 am on Dec 1, 2006 (gmt 0)

10+ Year Member



One thing you seem to be missunderstanding is that you don't have two different pages... You only have the PHP documents. All that is happening is you are permitting people to view that document using a more search engine friendly URL.

spina45

5:59 am on Dec 1, 2006 (gmt 0)

10+ Year Member



> One thing you seem to be missunderstanding is that you don't have
> two different pages... You only have the PHP documents.

Okay, I can see that. I guess it was causing me confusion because Google indexed two unique URLs leading to the exact same page, one php, the other .html -- i.e. Dupe Content.

So in summary, for this solution...

I will standardize on .html
I will NOT use NOINDEX
I will NOT use Disallow
This is NOT a .htaccess fix, but done on each .php page
I will add a 301 to .php product and category pages pointing to their .html counterparts

Do I have it correctly? Missing anything?

Jordo needs a drink

6:06 am on Dec 1, 2006 (gmt 0)

10+ Year Member



This is NOT a .htaccess fix, but done on each .php page

One thing that concerns me is you said in your original post that your web developer only changed some pages to .html and not all of them.

You need to have him change the .htaccess file to rewrite all of your product and category pages .php files before you wholesale add the 301 to you .php files, or you may be 301'ing to html's that your server thinks do not exist.

[edited by: Jordo_needs_a_drink at 6:08 am (utc) on Dec. 1, 2006]

Decius

6:12 am on Dec 1, 2006 (gmt 0)

10+ Year Member



What Jordo said is quite important -

What your developer did is add some lines to .htaccess that tell your server:

"If someone looks for product-blahblah.htm make them think they found it, but actually server up product.php?var=nnn"

Now you must make sure he did that for all the pages, not just a few of them. This involves .htaccess syntax which needs to be carefully tested before implemented.

will add a 301 to .php product and category pages pointing to their .html counterparts

You are adding the conditional 301 to the PHP pages... this will also add them to the HTML ones... but since the HTML ones will be served properly, it won't be activated.

The rest looks good!

[edited by: Decius at 6:14 am (utc) on Dec. 1, 2006]

spina45

6:14 am on Dec 1, 2006 (gmt 0)

10+ Year Member



-Jordo

That's correct, the only .php pages that were converted to .html pages are my 600+ PRODUCT pages.

I am planning on having my developer also convert my CATEGORY pages to html.

Then, on ONLY the .php PRODUCT and CATEGORY pages, I will have him do the 301 Redirect to the .html counterparts.

In the above scenario do I need to modify my .htaccess?

Decius

6:15 am on Dec 1, 2006 (gmt 0)

10+ Year Member



In that scenario you will have to alter the .htaccess for the category pages only.

But in reality, you might as well consider making everything look static including the contact us page etc. This is not necessary, however for the rest to work.

[edited by: Decius at 6:16 am (utc) on Dec. 1, 2006]

spina45

6:23 am on Dec 1, 2006 (gmt 0)

10+ Year Member



> What your developer did is add some lines to .htaccess that tell your server:
> "If someone looks for product-blahblah.htm make them think they found it, but actually server up product.php?var=nnn"

No, I don't think he did. He either created or modified a contribution called "sitemap_products.php" that creates a sitemap of PRODUCT names. The hyperlinks to these product names are product-nnn.html

He did not add to the .htaccess to make this happen.

I'm planning on having him add my CATEGORY pages to the this sitmap and then do the 301 redirect on only the PRODUCT and CATEGORY .php pages.

Is this correct?

Jordo needs a drink

6:37 am on Dec 1, 2006 (gmt 0)

10+ Year Member



Is your whole site php based? If so, there's really no reason not to use the .htaccess to rewrite all your .php files to .html.

I still think your web developer is using the .htaccess for the html conversion. If he really is using a sitemap file, then he's doing it a very weird and difficult way. It may be that he used the .htaccess and then modified your sitemap file to use the html url's to get the SE's to index them.

[edited by: Jordo_needs_a_drink at 6:39 am (utc) on Dec. 1, 2006]

This 62 message thread spans 3 pages: 62