Forum Moderators: Robert Charlton & goodroi
Background:
Four years ago my web developer installed a file into my OsCommerce shop that would convert product URLs from: product_info.php?products_id=nn to: product-nn.html and list them in a sitemap linked to my homepage. This was done to achieve better SE rankings. Within a month 100s of my individual product pages started to appear in the SEs and sales took off.
Over the years I’ve noticed that 90% of the URLs that appear in SEs were the .php version and NOT the .html. This seemed odd because the .php URLs produce a PR1 while the .html URLs produce a PR3. Weird?!
In June ’06 a well known company listed my website in a print publication and misprinted my domain name. I found the domain was available so I purchased it along with another that I thought was cool. I then pointed both new domain names to my existing website. But I did NOT do a 301 redirect because I didn’t know any better.
About 3 months later my site’s ranking plummeted in Google and I began to research why. I determined (but not 100% sure) that Duplicate Content was the culprit.
I modified my .htaccess with a 301 for both domains, including www and non-www, and waited until the next noticeable “update” in Google. Bummer! No change in my rankings
I’ve now been reexamining everything about my site to eliminate Duplicate Content.
So here’s my question, should I standardize on: product_info.php?products_id=nn or: product-nn.html?
Note: there are many incoming links from sites, blogs, articles, etc that point to the .php URL version.
Any insight would be appreciated as to what is going on, how long I’ll have to wait and what to do to regain my once stellar rankings in Google.
There is one in my root directory and another in my /store directory.
I see the code below in .htaccess in the /store directory...
RewriteEngine on
RewriteRule product-(.*).html$ /store/product_info.php?products_id=$1
This corresponds the "sitemap_products.php" contribution.
So, I'm sorry, he DID modify my .htaccess. I guess I need to get some sleep! It's 1:40AM!
Thanks for all your help.
Actually, no. When I first launched I only had an MSFrontPage site (don't ridicule me!) When I added the OsCommerce shop, I kept the MSFP portion, with links into the store.
I've had two sitemaps, one for the non-store that I create manually and the other for the store products that is generated via "sitemap_products.php"
That just converts your urls from PHP urls to HTML urls but doesn't include any keywords in it such as product titles and what not. The only possible benefit you are getting then is a site that looks static instead of dynamic, which is minimal.
All the current information still stands, but you should consider pressing your developer to look into actually including keywords in the HTML urls. This would give your site all completely new pages that at first won't carry any PR, and is risky, but is definitely a better long term idea.
--------
The code you have shown us is quite standard. Next he will probably add:
RewriteRule category-(.*).html$ /store/category.php?category_id=$1
or something along those lines which will accomplish the same thing, just for the categories.
But again, there is very little benefit to this unless you put the whole actual category name in there. If he does not know how to do this, just have him lookup some RewriteRule examples on Google. It's quite cool and fun once you get to know it, and extreeeeeeeeemely useful.
I have considered this but...
1a) Will the 301 preserve the 1000s of links external people have posted directly to my "product_info.php?products_id=nnn" pages?
1b) Will the old PR flow into the new Keyword URL?
2) I'm worried about losing existing rank AND if I do a wholesale change to all my URLs could Google consider this spam? I've read that adding to many new pages too quickly is bad.
3) If Google saw that a URL and the anchor text contained the exact same (or very similar) keywords, couldn't they assume I'm gaming the algorithm and weigh me down in the rankings?
If you are not in trouble right now, and you already have multiple urls in play, then I can appreciate your reluctance. It is worth considering, however. Done properly, I've seen solid improvements from using more human-readable urls. Of course when technical errors get in the way, then you can have a mess to fix.
Considering Google has indexed my URLs in two formats: 1) "product_info.php?products_id=nnn" and, 2) "product-nnn.html" (both serving the same content), do you think it is beneficial to introduce yet a third URL 3) "keywordwidgetname.html" and 301 the other two to this new URL?
It appears my Google rankings are currently being suppressed, so I'd like to do what is best long term. But I'd rather not have to wait a year or more!
I doubt Google will think you are spamming if you alter your site structure.
Also, it is possible that Google will think you are spamming if you make the links match the urls too much. This is my personal belief. Therefore, focus on making sure the URLs are perfect (since you do not want to change those over and over) and then tweak the titles and links to be variations of that. This will provide an environment where everything matches up topically, but does not look duplicated.
Remember, the more you make it look like a human went in and wrote every single url, every single title, and every single link, the less likely you are to trip filters.
Considering Google has indexed my URLs in two formats: 1) "product_info.php?products_id=nnn" and, 2) "product-nnn.html" (both serving the same content), do you think it is beneficial to introduce yet a third URL
Absolutely.
You currently have duplicate content indexed.
When you are done, you will have a single URL indexed for each product.
The 301 tells Google to forget about the old locations, and use the new ones. You will no longer have your pages indexed using two different sets of URLs.
Feel free to change the structure to something more logical while you are at it.
I don't know if you have simplified things for the post, but I notice that you have no top-level "products" or "store", etc. I would certainly do that, to distinguish your product pages from other miscelaneous parts of your site structure - "about us", "privacy policy", "shipping", etc.
In what way?
> Therefore, focus on making sure the URLs are perfect
Should I separate words within URL?
Should I use a dash -?
Should I use an underscore _?
Is there an ideal nummber of characters?
All Upper Case?
All lower case?
Mixed?
What matters most?
> I don't know if you have simplified things for the post
Yes I did. domain.com/store/product_info.php?products_id=nnn
->
domain.com/producttype/nnn/productname.htm
Seperate all spaces with dashes, remove all special characters, lowercase it all so it looks cleaner.
domain.com/store/category.php?category_id=nnn
->
domain.com/producttype-plural/nnn/categoryname.htm
That sounds just about right.
YOUR SUGGESTION: domain.com/producttype/nnn/productname.htm
Are you saying to segement products into separate directories?
And then put each product into its own subdirectory?
That seeems like it could rather L-O-N-G.
The "nnn" is my current unigue product identifier. If I use "productname.html" There is no need for "nnn." CORRECT?
Do you refer to products by name? By some industry-standard number? By catalog number? By an alphanumeric code?
I'd stick to one (or two) level(s) under "catalog", "products", etc.
Often, products cross categories. If this is the case, don't put the products under categories! Make seperate category indices. The indices should point to the product pages under a unified scheme.
Of course, I am going to illustrate my preference for no file type. Add ".html" if you insist...
Example:
example.com (home page)
about_us
contact
privacy
ordering
customer_service
downloads
manuals
459z456
catalog
widgets
frenulated_widgets
frapulated_widgets
wadgets
fragets
client_side_fragets
server_side_fragets
products
0x435q (or) super_excelsior
459z456
Try to give each product a UNIQUE URL. I think it's useful to distinguish your catalog - which may have categories, and subcategories, and have multiple references to the same products - from your product pages.
On the other hand, it may be conventional in your industry to have a linearly-sequenced catalog. You are probably going to incur some duplicate-content penalty in this case. But perhaps not. Catalog pages are often!- product data pages.
Forget the SEO. Think about your customers. IMO, that's the best SEO.
Why only dashes, periods, and hyphens? I think underscores most convey the visual appearence of spaces, and this can make things clearer to your users. Commas are awful, IMO.
OK, now I see what Decius is trying to do - include both a product ID and name in the URL, but the product name is just a decoration and not actually used to identify the page to serve. Cute. I like the idea, but I'd use something other than a "/" to seperate them, then. The "/" is confusing, and implies a hierarchy to the user.
Again, try to follow the conventions of your industry. If your industry uses names, use names. If not, don't. Try to avoid a totally meaningless product ID in any case. Try to avoid using a product ID forced on you by your database or CMS. Use a meaningful product code - the same thing that would appear in a paper catalog.
Catalogs, induces, etc. are seperate. Don't organize your product page URLs under a taxonomy, especially if the same product are likely to be categorized under multiple headings. Make their URLs flat. products/123, products/456, not products/widgets/123. You can have one or more taxonomies elsewhere in your URL tree that refer to product pages.
[edited by: jtara at 5:10 pm (utc) on Dec. 2, 2006]
>> Why only dashes, periods, and hyphens? I think underscores most convey the visual appearence of spaces, and this can make things clearer to your users. Commas are awful, IMO. <<
The problem is that a search for "two words" will only match "two-words" and "two.words" and "two,words" and "two words" but will NOT match "two_words".
Only a search for "two_words" will match "two_words".
Additionally, spaces in a URL are converted to %20 and that makes them%20very%20hard%20to%20read.
Hence the recommendation to use only hyphens, dots, or commas when separating words in a URL.
(Caution: highlighting of keywords in the title, snippet, and URL, in the SERPs is only a display process and does not indicate a search algorithm match.)
The problem is that a search for "two words" will only match "two-words" and "two.words" and "two,words" and "two words" but will NOT match "two_words".
But the words being separated will surely be found in the product page as well, right? And without the underscores. Search generally doesn't rely on the URL to find things.
Maybe this is good SEO (or maybe it isn't...) but I don't see underscores in URLs as preventing search from finding things people are looking for.
Google wants to allow programmers to find computer code examples, and so they treat the underscore as a literal match: like define_print_options [google.com] as opposed to define print options [google.com] etc.
domain.com/store/product_info.php?products_id=nnn
What is the consensus on moving to this path...
domain.com/store/widget-name-nnn.html
VERSUS THIS...
domain.com/store/widgets/widget-name-nnn.html
I don't understand including "widgets/" in the path other than adding a keyword to the url.
Also, if "widgets/" is not really a directory structure, what exactly is it?
I'm sorry, I'm not a programmer just a dude who's trying to fix his Google Grief and wants to "measure twice and cut once" in terms of making big changes. Thanks to all who are providing their input.
It is the other way about. The literal underscore is matched in a search because computer programs often use them in procedure and function names.
I realize that.
I was pointing-out that if you have, say, "red_widgets" in the URL, you probably have "red widgets" in the actual content.
Google will still be able to find "red widgets".
If you need keywords in URLs for Google to be able to find things, you have bigger problems.
Now, the presence or absence of keywords in the URLs might have SEO implications. Whether those are positive or negative from one day to the next is anybody's guess. Which is why I say make it make sense to your and your users, and forget the SEO implications.
don't understand including "widgets/" in the path other than adding a keyword to the url.
Nor do I...
Also, if "widgets/" is not really a directory structure, what exactly is it?
It is simply part of the URL. But, by convention, users read it as a component of a tree-structured heirarchy.
In the early days, URLs were mapped 1-1 to a directory structure on the server's disk. Dynamic page serving and mod_rewrite frees us from this structure, and allows us to structure URLs in any meaningful (or meaningless) way.
I say, go with the flow, and use "/" to represent heirarchy. That's why I said earlier that it would be confusing to use a "/" to seperate a product number from a product name.
Give some thought as to whether the term "catalog" is really appropriate here. I think in most cases, "product" or "products" might be better. Your catalog is something else - it IS typically organized as a heirarchy and may well have multiple-level subcategories. Your catalog might simply consist of index pages that link to product pages, might include some heading material, might be a reproduction of your paper catalog (in which case you will have to be careful able duplicate content - but I wouldn't be all that much paranoid about it).
You probably have one or more seperate indices on your site for locating products using a heirarchy. But this is distinct from the product pages themselves, and I think it makes sense to keep the URLs for the latter as flat as possible - UNLESS, say, your company has different divisions, with entirely seperate products, or you have completely different product lines. Again, whatever makes sense for your company and your industry.
One final thought - consider whether you might have - or might some day have - multiple bits of information for each product. A product information page, a catalog page, a data sheet, a material safety sheet, a manual, downloads, etc. If so, give some thought as to how to organize this. As in: inside-out, or outside-in?
e.g. example.com/products/45tzq67/data_sheet, example.com/products/45tzq67/manual, or: example.com/manuals/45tzq67, example.com/data_sheets/45tzq67, etc.
You don't have to be a programmer. Sit down and think about what really makes sense for your users. The programmers will sort it out, unless you come up with something REALLY goofy!
[edited by: jtara at 6:52 pm (utc) on Dec. 2, 2006]
They are bold when they appear in search engine listings and Yahoo and MSN and even Google use them to determine keyword strength. Don't overdo it by stuffing it with keywords, but the product name will suffice.
I don't think you should spend too much time worrying about how it will "look" to users or other people in your business as jtara is saying. This is not a priority according to me. The average user does not attempt to understand how your website is organized.
As I stated above:
domain.com/product-type/nnn/product-name.html
domain.com/product-types/nnn/category-name.html
This provides anyone who looks at it a topical hierarchy - you have product-name which belongs to product-type.
You have category-name which carries product-types.
I believe Google will like this very much as well.
if you do:
domain.com/product-type/nnn-product-name.html
domain.com/product-types/nnn-category-name.html
This is also acceptable.
domain.com/product-type/nnn-product-name.html
domain.com/product-types/nnn-category-name.html
Is worse than the url I suggested IMO... this is because if you want to stick in any additional variables:
domain.com/product-type/nnn-yyy-product-name.html
domain.com/product-types/nnn-yyy-category-name.html
It gets sticky, because you are already using dashes for spaces in the category name.
I would stick with:
domain.com/product-type/nnn/product-name.html
domain.com/product-types/nnn/category-name.html
I think you should run this by your programmer before responding here to get a more thorough idea of what is implied by all this.
I hate trying to traverse up the folder tree, to get to some other place on the site, only to receive a "virtual folder listings have been deactivated" (or similar) error message.
To assume in this day and age that you can alter a URL and expect to find categorically sorted information that isn't linked to by the webmaster is not very logical.
I do it all the time. And it works.
The most common case is when a company has come out with a new version of software, and I want to download it, get the manual, etc. They don't always get the links into all the index pages right away. If the site is laid-out logically, I can take a good guess and most of the time it will be right.
Better uses should be able to find stuff that you've forgotten to link than not.
I do think the URL should make sense to the user. It would be different it the URL wasn't displayed to the user in the URL bar - but it is.
If nothing else, it can show your users that you are organized. Or not.