Forum Moderators: Robert Charlton & goodroi
The site is an online retail, 600+ pages, been online for almost a year and google has about 300-400 of them indexed (approx. - i just used "site" command) by now (last week showed ~200).
It wouldnt be so bad, but the URLs look horrbile, they are very long and complex, you can find anything there - underscores, commas, about 4 "&" symbols in most of them.. anything but keywords. I guess it's not a disaster for SEO, but doesn't seem right at all, especially considering there's not enough keyword content for product items.
Do you think mod_rewrite and 301 redirects to new "user-friendly" URLs with keywords (same content on pages) would be something worth doing here? Or should I stick with current URLs not to ruin current rankings doing SEO for this client at first time? ;)
Also, from your experience, what are the effect of redesign with 301? While gg finds new pages, indexes and drops old ones - do the rankings of the new pages fall compared to the old ones? If so, what time period for "recovery" to be anticipated?
look forward..
trying not to bite nails ..
online
1st, when gg, as it indexes new pages, gradually transfers its "opinion" of old pages to these new ones without hurting rankings during this period;
and 2nd - "complete disaster"..;) i mean positions drop and it takes a long long time to recover
however i've had no experience with it so far.
c'mon guys, any suggestions? I know you've been through that before ;)...
If the client is insisting, then I'd call in an expert and distance myself from the job.
However, do take your time to read everything you possibly can about mod_rewrite and 301s - it's not that easy to get a handle on initially, but makes good sense with time.
Finally, make sure you run a headers checker on your new set up to check that old urls are redirecting correctly with 301s not 302s. Check, check and check again. I have a 5 year old site. After 1 year I sorted out my horrid urls to better ones using mod_rewrite. Last year I had to 301 quite a few pages for some technical reasons. Both transitions went perfectly, without any drops in ranking, but it took time and very careful planning. I am sure you can do it, and best of luck!
I agree that there are plenty of profitable e-commerce sites with horrid urls, but this is precisely where a start-up e-commerce site can make a jump on the 'big guys' - by being smarter!
Don't take exceptional measures to do this fast or all at once, or you can "pull the rug out from under your site" in search. Proceed slowly and very deliberately with regard to your top-ranking pages and main landing pages.
Someone here (I wish I could remember who, so as to give credit) has argued that starting with updating the links on your lowest-level, least-important pages (at step 2 in the list above) is a good plan, and I tend to agree -- build new internal supports for your top pages before removing the old supports. On a per-page basis, consider this a balancing act between maintaining the PageRank/link-pop support for a page, and avoiding long-term duplicate (old & new) URLs for the same page. This should work well for sites with a small number of well-ranked landing pages, and lots of supporting pages below -- for example, an e-commerce site with a few "main" pages and categories, and lots of product pages below that.
Jim
Jim's advice below your post is what I meant by 'get an expert'. For some, the procedure would be technically difficult and will most likely hurt traffic for a while. So if this drop is going to cost a client some money, then really it should be requested by the client (though of course I don't know what the contract is in this case) and undertaken by someone professionally competent and with the client's acceptance of a likely drop in traffic for a while.
I have one or two Zen Cart installs on the go (not my sites) and to attempt to clean up the URLs on those sites would be a big step to take - even if a good one in the very long run.
No offense intended and I trust none taken! I'm not familiar with out-of-the-box type scripts like the Zen shopping cart. The mess I made at the beginning of my site was all my own mess, and I knew how to sort it out. Trying to fix someone else's mess - well, you're right - I'm not up for that and nor should a webmaster be handling that sort of 'learning' on a client's time and money (IMHO)
The risks/rewards clearly need to be spelled out to the client before proceeding. I have to say that if I were the client, my first question would be - "Why did you not tell me a year ago?"
So even discounting the query string problem, most sites still have four duplicates of their home page available:
example.com
www.example.com
example.com/index.html
www.example.com/index.html
(Substitute htm, shtml, php, asp, or cfm if you like)
This is because the Webmasters haven't taken any steps to rectify these duplicates caused by default server configurations and the behaviour of common HTML authoring tools.
Some of these sites even link to all of their own home page URL variants through ignorance or incompetence, and I don't see any evidence that any of them are actually "penalized" for it.
So to quantify my statement, if you have four or fewer duplicates of any given page in your site's URL-space, the worst that will happen to you is that the PageRank/Link-pop of a page may be split across one or more of those URLs. Since I don't know the logarithmic base that G uses to calculate PR, I can't say what effect that might have on what you see in the Google Toolbar PR display, but in the linear PR domain we can assume that two URLs for one page could -- worst-case, assuming both are equally linked-to -- divide the linear page PR by two, splitting it equally between the two URLs. Four URLs, also equally-linked, could divide the page's linear PR by four.
However, we are discussing replacing old URLs with new ones here anyway, so a temporary split in PR is a given -- It *is* going to happen. As I stated above, the trick is to watch the changes being picked up by the search engines, and when they're solidly locked-on to a new URL, then complete the redirection for that old URL as soon as possible. So, there is an important timing element needed to minimize the disruption and limit it to a short period of time.
If your site already has the split-PR problem described above, and if you fix that at the same time as re-architecting your site's URLs, then you can come out way ahead in the end.
Jim
[edited by: jdMorgan at 8:50 pm (utc) on Jan. 4, 2007]
I have to say that if I were the client, my first question would be - "Why did you not tell me a year ago?"
I'm not sure what there would have been to tell, and this does of course highlight the immaturity of the web and much that goes with it (compared to other long-established professions). My guess is that most 'contracts' between website owners and whoever builds the sites are very vague on such matters as URL design and exactly how a site will comply with Google guidelines, how it might perform in one search engine or another, etc etc, not to mention who is actually responsible if (say) a site is dropped, penalised, filtered, whatever... for reasons that no-one may even fully understand.
So really, if you've been paid for the job a year ago, everyone is happy, think hard before opening up a can of worms like redoing all the site's URLs - unless it's part of a 'brief' with an agreed aim in mind, and you know what you're doing.
I agree with your point about fixing the split PR issue, but I would suggest to do this at a differnt time rather than at the same time as redirecting old urls to new ones. My experience has been that its best to fix one thing, wait until that beds down, than fix another, particularly if that involves htaccess files, regular expressions and the like. It's really tempting say to fix the non-www to www issue (or vice-versa) and restructure your site architecture with new urls (for example) but this can often cause 'double jump' issues, whereby the server gives 3 responses to an old url - 301, 301, 200. In my experience, Google can handle one jump, but has difficulties with two. It probably thinks it is being spoofed...
I'm probably not being very clear here but the bottom line is my approach would be:
fix one thing at a time, check, leave for at least a few weeks, then fix the next thing.
I've seen countless examples in these forums (fora!) where rankings tank and the webmaster struggles to know what the reasons are because they coincide with a whole raft of changes that have been made. But there's a thread on Changelogs which I think covers this better than i could...
I would have to argue that a year ago search-engine friendly urls were nothing new. It seemed to start gaining a lot of coverage around 2002, with a decent number of articles emerging at that time. I'd say a webmaster would have had an excuse if asked "why did you not tell me a year ago?" in 2003 or 2004. But not now, surely?
But point taken. Anyone who tackles this needs to know what they are doing. As webmastering becomes more complicated, it does get harder and harder to keep on top of it all. However, I'd have to say that creating sites with good spiderable architecture should be an absolute basic that a client should expect of any webmaster. It shouldn't be the obligation of the client to ask questions about things he's personally not expected to know! The onus is on the provider - the webmaster in this case.
I'm rambling here, but I hope my point is clear-ish. When I buy a car, I can probably choose the colour I like and aircon or not. But I'm not expected to have a degree in engineeering so I can quiz the manufacturer about the engine. I expect it to have the latest accepted technology behind it - fuel consumption, emissions and so on. Surely the same goes for websites?
It is abhorrous to me to permit the viewing of the same content via two different urls. The www and non-www issues are definitely problems (in regards to PR) which is why it is recommeded that you never permit users to see your site in both ways. All of the sites I run make sure you view every single page in the exact method I have arbitrarily set.
www.domain.com/pagename/id/keyword.htm
If you try to view it without the www, it does a 301 to the www version. If you try to view it by PHP, or without the keywords or even without the htm, it 301's to the correct page.
So, I would have to say I think it is incorrect to advise people to have two different dynamic urls pointing to the same content. This is probably exactly what Google is attempting to police, regardless of one's intentions. This is especially tricky ground given the outcrys of people who are being penalized for duplicate content and have no idea why.
In my opinion, you should htaccess the site to be ready for the new urls, and then 301 all the old ones to the new ones and hope Google will pick it up ASAP without much of a burp. Anything is is far more risky imo.
[edited by: Decius at 10:03 pm (utc) on Jan. 4, 2007]
> It's really tempting say to fix the non-www to www issue (or vice-versa) and restructure your site architecture with new urls (for example) but this can often cause 'double jump' issues, whereby the server gives 3 responses to an old url - 301, 301, 200.
It's not necessary to use multiple redirects. You can do it all at once if the code is properly implemented.
Actually, given a choice, I'd prefer to fix the canonical domain problem before even starting the URL re-design, as long as there was a considerable ranking difference and it favored my preferred domain. Otherwise, I'd prefer to build up that ranking difference before forcing it with redirects. This makes a good project while you go to all the URL re-design meetings and briefings. And of course, the best cure is prevention: On a new site, install the domain canonicalization redirect before the first page or even the robots.txt file goes up.
Decius,
Google has often gotten confused about domain canonicalization, since www and non-www are different domains and they must use back-end processing to detect that the two domains are aliases. Sometimes that processing is apparently defective, maybe because their confidence threshold is set too high for cross-domain comparisons of extremely-dynamic pages, or maybe they just don't get it all done before the next re-spidering phase, but problems crop up from time-to-time and we can't take it for granted. A 301 from the non-preferred domain to the preferred domain is best practice.
I'm not advising people to intentionally link to their own pages using multiple URLs, but rather stating that it is a fact that your internal links (all updated) and inbound links (relatively few updated) will and must co-exist. Based only on my own experience, the SEs like to see your site pointing to its new URLs before they see a massive number of redirects from old URLs to those new URLs. Because of all the latencies in the Web and in the SEs' indexing itself, I think it's best to let them find your internally-consistent link changes, and then roll in the 301's to correct the obsolete inbounds. Otherwise, because of the indexing and datacenter latencies, you risk them fetching a bunch of 301 redirects for URLs that are (apparently) still present on your own site -- actually, still present in their stored 'snapshot' of your site, containing a mixture of older and recently-updated pages, which to them represents 'current reality'. It wouldn't surprise me if that 'snapshot' were distributed too, with pieces of it older and newer at different datacenters, and internally inconsistent over intervals in the hours-to-daily range.
Ignoring all these temporal intricacies, and differences of opinion on the above, the one thing that I'd say is critical is to *not* do the redirects first -- before updating all your own internal links. That would almost certainly put the "site quality meter" well into the red zone.
Jim