Welcome to WebmasterWorld Guest from 18.104.22.168
Forum Moderators: mack
I have been asked to redesign a 450 page website packed with content - some good but most of it packed there by the original webmaster with hopes that it would generate traffic. The website owners have been happy with their traffic and people finding them thru Search Engines, but they are NOT happy with their crazy - HUGE - disorganized website.
I am trying desperatly to weed thru the site's site map in order to figure out what to "throw away", but I am terrified that the pages I don't include in the new site will damage the website's traffic and search engine ranking.
How can I redesign an ENORMOUS 450 page website without loosing traffic and search engine ranking?
Keeping all 450 pages would be nuts right?
Thanks for any light you can shed on this problem :-(
It is tempting to introduce a redesigned site (of which you are proud) as an "opus" -- a new masterwork to be appreciated all at once as a whole. But that is something that search engines don't like at all.
I've been weeding thru the site's traffic reports and see that there are very consistant Entry Pages each month. So this is telling me to keep the Top Entry Pages on the new redesigned site - keeping both the file name (page.html) and the URL (http://www.site/directoryname/page.html) exactly the same. Does it sound like I'm on the right path here?
Thanks again for helping our the Newbie :-)
This makes it possible, for example, to replace an old static-html site with a new PHP-and-database-driven site without changing the old URLs [w3.org] in any way. Unfortunately, most Webmasters are unaware of this, and many of them manage to destroy their search ranking for a year or more when undertaking such a site redesign project.
But the biggest hurdle I'm trying to figure out is, "removing" a large amount of old, outdated html pages from the site, without jepardizing the site's ranking.
Should I keep the old pages as they are on the server? Should I remove them from the server?
See the newly redesigned HMTL site will not include a ton of the older outdated html pages because we just have to have a managable smaller site - not 450 pages of old "crap".
I guess I need to trouble you to walk me thru Mod-rewrite on an Apace server because I'm not understanding how to do that or how it really works in my situation, where I'd have hundreds of old pages that I don't want to include on the newly designed site anymore - but am fearing the loss of searh engine ranking.
Do I need root access to accomplish Mod-Rewrite?
Thanks for your patience.
Only you can decide whether these old pages have any value. Try considering them as 'archival material' -- Are they of any use as such? If so, you can move them to an 'archive' directory, and add a page header to them that states that they're somewhat/mostly/completely outdated (as applicable). Sitting in that archive directory, they'll be out of the way, and need little or no maintenance. As previously noted, putting them in a different filepath need have no effect on their URLs.
The paper I cited, by the man credited with inventing the WWW, reflects the 'academic mindset' of the Web. Although the Web has since been more-or-less taken over by commercial interests, it's important to realize that this academic mindset persists, especially at search engine companies; They view the Web as a library of information, and not as a temporary roadside billboard sign or a street-corner magazine/newspaper kiosk. For this reason, they noticeably favor persistent content, and the kinds of sites that host persistent content.
To illustrate, a librarian does not go through the library and toss out books just because they are old -- Imagine if we'd tossed all copies of Shakespeare for that reason alone...
Divide the pages of this site into classes according to what makes sense:
If a page is to be removed (and I suggest that any page that might be of any historical or research value to anyone be retained) then install a 301-Moved Permanently redirect to one or more of:
Your old pages may offer valuable information for people trying to discover what your industry was like five years ago. They may offer you the benefit of PageRank and Link-Pop they've accrued over the years. They may serve as a traffic draw and/or as link-bait because of their content.
On the other hand, they may indeed be totally useless, but only you can decide that. However, the last criteria I would consider is the "convenience" of their maintenance.
Again, the above is generalized -- and perhaps to the point of irrelevance; I don't know anything about the site.
If I have a well ranked html page:
and I move wellranked.html to:
That is a totally different URL - isn't it?
Sorry, I just don't get that one...thanks again for all of your great insight.
[edited by: encyclo at 11:05 pm (utc) on Aug. 12, 2007]
[edit reason] switched to example.com [/edit]
URLs are used to locate 'resources' --pages, images, multimedia, etc.-- on the Web. They are meaningless inside a server. Filenames are used to locate files, either data files or executable (e.g. script) files inside a server, and are meaningless on the Web. Simply put, the fundamental job of a server is to accept a URL request and translate that URL to a filesystem path.
This seems to be a difficult concept to convey, but let's take a simple, common example:
Let's say your homepage URL is http://example.com/
There is no such location in your server, though, since no disk drive or filename appears in that URL.
So, when a request for this URL arrives at your server, the server removes the now-unneeded "http://example.com" part, and adds the partial filepath specified by the server's DocumentRoot configuration directive, "C://Program Files/Apache/httpd/dev-sites/my-site" (on a server running on a Windows PC, for example, just to keep on familiar ground here) to the remaining "/".
So far, we now have "C://Program Files/Apache/httpd/dev-sites/my-site/" as the partially-translated filepath.
However, we're still missing any filename, because "/" isn't a filename.
So, the server uses the value defined by the DirectoryIndex configuration directive, and finds that your default index file is called "index.html". So it adds that to complete the filepath.
The completely-resolved filepath is now "C://Program Files/Apache/httpd/dev-sites/my-site/index.html".
So, the URL is
which resolves to the server filepath
The one and only surviving token from the URL that appears in the filepath is a slash...
So again, URLs and filenames are not the same thing, and need not have any fixed relationship with each other.
It's important to grasp this concept because the successful use of mod_rewrite or ISAPI Rewrite depends on it. And far from being a pedantic distinction, it's important to business as well, as you will discover if you change all your URLs and tank your site's rankings...
So, to re-cast your question in orthodox terminology:
If I have a well ranked html page (file):
and I replace that file with:
How do I tell the server about the new file location, while retaining the same URL?
RewriteRule ^Directory/wellranked\.html$ /Totally_Different_Directory/wellranked.html [L]
RewriteRule ^Directory/([^.]+)\.html$ /Totally_Different_Directory/$1.html [L]
When executed, the $1 token in the new substitution path (on the right) will take the value of the requested URL-path that matches the first parenthesized subpattern in the RewriteRule regular-expressions pattern (on the left).
In addition, you can define a few exclusions, if needed, by using mod_rewrite's conditional-rewriting directive, RewriteCond.
Even though I have experience with and know how to deal with the above situation, I wish I could afford jdMorgan.
Anyway, what I mean to say is very good suggestions. Right on target buddy...