Forum Moderators: DixonJones
My latest project started out looking at the possibilities of trimming the fat out of my hefty 27,875 byte .htaccess file.
17,494 of those bytes are 179 lines representing old indices whose file names needed changing, which I handled using 'Redirect permanent (Rp)'.
(To my best recollection, I've had these Rps up for maybe three, four months.)
Looking thru my access log files, I notice something that has been nagging me for some time now and I'm gonna ask.
As of a few minutes ago spanning [30/Dec/2002:03:32:47 -0800] to [01/Jan/2003:03:49:57 -0800]Gigabot/1.0, (as do all the Google bots) still requesting those old file extensions.
Fast ditto
Ask Jeeves/Teoma ditto
inktomisearch ditto I'm thinking slurp read 'em last week too.
Any ideas why bots/spiders can't understand what they read and delete the old from their current task?
Are they (or anyone/thing else) not capable of modifying the tasks?
Seem like it'd save wear and tear. <shrug>
I'd really like to make some space available in my access log files for banning and such. Time to expand my knowledge base, so-to-speak.
The premise was to have an even flow of data transfer, not to have that transfer repeated over and over, and over, and over, and over, and over, and over, and over...
Anyway, lest I digress, how long must I leave the Rps up? For that matter, how important are they if all the bots ever do is clutter up my .htaccess log files with repeated requests?
Thanks.
Pendanticist.
The SE needs to see a 301 code and to be made aware the file has moved. For this to happen the redirection url has to be a full url, rather than a relative url.
Redirect 301 old-relative-url new-full-url
Do the SEs list your page under the old name or the new one?
Are there any links to those urls?
If there is a link from another site to the url that returns a redirect - it might be requested over and over just to make sure it still returns the same redirect.
This is snippet of what I have - minus root url and http:
Redirect permanent /1ABTribes-Councils.html www.dontclick.com/Aboriginal_Tribes-Councils_A-O.html
Redirect permanent /1ABTribes-Councils-2.html www.dontclick.com/Aboriginal_Tribes-Councils_P-Z.html
Redirect permanent /1AborWor.html www.dontclick.com/Aboriginal_International.html
Redirect permanent /1Acct.html www.dontclick.com/Accounting.html
Redirect permanent /1AcctForensic.html www.dontclick.com/Accounting_Forensic.html
Redirect permanent /1Agric.html www.dontclick.com/Agriculture.html
Redirect permanent /1AnimalRights.html www.dontclick.com/Animal_Rights.html
Redirect permanent /1Anthro.html www.dontclick.com/Anthropology.html
Ex: I changed the old index "1AnimalRights.html" to "Animal_Rights.html", "1Anthro.html" to "Anthropology.html", and so on for clarity.
There are roughly 160 additional entries.
The SE needs to see a 301 code and to be made aware the file has moved. For this to happen the redirection url has to be a full url, rather than a relative url.Redirect 301 old-relative-url new-full-url
Relative - meaning shorter, old file and new-full-url - meaning httpblahblah.com/Final_Destination_File.html?
Do the SEs list your page under the old name or the new one?
That's the jist of my question, both.
The initial page request is always for the 'old' file - then it re-directs (via 301) to the new full destination url.
One would think the bots woulda/coulda/shouda have done been (don't know if this is the right term to use here) uh, 'ammended' by this now, saving server run time as well as the bloat in my access_log files. :o
I'll check back in later today.
I appreciate the response folks. :)
Pendanticist.
It is likely that the SEs have updated their databases, but are still finding links on other Web sites which link to your old page URLs. So, since there is a possibility that those old pages might have been resurrected, they have to check again (and get the 301 response) to be sure.
You might want to dig through your backlinks on Fast to see how many point to old page URLs - Google won't show the lower-PR backilinks.
NB: Gigabot is not a Google robot. It belongs to GigaBlast.com
Happy New Year!
Jim
It is likely that the SEs have updated their databases, but are still finding links on other Web sites which link to your old page URLs. So, since there is a possibility that those old pages might have been resurrected, they have to check again (and get the 301 response) to be sure.You might want to dig through your backlinks on Fast to see how many point to old page URLs - Google won't show the lower-PR backilinks.
Oh, I know there are some out there, just too many to do manually.
The intended purpose for doing this was to minimize the potential SE and viewer losses. Lessen the impact, so-to-speak.
In this regard, would I have been better to let all requests (for old files) go out to my custom 404 handler, rather than re-directing in the first place?
The webmaster having my link up, would be more likely to notice the 404 than the re-direct anyway, don't you think?
Feels Catch-22ish.
NB: Gigabot is not a Google robot. It belongs to GigaBlast.com
I wuz just lumpin all the 'Gs' tagether on the same line, Vern :)
Happy New Year!
Thanks. And the Very Same to You and Yours!
Pendanticist.
Without a notice, the webmasters whose sites link to you may or may not ever notice a problem with their link - it all depends on whether they ever check their sites for link-rot. The people who do notice will be the people who try to use those links, and being busy or seeking instant gratification, they may not take the time to report the expired link to the webmaster - they'll likely just try the next link in the list.
If you contact them, some webmasters will take action, some won't. Mark the updated sites off your list (consider sending a "thank-you" note, too). Wait a couple of months, then try again. After three tries, fuhgedabouddit...
I've had some success with this approach, reducing the number of incorrect links to 15% or so of the original number.
After this, you decide whether to continue with the redirection, or to just 303 or 410 the page as appropriate - My opinion of 404 is fairly well-known, I believe... ;)
After having done this excercise a couple of times, one learns the advantage of creating a carefully-planned site structure, so that page name changes are rarely or never needed after the first few "ordeals". BTDT, got the T-shirt... ;)
HTH,
Jim
I'll probably trashing a couple of hundred pages and would like to keep the .htaccess file as slim as possible and occassionally delete old redirects. So, if there aren't any of the problems pendanticist encountered, about how long should I keep the redirects? Six months? A year?
I've only got a few so it makes little difference, but until all links are changed and I see the transition complete at Google, Inktomi, FAST, Teoma and Alta Vista I'm not making the change.
about how long should I keep the redirects? Six months? A year?
As Marcia points out, you just watch your logs, and then you decide when to pull the plug based on the traffic levels you see on the old URLs.
One thing that may help pendanticist's primary problem of .htaccess bloat is to look for things the old URLs have in common; Sometimes you have a pattern, like a defunct subdirectory, subject, etc. In that case, you can use RedirectMatch or a mod_rewrite regex pattern to redirect multiple pages with one directive.
Jim
As Marcia points out, you just watch your logs, and then you decide when to pull the plug based on the traffic levels you see on the old URLs.
I just manually checked over 8,000 lines and of those nearly 1,000 were 301s, and six were actual page requests made of the old extension. All the rest were bots. Seems like useless traffic to me.
One thing that may help pendanticist's primary problem of .htaccess bloat is to look for things the old URLs have in common; Sometimes you have a pattern, like a defunct subdirectory, subject, etc. In that case, you can use RedirectMatch or a mod_rewrite regex pattern to redirect multiple pages with one directive.
Ah, a new learning curve :)
Would that be similar to putting multiple bots (all beginning with the same letter) on the same line continuously and then only starting a new line if there is another letter you wish to add?
Assuming we're still in .htaccess.
Pendanticist.
1) Redirect widget1.html, widget2.html, etc. to newwidget.html using RedirectMatch:
RedirectMatch "^widget.?\.html" /newwidget.html
RedirectPermanent oldwidget /newwidget
RewriteRule ^(.*)/(red¦orange¦yellow¦green¦blue¦indigo¦violet)widget\.html$ /$2/$1widget.html [R=301,L]
What I mean is if you can accomplish multiple redirects using only one directive, do so. This is especially useful when you are trying to change the organization of your directories or to rename groups of pages which have common name-parts, rather than renaming a few unrelated pagenames here and there.
Jim
AOL switched from Ink to Google, Yahoo once switched from Ink to Google and is buying Ink, who knows what will happen with others down the road.
Who knows, if they've still got the old, how long it would take to get in again. And maybe it will take paying to get in, in the future. We can never know when partnerships will shift, and we never know who will end up sending quality traffic down the road.
At this point, though they're limited search terms that might not produce at search engines that now seem insignificant and the volume is tiny, I keep those 301's as insurance that I stay in those databases.