Forum Moderators: DixonJones

Message Too Old, No Replies

Redirect permanent

How long do they need to remain before the bots get a clue?

         

pendanticist

1:25 pm on Jan 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Greetings,

My latest project started out looking at the possibilities of trimming the fat out of my hefty 27,875 byte .htaccess file.

17,494 of those bytes are 179 lines representing old indices whose file names needed changing, which I handled using 'Redirect permanent (Rp)'.

(To my best recollection, I've had these Rps up for maybe three, four months.)

Looking thru my access log files, I notice something that has been nagging me for some time now and I'm gonna ask.

As of a few minutes ago spanning [30/Dec/2002:03:32:47 -0800] to [01/Jan/2003:03:49:57 -0800]

  • Gigabot/1.0, (as do all the Google bots) still requesting those old file extensions.

  • Fast ditto

  • Ask Jeeves/Teoma ditto

  • inktomisearch ditto

    I'm thinking slurp read 'em last week too.

  • Any ideas why bots/spiders can't understand what they read and delete the old from their current task?

    Are they (or anyone/thing else) not capable of modifying the tasks?

    Seem like it'd save wear and tear. <shrug>

    I'd really like to make some space available in my access log files for banning and such. Time to expand my knowledge base, so-to-speak.

    The premise was to have an even flow of data transfer, not to have that transfer repeated over and over, and over, and over, and over, and over, and over, and over...

    Anyway, lest I digress, how long must I leave the Rps up? For that matter, how important are they if all the bots ever do is clutter up my .htaccess log files with repeated requests?

    Thanks.

    Pendanticist.

    bcc1234

    1:54 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Are there any links to those urls?
    If there is a link from another site to the url that returns a redirect - it might be requested over and over just to make sure it still returns the same redirect.

    HarryM

    2:52 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I'm no expert on htaccess, but I have a couple of suggestions.

    The SE needs to see a 301 code and to be made aware the file has moved. For this to happen the redirection url has to be a full url, rather than a relative url.

    Redirect 301 old-relative-url new-full-url

    Do the SEs list your page under the old name or the new one?

    pendanticist

    4:15 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Are there any links to those urls?
    If there is a link from another site to the url that returns a redirect - it might be requested over and over just to make sure it still returns the same redirect.

    This is snippet of what I have - minus root url and http:

    Redirect permanent /1ABTribes-Councils.html www.dontclick.com/Aboriginal_Tribes-Councils_A-O.html
    Redirect permanent /1ABTribes-Councils-2.html www.dontclick.com/Aboriginal_Tribes-Councils_P-Z.html
    Redirect permanent /1AborWor.html www.dontclick.com/Aboriginal_International.html
    Redirect permanent /1Acct.html www.dontclick.com/Accounting.html
    Redirect permanent /1AcctForensic.html www.dontclick.com/Accounting_Forensic.html
    Redirect permanent /1Agric.html www.dontclick.com/Agriculture.html
    Redirect permanent /1AnimalRights.html www.dontclick.com/Animal_Rights.html
    Redirect permanent /1Anthro.html www.dontclick.com/Anthropology.html

    Ex: I changed the old index "1AnimalRights.html" to "Animal_Rights.html", "1Anthro.html" to "Anthropology.html", and so on for clarity.

    There are roughly 160 additional entries.

    The SE needs to see a 301 code and to be made aware the file has moved. For this to happen the redirection url has to be a full url, rather than a relative url.

    Redirect 301 old-relative-url new-full-url

    Relative - meaning shorter, old file and new-full-url - meaning httpblahblah.com/Final_Destination_File.html?

    Do the SEs list your page under the old name or the new one?

    That's the jist of my question, both.

    The initial page request is always for the 'old' file - then it re-directs (via 301) to the new full destination url.

    One would think the bots woulda/coulda/shouda have done been (don't know if this is the right term to use here) uh, 'ammended' by this now, saving server run time as well as the bloat in my access_log files. :o

    I'll check back in later today.

    I appreciate the response folks. :)

    Pendanticist.

    jdMorgan

    5:24 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    pendanticist,

    It is likely that the SEs have updated their databases, but are still finding links on other Web sites which link to your old page URLs. So, since there is a possibility that those old pages might have been resurrected, they have to check again (and get the 301 response) to be sure.

    You might want to dig through your backlinks on Fast to see how many point to old page URLs - Google won't show the lower-PR backilinks.

    NB: Gigabot is not a Google robot. It belongs to GigaBlast.com

    Happy New Year!
    Jim

    pendanticist

    7:39 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    It is likely that the SEs have updated their databases, but are still finding links on other Web sites which link to your old page URLs. So, since there is a possibility that those old pages might have been resurrected, they have to check again (and get the 301 response) to be sure.

    You might want to dig through your backlinks on Fast to see how many point to old page URLs - Google won't show the lower-PR backilinks.

    Oh, I know there are some out there, just too many to do manually.

    The intended purpose for doing this was to minimize the potential SE and viewer losses. Lessen the impact, so-to-speak.

    In this regard, would I have been better to let all requests (for old files) go out to my custom 404 handler, rather than re-directing in the first place?

    The webmaster having my link up, would be more likely to notice the 404 than the re-direct anyway, don't you think?

    Feels Catch-22ish.

    NB: Gigabot is not a Google robot. It belongs to GigaBlast.com

    I wuz just lumpin all the 'Gs' tagether on the same line, Vern :)

    Happy New Year!

    Thanks. And the Very Same to You and Yours!

    Pendanticist.

    jdMorgan

    8:40 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I dunno, I think the best way to not lose traffic is to check the back-links of the moved/expired pages, make a list, look up the e-mail addresses of the sites which link to them (or try "webmaster@domain_name.tld", and send them a change request. Best results will be had if you send them the exact URL of their page, and the old link and link text. Use the opportunity to ask for better link text, if needed, along with the new link URL.

    Without a notice, the webmasters whose sites link to you may or may not ever notice a problem with their link - it all depends on whether they ever check their sites for link-rot. The people who do notice will be the people who try to use those links, and being busy or seeking instant gratification, they may not take the time to report the expired link to the webmaster - they'll likely just try the next link in the list.

    If you contact them, some webmasters will take action, some won't. Mark the updated sites off your list (consider sending a "thank-you" note, too). Wait a couple of months, then try again. After three tries, fuhgedabouddit...

    I've had some success with this approach, reducing the number of incorrect links to 15% or so of the original number.

    After this, you decide whether to continue with the redirection, or to just 303 or 410 the page as appropriate - My opinion of 404 is fairly well-known, I believe... ;)

    After having done this excercise a couple of times, one learns the advantage of creating a carefully-planned site structure, so that page name changes are rarely or never needed after the first few "ordeals". BTDT, got the T-shirt... ;)

    HTH,
    Jim

    jimbeetle

    10:03 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Planning to do a major cleanup/cleanout this month so this thread is very timely.

    I'll probably trashing a couple of hundred pages and would like to keep the .htaccess file as slim as possible and occassionally delete old redirects. So, if there aren't any of the problems pendanticist encountered, about how long should I keep the redirects? Six months? A year?

    Marcia

    10:15 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I've got listings in Ink with both the new URL and the old; it's not over even though the 301 went up at the end of July. Also, Teoma/Ask is still using the old so I don't see any sense in giving that up.

    I've only got a few so it makes little difference, but until all links are changed and I see the transition complete at Google, Inktomi, FAST, Teoma and Alta Vista I'm not making the change.

    jdMorgan

    11:49 pm on Jan 1, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    about how long should I keep the redirects? Six months? A year?

    As Marcia points out, you just watch your logs, and then you decide when to pull the plug based on the traffic levels you see on the old URLs.

    One thing that may help pendanticist's primary problem of .htaccess bloat is to look for things the old URLs have in common; Sometimes you have a pattern, like a defunct subdirectory, subject, etc. In that case, you can use RedirectMatch or a mod_rewrite regex pattern to redirect multiple pages with one directive.

    Jim

    pendanticist

    4:51 am on Jan 2, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    As Marcia points out, you just watch your logs, and then you decide when to pull the plug based on the traffic levels you see on the old URLs.

    I just manually checked over 8,000 lines and of those nearly 1,000 were 301s, and six were actual page requests made of the old extension. All the rest were bots. Seems like useless traffic to me.

    One thing that may help pendanticist's primary problem of .htaccess bloat is to look for things the old URLs have in common; Sometimes you have a pattern, like a defunct subdirectory, subject, etc. In that case, you can use RedirectMatch or a mod_rewrite regex pattern to redirect multiple pages with one directive.

    Ah, a new learning curve :)

    Would that be similar to putting multiple bots (all beginning with the same letter) on the same line continuously and then only starting a new line if there is another letter you wish to add?

    Assuming we're still in .htaccess.

    Pendanticist.

    jdMorgan

    5:30 am on Jan 2, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Well, sort of, and no...

    1) Redirect widget1.html, widget2.html, etc. to newwidget.html using RedirectMatch:


    RedirectMatch "^widget.?\.html" /newwidget.html

    2) Redirect /oldwidget directory (with a ton of widget product pages in it) to /newwidget directory

    RedirectPermanent oldwidget /newwidget

    3) Redirect multiple old pagenames to new pagenames (but with common name elements, here used to support a new directory organization by color, assuming all URLs started with "<texture>/<color>widget":

    RewriteRule ^(.*)/(red¦orange¦yellow¦green¦blue¦indigo¦violet)widget\.html$ /$2/$1widget.html [R=301,L]

    This changes the organization of the page directories from a texture-first structure to a color-first structure (based on the results of recent marketing analysis of fuzzy blue widget sales). The colors are included in the pattern-match so that the rule only swaps texture and color if necessary - otherwise, it would rewrite old URLs to new URLs and vice-versa!

    What I mean is if you can accomplish multiple redirects using only one directive, do so. This is especially useful when you are trying to change the organization of your directories or to rename groups of pages which have common name-parts, rather than renaming a few unrelated pagenames here and there.

    Jim

    Marcia

    6:05 am on Jan 2, 2003 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I probably should say why I won't remove them. It's not because of logs, there's 1 MSN hit for every 9 Google, with the others not worth much of a mention right now.

    AOL switched from Ink to Google, Yahoo once switched from Ink to Google and is buying Ink, who knows what will happen with others down the road.

    Who knows, if they've still got the old, how long it would take to get in again. And maybe it will take paying to get in, in the future. We can never know when partnerships will shift, and we never know who will end up sending quality traffic down the road.

    At this point, though they're limited search terms that might not produce at search engines that now seem insignificant and the volume is tiny, I keep those 301's as insurance that I stay in those databases.