|in my ".htaccess" file I have a few rules of this type: |
... where the full requested URL is
Otherwise the pattern would never match in htaccess. Have you verified that the redirect works as intended when you request the page manually? Does bing also ask for the new URL?
Robots never completely stop asking for redirected pages. They just slow down.
The bingbot seems to behave quite differently from the googlebot when it comes to non-200 responses. I've got a slew of pages that have returned 410 for a year or more. Bing still asks for them regularly; Google never.
The bingbot also is far more likely to ask for pages with the "wrong" form of the domain name (with/without www). Maybe it's doing it on purpose to verify that the redirect is still in place.
Hm. A thought, there. Maybe it really doesn't care about the page at all. What it cares about is seeing that the redirect is in place, meaning that the site is properly maintained.
Here's some additional information:
|Have you verified that the redirect works as intended when you request the page manually? |
Yes, all of these rewrite operations have been working for years. :-)
|Does bing also ask for the new URL? |
Very occasionally... it appears to request any new page soon after it has encountered a new 301 pointing to that new page, but it appears to not directly pursue those 301s after a while, meaning it checks both old and new file names, but not in no particluar connection with each other, time-wise...
|Robots never completely stop asking for redirected pages. They just slow down. |
... which leaves me curious as to the reason: 301 is about as definite as it gets, so why the bother?
|The bingbot seems to behave quite differently from the googlebot when it comes to non-200 responses. I've got a slew of pages that have returned 410 for a year or more. Bing still asks for them regularly; Google never. |
Yes, Bingbot is the main perpetrator. Google does not seem to be checking my sites for old 410s, but occasionally keeps looking for old 301s...
It seems the bot is programmed quite differently to how Google does things.
Serving the 301 response is lightweight on server resources so I wouldn't worry too much about it.
RewriteRule /linklist.html /links.html [R=301,L]
Of course, your code above redirects www to www and non-www to non-www. I would add protocol and hostname to the target so that both requests end up in the same place.
I would also escape the literal period in the rule pattern.
Make sure that all rules that are like that one are listed before your site-wide canonical non-www/www redirect.
|in no particluar connection with each other, time-wise... |
That's normal for a robot. It handles a redirect the same as a newly discovered link: the new URL goes on a shopping list for later. The old URL is logged as "This URL redirects to such-and-such".
It makes sense if you look at it from the other side. The new URL doesn't carry a built-in tag that says "This page was formerly known as such-and-such". So its existence doesn't really give any information about the status of the old URL. True, most robots do get the hint after receiving the identical redirect eighty-seven times in a row. But remember, this is the same bingbot that will check your robots.txt fifty times a day on the off chance that something might have changed :)
|I would add protocol and hostname to the target so that both requests end up in the same place. |
In my case the nameserver settings take care of that, that's why I can ignore it here... :)
|I would also escape the literal period in the rule pattern. |
Yes, I do that with sites that have more than just a handful of pages or that have a BBS / Forum system installed...
|this is the same bingbot that will check your robots.txt fifty times a day |
Don't get me started on that topic... ;)
A follow up note:
As a test I have changed 301s to 410s on some sites - will see whether Bingbot and the like will take that hint any better...
That will effectively cut off any visitor who happens to click on a link to the old pages and it will cut off any link equity you had coming in, is that what you want? You have no incoming links to the old urls? An incoming link would also explain why Google requests the page more frequently.
Thanks for the comments.
Those pages that Bing and others keep requesting (mostly Bing, the others do it rather infrequently) belong to various "first versions" of websites that I had just created from scratch (the associated domain names are recent new registrations that have never been used before). I had changed those page names within a few days and provided 301 redirects. So there is no equity and there are no legitimate links to those page names.
There are, of course, certain domain information services who monitor the reports coming from the registrars and start scouring new sites within a day of them being registered (and Google, in their near infinite wisdom, even places those services ahead of the associated sites themselves in the search results!), but to my knowledge those services to not provide links to individual pages, just the domain roots.
My next research will be to explicitly disallow all those old pages in robots.txt and see whether that makes a difference to Bing. :)