homepage Welcome to WebmasterWorld Guest from 54.167.177.180
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / WebmasterWorld / Webmaster General
Forum Library, Charter, Moderators: phranque

Webmaster General Forum

    
Bots keep looking for redirected pages
Bingbot et al don't seem to understand 301
yaimapitu



 
Msg#: 4598611 posted 1:20 am on Aug 2, 2013 (gmt 0)

Hi,

in my ".htaccess" file I have a few rules of this type:

RewriteRule /linklist.html /links.html [R=301,L]

Bingbot and MSNBot (frequently) and Google (once in a while) keep looking for the file "linklist.html" (and the other old files), although the rules have been in place for several months. One of the pages redirected in this manner even shows up in the search engine results still!

Any explanation?
TIA!

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4598611 posted 1:54 am on Aug 2, 2013 (gmt 0)

in my ".htaccess" file I have a few rules of this type:

RewriteRule /linklist.html

... where the full requested URL is
www.example.com/one-or-more-directories/linklist.html
?
Otherwise the pattern would never match in htaccess. Have you verified that the redirect works as intended when you request the page manually? Does bing also ask for the new URL?

Robots never completely stop asking for redirected pages. They just slow down.

The bingbot seems to behave quite differently from the googlebot when it comes to non-200 responses. I've got a slew of pages that have returned 410 for a year or more. Bing still asks for them regularly; Google never.

The bingbot also is far more likely to ask for pages with the "wrong" form of the domain name (with/without www). Maybe it's doing it on purpose to verify that the redirect is still in place.

Hm. A thought, there. Maybe it really doesn't care about the page at all. What it cares about is seeing that the redirect is in place, meaning that the site is properly maintained.

yaimapitu



 
Msg#: 4598611 posted 5:24 am on Aug 2, 2013 (gmt 0)

Here's some additional information:

Quoting lucy24:
Have you verified that the redirect works as intended when you request the page manually?


Yes, all of these rewrite operations have been working for years. :-)

Does bing also ask for the new URL?


Very occasionally... it appears to request any new page soon after it has encountered a new 301 pointing to that new page, but it appears to not directly pursue those 301s after a while, meaning it checks both old and new file names, but not in no particluar connection with each other, time-wise...

Robots never completely stop asking for redirected pages. They just slow down.


... which leaves me curious as to the reason: 301 is about as definite as it gets, so why the bother?

The bingbot seems to behave quite differently from the googlebot when it comes to non-200 responses. I've got a slew of pages that have returned 410 for a year or more. Bing still asks for them regularly; Google never.


Yes, Bingbot is the main perpetrator. Google does not seem to be checking my sites for old 410s, but occasionally keeps looking for old 301s...

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4598611 posted 5:49 am on Aug 2, 2013 (gmt 0)

It seems the bot is programmed quite differently to how Google does things.

Serving the 301 response is lightweight on server resources so I wouldn't worry too much about it.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4598611 posted 5:51 am on Aug 2, 2013 (gmt 0)

RewriteRule /linklist.html /links.html [R=301,L]

Of course, your code above redirects www to www and non-www to non-www. I would add protocol and hostname to the target so that both requests end up in the same place.

I would also escape the literal period in the rule pattern.

Make sure that all rules that are like that one are listed before your site-wide canonical non-www/www redirect.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4598611 posted 5:55 am on Aug 2, 2013 (gmt 0)

in no particluar connection with each other, time-wise...

That's normal for a robot. It handles a redirect the same as a newly discovered link: the new URL goes on a shopping list for later. The old URL is logged as "This URL redirects to such-and-such".

It makes sense if you look at it from the other side. The new URL doesn't carry a built-in tag that says "This page was formerly known as such-and-such". So its existence doesn't really give any information about the status of the old URL. True, most robots do get the hint after receiving the identical redirect eighty-seven times in a row. But remember, this is the same bingbot that will check your robots.txt fifty times a day on the off chance that something might have changed :)

yaimapitu



 
Msg#: 4598611 posted 6:30 am on Aug 2, 2013 (gmt 0)

Quoting g1smd:
I would add protocol and hostname to the target so that both requests end up in the same place.


In my case the nameserver settings take care of that, that's why I can ignore it here... :)

I would also escape the literal period in the rule pattern.


Yes, I do that with sites that have more than just a handful of pages or that have a BBS / Forum system installed...

Quoting lucy24:
this is the same bingbot that will check your robots.txt fifty times a day


Don't get me started on that topic... ;)

yaimapitu



 
Msg#: 4598611 posted 2:22 am on Aug 4, 2013 (gmt 0)

A follow up note:

As a test I have changed 301s to 410s on some sites - will see whether Bingbot and the like will take that hint any better...

JS_Harris

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4598611 posted 1:17 pm on Aug 10, 2013 (gmt 0)

That will effectively cut off any visitor who happens to click on a link to the old pages and it will cut off any link equity you had coming in, is that what you want? You have no incoming links to the old urls? An incoming link would also explain why Google requests the page more frequently.

yaimapitu



 
Msg#: 4598611 posted 11:54 pm on Aug 10, 2013 (gmt 0)

Thanks for the comments.

Those pages that Bing and others keep requesting (mostly Bing, the others do it rather infrequently) belong to various "first versions" of websites that I had just created from scratch (the associated domain names are recent new registrations that have never been used before). I had changed those page names within a few days and provided 301 redirects. So there is no equity and there are no legitimate links to those page names.

There are, of course, certain domain information services who monitor the reports coming from the registrars and start scouring new sites within a day of them being registered (and Google, in their near infinite wisdom, even places those services ahead of the associated sites themselves in the search results!), but to my knowledge those services to not provide links to individual pages, just the domain roots.

My next research will be to explicitly disallow all those old pages in robots.txt and see whether that makes a difference to Bing. :)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Webmaster General
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved