Msg#: 4404350 posted 6:44 am on Jan 6, 2012 (gmt 0)
In the evolution of my site, I 301'ed a number of pages with, naturally, inbound links known by Bing.
In Bing's Index Explorer, I see many of these old urls still exist.
If I click their "Block Directory and Cache" button for those pages or directories will I be harming my ranking of the new page by removing those inbound and now redirected links? Or will I be improving it by verifying that the pages should no longer be indexed and the link love should flow to the new page?
Msg#: 4404350 posted 9:32 am on Jan 6, 2012 (gmt 0)
I suspect it's safer not to do anything. If you block the old URL, bing may decide that you're trying to block the new URL too. But as long as you're there, look closer at their list of 301s and make sure none of them have anything linking to them. They'll never stop crawling the old URL if it's getting constantly reinforced by links pointing to it rather than to the new URL.
They'll probably never stop crawling the old URL anyway, but it will slow down with time.
Incidentally: If you have a simple way of crunching the numbers, those crawls of old URLs are a terrific way to see which pages the search engines consider important. I moved a batch of pages to a new directory a few months ago, and the difference in crawl rate among those various pages is really striking.
Msg#: 4404350 posted 9:49 am on Jan 6, 2012 (gmt 0)
Thanks Lucy. Interesting insight. I will need to do some pretty tricky splunking to figure out the counting. In your case, you mean that you found that the crawl rate on certain older pages was high, but the newer equivalent pages was lower?
I have never really dug into the crawl rates for specific pages. That does sound interesting. - Peter
Msg#: 4404350 posted 8:43 pm on Jan 6, 2012 (gmt 0)
I've been pulling out all 301s to see when I can start pruning the htaccess. (Answer: probably never for html files, but search engines do learn pretty fast when you've moved/renamed an image or, ahem, mistyped a link.)
If I then group them by filename, I can see that the search engines are looking for some files more often than others. That's among files that were all moved at the same time, and all get modified at about the same rate, so there's no other variable.
In fact the most striking case I've found is in a group of files that are no longer being crawled at all, because I moved them to a no-robots directory. Every couple of months google asks for an old address just as a formality-- but there's one particular file that they request about five times as often as all the others. There must have been something in that one file that triggered a lot more search-engine hits.