| 9:22 pm on Jul 22, 2009 (gmt 0)|
Hello Kris, and welcome to the forums.
Are these urls still showing in the search results? That should not be the case if your 301 redirects are working properly and the server is sending a 301 status in the http header.
Another question is whether the urls involved are actually redirected to good content - content that is actually a viable replacement for the original content. If so, leave the 301 redirect. If not, change to 404.
In either case, the url should fall from the Google search results without you doing anything. Google may have been a bit slow on automated removal of 301 and 404 urls recently, since they were involved in an apparently intensive change to the algo.
| 10:12 pm on Jul 22, 2009 (gmt 0)|
Hello Tedster and thank you for the quick response.
Let me try to explain the mess we have made for ourselves and why we implemented 301’s and hopefully will answer your questions as well.
First off we have an old site (about 9 years) which contains both a content side and a store side. On the store side, we decided to make some adjustments to the “flow” and it is something we now regret.
What we had for approximately 8 years and was doing just fine:
What we changed to:
The 2 new ones you see were now above all the old directories which were nicely ranked and quite well placed. We were very stupid and did not think out the implications of what seemed like such a small change. Really, it was quite huge. We had just completely redefined the flow of the store. *ouch* So, the bot rearranged all the old categories under these new pages with no rank and everything below it suffered. We waited and waited... and waited and it seemed that things were just not going to recover and in fact, things were just getting worse. The more that it realized was now under these new pages, the more things would sink.
We thought that the new way would make more sense to site visitors. As it turns out, it really didn’t make any improvement for visitors. We have inquired to many of them and opinion was, it was just fine the original way. So we are now back to the first way.
So, we decided we no longer wanted the new top level departments and brands and wanted to revert back to our old way which ranked perfectly fine.
So we took:
and 301 redirected them all to:
This seemed the most logical place to redirect them to. Maybe we should have just done a 404 on all of them....
To your questions:
1: Are they showing in the search results? Well, they never did all that well as they were new so the very few we did find, are either falling or dropping out. I don’t think the bot has done a deep enough crawl to figure out yet that all these pages are now 301. They are still in the “site:example.com” results and so far only a small handful of the approximately 250 have in fact disappeared.
2: Are they showing 301 status in the http header? Yup, sure are. ;)
3: We are redirecting them all to www.example.com/store.html. There are approximately 250 of these department and brand pages that we are letting go of.
One thing to know, the instant we reverted back to the old way, we started seeing some sales flowing back in again. We were feeling quite encouraged. The new departments and brands started falling in the “site:example.com” results and moving the old ones back up in the ranks. Then, about a week or so after that, those darn top level departments and brands started creeping back up in the “site:example.com” results and pushing the old ones back down! At the same time, sales started teetering off again. This is why I am writing and asking if maybe we should be doing 404’s or even requesting the deletion of these pages in the Webmaster Tools. We sure made a mess and we are well aware of it. We know the bot must be utterly confused by this mess so we want to figure out the best way to release these older (good) pages from the wrath of these new top level departments and brands pages that we have 301’ed. They seem to be holding them down and until the bot figures out what is attached to what, I don’t think we can start moving forward again. As long as the bot seems to think they exist, we seem to have a problem.
Lesson learned for us, never move nicely ranked and nicely positioned pages BELOW new pages. All the good stuff seems to have lost its “oomph”!
I hope this all makes sense... and thank you again.
P.S. This has been going on for about the last 3 months and the content side is OK.
Bet you didn't expect a novel back. ;)
[edited by: tedster at 10:21 pm (utc) on July 22, 2009]
[edit reason] switch to example.com - it can never be owned [/edit]
| 10:32 pm on Jul 22, 2009 (gmt 0)|
I would go with the 404 (or even more correctly, 410) and let things sort out naturally. The home page was not really a replacement for the content you removed. And I wouldn't suggest any removal requests, especially since the removed urls are not showing up in the SERPs anyway. One typo in your request and you could create a further problem.
When you make a previous url go 404 or 410, googlebot will continue to request it (with decreasing frequency) just to be sure you haven't changed your mind. If that is problematic for you, you can always disallow the pattern in your robots.txt file.
|Lesson learned for us, never move nicely ranked and nicely positioned pages |
I'm with you! And I can tell you that, unfortunately, many other businesses have also learned this lesson the hard way.
It sounds like what you did was disrupt the internal link flow - is that correct?
| 10:46 pm on Jul 22, 2009 (gmt 0)|
Thank you again for such a quick reply!
Yes, that is 100% correct. The link flow for the store was completely disrupted from this change. In our heads before we did it, it didn't seem so disruptive. Hindsight says, "good lord, what were we thinking..."
Ok, so you suggest 404 or 410. Even if the bot requests the page, as long as it is a 404 or 410 I am assuming it will stop being part of the disruption? As soon as it knows it is a 404 or 410 maybe it will just be set aside so the rest of the store can sort out? Am I understanding that correctly? I don't want them to linger around as top level 404 pages and just hold the rest of the store under their wrath.
Our logic was that if they were 301, the bot would hit it, then redirect and dump. 404 concerned us as a longer waiting period but we know we could be totally wrong. After all, look at the mess we just made!
I hope maybe at least one business out there reads this before making such a change so they don't have to also learn this lesson the hard way like us and many others. We feel so stupid.
P.S. Has it been known for business to recover from such link flow disruption? When they fixed the problem, were they able to get their standings back? The bot liked our store before we made the mess, I am hoping it will like it again once it's all fixed.
[edited by: KrisE at 10:56 pm (utc) on July 22, 2009]
| 10:55 pm on Jul 22, 2009 (gmt 0)|
In your analysis, don't confuse the three parts of the process - spidering, indexing, and ranking.
You just want to make sure that all the original click paths are back in place, from the top level pages on down.
- The spider (bot) just requests a url and stores its content in Google's back end crawl cache (not the cache you see in the SERPs).
- Then the content gets indexed. The indexing process takes the url's content and tags and shards and stores it away in a bazillion different places.
- Now the ranking algo gets access to all that data.
| 10:59 pm on Jul 22, 2009 (gmt 0)|
Ah ha... Ok, so since the click paths are now back to normal, that is the path it will pay attention to and as it keeps finding the old pages back in place, they will just slowly move back in place. That is once all information is spidered, indexed and then ranked.
Ok, so we just decide on 404 vs 410. 410 sounds like it might be more logical.
Thank you. :)
| 11:05 pm on Jul 22, 2009 (gmt 0)|
The click paths will be highly important for the RANKING algo - PageRank assignment and so on. That's more the issue than any crawling factor.
Spidering these days is often not done by the old-fashion "crawl the links" approach. Instead, the crawl team has an algo that assigns a "crawl budget" as well as a list of previously identified urls to request.
It is also possible to knock a url into the supplemental index if you remove too much of its internal backlink support. Then it would see much less frequent spidering, because its indexing had changed.
[edited by: tedster at 11:18 pm (utc) on July 22, 2009]
| 11:16 pm on Jul 22, 2009 (gmt 0)|
Ok, great... Information like this does really seem to confirm the disaster we created. Very good to know. By what you said, since we moved the click path down for much of our pages, we could have inadvertently moved them closer to supplemental which has slowed the spider process which could by why it's seeming so slow to get this sorted back out. Many factors involved but now that the click path is back in place, it sounds like it is just a matter of time.
Switching the 301's to either logical 301's or changing them completely to 404's or 410's is just to get rid of the pages at some point.
A huge chunk of these pages are still cached from before we switched back so my guess is it just hasn't spidered enough yet.
| 2:18 pm on Jul 26, 2009 (gmt 0)|
One final follow up question to this one... Do you recommend adding these pages to the robots.txt file?
Right now it has been nearly a month and 90% of the pages that were 301 and now 404's are sitting high in the site:example.com/ results.
In the past we have seen them 301's or 404's drop very low to the bottom of those results pretty quickly but not this time. Instead each day we just see more of them moving up in that "site" result which suggests the bot hasn't even found that they have changed.
The sooner we can help the bot to realize these are gone so it can calculate the correct results, the better.
| 3:23 pm on Jul 26, 2009 (gmt 0)|
Do not add the old URLs to your robots.txt. If you do, they won't be fetched, and the spider will therefore never see your 301/404/410 responses.
BTW, a general method for choosing the correct response for an obsolete URL is: URL has strong backlinks that need to be preserved, and a sensible replacement URL exists: 301-Moved Permanently
URL has been intentionally removed, but has no important backlinks or a sensible replacement does not exist: 410-Gone
URL is one of so many (many hundreds) that 301s or 410s are not a viable solution: 404-Not Found
If a 301 is used, then plan on leaving it in place forever. It only "works" to preserve the PageRank/Link-popularity of the old URL --and to re-capture the old URL's link and bookmark traffic-- for as long as it's still in place.
Web site URL 'systems' should be designed, and not left to develop haphazardly. Spend ten times longer designing the URL-structure of your site and the underlying directory structure (not necessarily even similar to the URL-structure) as you do implementing redirect/rewrite code, and that should be about right. Search engines 'hate it' when URLs are changed or removed [w3.org]; They see the Web as a library with only a very-slowly-changing inventory, not as a corner newspaper/magazine stand with "contents updated daily."
| 4:08 pm on Jul 26, 2009 (gmt 0)|
Ok, great. Any logical links that need to be preserved and have sensible replacements are still 301's. The ones that didn't have a good replacement are now 404's. So it sounds like we got that right according to what we learned from tedster. Thanks for confirming this even further.
I agree with you 100% and for several years our URL system had been relatively the same with a few minor changes over the years. This was the largest and stupidest change we had made which is why we are trying to return back to our old URL system.
I hope once it sees we have returned to the old way, things will smooth out again.
I do wonder though... If there is no way to naturally get to any of these URLs through our site anymore, (the 404's and 301's), how will the bot know to fix them? Are we relying on data the bot has stored and it eventually taps those old pages only to find they are gone or moved?
I only ask that one because we were told a few days ago (from a local neighbor who also has an online business) we should have left up a path for those old links to be followed and seen as 404's or 301's. By removing that path for them to be seen is making the process that much longer.
I don't want to act on that suggestion unless you guys at webmaster world in fact agree with such a statement.
Thanks again for the help. It has been greatly appreciated.
| 4:39 pm on Jul 26, 2009 (gmt 0)|
The "old links" will exist in the search engines' databases and/or on other sites. If they don't, then they won't be fetched, but then they won't be listed in search results, either... So no problem.
| 4:55 pm on Jul 26, 2009 (gmt 0)|
Ok, good. We realized the new URL structure was not going to work out well for us pretty quickly so few if any other sites had managed to link to them. The bot picked them all up within days through the sitemap so we will just rely on the engines database to filter them out as 301's or 404s.
We just are so eager to see them drop out of the "site:example.com" results but it's obviously coming down to patience now.
Wanted to make sure we had everything in place properly according to advice and I think we are set up now so now we just pray to the google gods, do a few rain dances and wait it out.
| 7:35 pm on Jul 26, 2009 (gmt 0)|
I thought I was done asking questions under this post but it was requested of me from a collegue to ask this final one. ;)
For the 301's that are still in tact (because they made total sense to which page we would point them), which PR will the bot eventually take?
As explained above, we introduced top level departments to our store which threw everything below into hell. So, we have removed them and used 301 on many of them that made sense.
The worry, the pages that we are removing had 0 PR. The old pages that they are now pointing to have 2-5 PR. I hope this doesn't mean that because we 301'ed 0 PR pages to 2-5 PR pages, the 2-5 PR pages will switch to 0 PR!
That was a tongue twister to even type...
| 8:57 pm on Jul 26, 2009 (gmt 0)|
PR is continually recalculated behind the scenes, even though the toolbar values are only updated once in a while. PR will be forwarded through a 301, so have no concerns - no matter what you see on the toolbar.
| 9:09 pm on Jul 26, 2009 (gmt 0)|
Ah, ok. Will not fret about this one then. I was figuring it would probably take the one with the highest PR or combine or whatever.
One less thing to worry about.
| 10:01 pm on Jul 26, 2009 (gmt 0)|
why didn't you get rid of the .html's when you were implementing the new URL structure? Next time huh? :-)
| 11:19 pm on Jul 26, 2009 (gmt 0)|
We didn't change the whole site. We only added about 250 pages to our store that were above the previous pages. Basically a new click flow we thought would be more logical. It just ended out causing us havoc by pushing all the ranking pages lower in click flow. So it's not like we changed the whole site of URL structures.
We have never had any problems with .html's. The whole rest of the site is indexed fine. The store was fine too until we added those 250 pages above the old pages.
So hopefully I didn't give the impression it was whole site. We have about 40,000 pages. 250 were the new ones that caused problems and it's those 250 we are trying to remove to get back to our old click flow. Anything below those 250 are now in a pitiful limbo. The rest of the pages that have nothing to do with the store are just fine.
It's sad because the site was very well thought out. Somehow we all had one massive unanimous brain fart and made a stupid mistake. If only we had made this mistake on an area of the site that isn't related to making money!
| 10:36 pm on Aug 8, 2009 (gmt 0)|
Tedster and jdMorgan,
I wanted to thank you both for your great advice. We implemented the suggestions and sat tight and sure enough, everything is falling back into place. Those pages we pushed lower by changed our click path are returning into position while the 301's are falling out of the index and "site:" results. We are very pleased. Hopefully the trend will continue but I wanted to update you guys as you did help keep us focused rather than making an even bigger mess.
I am glad I came here to ask my questions!
| 3:06 am on Aug 9, 2009 (gmt 0)|
And thank you for the update, Kris. That's sure to help the future readers of this thread. I'm happy to hear it's working out.