| 2:12 am on Mar 14, 2011 (gmt 0)|
|As I've been watching for pages to re-enter Google's index, I've noticed Google indexing some of them using a seemingly arbitrary path for the page. But these pages have absolutely no incoming links. |
The pages that are showing up in the index, what kind of header status do they return? 200ok?
| 2:17 am on Mar 14, 2011 (gmt 0)|
Category in URL is almost always a bad idea, especially when the requested URL isn't checked by the site script to ensure that the requested category is actually a valid match for the curently requested page.
This is a major design flaw in most CMS, blog, forum and cart software. Once the script understands the URL is for a valid page, but the wrong category it should issue a 301 redirect to the correct URL.
| 12:43 pm on Mar 14, 2011 (gmt 0)|
Believe me, I want to 301 them. The basic problem is that we want to maintain the breadcrumb structure, which the CMS builds off of added information in the path. But because of the way it works, we're returning a 200 for practically everything. What really confuses me is that some very convoluted paths have ended up being indexed, but there's absolutely no one linking to the pages that way (internally or externally). While I try to figure out the best way to properly redirect to the canonical URL, in the meantime I'm going to be asked why Google is even picking up these frivolous ones to which I have no intelligent answer. :(
| 1:51 pm on Mar 14, 2011 (gmt 0)|
Is this osCommerce or Zencart perchance? I gave up trying to educate those guys about URLs and site structure. They have their own band of "SEO experts" but this stuff isn't in their area of expertise. They've recently added canonical tag band aids, but haven't fixed any of the core issues.
One major error in your site was using the canonical tag to signify that every page is a copy of the root home page. The canonical tag is merely a hint to search engines, but using it in that way would be very likely to cause harm to the site indexing.
| 2:03 pm on Mar 14, 2011 (gmt 0)|
Yeah, I'm pretty sure that's what happened with the canonical tag. Any thoughts on how long it should take Google to get these back into the index? The problem has been fixed for about three weeks now. At this point I can't tell if I'm just being impatient or if I should be worried.
| 2:37 pm on Mar 14, 2011 (gmt 0)|
It could take months. The problem is that you have said "this page is a copy of the root home page", and that means Google has little incentive to crawl it again and see that in fact it is not such a copy. You have to hope their system has "ignored" the canonical tag because the content of page "X" and the content of the root home page was actually found to be vastly different.