Welcome to WebmasterWorld Guest from 54.91.4.56

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Managing Radical Changes in Sitemap - How will G see this?

     
3:36 pm on Jun 21, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


I just made some significant changes to URL structure on my main site. Here's the basic concept.

Old URL example: domain/1/CategoryName/2/SubcategoryName
New URL example: domain/CategoryName/SubcategoryName

Suffice to say the old URLs were cumbersome and unnecessarily deep. The new ones are more "hackable" and "readable" as well as provide some other site structuring features not pertinent to this question.

Anyway, my old site map file submitted to SE's contains about 100,000 of these old-style URLs. All old URL's are properly 301 redirected to the new version. So, the question is whether to submit a new site map or just delete the old one.

This site involves millions of pages so it could take awhile for the SE's to pick up this volume of change. I'm worried that if I submit a new sitemap, the new URLs will be indexed in short order while the old urls are still in the index (leading to duplicate content penalties in the short run). My gut feeling is to simply remove the old sitemaps and wait for Google to find and apply the redirects via it's natural crawling process. I'm also thinking that by allowing discovery solely through the crawling process, that I won't get into a situation where I have to manually remove pages from the index to solve potential duplicate content issues.

Any thoughts or suggestions would be much appreciated as always.
5:35 pm on June 21, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4297
votes: 288


If your old URL structure is rewriting cleanly (301) to the new structure Google would never see the old URLs again and to me, it seems that you don't want to submit those old URLs in your sitemap. I would give them the new URL, let the rewrites handle old results and links. You might find definitive information on Google's recent How-To for site moves just because it discusses how to handle URL or path changes with Google. It is on their Blogspot site: [googlewebmastercentral.blogspot.com...]
6:23 pm on June 21, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Thanks not2easy. Yes, I've seen that section before but it doesn't really address this question (at least directly). I'm definitely convinced any reference to the old URLs (e.g. in any existing sitemaps) need to go. I guess my primary question is whether submitting a new set of sitemap files with the new URL's will help the process along or create some temporary confusion.

As G is crawling, it will find the 301's and should update things based on that alone. Asking G to crawl the new URL's directly though (per the new url's in the sitemap) seems like it could (temporarily) cause G to see two pages with the same content e.g. The one with the old URL already in its index (but not yet re-crawled -- thus exposing the 301) and the new URL submitted in the sitemap. Perhaps I'm over-thinking this but who needs to get completely slammed in the SERPs (even temporarily) if it can be avoided? ;)
8:35 pm on June 21, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4297
votes: 288


Agreed! I do know that URLs listed in a sitemap are supposed to be URLs you want Googlebot to crawl, hence the thought that it is better to replace the sitemap. I did some digging around to find what they say about duplicate content due to same content/different URLs/same site questions and found some things you might find comforting and useful: [googlewebmastercentral.blogspot.com...]
Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. If your site suffers from duplicate content issues, and you don't follow the advice listed above, we do a good job of choosing a version of the content to show in our search results.
9:16 pm on June 21, 2014 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Apr 30, 2008
posts:2630
votes: 191


My experience with URL structure change is that when Google encounters new URLs on the site, Google will hungrily crawl them. As a result you will almost certainly have a duplicate content for a period of time, until Google re-crawls OLD URLs to see 301 redirect.

You must be prepared for the site to take a little bit of dip during URL structure change, although there are cases when this does not happen. In any case, from my experience, the site will fully recover - providing you do not make a technical mistake during new URL structure implementation.

There were speculations in the past that if you leave OLD URLs in the sitemap, that Google would crawl the old URLs faster and therefore see 301 redirect sooner, but I have not tried this - I am not convinced in this especially since Google WMT will report sitemap error if URL in Sitemap redirects.

So my recommendation would be to have only new URLs in the sitemap.

You can monitor your duplicate content in the "Duplicate Titles / Descriptions" section of WMT - each page will report 2 titles until Google re-crawls the old page and drops it from its index.
9:46 pm on June 21, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Thanks for the above replies. I've gone ahead and generated new sitemaps with only the new URLs in them and in just a few short hours, the new URLs are starting to show up in the index. Not sure if that's a result of G finding them via redirect or a combination of the redirect and the sitemap telling Google that that is the preferred URL. Will be monitoring this closely as you can imagine and will report any significant developments when and if they occur. As for WMT reports, I've found the Duplicate Titles / Descriptions report to be very valuable in the past and will certainly be watching it like a hawk in the coming weeks/months. Again, thanks for the corroborating insights.
5:51 am on June 23, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 6, 2003
posts:1487
votes: 132


I've done this:

1. Create new sitemaps (different filename.xml) and submit, delete the old sitemaps.
I've seen this by mistake: is forcing google to recrawl the entire site.
2. Put inside the new URL's ONLY.
3. Redirect 301.

Google will storm you!
1:19 pm on June 23, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Don't mind the storm. I want the new URLs indexed as fast as possible and the old one's removed from the index. The sooner the old ones are gone, the sooner the redirecting will settle down.

I did this...

1. Build new sitemaps (same names as the old ones)
2. Removed all old URLs from the sitemaps
3. Included only new URL's in the sitemaps
4. Resubmitted the new sitemaps via WMT

The results were almost immediate. (I should say, some results. Many pages are still pending update in the index).
11:19 pm on June 26, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


And now I'm starting to see duplicate title and meta descriptions reported in GWT. This could get ridiculous or could clear up in short order. We'll see I guess. Come on GBot, get with the program. How to Handle a Redirect 101 (er, I meant 301). It's an advanced course perhaps not suitable for search engines with limited intelligence. After looking at this, I'm thinking it might be better to just delete the old sitemap and let GBot figure out the changes at its own pace (e.g. at a crawl) next time. Wait, I vow there will be no next time.
1:53 pm on June 28, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Funny thing is, Bing is picking up the changes faster than Google. This site has less pages indexed in Bing than in Google but still, Google is in a complete state of disarray where this site is concerned. Same stuff (at least some of it) is rising to the top in Bing with the new urls. Google is crawling about 100,000 pages a day on this site. Not sure what it's crawling exactly but the index isn't reflecting that it's picking up 301's at a very fast pace.
2:46 pm on June 28, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member editorialguy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:June 28, 2013
posts:3421
votes: 747


Funny thing is, Bing is picking up the changes faster than Google


Bing has more free time. :-)
4:31 pm on June 28, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


Here's a strange fluctuation I've been watching using the "site" search operator.

Old URL site:mydomain.com/oldDirectory (this number is growing)
New URL site:mydomain.com/newDirectory (this number is shrinking)

I can imagine the old URL numbers could keep increasing for awhile because my whole site is not yet currently in the index. Having said that every non-indexed URL of the old style has a 301 redirect to the new style so, go figure why old style url numbers continue to increase while new style URLs got to a certain level and are now decreasing. All I can think of is that this is all a delayed reaction and hearing many others talk about content that doesn't even exist any more showing up in various reports, kind of support the contention. At least on a gut level anyway. This is torture but I did it to myself I suppose. Still have great faith that the changes will some day result in better treatment in the SERPS.
8:23 pm on June 28, 2014 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4297
votes: 288


The kind of changes that you made are not ever going to be all wrapped up neat in a few weeks. I generally expect closer to 6 months myself - though I would think that much depends on the number of URLs and "value" of the content. Another factor is the number and value of inbound links those URLs had.

Yes, I still see them complaining about 404's from almost 10 years ago, but at the same time I see scrapers requesting the same files so they are still linked to from something out there. (and it's not the wayback machine!)
4:09 am on June 29, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member Top Contributors Of The Month

joined:Aug 1, 2013
posts:1338
votes: 22


@not2easy -- I'll admit I expected the worse from this change so nothing much in the way of temporary consequences is gonna dissuade me from thinking it was a good move for the long run. I pretty much planned for total meltdown prior to taking this step. Just saying, I'm prepared to take my lumps on this one as they come. In the meantime, I'm watching and trying to learn and your mention of inbound links is appreciated as a factor to consider.

I've never been a link builder but this site has some quality links to it that will need to be addressed. Some of which I've already successfully gotten updated at the source. Then there are some junk links out there (directory's that want me to pay them to claim and update a link). Not sure how to deal with those but I'm considering disavowing them. Any suggestions on that front would be most appreciated (even if another topic in it's own right). I can definitely see how getting the link profile straightened out ASAP could only be a good thing.

BTW, I've added canonical headers to pretty much every page impacted by this on the site. So right now, most everything is 301'd, sitemaped and/or canonical-ized.

And again, your time-frame doesn't seem unreasonable. Just trying to document progress as I see it and keeping the door open for any insights or miracles that may happen to pop up along the way. ;)

Hmmm. Pop-up miracles. There might be something in that...
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members