Welcome to WebmasterWorld Guest from 34.235.143.190

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Migrating a 6-yr-old website with 8k pages - 180 day update

     
3:15 am on Aug 22, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts:46
votes: 2


I put in a request to migrate our website (indexed on Google News).

It had both http and https indexed pages, around 8k of them (duplicate content).

You can see some of the data -- such as the pace of migration/re-indexation here: [productforums.google.com...]

It's been 180 days since I put in the request and there are still around 42 page on the old domain that are not migrated yet to the new domain.

I keep track of this by searching for ' site: old.domain -new.domain'. This query gives fewer and fewer results every day, and has fallen to 42 today. When I do a search using the subject lines of these pages, I get results that are indeed on the old domain.

For all other pages (thousands of them), I get search results from the new domain, and nothing from the old domain on a regular search. But if I add ' site: old.domain ' to the query, then I get the same results from the old domain as well. But if I pull up the google cache for those pages, it shows 'this is Google's cache for the page https:// new.domain/uri. In other words, those pages seem to have migrated fine.

My question is, isn't the 180 days a real deadline, or is it just an approximate estimation of how long they take?

We are really hit hard by this migration as the new domain doesn't rank well. Smallseotools shows domain authority of 1.00 for the new domain and 30.27 for the old domain. Moz rank for the old domain is 3.97.

Funny thing is that the old pages seem to rank fine under the new domain. I think Google's simply transferred the reputation or rank of the old urls to the urls. But the new domain doesn't rank anywhere close to the old one.

I still have the option of taking off the 301s and canceling the migration request and moving back to the old domain. But I want to see how this goes since I've already invested 6 months of our company's earnings in this move (more or less).

Has anybody had any experience with such a migration in recent time? What should I expect now? Should I hang in?

[edited by: goodroi at 11:56 am (utc) on Aug 22, 2017]
[edit reason] Fixed formatting [/edit]

6:16 am on Aug 26, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12292
votes: 389


Btw, the links that point to these 404 pages are mostly from within the site - one reason I'm willing to just block them all with robots.

? ? ? ? ?

If I'm reading all this correctly, I think you're saying that your CMS is still generating links internally to the pages you've removed.

That's the problem that needs fixing, not via your server, but in your site's code, which is giving Google inconsistent messages.

10:19 am on Aug 27, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


I'll check this. Could be links generated by the related posts plugin, probably long back. The robots txt solution has helped in bringing down the error count in GSC
6:52 am on Aug 28, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


Just an update - Google's stopped the 180-day 'change of address' process.

Now all of the site's settings, such as preferred domain, are available for modification.

I've mostly got rid of the 404 errors in GSC (I get rid of 1,500 pages per day) by using robots.txt.

There's some benefit as far as ranking is concerned. Some of the pages/news articles now turn up on the first page. Hope to see an improvement in coming days.
9:33 am on Aug 28, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12292
votes: 389


I've mostly got rid of the 404 errors in GSC (I get rid of 1,500 pages per day) by using robots.txt.

robots.txt is not a good way to do it, and Google warns against it.

See discussion in this thread about Google and 404s.

17 May 2013 - GWT Sudden Surge in Crawl Errors for Pages Removed 2 Years Ago?
https://www.webmasterworld.com/google/4575982.htm [webmasterworld.com]

I'm going to excerpt some relevant parts of my last post in the thread that fit your situation, but I suggest reading the whole discussion...

By reporting the 404s, Google is just telling you that they requested the url for the page, and that your server didn't find anything and returned a "404 Not Found" response to Googlebot.

If you think that your server should have found something... ie, that you believe the pages are still around and that Google should not have gotten a 404 Not Found response when it requested the url, then Google's message is useful because it alerts you to a possible problem. Otherwise, 404s are the expected response and are perfectly normal.

As to why Google recrawls urls that you think are gone or non-existent, there are numerous reasons. One is that links to the urls may persist somewhere on the web....

...It might be... that a site will still have internal nav links to the urls of pages that have been removed.... It can be worth checking a site with Xenu or Screaming Frog... to make sure that these urls aren't in the site's code.

I also note John Mueller's recommendations...
For large-scale site changes like this, I'd recommend:
- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed

Two very helpful resources I've found...

John Mueller - 404 Crawl Error Reading List
John Mueller > Public
https://plus.google.com/+JohnMueller/posts/RMjFPCSs5fm [plus.google.com]

Do 404s hurt my site?
Official Google Webmaster Central Blog
Monday, May 02, 2011
[webmasters.googleblog.com...]

3:19 am on Aug 29, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


From that discussion on spike in 404, this seems to fit my situation, as Google's reincarnating a lot of dead urls when I'm moving from the old domain to the new domain.

"This most likely means that Google is running an old dataset, possibly among other datasets, perhaps for purposes of comparison. This seems to happen at times of big change. "

Question is -- would it be better to let them 404 or just prevent spider access to these pages? If I let them 404, would the errors subside on their own (like they presumably did in case of the old domain name?)

PS: I also note that some of the urls are from 2011-12, when the website was on Drupal. The urls are not possible on Wordpress, which I what I use now.
9:27 am on Aug 29, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12292
votes: 389


...this seems to fit my situation, as Google's reincarnating a lot of dead urls when I'm moving from the old domain to the new domain.
"This most likely means that Google is running an old dataset, possibly among other datasets, perhaps for purposes of comparison. This seems to happen at times of big change. "

The situation that "17 May 2013" thread was discussing was about a large system-wide Google update, and I was talking about the entire web, not old urls for your site. It's possible, though, perhaps even likely, that when you move a site, Google looks way back into its index for your domain... but I'm not sure of that. My experience suggests that over the years, Google has been looking more carefully at domains that get changed.

I cited the thread, though, because it presents a good overview of what 404 errors are about... including why Google revisits them and why the information might be helpful to you. I suggest you reread it a couple of times, including John Mueller's "404 Crawl Error Reading List", the Google blogpost, and also the interview in the thread with Google's sitemaps team.

You ask...
Question is -- would it be better to let them 404 or just prevent spider access to these pages?

I don't know how much clearer I can be. In this (shortened ) list of John Mueller's suggestions, I'm going to reemphasize what John said about preventing spider access. He's very explicit about the situations and the options....

- don't use the robots.txt
- use a 301 redirect for content that moved
- use a 410 (or 404 if you need to) for URLs that were removed

I would let them subside on their own, and "subside" is a appropriate word to describe how it happens. But note (with my emphasis added)...
As to why Google recrawls urls that you think are gone or non-existent, there are numerous reasons. One is that links to the urls may persist somewhere on the web....

...It might be... that a site will still have internal nav links to the urls of pages that have been removed.... It can be worth checking a site with Xenu or Screaming Frog... to make sure that these urls aren't in the site's code.

So, if you have something on your site that's still either containing or generating old urls, Google will persist in checking them. The fix is to remove them, not to try to block them.

I should add that it's often natural for old dropped or even 301ed urls to persist in the index for a period of months... particularly if you search for them directly, but they should redirect to destination name you've specified when you click on them. Again, use a server header checker.

Also, spider your old domain to check that you've gotten all the old links.

One more thing, and this might save you some work... note in John's list, in item #5...
5) We list crawl errors in Webmaster Tools by priority, which is based on several factors. If the first page of crawl errors is clearly irrelevant, you probably won't find important crawl errors on further pages.
https://webmasters.googleblog.com/2012/03/crawl-errors-next-generation.html

10:51 am on Aug 29, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


Ok. I've cleaned out the robots.txt, and created a rule on Nginx to return 410 for the old urls.

I wonder if Google will again try to access those old urls, since all of them have been purged from Google's index over the past three days thanks to the robots.txt. I guess the ones with links will at least be tried again.

I'll keep an eye out, and if there are many, I'll put in 301s.

Thanks a lot Rob for your help, means much for part-time webmasters like us.

In case it's of use to anyone, one can use the following for returning 410 in Nginx:

location ~ ^(url-pattern1|regex2|regex3) {return 410;}

OR if it's a precise location, one can use = instead of ~ . The ^ means it will search only at the beginning of the uri and not in between.
8:02 pm on Aug 29, 2017 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2000
posts:12292
votes: 389


I wonder if Google will again try to access those old urls, since all of them have been purged from Google's index over the past three days thanks to the robots.txt. I guess the ones with links will at least be tried again.

No, they haven't been purged from Google's index. Google has a record of them and will keep it forever. It may try periodically over the years, but less and less often over time. The 410 is a "signal" that you've done this purposely, and that will reduce the frequency of Google's retries. What you need to do is to make sure that these urls don't exist in your current nav code.

11:21 am on Sept 1, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


You were right. A day after I removed the robots.txt restrictions, it's again started filling my GSC with error reports, this time under 'not found' 410.

I'm tempted to put the robots.txt restrictions back on, because redirecting all these old urls would require very complicated rewrite rules -- some of which would even contradict each other. Each query would have to pass through so many clauses and conditions, I'm pretty sure it would affect the performance as well.

On a positive note, I've seen a marked improvement in the placement of news stories in the last two days. I would say the ranking is now very very close to what it was on the old domain. At least that's resolved. Hopefully, this won't be reversed in coming days as the not found errors start piling up.
5:39 pm on Sept 1, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15638
votes: 795


it's again started filling my GSC with error reports, this time under 'not found' 410

Reminder: Most GSC “errors” are not errors. You didn’t necessarily do anything wrong and they’re not trying to say you did; they’re just providing you with information.

Granted, it's a shame they do not distinguish between 404 and 410, since they clearly recognize the difference. They could at least provide an option for filtering out 410s from the report.
2:59 am on Sept 15, 2017 (gmt 0)

Junior Member

joined:Aug 22, 2017
posts: 46
votes: 2


Just wanted to update -- things have become far far better now, over the last three days. I think ranking for the articles/domain is now almost on par with what used to be, before February when I moved to the new domain. Earlier, the 'average' users online used to be around 40 at any time during the day. Now, it's about 20-25. I suspect it's because we're missing the 'long tail' effect for the last seven months when the articles were not showing up in search, and therefore the social media sharing was also missing.

Not sure whether the change is because Google just finished migrating the domain, or whether it's the removal of those 3.5k 404s. At present, there are about 500 410s in GSC.

Phew!
Thanks everyone.
This 41 message thread spans 2 pages: 41
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members