Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to Get Google to Index and Update the Canonicals Faster

         

NYCTech

5:26 pm on Apr 20, 2017 (gmt 0)

5+ Year Member Top Contributors Of The Month




System: The following 1 message(s) were cut out of thread at: https://www.webmasterworld.com/google/4842918.htm [webmasterworld.com] by engine - 4:43 pm on Apr 21, 2017 (utc +1)


We got hit very hard by Fred and seemingly are getting hurt by some of the more recent updates. We've fixed a ton of on-site issues, and I suspect a technical mistake we made with URL parameters might have gotten us hit with a cloaking penalty.

Now that everything is cleaned up, I am hoping to see a recovery. The biggest issue that still shows in Search Console is that there are tens of thousands of pages in the index that probably shouldn't be there (they're basically slight variations based on a parameter). We also had a mistake in how MVC changed URLs without updating the canonicals, so Google inaccurately thinks lots of pages are canonical to lots of other pages. Obviously this could be seen by Google as lots of duplicate content and a terrible ratio of low quality pages to high quality pages. I think the only reason we still rank on page 1 for lots of searches (though lower on the page) is that user experience is good - decent dwell time, bounce rate, time on site, pages per visit, fast loading, etc.

Traffic from search is down 85% or so. Is there any way to get Google to reprocess the site faster, update the canonicals, and clean the index? I know we could potentially do URL removal, but we're talking about tens of thousands of pages, and don't actually know specifically which are in the index (so we could have to submit hundreds of thousands to make sure we got most of those).

not2easy

12:29 am on Apr 22, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I've had good results by deleting the sitemap for urls I really need to get rid of, then resubmitting a new sitemap that accurately lists the pages that should be indexed. If the duplicate content URLs contain a /directory/ in the URL that does not also have any pages you do want indexed, you can disallow that directory in your robots.txt file. Note- don't disallow if you have recently changed the meta from "index" to "noindex" because Google won't see the changes if they are disallowed.

tangor

6:19 am on Apr 22, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Return 410 for the parameters in question. This not only shows you've removed them, it instructs g to remove them as well.

View your raw logs to see how often and how many urls g allocates to their crawl budget for your site. From this you can extrapolate how long it will take for them to both remove the bad and index the good.

When talking ten thousand or more pages just know it won't happen overnight, next week, or even in a month's time. Even then, years from now, g will still check to see if those pages will exist as part of their crawling practices.

buckworks

10:20 pm on Apr 22, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



410s are appropriate if the pages are actually gone. That's not always the case, though. Near-duplicate pages such as color or size variations might have good value for users even if it would be better to keep them out of the search indexes.

If pages with unwanted parameters have proper canonical elements (it sounds as though yours do), Google will remove the variant URLs from the index as soon as it notices the canonicals. You don't need to specifically ask for removal.

That's the bottleneck, though. How quickly new canonicals get noticed will depend on how often and how deeply your site is crawled. Some URLs will be deindexed quickly while others may linger for months.

"Fetch as Google" in the Search Console is your friend here, to ask for pages to be crawled. I usually prefer to request crawling rather than removal, to make sure I capture the SEO benefits of the canonicals. You can make 500 "Fetch as Google" requests in a month for individual pages. Tedious, but it works.

Even more powerful is to request spidering for a page and all its links. You can do that ten times a month, and I haven't found a limit to how many links can be on the page.

I've been experimenting with that recently, to get Google visiting "backwater" pages more quickly.

(1)) Make a page that links to lots of your unwanted URLs. This page can be on a different domain if you like, as long as you have access to the Search Console.

(2) Use "Fetch as Google" to request crawling for that page and the links on it.

That will get Google revisiting corners of your site that might otherwise wait a long time to be crawled.

To create my list of links I do searches like these in FireFox ...

allinurl: UnwantedParameter1 DomainName
allinurl: UnwantedParameter2 DomainName

... and use Link Gopher to collect the URLs that turn up. I then build a list of links to the unwanted URLs, hundreds of 'em, and submit that page to "Fetch as Google". Some copy and paste work can create a long page of links in just a few minutes.

This is especially useful for URLs that are not otherwise linked from the current site navigation and might take Google a long time to revisit.

buckworks

3:14 pm on Apr 23, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Here's an example of parameter problems. Eight months ago a client had over 45,000 URL variations along these lines in the index:

h ttps://example/widgets/6749/?color=blue&colorid=10
h ttps://example/widgets/6749/?color=oyster&colorid=235

h ttps://example/widgets/1953/?color=colorid=1307
h ttps://example/widgets/1953/?color=purple&colorid=114

When we added canonical link elements...

<link rel="canonical" href="http://example/widgets/6749/" />

... the "colorid" variations started dropping out of the index and whatever link popularity and other SEO factors they had became credited to the "main" URL.

A few weeks ago there were still about three thousand "colorid" URLs in the index, and I finally got the idea described above to get them crawled more quickly by linking to them from an outside page. Now the total comes down every time I update the page of links, and there's just a few hundred to go.

The client is now more likely to rank at the top in searches for his own products!