Welcome to WebmasterWorld Guest from 54.211.135.32

Forum Moderators: Robert Charlton & goodroi

How to no index CDN sites showing up in Google results?

     
2:01 pm on Jun 26, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


So let's say that we have a website test.com and these cdn sites: [cdn1.test.com,...] [cdn2.test.com,...] etc
When I am using the site: command I can see -as part of the Google results- also the pages of the CDN sites, which I think they shouldn't be there.
Since they are already indexed, what is the best way to remove them from the Google results?
I was thinking to add the -no index- code at the header of all related links, but is this enough?

Also, our webmasters claim that Google doesn't index redirects, so what they have done is redirecting all CDN sites to the main site.
Is this correct?
2:29 pm on June 26, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1017
votes: 244


It's better if the CDN stuffs are transparent and handled at the DNS level, with the same URL but served from different servers based on the visitors' geographical location.

Otherwise, if test.com, cdn1.test.com and cdn2.test.com are serving the same content, then use canonical urls on cdn1 and cdn2 .

Also, our webmasters claim that Google doesn't index redirects, so what they have done is redirecting all CDN sites to the main site.

Then , what's the point of having CDN ?
8:39 am on June 27, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


So, it seems that the redirection works, as the site: doesn't bring the cdn websites any more.

Regarding the canonicals, I cannot check at the moment if they are there, because of the redirection.

So Dimitri u think that having this redirection is the wrong way to handle this, right?
11:18 am on June 27, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Nov 13, 2016
posts:1017
votes: 244


Regarding the canonicals, I cannot check at the moment if they are there, because of the redirection.

Your "webmasters", should know if they set up canonicals... but if so, the cdn sites shouldn't have been indexed by search engine. (this is the purpose of canonicals).

So Dimitri u think that having this redirection is the wrong way to handle this, right?

Wait for the opinion of wiser guys, but in "my" opinion, "yes". (also it's possible I didn't understand well the situation, and your use of CDN).
11:25 am on June 27, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3482
votes: 376


Is this a 301 redirect or another type of redirect? When you look at the html source do you see a canonical tag? I suspect the CDN is not setup in the wisest way but I don't know your situation well enough.
12:50 pm on June 27, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


It's a 302 redirect.

And when u are writing that I should look at the html source, r u referring at the source of the normal website or at the cdn site?
At the CDN I have no way to check the source currently.
At the normal website, there is a canonical with the same url.
2:19 pm on June 27, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4352
votes: 300


True, google does not index the target of a 302 redirect because it is a temporary redirect - that means that its true location is where it is being redirected from and not where it is being redirected to. This will ensure that the source domain (cdn1.test.com or cdn2.test.com) is indexed but not the site where it is viewed. I don't think this is what you had in mind.
1:13 am on July 1, 2019 (gmt 0)

Full Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 291
votes: 132


if canonicals are set up correct there should be no need for redirects or anything like that

main site should be the ORIGIN of all the CDN content - so all main site content should have canonical set to its own domain urls.
All content on the CDN should be pulled from the ORIGIN (main site) so the canonicals on the cdn should be pointing to the main site domain urls
So in theory if set up correctly google shouldnt ever index the cdn content as the canonicals point to main site so something has been set up wrong as you have guessed so I would ...
1. Ensure canonicals set correctly on main domain
2. delete / expire the CDN cache and resynch from the Origin again
3. The in google webmaster tools OLD VERSION go to Googel Index > Remove URLs and remove the cdn domain from the index - this lasts for a few months by which time hopefully you have cleared everything else up

You should probably look into your CDN and site setup a bit more to try to work out why it happened though to make sure it doesnt continue. Which CDN are you using?
11:37 am on July 2, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


@Milchan thanks for the comments
1. There is only 1 canonical at the moment: <link href='https://xxxxx.com/' rel='canonical'> targeting the same url
2. I guess this is something that the webmasters need to handle
3. No need to do that. All CDN links are already hidden from the Google results

About the type of CDN that we are using, it's something that I never had to check before, so I would need to ask our webmasters also about this.
12:36 pm on July 2, 2019 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4352
votes: 300


Meta canonicals are not the same as canonical rewrites. A canonical rewrite rule (usually in your .htaccess file) makes sure that your URLs can only be accessed at one URL. A meta canonical only lets some robots know which version you prefer.

If you can visit your site at https://example.com/ and/or https://www.example.com/ that means you have no canonical rewrites.
6:19 pm on July 2, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


Just received also a coverage issue email, regarding 2 of our cdn sites.

"Duplicate without user-selected canonical"

This means that some canonicals are missing from the cdn sites?
7:14 pm on July 2, 2019 (gmt 0)

Full Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 291
votes: 132


yes that would seem to say that the CDN content is publishing with there being a canonical set to indicate that the content is from maindomain.com. My points above were to address that specific issue really . So to expand a little bit

1. Ensure canonicals set correctly on main domain - by this I mean EVERY page on your main site should have canonical set on it - how you would go about this and how you implement it exactly I cant say without knowledge of the site. Usually though though you dont do it manually and use some some php that would set the canonical meta tag in each header to be something like https://$maindomain.com/$categoryname/$productname or https://$maindomain.com/article-url are some examples.

If your using wordpress or some cms , there are lots of plugins that help automate all this very easily though and that is the way to go.

Plus it should be configured so that things like filter page have the canonical pointing to the target (unfiltered) pages as the canonical page . Eg. a pages that has url https://domainname.com/category and when you choose to filter the products on it by color, size & weight might then add parameters to the url so that it then https://domainname.com/category?filter=color,size,weight - this will be seen as a different page url by google and could be indexed as such which can cause duplicate content issues. On large eccommerce site with lots of filters options there are exponentially 100s of thousands or millions or potential filter url combinations so it can cause major problems and additionally will use up crawl budget very quickly.


2. delete / expire the CDN cache and resynch from the Origin again - so this was because It sounded like you didnt have your canonicals set correctly in your main site (which you have confirmed) so therefore your CDN copies would also have the incorrect configuration in the caches. To fix things you need to fix main site config first and then make sure CDN caches contain the corrected pages.

In your last post you saying "regarding 2 of our cdn sites. " makes me suspect that you might not be using a proper CDN at all or something that is not what id consider a standard CDN. I think you need to find out from whoever set it up or is responsible for it exactly what/how etc you cdn setup is configured


[edited by: not2easy at 7:41 pm (utc) on Jul 2, 2019]
[edit reason] unlinked for readability [/edit]

7:42 pm on July 2, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


EVERY page on your main site should have canonical set on it
If you have a hostname canonicalization redirect in place as not2easy discussed above, there is no earthly reason to clutter up your site with another meta on every single page. The "canonical" meta is only for situations where it is out of your power to constrain all requests to a single URL; typically this applies to multiple paths leading to the same content, not to the hostname. It shouldn't matter what CDN is physically serving up your content, since that has--or should have--nothing to do with the visible URL.
2:38 am on July 3, 2019 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:9914
votes: 972


^^^ Word!
4:01 am on July 3, 2019 (gmt 0)

Full Member

Top Contributors Of The Month

joined:June 28, 2018
posts: 291
votes: 132


If you have a hostname canonicalization redirect in place as not2easy discussed above, there is no earthly reason to clutter up your site with another meta on every single page. The "canonical" meta is only for situations where it is out of your power to constrain all requests to a single URL; typically this applies to multiple paths leading to the same content, not to the hostname. It shouldn't matter what CDN is physically serving up your content, since that has--or should have--nothing to do with the visible URL.


You kind of contradicted yourself there by stating that there is no earthly reason to use canonical on every page then giving one of the very good reasons why you should - having multiple paths to the same content. Now we do not know what system/cms/framework etc the OPs site is running on and if it is simple enough site with few pages/features etc that has a bunch of plain old vanilla urls it a straight forward hierarchy, then fine a canonical redirect will do the job. But the moment you introduce anything that will start adding url parameters that create multiple urls to the same content you are running risk of duplicate content issues. And lets be honest , the majority of sites out there now have some kind of complexity along those lines - filters, sorting, reviews, comments, wp tags, wp archives,articles or products in multiple category sections etc etc can all cause this issue.

The hostname canonicalization is a seperate thing and doesnt seem to be the issue here - thats just used for dealing with the issue of serving up just one http / https / www combo but the problem reported are seemingly because a cdn domain is serving up a duplicate copy of a page (which is not going to be to do with the http/https/www domain redirect configuration).
The problem we know about is that cdn.domain.com is/was being indexed and there are reports of duplicate content. That brings up 2 things to solve. Firstly of course, stopping the cdn from being indexed. Secondly though, it demonstrates that meta canonicals are not present - if they where the duplicate content issue wouldnt have occured. So basically if the cdn was configured correctly there might not be a need for meta canonicals but if they had been in place it wouldnt have mitigated the cdn misconfiguration problem. Also, it is very likely that there could be a need for meta canonicals anyway due to the parameter/urls leading to duplicate content explanation above. Overall all, and google advises this themselves, it is better to have canonicals in place for every page rather than not as it acts as a safety measure and can safe a site from huge problems.

The CDN config is a separate thing and I was only asking about the config of that because clearly there is a mis configuration in that regard if google is indexing it (which it was/is) and also because the wording of some of the post triggered my suspicion that it might be setup up in a non standard way. I.e. getting a " coverage issue email, regarding 2 of our cdn sites" sets alarms bells ringing for me straight away. To explain, firstly why "2 of our cdn sites" ? Usually a site is just running a single cdn to server static assets and that would include a number of edge locations. How would there be a report about 2 "sites". Of course we are analyzing blind here and it is feasible there are 2 cdn domains in use for some reason (separate ones for images, js etc for example) but I cant see any reason why they would be registered in google webmaster tools for sending those messages. The OP of course does let us know that he isnt completely clear on the setup of everything and it could simply be that the wording of things is due to not knowing much about these things - that is why I didnt go into too much detail with that but suggested they investigate it as no matter what it seems there was/is a misconfig or otherwise google wouldnt have indexed anything.
2:32 pm on July 3, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


Hey Michan,
actually there are more than 1 cdn sites available (actually 3), as I have stated at my first post.
I don't know why, but this is the current status.

Also, they are registered at the GSC as this is suggested by many articles online.
Regarding the setup, unfortunately I can't help much.
3:15 pm on July 3, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member Top Contributors Of The Month

joined:Apr 1, 2016
posts:2625
votes: 774


Secondly though, it demonstrates that meta canonicals are not present - if they where the duplicate content issue wouldnt have occured.

Canonical tags are not always followed, Google has said many times that they act as a hint, but that Google may decide to ignore them.

@jediviper
If you don't want the content from the CDN's to be indexed simply add a X-Robots-Tag: noindex element to the HTTP header of the pages being served from the CDN's.

see info here:
[developers.google.com...]
5:53 am on July 4, 2019 (gmt 0)

New User

joined:June 26, 2019
posts: 15
votes: 0


So I had a look at the GSC and there seems to be a canonical set for the cdn sites.
It's a User-declared canonical targeting the main domain https://example.com/

Also I noticed that the last crawl took place 10 days ago. If I request a reindex, shall the problem of the "Duplicate without user-selected canonical" be solved?

@NickMNS
Thanks will also propose this, although the cdn sites have fully stopped appearing at the google results after the redirection process.



[edited by: not2easy at 2:14 pm (utc) on Jul 4, 2019]
[edit reason] exemplified for readability [/edit]

5:59 am on July 4, 2019 (gmt 0)

New User

joined:June 26, 2019
posts:15
votes: 0


also, I think our webmasters are following the steps from here:
[webmasters.stackexchange.com...]

1.Change domain and apply 301 redirects between the old and the new one.
2.Create Google Search Console property of the new domain (with http and https).
3.Apply 301 redirects from http to https.
4.Make sure sitemap.xml contains only https URLs.
5.Make sure internal links points to https URLs.
6.There should be canonical tag pointing to https version of the URL.
8:50 am on July 11, 2019 (gmt 0)

New User

joined:June 26, 2019
posts:15
votes: 0


So there are some strange things happening with my Crawl stats.

I have noticed at the Webmaster Tools that during the dates that the redirection took place from the cdn pages towards the main domain, the no of Crawled pages per day went suddenly 8 times up and after 5 days there was another spike at 16 times up!
Last data from 3 days ago is showing the number again at the levels of the first spike.

Can we guess that the CDN redirection and this sudden increase are related?
What can be the results of this? Except the extra stress on our servers?
12:33 am on July 12, 2019 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 12, 2004
posts: 656
votes: 13


If you're using the CDN to host images/media files, restrict CDN configurations to only those files/folders. Start returning 404 for the rest of the URLs as soon as possible.

You probably have relative paths in your pages, and Googlebot is constantly discovering new URLs to crawl and index from those subdomains. Adding canonicals or noindex headers won't stop this.

If you don't have incoming links to these URLs on the CDN domain, 404 is the way to go. If they've been around for months and accumulated organic links, you may consider 301.