homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

Canonical tag scraped from my site, will it hurt?

 5:04 am on Apr 29, 2014 (gmt 0)

I just recently found out that an extremely spammy site is currently using my theme. They probably did it just by copying and pasting so all the layout looks sucks.

Because they did it that way, all my header properties are all copied along, including the rel=canonical which pointing to my homepage url. OMG, I have no control over this. How can i tell search engine that this site isn't mine?

Will this kind of canonical from spammy site hurt mine? Anyway, can disavow help this kind of situation? It's just a canonical and therefore, there's no inbound link detected from that site.

Please advice. Thank you.




 9:37 am on Apr 29, 2014 (gmt 0)


Do not ignore this situation; please use the Google Spam form to report Google about this situation. When you report, Google will check the actual owner of the domain and spammer will get listed in Google scrap index database.

So, you do not need to worry at all just login you WMT account and submit the report.
Find the report link: google.com/webmasters/tools/spamreport?hl=en

Do not use Disavow Tool ever because Disavow Tool is only used for disavowing the links not for disavowing Canonical Tag.


[edited by: Robert_Charlton at 11:04 pm (utc) on Apr 29, 2014]
[edit reason] no promotional signatures, per TOS [/edit]


 11:41 am on Apr 29, 2014 (gmt 0)

Thanks for the reply, but I can't seem to find a category in the spam report that fit this situation. That site isn't doing this to "trick Google into ranking them highly" (stated in the webspam report page). It is just using my theme (which I don't really care) and has a canonical to my domain. Im worrying because the spammy sites contain all sorts of spammy contents such as lyrics, mp3, etc.

Well, I have a disavow list submitted last year and I've actually recovered from penguin once so I know what disavowing is.

Thank you but I don't really think the webspam report will help. Anyone else? =(


 11:58 am on Apr 29, 2014 (gmt 0)

Welcome to WebmasterWord, kikolani!

If your canonical link element has the full URL path (as recommended), which includes your protocol, domain name and path, (e.g. it is set up in the format "http://www.example.com/your-site-page" )then you have nothing to worry about with regards to Google ranking.

Google would take your canonical into account which says the page on your site is the canonical version of the page.

I hope rel canonical on pages on your site are not all the same pointing to your home page. Each unique page you want indexed should have its own canonical


 12:25 pm on Apr 29, 2014 (gmt 0)

Hi thanks again for the follow ups.

Actually my question is, since bad links from sites you don't own will hurt your site, will canonical from random bad sites hurt you too?

Thank you.


 1:34 pm on Apr 29, 2014 (gmt 0)

As I understand it, the canonical tag is supposed to be used on pages with the same or similar content to identify the preferred page. If those other pages have totally different content than your pages, then the tag is being mis-used, and the Google algorithm should ignore it.


 4:02 am on Apr 30, 2014 (gmt 0)

First Thing:

As I told you first login your GMAIL account and then open this URL google.com/webmasters/tools/spamreport?hl=en

You can easily find “ This page is really webspam “ Report Spam button at the bottom. Here you need to click on “Report Spam Button” and then the spam form will be loaded. Now you can clearly specify your details & situation.

Second Thing:

Canonical tag is used just to remove duplicate URL’s or you can say the variant of URL’s like –
1- http://www.example.com
2- www.example.com
3- http://example.com
4- example.com/index.php

We can not control other sites what they are doing actually, but Google knows actual owner of the perticullar domains is. If someone using your canonical for his/her websites you no need to worry at all because Google will check whois details to find correct ownership details.


 4:27 am on Apr 30, 2014 (gmt 0)

Just to get this crystal clear:

In normal usage, you'd have something like this:

And then each of those pages would include a line in the <head> saying something like
<link rel = "canonical" href = "http://example.com/pagename.php">
<link rel = "canonical" href = "/pagename.php">

So you're talking about a site at
et cetera, and the <head> of that site's pages says
<link rel = "canonical" href = "http://example.com/pagename.php">
giving the full name of your domain?

Is that what you're describing?

To me that sounds like a scraper who is just too stupid to live. I would hope any major search engine can recognize them for what they are, at no risk to you.

Edit: ravisharmaseo, why do keep saying "log in to gmail" when we're talking about Webmaster Tools? Yes, it's all the same login* whether you like it or not. But you can have wmt without gmail.

* In this week's version, attempting to log out from wmt in Camino takes me instead to my own g### profile page. I can only log out from Safari, or from Google Search now that I'm logged in.


 11:34 am on Apr 30, 2014 (gmt 0)

If someone using your canonical for his/her websites you no need to worry at all because Google will check whois details to find correct ownership details.

As far as I know, Google does not use whois when they find a canonical on one site pointing to another. Have you got a Google source for this information?

We need to be careful not to spread misinformation (which is often the case in SEO).


 5:29 am on May 1, 2014 (gmt 0)


As far as Website concern specially when we talking about Google & other search engines yea its very write, every search engine knows all about your websites. There has several mechanism, algorithm, index servers, rules and protocols that verify your website data’s. Definitely Google knows your website IP where it is hosted, who is the registrar, when domain has been booked, domain crawling histories etc.

Search Engine Means not just crawling and indexing the WebPages. So, please do not lead astray and talks always genuinely. You are in IT world you can not be able to hide anything.

Kindly keep in your mind before saying anything:

“Google always knows your WhoIs data but currently is not using in the ranking algorithm. WhoIs data will probably be used when a site comes under a manual review.”

I already suggested, he/she has to report manually (about the canonical) using Google Spam Form and then after Google will check WhoIs details for both sites to find-out who is the real owner.

Google always validate your sites whenever you submit the Manual Report just like – Malware Action, Disavow Action or any Scrappy Action have taken by third party.

brotherhood of LAN

 5:54 am on May 1, 2014 (gmt 0)

ravisharmaseo, please post a link to an authoritative source regarding the relevance of WHOIS in this situation as you seem to be quite certain that it's relevant here.

Regardless of the amount of information Google collects... it doesn't necessarily mean that said information is actually used in all the various ways it can be used. Besides, there are millions of domains that are hidden behind WHOIS privacy.


 6:24 am on May 1, 2014 (gmt 0)

Google's Spam report is for reporting a site that you believe is abusing Google's Quality Guidelines for Webmasters - according to their site:
If you believe that another site is abusing Google's quality guidelines, please let us know by filing a spam report. Google prefers developing scalable and automated solutions to problems, so we attempt to minimize hand-to-hand spam fighting. While we may not take manual action in response to every report, spam reports are prioritized based on user impact, and in some cases may lead to complete removal of a spammy site from Google's search results. Not all manual actions result in removal, however. Even in cases where we take action on a reported site, the effects of these actions may not be obvious.

kikolani had pages copied, reporting scraping as spam is not the best action to take and from Google's own site they do not mention verifying ownership for scraped content. That is what the canonical tags are for.


 2:10 pm on May 1, 2014 (gmt 0)

“I already written WhoIs data not used in ranking algorithm but it is very important when we talking about Spam Websites and Black Hat sites.”

@brotherhood of LAN

For your kind information, in 2006 Google became an ICANN registrant like GoDaddy! Google as a Registrar Has Access to the WHOIS API.

Yea I know you can use privacy on your domain name called WhoIs privacy where you can hide the details but Google devalues the sites because of a lack of openness or reputation.


 2:37 pm on May 1, 2014 (gmt 0)


We all know that Google has access to WhoIs data because they are registrant, this is an old news.

However you said that Google using this WhoIs data in their algorythm.

So far you have made two statements on this: using WhoIs when looking at cross-domain canonical and devaluing site with WhoIs privacy service.

So if you have an authoritative source, please provide us with a link.

Otherwise this is just your opinion which, for example, does not match with what my experience is in both of these circumstances.


 2:49 pm on May 1, 2014 (gmt 0)


Thinking about your question again, there are two aspects:

a) the other site being spammy
b) the other site perhaps having spammy links

I think that b) could be a bigger problem than a). Since canonical consolidates links to a page the canonical points to, then theoretically you could be "inheriting" all spammy links in this way. Whether Google has something to safeguard scraped sites from this I do not know, but I suspect is that there are lots of sites in the position the same as yours and perhaps we would hear if someone's site was tanked in this way.

brotherhood of LAN

 5:32 pm on May 1, 2014 (gmt 0)

>but I suspect is that there are lots of sites in the position the same as yours

Indeed, I'd suspect Google would ignore the scraped version, but it'd get be good to hear someone who's witnessed that's the case.

Google most definitely does not have access to .uk whois data, domains registered by proxy, and a host of others. Google potentially devaluing the 'trust' of private WHOIS isn't relevant. The issue is far more likely to be whether Google totally ignores the scraped page or attributes it in some way with the domain/page inside the canonical, regardless of who owns the domain(s) and what characters are inside the WHOIS fields.


 8:10 pm on May 2, 2014 (gmt 0)

Google potentially devaluing the 'trust' of private WHOIS isn't relevant.

FWIW, do we really know what all registrars see?

Do they see the real registration data, the private registration, or both?

Knowing that trivial tidbit would answer a lot of the unknowns really fast.

brotherhood of LAN

 3:23 am on May 3, 2014 (gmt 0)

They see what the rest of us see... registrant, admin,technical and billing information for a domain and any other fields provided in the WHOIS for the TLD. If you register a domain via some kind of WHOIS guard that uses their own details, then those details are what Google sees. ravisharmaseo has suggested that private/obfuscated WHOIS may be a signal of lower quality (can't say I disagree), but IMO WHOIS isn't really relevant here other than a manual check at Google, on the assumption that some kind of manual review would take place here.

Having some kind of algo component that takes 2 sites and evaluates their WHOIS sounds shakey to me, I don't buy it (happy to be proven wrong though).

In regards to the OP, my best guess is that this kind of situation isn't going to warrant a manual review, and again IMO, the scraped page would be ignored. As aakk9999 pointed out, if it did matter it's a 2-prong potential issue, one being any backlinks to the scraped site potentially being counted towards the OP's site. Perhaps in this case it'd be obvious in GWT whether that's the case.

My instinct is that Google's seen this 000's of times before and it makes sense just to totally ignore the cross-domain-canonical-scraped-version.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved