I was worried that would happen when google announced they would work cross-site.
Does a canonical tag have to be in the <head> section of the page to be effective?
If not, any site with a cross site scripting vulnerability that allows html code to be posted somehow is open to canonical tag pagerank siphoning.
One other question: Can hijacked canonical tags be cloaked? Meaning, can they be shown only to googlebot instead of to site owner's looking at the source code in their browsers?
There are a few ways I can think of to show the bad canonical tag to Google and make the site owner not realize the canonical tag is hacked. Since I'm not looking to train people in this I'll keep those specifics to myself.
Oh, it's being exploited now? Really? LOL
Sorry, but imo it doesn't take a rocket scientist to figure out it's not going to be exploited much until you make it exploitable ... All it takes is a bit of common sense ... Unfortunately, they employ a bunch of rocket scientists!
Would someone be able to spot the exploit using something like fetch as googlebot?
And would straight html pages be vulnerable to this exploit? Or would it only be for server side scripting pages like PHP and ASP, etc?
You can spot it using "Fetch as googlebot" - but only if you think to look for it. Most people, even those who are pretty savvy about defending against hacks, will probably be looking in the <body> tag for parasite nonsense, not within the <head>.
[edited by: tedster at 2:27 pm (utc) on May 14, 2011]
tags can be placed anywhere, most browsers will pick up a "title" tag in the footer for example. It may not be correct but browsers try to be lenient. Look at the source code, everywhere.
I don't think this would be a very effective tool for hackers.
I'm working on a site that has a .com and .co.uk domain. They are identical sites, so every page has a canonical tag on it- telling Google that the .co.uk is the preferred domain.
Made ZERO difference, and Google continued to index, revisit and reindex the .com pages.
Only since we put a noindex on every .com page and had the site successfully 'removed' via WMT has Google stopped indexing both domains.
Google was even indexing and showing a staging/dev version of the site in its results. A site with absolutely zero links, and again with the canonical tag telling google to index the www. domain. The only way it could have found this site was via the toolbar reporting back on visits to the URLs. (we've sinced done the same with this in WMT to remove it from Google's index).
Not really impressed with Google's adherence to a tag that it conceived and implemented.
There must be a monetization method or ROI for all this 'work' (or why bother with the hack) would you mind saying how the spammy site was monetized goodroi? Did they make money through sign-ups, email harvesting, affiliate links or adsense etc.
Or did the spammy site in return link off to other monetized sites, re-routing the increased pagerank from these 'links' (<?if any?>) through a third-party website - trying to disguise the 'theft' of 'link-juice' as it were? In my gut I get the feeling that this (however far down the internet chain) is another pollution of the internet because of lax adsense policies ...
The only reason I ask is that IF this is an effective monetization 'hack' it WILL spread, and something that was supposed to be of benefit to the internet will once again be subverted for dirty ends ... to the point where like meta keywords, the canonical tags purpose becomes obsolete, when it could be of great value ...
Canonical Link Element
Mechanics don't call engines pistons because they're not pistons.
from experiance of wordpress getting hacked the head is the favored element to inject code into.
@mattcutts tweeted about it so it's really happening:
|A recent spam trend is hacking websites to insert rel=canonical pointing to hacker's site. If U suspect hacking, check 4 it. |
Allowing cross-domain canonical is just stupid IMO, what PhD didn't see this exploit coming?
If you allow it, it should be between registered domains in the same Google account only pretty much solves the problem.
Besides, the hacker could just as easily 301 redirect your pages elsewhere, other things they can do, this is just less obvious.
Canonical tag is just a suggestion for Google and not a direction...
To be more precise, several Google spokespeople have said that Google takes the canonical link as a strong suggestion. They do reserve the right to ignore it, however, that is not what commonly happens even when there is a hack or an error.
For more discussion about this area, see the discussion Why Are There "Canonical Disasters" - Is Google Messing Up? [webmasterworld.com]
Thanks for the heads up incrediBILL and I completely agree ...
|Allowing cross-domain canonical is just stupid IMO, what PhD didn't see this exploit coming? |
The Canonical Link Element should be for internal domain reference - <link rel="canonical" href="http://"> which carries more 'weight' and minimum link-juice.
The Original Source Meta Tag should be for external and cross-domain reference - <meta name="original-source" content="http://"> which carries less 'weight' and no link-juice.
Google designed both tags (and are busy trying to solidify a use for both) so why don't they tell us how to implement them 'properly' and assign them concrete tasks, one internal and one external?
Honestly it's not rocket science ...
[edited by: JoePublisher at 2:27 pm (utc) on May 16, 2011]
|The Original Source Tag should be for external and cross-domain reference |
However, the Original Source tag is currently in use only for Google News - not Google Web Search
@Tedster, totally aware of this .... not the point I was making, google are messing around and changing the purposes of both these tags, why not adopt them as I suggested? Unless there is a reason not to spread the use of the original source tag (which at the moment is experimental) outside of google news? Why are they allowing new loop holes in the canonical tag to be exploited when they have a tag which is cross-domain in purpose already?
Assign a definitive 'purpose' for both and allow both to be in general use ...
The Original Source tag can be used between news publishers and their affiliate feeds as well as by general purpose webmasters using it to define the 'source point' of the page between their domains ... as the Canonical tag is now becoming (to its detriment). The Canonical tag should remain the stronger of the two, the original source tag the weaker of the two. One internal and one external ...
I would rather the Canonical Link Element be put back to only internal linking which would remove the main 'reason' for the hacker to alter it ... at worse google would just ignore the tag if it had been altered to point externally. It is far too useful in its original purpose to become diluted to almost nothing because of it being exploited.
I would also rather wait for google to sort out any issues and put safe guards in place to the 'Original Source' tag before rolling it out for general use outside of google news. Why is there this sudden rush for the Canonical tag to be cross-domain and alter its basic premise, opening it up to hacking, which was a bleeding obvious next step, when they were working on a cross domain link tag already?
[edited by: JoePublisher at 3:25 pm (utc) on May 16, 2011]
|I was worried that would happen when google announced they would work cross-site. |
Not really Google's fault, though, there's not much they can do to stop sites from getting hacked.
|Can hijacked canonical tags be cloaked? |
Yes, technically they could be.
|Not really Google's fault, though, there's not much they can do to stop sites from getting hacked. |
They provide the incentive when you can easily game the system by hacking a site.
Do you think the SEO pharma hacks would happen if Google was smart enough not to credit links to viagra and cialis from a page about plumbing?
Stupid is as stupid does and Google is doing some real stupid stuff, things I'd be personally embarrassed about if I were them.
Down right disgraceful in fact that they don't fix it when the fix isn't that complicated.
Considering the hacker pointed the site to a specific URL, can't they figure out who owns that URL, and throw that hacker in jail?
It's certainly a legal possibility, in any country where hacking into someone else's server is illegal at least. The same would hold for iframe injection, too - and conceivably even for parasite links.
The problem that would come up for law enforcement is one of scale - sound familiar?
But it seems to me that it's long past time that cyber-crimes be treated as just small nuisances. It's at least as serious as breaking and entering in the physical world, and far beyond mere trespassing or even shoplifting, IMO. Given the financial damages, it's actually worse than breaking and entering and more like the severity of theft.
|Considering the hacker pointed the site to a specific URL, can't they figure out who owns that URL, and throw that hacker in jail? |
Someone could hack a server and point the canonicals at your server just to send you to jail if it were that simple.
You need more evidence although it's pretty damning on it's own.
incrediBILL, I understand, but if the site that its being pointed to is obvious spam, it would be a different story.
I think at least an inquiry to the domain owner by the FBI would scare some 13 year old hackers from doing it again.
|I understand, but if the site that its being pointed to is obvious spam, it would be a different story. |
Obvious spam does not mean an obvious hacker, not mutually inclusive or exclusive, just highly suspect
Most of the spammers/hackers I've investigated on the web were a far cry from 13 year olds, we're talking organized crime type stuff making some serious coin.
The only thing we agree on is the FBI should be all over this stuff, hopefully they are.
[edited by: incrediBILL at 9:09 pm (utc) on May 16, 2011]
|long past time that cyber-crimes be treated as just small nuisances |
As noted: Mandatory 3 year sentences sought [webmasterworld.com...]
If you think about it, this is going to be pretty much an edge case though folks.
1. Find a hackable site with some good content that ranks well for terms that can be monetised by the spam site
2. Create a burner site and add all the content or add it to an existing spam site
3. Hack the site and put the redirects in place and 'hope' that google takes the 'hint' and redirects all worth to the spam site
I can't see this working in most cases. The spam site may have problems or it maybe a new site. There are no 301 redirects just hacked tags. The content is established on site A. The hacks may get a common signature that is easily detected by Google. The spam sites will likely be on different countries IP ranges, domains, in bad neighbourhoods etc.
Something else that we can kind of be aware off but in most cases, I would imagine this is not going to be a big money earner for those dastardly under-worldly sorts. :)
Matt Cutts just weighed in on this:
He says a bunch of specifics about when rel canonical will not be honored:
1) In the <body>, only honored in the <head> (answers my question from my earlier response, YAY!)
2) If Google suspects the site is hacked
3) If the <head> section is not closed
4) If it points to a 404 page.
I've seen sites with two or more <head> sections - all closed. Bad design, I know, but how would that affect point 1? :)
|brotherhood of LAN|
|how would that affect point 1? |
Fair point, lots of bad code out there.
I think points 1 to 4 are moot if the hacker has access to the server, though the rules Matt Cutts posted should eliminate some loopholes, which is a good thing.
| This 33 message thread spans 2 pages: 33 (  2 ) > > |