homepage Welcome to WebmasterWorld Guest from 54.225.1.70
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 386 message thread spans 13 pages: < < 386 ( 1 ... 2 3 4 5 6 7 8 9 10 11 [12] 13 > >     
Matt Cutts and Amit Singhal Share Insider Detail on Panda Update
tedster




msg:4276281
 10:54 pm on Mar 3, 2011 (gmt 0)

Senior member g1smd pointed out this link in another thread - and it's a juicy one. The Panda That Hates Farms [wired.com]

Wired Magazine interviewed both Matt Cutts and Amit Singhal and in the process got some helpful insight into the Farm Update. I note that some of the speculation we've had at WebmasterWorld is confirmed:

Outside quality raters were involved at the beginning
...we used our standard evaluation system that we've developed, where we basically sent out documents to outside testers. Then we asked the raters questions like: "Would you be comfortable giving this site your credit card? Would you be comfortable giving medicine prescribed by this site to your kids?"


Excessive ads were part of the early definition
There was an engineer who came up with a rigorous set of questions, everything from. "Do you consider this site to be authoritative? Would it be okay if this was in a magazine? Does this site have excessive ads?"


The update is algorithmic, not manual
...we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons.

 

tedster




msg:4280246
 4:05 pm on Mar 11, 2011 (gmt 0)

That is what I see, too, at least anecdotally. Got data, Shaddows?

Much as it hurts, I can see that maybe it's a necessary trade-off for now. In Google News, there's more effort on accurate attribution, but not in organic search.

falsepositive




msg:4280268
 4:41 pm on Mar 11, 2011 (gmt 0)

Then it wouldn't be so hard to game this system. All I have to do is create a site that steals content from everyone else, make it look better, have no ads, build links, and I should be fine. What a fine way to bring down my competitors! This new mindset for search lends itself to an easy way to sabotage hardworking businesses.

tedster




msg:4280276
 4:57 pm on Mar 11, 2011 (gmt 0)

All that is why they say they are only going to look at the data for now and not use it for ranking

Shaddows




msg:4280296
 5:26 pm on Mar 11, 2011 (gmt 0)

@falsepositive
Apart from two things
1) DMCA
2) Monetisation

If you're not selling ads, you need to sell stuff. At which point, you compete with all the other ecoms. And selling ads makes you a target of both Scraper and Panda.

@tedster
My data is pretty distorted by Panda. It took me a while to look at data through the right lens, so now I only have "what sites are new" to go on. Doing comparative analysis once two major updates go through is tricky.
----------

Take a look at some unfamiliar competitive SERPs- try a few keyword combinations. Some things hit you
a) YOU do not know who published first
b) Top results are differentiated offerings
c) There are very few sites returned which look like total dross

Once you step away from your own emotionally charged corner of the web, just looking at what is returned is useful.

(Ted, I think your response to falsepositive was more in the context of the [google] "hide your competitor" thing from another thread?)

walkman




msg:4280302
 5:44 pm on Mar 11, 2011 (gmt 0)

I was reading an article from CNN on Panda's $$ impact and decided to google a direct quote to see if others were discussing it. #1 came a scrapper, CNN second, and the CNN article was copied 100% there. I will post the link if I can find it, maybe I was logged in, but even then, CNN should have come first since I was there, no?

Edit: [google.com...]
(CNN is second for me)

Regarding improved results: I am sure if US drops 100 nukes in Afghanistan and Pakistan they will kill many Taleban, along with....

Shaddows




msg:4280310
 5:57 pm on Mar 11, 2011 (gmt 0)

[google.com...]

When searching for the topic, not the quote, CNN comes first (well, third after results for Google adwords)

Regarding improved results: I am sure if US drops 100 nukes in Afghanistan and Pakistan they will kill many Taleban, along with

G hasn't been that indescriminate. More like, if the US dropped daisy-cutters in front of all the caves on the Afghan/Pakistan border, the majority of those killed would be Taliban.

tedster




msg:4280312
 5:57 pm on Mar 11, 2011 (gmt 0)

more in the context of the [google] "hide your competitor" thing from another thread?

You're right, sorry - too many windows open ;(

walkman




msg:4280319
 6:11 pm on Mar 11, 2011 (gmt 0)

Shaddows, you are nitpicking. The article is a 100% copy. It should NOT rank at anything before CNN. My guess is that since CNN needs to pay the writer the had to put ads, the 'scraper', not so much and Google likes that.

Bewenched




msg:4280321
 6:12 pm on Mar 11, 2011 (gmt 0)

@walkman, yup ... cnn is second for me too.

The difference between the sites.... well.... CNN is running ads and the other is not.... and the other has some pretty ugly html to it.... CNN is all divs and styles.

I'm seeing this for other searches as well. Almost always the top two spots are scrapers and in my opinion bad sites that I wouldnt give my credit card to that's for sure.

walkman




msg:4280329
 6:20 pm on Mar 11, 2011 (gmt 0)

This shows to me that looks and ads are more important than content. I assume that CNN is crawled almost instantly (or at least before the scrapper) and that CNN published it first.

AlyssaS




msg:4280333
 6:32 pm on Mar 11, 2011 (gmt 0)

I was reading an article from CNN on Panda's $$ impact and decided to google a direct quote to see if others were discussing it. #1 came a scrapper, CNN second, and the CNN article was copied 100% there. I will post the link if I can find it, maybe I was logged in, but even then, CNN should have come first since I was there, no?


Are you sure it is "scraped" and not "syndicated"? There is a difference. G is targeting scraped content, but not syndication.

Lots of the major news sites syndicate their content. Taking a look at CNN's RSS TOS:

[edition.cnn.com...]

They state that you can use it as long as there are no adverts near it - and #1 complies does it not?

browsee




msg:4280337
 6:39 pm on Mar 11, 2011 (gmt 0)

End of days for G. Look at the market share for G search.

[searchengineland.com...]

If they don't fix search, their market share will go down further.

TheMadScientist




msg:4280338
 6:41 pm on Mar 11, 2011 (gmt 0)

Yeah, AlyssaS, that's exactly what I was going to add to Shaddows list:

Take a look at some unfamiliar competitive SERPs- try a few keyword combinations. Some things hit you
a) YOU do not know who published first
b) Top results are differentiated offerings
c) There are very few sites returned which look like total dross

d) If it's a legal copy which has been syndicated / copied with permission.

walkman




msg:4280339
 6:42 pm on Mar 11, 2011 (gmt 0)

"Are you sure it is "scraped" and not "syndicated"? There is a difference. G is targeting scraped content, but not syndication."

I am sure Google called the guy up, asked if it was scrapped or syndicated and then ranked them :)

Come on Alyssa. Maybe it's syndicated but still CNN should be #1 and CNN RSS apparently is just one sentence, linked to CNN.

If i had to bet it would be cut and paste: see

"Google cuts its content farm subsidy

J.C. Penney gets busted juicing its Google results "

at the bottom, those are CNN's 'related stories.' [money.cnn.com...]

Also normally a sentence saying that the article is from CNN and was published there is on each syndicated story. MSNBC does it with NYT articles for example.

browsee




msg:4280342
 6:49 pm on Mar 11, 2011 (gmt 0)

Agree with Walkman, it looks like a copy & paste. You can see all three related article links as text.

AlyssaS




msg:4280343
 6:49 pm on Mar 11, 2011 (gmt 0)

walkman - given that CNN is a major news org, with powerful lawyers, and their news is their main intellectual property, it is reasonable for G to assume that if copies of their articles appear, it is with CNN's permission and under licence. If not, CNN will take them down.

Now think about it from the licencee's point of view. Why pay CNN, AP, Reuters etc any money if they will ALWAYS outrank you? You wouldn't bother would you? And from CNN's point of view maybe the income they get from licencing this stuff out to lots of orgs far, far, outweighs the money they could get from ranking a solitary article and monetising it with ads. So from their point of view it's OK to be outranked, and their brand spreads that much more with syndication.

IMO the "scraper" update was targeting known abuses - eg people copying reviews off Amazon and passing them off as their own, that sort of thing. Not news syndication.

walkman




msg:4280345
 6:53 pm on Mar 11, 2011 (gmt 0)

"If not, CNN will take them down."

In 5 miliseconds, no need to file a DCMA, contact owner / host, wait for answer, file a suit or whatever, right? Anyway, I'm done on the CNN topic :)

TheMadScientist




msg:4280359
 7:14 pm on Mar 11, 2011 (gmt 0)

AlyssaS I think you have a good point, and imo people are looking at G rankings with too much emotion and too little of a 'big picture' view ... If the content is syndicated then it's not a big deal, and if it's not, then it really is CNN's responsibility to do something about it, not G's. (Did I really just defend G not putting the original source first? WOW! Didn't think I would ever do that.)

Most of the complaints I read are about what people see when clicking back and forth from one to another as if the G algo can 'see the same thing' ... I guess until you've done some data processing maybe you don't realize how hard their job is ... You can look at those links at the bottom and draw a conclusion (possibly accurate) but an algo doesn't 'see' things the same way.

There's way too much emotion and too little logic about why the rankings are the way they are most of the time imo.

TheMadScientist




msg:4280371
 7:23 pm on Mar 11, 2011 (gmt 0)

If anyone wants an unemotional idea of how tough it is, spider the CNN page and the duplicate, then 'detect' the duplicate as a duplicate with a script, then apply the 'duplicate detection process' to a set of results (or pages you find where there's duplication or syndication ... use a set where there's not a 'major source' to detect) and see how accurate it is for detecting origination ... Really, before you say 'they should (blah)' go try and do it ... My guess is you'll have a whole new found appreciation for the job they do and what a challenge it is, and you'll quite possibly run into issues you would not have thought exist.

The only 'fair' way I can come up with is 'discovered first' and we don't have any clue which of the two results GBot hit first ... It would seem like it should be CNN in the case above, but the reality is we have no way of knowing.

crobb305




msg:4280378
 7:36 pm on Mar 11, 2011 (gmt 0)

I hope the Google share prices begin to go back up. They started a steady decline on Feb, 18 (NASDAQ:GOOG). Fluctuations happen, but Google has been in the headlines A LOT lately, with news about how they are spanking companies, and receiving a backlash.

tedster




msg:4280384
 7:44 pm on Mar 11, 2011 (gmt 0)

before you say 'they should (blah)' go try and do it

I agree - we begin to see why Google News is introducing the idea of an "original source" meta tag.

Unfortunately, even if news sites implemented that idea flawlessly it still wouldn't address the rest of the duplicate attribution problems around the web.

TheMadScientist




msg:4280386
 7:51 pm on Mar 11, 2011 (gmt 0)

Yeah, tedster, the ONLY way I can think of to make it 'more fair' is to go with discovery order, because everything else is depending on a webmaster being honest with attribution ... You can't go by domain, because of all the syndication and it has to be something you can apply on a large scale, not just 'major source' of information ... You can't go by links / PR, because they don't say anything about who published what first ... You can't go by writing style because it's all the same with duplication ... You can't go by [insert a bunch of other variables here] ... The only thing I can come up with is 'original discovery time' of a near duplication being 'devalued' for the ordering of results based on origination attribution, and then it's still up to the publisher to act if it's incorrect.

zerillos




msg:4280418
 9:19 pm on Mar 11, 2011 (gmt 0)

I don't think discovery order is the best solution. Some website are spidered slower than others and I've seen countless situations when the scraper got indexed faster than the original source.

I read this idea here a few days back and I think it's a better solution to the scraper issue. when you publish something new, you don't post it publicly immediately. First you ping google, and wait for a signal of some sort, that G spidered your new content. If it agrees with you that it's new and original, it sends you a token. After receiving it, you can make your content public knowing that G is aware that you're the original source.

The CNN article is second if i search it from my desktop, but it's the first result if searched on G mobile. Even if AlyssaS has got a very good point there, that doesn't make it right. Copyright should still be respected.

walkman




msg:4280423
 9:23 pm on Mar 11, 2011 (gmt 0)

To change the topic a bit, has anyone changed something and got back up in SERPS? Like "I deleted a gazillion thin pages, they were indexed /seen as gone last week and now I'm all OK"

Anyone?

TheMadScientist




msg:4280426
 9:33 pm on Mar 11, 2011 (gmt 0)

I read this idea here a few days back and I think it's a better solution to the scraper issue. when you publish something new, you don't post it publicly immediately. First you ping google, and wait for a signal of some sort, that G spidered your new content. If it agrees with you that it's new and original, it sends you a token. After receiving it, you can make your content public knowing that G is aware that you're the original source.

LOL All that idea is is discovery order complicated.

You don't agree with discovery order, but you think you should ping G to make sure it gets spidered first (discovery order credit to you) and then do a bunch of ish rather than having them just credit the one discovered first and solving the issue without having to do a bunch of extra work.

If they credit originality based on discovery order all you have to do is: Ping, Tweet, (possibly fetch as GoogleBot) and watch your stats for 2 to 5 seconds before you publish a link ... Why complicate the issue?

You don't need a token on your page, they need an internal token to solve it, because any good scraper is just going to remove any token you publish anyway ... and if they don't remove it, then they're credited too, aren't they?

[edited by: TheMadScientist at 9:44 pm (utc) on Mar 11, 2011]

crobb305




msg:4280430
 9:35 pm on Mar 11, 2011 (gmt 0)

@Walkman, I am just now trying to come up with a plan after my hit yesterday. I'm still unsure about what caused it (if it was Panda disliking my 5 pages that have 1 or 2 affiliate links), or if it was due to tweaks I made to my homepage back in January (2 months ago) that triggered an OOP. The changes I made were to place my slogan header inside an <h1> tag (that slogan contains the 2-word phrase that got demoted from page 1 to page 56 yesterday). I also added a few more links to some of my internal affiliate pages (so my homepage could now be deemed as "lower quality"). So am I facing an OOP (2 months after the changes), or a low-quality signal from 5 monetized pages (despite 104 unmonetized/good-content pages)? That is the big question.

Walkman, did you see a demotion similar to mine (not a 100% removal, but a significant drop on rankings and about 60% drop in traffic), or was your site completely removed from the index?

zerillos




msg:4280436
 9:46 pm on Mar 11, 2011 (gmt 0)

Why complicate the issue?


As i said, some websites get spidered slower. They could wait for much longer than 2-5 seconds before gbot comes to fetch the page. If it's a news site, sometimes an hour is just to long to wait before publishing a breaking story.

TheMadScientist




msg:4280439
 9:50 pm on Mar 11, 2011 (gmt 0)

First you ping google

You can ping Google right now today, either with a regular ping or a tweet.

(I actually think Fetch as GoogleBot might work too, because when I do my regular GBot is what fetches the page, but I haven't tested it and it takes longer than a tweet to get the bot there from the testing I've done.)

Why don't you go put up two new pages that are unlinked, then ping Google about one and tweet a link to the other one and see how fast GBot hits each page ... Really, try it ... The systems for getting GBot to a new page 'now' are already in place ... Your idea is simply discovery order ... All they need is an internal token, like DiscoveryTime, to be attached to the page.

NixRenewbie




msg:4280536
 2:58 am on Mar 12, 2011 (gmt 0)

Ah, Seven_Cubed ... agreed! If I can't go crazy I might as well stay home!

aaronbaugher




msg:4289049
 3:43 pm on Mar 29, 2011 (gmt 0)

What I do not understand is the following: They say they cannot reveal information about the update because it would be gamed. But what would be the consequence if everyone knew Google's definition of quality? Right, everyone would modify or create sites based on that standard, at least if they want to rank in Google.

Yes, that's always been an especially lame argument. If Google said, for instance, "As of this update, pages with more than 25% of space above-the-fold devoted to ads will have their rankings lowered," what would happen? Every webmaster who's paying attention would make sure his pages met that standard, and the web would be a better place for it. Sure, a few who only had 10% of space in ads might increase theirs to 24%, but so what? Google's doing the research to determine what users want to see, so if that research determines that 25% is okay, then it's okay. Or they could make it 10% -- whatever they think works. The point is, they claim to know better than anyone else what users want to see, so why not pass that info along to webmasters so they can adhere to it?

Sure, spammers and black hat types will follow the instructions too. But so what? They're more likely to figure out the limits through trial and error anyway; it's the mom-and-pop place without a dedicated SEO budget that can't keep up. Some clear directions would help those small- to medium-sized operations most of all.

tedster




msg:4289107
 4:56 pm on Mar 29, 2011 (gmt 0)

Hello Aaron, and welcome to the forums.

My take on it is that the algorithm is much more complex than a simple "doing X will hurt a page by N%." To share more than they have already done (by describing the training set in this interview) would give away not only the specifics about what is being measured right now (and that will evolve) but also exactly how their processes work.

Google already gave webmasters a lot more detail than they did with other major updates, and more than any other search engine ever has. Also, even though there is a focus on what can hurt a site, the document classifier algorithm is also designed to classify some sites as high quality or mixed quality.

"we actually came up with a classifier to say, okay, IRS or Wikipedia or New York Times is over on this side, and the low-quality sites are over on this side. And you can really see mathematical reasons...

This 386 message thread spans 13 pages: < < 386 ( 1 ... 2 3 4 5 6 7 8 9 10 11 [12] 13 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved