homepage Welcome to WebmasterWorld Guest from 54.204.134.183
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 34 message thread spans 2 pages: 34 ( [1] 2 > >     
Matt Cutts and Hacker News
seodudez




msg:4373845
 10:59 pm on Oct 12, 2011 (gmt 0)

There is a great post going on at Hacker News with Matt Cutts. Someone is complaining about getting banned from Google and Matt is talking about quality and do no evil...

[news.ycombinator.com...]

"It's a fair question, but our quality guidelines have been quite consistent for the last decade or so. Our team takes action without regard to whether a website is an advertiser, partner, or competitor. The long-term loyalty/trust of our users is worth much more than any sort of short-term revenue that a sneaky deal would provide. And Google's culture is such that anyone inside or outside of the company can claim that a particular practice isn't in line with "Don't be evil" and kick off a fair amount of self-scrutiny."

 

walkman




msg:4373873
 2:27 am on Oct 13, 2011 (gmt 0)

And Google's culture is such that anyone inside or outside of the company can claim that a particular practice isn't in line with "Don't be evil" and kick off a fair amount of self-scrutiny."
Say what you want about Google but they do have a sense of humor. Personally, I am going to skip this "self-scrutiny," simply not going to buy it. I know enough of their practices to support [en.wikipedia.org...]
1script




msg:4374006
 2:38 pm on Oct 13, 2011 (gmt 0)

Matt Cutts starts off with a very troubling premise right off the bat:
You can make an infinite number of autogenerated pages on your site
and then proceeds to give examples of various query strings you can (does not mean you should or that anyone did) submit to get a new URL/content. But that's just the issue: if Googlebot had not submitted forms, this would not be an issue. These autogenerated URLs do not exist for any intent or purpose until they are at least linked to (usually maliciously by a competitor) or Googlebot starts to try every combination of every query string parameter. In other words, Googlebot is creating new URLs for itself as it goes, and then the webmaster gets punished.

He then mentions that another search engine claims they have to sift through 20 Bil URLs to find 1 Bil non-spam pages. Well, if you did not let your bot create URLs on the fly, you will have much less URLs to sift through. Say, 5/1 instead of 20/1 ?

deadsea




msg:4374033
 4:11 pm on Oct 13, 2011 (gmt 0)

The bigram post by option1138 is quite interesting. He claims that certain pairs of words indicate spam on web pages.

Based on the number of legit emails I get that my email client marks as spam using its Bayesian filter, a big percentage of the web would get marked down as spam if a similar approach were taken by Google.

Lends a lot more credence to the "poison words" theory that I saw a thread here about last week.

g1smd




msg:4374212
 11:37 pm on Oct 13, 2011 (gmt 0)

If requesting any set of random words or characters in the path part of the URL always returns a page of content and 200 OK status then you have infinite URL space. I'd call spam on this too.

A website should have a finite number of URLs that will return 200 OK status and some content. Others will return 404 or 410, and certain formats will 301 redirect to the canonical URL for the content. If anything and everything returns content with 200 OK status, that is not a good signal.

FranticFish




msg:4374344
 7:43 am on Oct 14, 2011 (gmt 0)

if Googlebot had not submitted forms, this would not be an issue

Not really in this case. The site in question appears to me to have been deliberately set up to try to take advantage of Google's penchant for url discovery, then punt people on to Amazon to make an affiliate buck.

I have to admire the webmaster's front though, in acting as if his site were some sort of quality resource that deserved any attention whatsoever from anyone including Google.

jecasc




msg:4374357
 9:08 am on Oct 14, 2011 (gmt 0)

I do not think the site deserves a total ban. It deserves to be included in the SERPS with exactly one page. The homepage. And then turn up when someone searches for the service it provides.

However i guess it turned up in the SERPS when someone was searching for one of the products. And when this is the case the website is not a service but an obstacle for users. So Google should simply have stopped including search result pages from the site.

Sgt_Kickaxe




msg:4374371
 10:22 am on Oct 14, 2011 (gmt 0)

When Google's webspam team takes action on websites in our websearch index, we can pass that information on to the ads group so they can check for violations.


Nice to know that one fallen domino may cause you other problems. As for the site it's ONE major mistake, other than not implementing a proper word filter and 404 system, was not adding mashup from other sites besides amazon. Google hates the site as is, obviously, but add items from other services and you get, well, a copy of Google's shopping pages.

and the links/keywords are duplicate content.

I think that was the red flag. I'm strongly believing that Google has a "duplicate content % rating" assigned to all sites, perhaps even several independent ratings (one for link anchor text, one for non-link text, one for sitewide template text, one for hot zone text etc). Affiliate sites often repeat link anchor text and so rate poorly without additional text.

Check your anchor text and NO, just switching the word order will not help anymore.

Hissingsid




msg:4374378
 10:53 am on Oct 14, 2011 (gmt 0)

I think that was the red flag. I'm strongly believing that Google has a "duplicate content % rating" assigned to all sites, perhaps even several independent ratings (one for link anchor text, one for non-link text, one for sitewide template text, one for hot zone text etc). Affiliate sites often repeat link anchor text and so rate poorly without additional text.

Check your anchor text and NO, just switching the word order will not help anymore.


For goodness sake will you please read about Semantic Vectors, it explains all of these things we are seeing with Panda.

seodudez




msg:4374492
 3:22 pm on Oct 14, 2011 (gmt 0)

Hissingsid: I tried to read about semantic vectors, but it went way over my head. Can you give me the dummies version?

walkman




msg:4374495
 3:30 pm on Oct 14, 2011 (gmt 0)

For goodness sake will you please read about Semantic Vectors, it explains all of these things we are seeing with Panda.
Maybe, but not all sites have articles, many barely have any text that can be analyzed. Yet, some are doing much better, others have crashed. So, at best it may describe part of Panda.
Hissingsid




msg:4374500
 3:54 pm on Oct 14, 2011 (gmt 0)

Biswanath Panda is a very cleaver chap who works for Google. The Panda update was named after him.

His speciality is categorisation using decision trees.

If you couple decision tree categorisation with semantic analysis (the output of which is a vector model of a document) then you have a breakthrough in the storage space required and speed of retrieval of data stored.

It is the application of this as a component of the ranking algorithm that I believe is what Panda is all about.
Do a search on some of the vocabulary that has become very popular with the Google spokespeople, "False Positives" for example, and semantic vectors and you will see papers on "scalable detection of semantic clones" etc etc etc.

It seems to me that there can be no doubt that this is what Panda is about, the problem is, once you come to that conclusion what do you do about it.

walkman




msg:4374511
 4:25 pm on Oct 14, 2011 (gmt 0)

Biswanath Panda is a very cleaver chap who works for Google. The Panda update was named after him.

:) I never said he wasn't smart and I don't doubt his brain power, at least in math. I can't really classify him as the smartest on earth or #8457 but that's besides the point. We do not know what 'Panda' offered to Google to make Panda happen. He may simply come up with the 'What is a good page site' questions.

I said that there is a lot, lot more than just text analysis and in fact text is ignored at least in my niche--and even in news (See Huff Post or Business Insider type of sites). Several sites in my niche that gained massively in February Panda got slammed in future Pandas, even though they had a lot better text not found in any other site.

Can Google rank a story as 'great' by sending Googlebot? Maybe, when it comes to grammar, spelling and uniqueness but that's not 'great.' Even with that, I sincerely doubt they can apply that to the entire web, or even subsections.

edit: In fact Google lumps them all in one: low quality, poorly written, shallow, not useful etc etc. No way an algo can tell that by reading a page.

aristotle




msg:4374533
 5:06 pm on Oct 14, 2011 (gmt 0)

Can Google rank a story as 'great' by sending Googlebot? Maybe, when it comes to grammar, spelling and uniqueness



Wouldn't an analysis of the on-page content also be able to evaluate its depth, comprehensiveness, and relevance. Then the algorithm could combine this with statistical information about visitor behavior to evaluate quality

tedster




msg:4374538
 5:15 pm on Oct 14, 2011 (gmt 0)

In a recent video, Matt Cutts said that Google does not use spelling or grammar as a direct ranking or quality signal. From what I've seen, that is the straight scoop. There is a noticeable correlation, he clarified, but it's not a direct cause-and-effect thing.

-----

With regard to this Hacker News discussion, the site in question isn't a Panda demotion - it's a manual action penalty to keep it out of the search rankings and I've got to agree with that judgment. There's no way I want to land on one of those pages just because there's some kind of keyword match.

wheel




msg:4374539
 5:17 pm on Oct 14, 2011 (gmt 0)

Wouldn't an analysis of the on-page content also be able to evaluate its depth, comprehensiveness, and relevance. Then the algorithm could combine this with statistical information about visitor behavior to evaluate quality

I suspect part of the problem is that we don't even know what's being measured. If they're looking for depth, or relevance, or comprehensiveness, than webmasters can provide content that matches these attributes - even if we don't know how they're being measured.

For links, we know that relevant and authoritative matter. I don't need an algo to tell me if the site is relevant and authoritative - I can tell by looking. And we know relevant and authoritative both matter as general attributes.

But panda and page content? is it depth? comprehensiveness? Relevanance? It seems like we don't even know *what* they're looking for - nevermind figuring out how to offer that.

walkman




msg:4374562
 5:56 pm on Oct 14, 2011 (gmt 0)

Wouldn't an analysis of the on-page content also be able to evaluate its depth, comprehensiveness, and relevance. Then the algorithm could combine this with statistical information about visitor behavior to evaluate quality

Yes and no. 500 more words don't equal a better explanation for example and Google cannot tell what's true or not. Can Google tell that "Obama is the President of Kenya, the country where his father was born" is false as they analyze, sort and rank the gazillions and gazillions of pages? What if you use satire? Or debunk what others said? If a topic is questionable, who is Google to decide what's true?

And then a lot of e-commerce sites with no real sentence structures have been destroyed as well by Panda, when others (can't mention that almost all are top brands) have done extremely well without much more text.

aristotle




msg:4374569
 6:10 pm on Oct 14, 2011 (gmt 0)

walkman if you read my statements, I didn't say that an analysis of on-page content is the only factor. It needs to be combined with other factors such as data on user behavior. But it obviously has to be one of the major factors.

walkman




msg:4374570
 6:15 pm on Oct 14, 2011 (gmt 0)

aristotle, yes, I buy that. Along with brand power of course

tedster




msg:4374571
 6:19 pm on Oct 14, 2011 (gmt 0)

I noticed this exchange with interest:

lewispb: I think he [Matt Cutts] can probably already see your analytics

Matt_Cutts: Nope, I can't. I can see how often the site shows up in e.g. our search results, but Analytics is a separate property and they don't send data to the search team either.

Robert Charlton




msg:4374620
 8:00 pm on Oct 14, 2011 (gmt 0)

Yes and no. 500 more words don't equal a better explanation for example and Google cannot tell what's true or not. Can Google tell that "Obama is the President of Kenya, the country where his father was born" is false as they analyze...

walkman - It's extremely likely that the phrases extracted from ""Obama is the President of Kenya", at the least, would send Google a strong signal that something is amiss with the page.

Yes, Google does use "term vectors", as Hissingsid has been suggesting, along with a considerable degree of semantic parsing and statistical analysis, both onpage and off. I myself can't do the topic justice, nor do I claim to be extremely fluent in the area, but I understand it enough to know it's being used and to understand why, and here and there how to apply that to websites. Some big hints that relate to ecommerce have been dropped in the forums, but there's been so much noise here that it's often a chore to find them.

You might try reading some of Bill Slawski's discussions on phrase-based indexing and on semantic factors... [seobythesea.com...] ...which I find very easy to follow. In the Plex by Steven Levy, a book which you mentioned you hadn't read when you criticized it, presents a superbly clear though very basic introduction to the kind of semantic stuff that Google has been doing for years. I recommend you take a look at it.

Robert Charlton




msg:4374681
 9:42 pm on Oct 14, 2011 (gmt 0)

PS: Also take a look at WebmasterWorld's 2007 discussion of some of the core Phrase Based Indexing patents....

"Phrase Based Indexing and Retrieval" - part of the Google picture?
http://www.webmasterworld.com/google/3247207.htm [webmasterworld.com]

walkman




msg:4374683
 9:45 pm on Oct 14, 2011 (gmt 0)

Regarding the Kenya thing, it was an example. But we've all done searches and seen good results, decent ones, so-so and many times plain horrible ones.

In the Plex by Steven Levy, a book which you mentioned you hadn't read when you criticized it, presents a superbly clear though very basic introduction to the kind of semantic stuff that Google has been doing for years. I recommend you take a look at it.
I actually know /knew that Google does that and by the time it shows in a Google approved book it's too late anyway. But the debate was about something different.

As far as the book, I mentioned that he is not fair and questioned a conclusion that one apparently got from the book (Larry and money).

But you made me look at Amazon reviews:
"Google booster"

"may have lost some journalistic objectivity by his wonderment of the company and their significant accomplishments. I didn't feel he represented the reasonable criticisms of Google's practices"

"Either he is truly enamored with Google or he agreed not to say anything negative. It's almost a PR piece for Google. No organization is flawless, but he paints Google and its founders as angels. "

"As Steven Levy wrote on Quora, this book has approved by Larry Page, Sergey Brin and Eric Schmidt. "

"If Google were a person, this is probably what its autobiography would look like"

"Another one of the book's weaknesses is the lack of critical assessment and analysis of various products, projects, policy decisions, and inevitable failures. The author appears a bit too eager to present Google's version; any criticism remains of the mildest variety. One gets a sense that this book was thoroughly vetted by Google's PR department."

So in short Robert I already read Google's PR releases and read other people's blind praise of Google almost ever everyday. Why read it in a book format? I can see it live how Google thinks, acts, shapes our lives or whatever.

Hissingsid




msg:4374712
 11:19 pm on Oct 14, 2011 (gmt 0)

So in short Robert I already read Google's PR releases and read other people's blind praise of Google almost ever everyday. Why read it in a book format? I can see it live how Google thinks, acts, shapes our lives or whatever.


What I take from what Robert is saying is that there is little real information but understanding the background and history helps you to see the threads in what they are doing now.

When you do that much of the stuff that might otherwise be opaque starts to become clear.

walkman




msg:4374732
 12:50 am on Oct 15, 2011 (gmt 0)

What I take from what Robert is saying is that there is little real information but understanding the background and history helps you to see the threads in what they are doing now.

Interesting. Maybe if we learned Larry's master-plan (to the extent we normal people can) we'll be more understanding. I'm trying to see how driving small sites out of business and filling his SE with ads and Google junk is going to change the world, for non-Googlers and Google shareholders.
[wired.com...]
Maybe because ...
Even Googlers, no Luddites themselves, joke that Page “went to the future and came back to tell us about it.”

Next quarter he might add even more ads, he's that visionary ;) [webmasterworld.com...]

Sorry but I am not impressed by the press crap, after 13 years they have exactly one product that they keep milking. They are a one trick pony, ruthless (self-censored) (self-censored).

Hissingsid




msg:4374854
 10:01 am on Oct 15, 2011 (gmt 0)

Sorry but I am not impressed by the press crap


Don't get angry - get even!

I see all this noise like being in a very busy and noisy bar. Occasionally you hear snippets of conversations and tune in on what is being said.

There are things in the specific vocabulary of spokes people from Google that confirm what Google is working on, what direction it is going in and specifically what Panda is. Since they will not actually tell us that is the best we can hope for.

CMidd




msg:4375045
 4:35 am on Oct 16, 2011 (gmt 0)

I now have to advise anyone we work with that any income that relies on Google for either lead generation or direct income (AdSense) has to be treated as though it could evaporate at any time and for any reason without any real opportunity given to restore it in a timely fashion.


Powerful Statement. This should be part of a SEO/Adsens/Adword Bible.

diberry




msg:4375068
 5:48 am on Oct 16, 2011 (gmt 0)

Can Google tell that "Obama is the President of Kenya, the country where his father was born" is false as they analyze, sort and rank the gazillions and gazillions of pages? What if you use satire? Or debunk what others said? If a topic is questionable, who is Google to decide what's true?


Google has always done a bad job with satire and criticism. While humans can instantly categorize a site as pro this and anti that by, none of the algos in the past 6 years have ever caught up with one site I own that analyzes opposing viewpoints. The site has PR5 and sitelinks and other indicators that Google thinks highly of it, and yet they can't even figure out that the relevant keywords to an article analyzing pro and anti views on a topic would be "[topic]".

Strangely, other engines do better at this.

tristanperry




msg:4375106
 10:47 am on Oct 16, 2011 (gmt 0)

@CMidd: It is indeed a powerful statement. It's one that I've been realising is true more and more over the past few months... and - since being hit big time 2 days ago (and replaced by spammers, content scrapers and crap content) - now is something I'll live by. Even if my site recovers (doubt it), I'm changing my long-term business model. I'll never rely on Google again; it's simply not worth it.

tedster




msg:4375145
 4:06 pm on Oct 16, 2011 (gmt 0)

Google has always done a bad job with satire and criticism.

I'm very close to a multi-year sentiment analysis project - and I know that satire, irony and such are massively difficult to analyze accurately without human editorial input. Even those top-of-mind companies that advertise their sentiment analysis to major corporations get it wrong a lot of the time (like 30% or more.) Their technology prefers to report "no sentiment detected" too much of the time.

Matt Cutts mentioned last year that Google News uses sentiment analysis, but as far as I know, no web search engine does. If Bing seems to do better in this area, it may be because their indexing for most sites it a lot less deep so they miss alot of red herrings.

I'm going to go out on a limb and say that sentiment analysis and satire/irony detection is just not part of Google's organic ranking - although it may be in play for Reviews, where it's a more critical factor.

I also can't see how automated fact-versus-error assessment could be part of any algorithm today. It would take the equivalent of IBM's Watson, but it would need also to update dynamically. Still, if anyone is in a position to attempt this, it would be Google.

------

I don't see how this area could apply to the Hacker News site under discussion, either. The reasons for their exclusion from the index are much simpler than assessing factuality.

This 34 message thread spans 2 pages: 34 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved