homepage Welcome to WebmasterWorld Guest from 54.234.60.133
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4 5 6 7 [8] 9 > >     
Google's Florida Update - a fresh look
We've been around the houses - why not technical difficulties?
superscript




msg:212181
 10:20 pm on Dec 12, 2003 (gmt 0)

For the past four or five weeks, some of the greatest (and leastest) Internet minds (I include myself in the latter) have been trying to figure out what has been going on with Google.

We have collectively lurched between one conspiracy theory and another - got ourseleves in to a few disagreements - but essentially found ourselves nowhere!

Theories have involved Adwords (does anyone remember the 'dictionary' concept - now past history.)

And Froogle...

A commercial filter, an OOP filter, a problem caused by mistaken duplicate content, theories based on the contents of the Directory (which is a mess), doorway pages (my fault mainly!) etc. etc.

Leading to the absurd concept that you might be forced to de-optimise, in order to optimise.

Which is a form of optimisation in itself.

But early on, someone posted a reference to Occam and his razor.

Perhaps - and this might sound too simple! - Google is experiencing difficulties.

Consider this, if Google is experiencing technical difficulties regarding the sheer number of pages to be indexed, then the affected pages will be the ones with many SERPs to sort. And the pages with many SERPs to sort are likely to be commercial ones - because there is so much competition.

So the proposal is this:

There is no commercial filter, there is no Adwords filter -Google is experiencing technical difficulties in a new algo due to the sheer number of pages to be considered in certain areas. On page factors havbe suffered, and the result is Florida.

You are all welcome to shoot me down in flames - but at least it is a simple solution.


 

Kirby




msg:212391
 5:27 pm on Dec 18, 2003 (gmt 0)

>>If Google thinks the searcher is looking for something specific it will return results without broadmatching.

Definitely true with stemming. It only comes into play when Google thinks the search is specific. For example, for a search on "cat", because that can mean more than felines (#3 is caterpillar.com) it takes it literally. However, for "Manx cat" not only is cats highlighted, but also cattery. Google picks up that you are searching about a specific type of domestic feline, and allows stemming.

This still means that content is king. The broader the subject, the more important addition specific content will be.

>Perhaps Google has traded in its white hat. If they have, their share of the search market will fall signicantly over 2004.

Perhaps we are missing the obvious here. Google already knows that their market share will change in 2004 with Y! dropping Google at some point and the possibilty of M$N's new foray into search. Acknowledging this fact, it makes sense for Google to carve out new ground and move forward with new technology.

usavetele




msg:212392
 5:40 pm on Dec 18, 2003 (gmt 0)

Any news of a new update? It's been over a month now!

nileshkurhade




msg:212393
 6:14 pm on Dec 18, 2003 (gmt 0)

Google indexed my entire site today. May be a new update.

rfgdxm1




msg:212394
 6:23 pm on Dec 18, 2003 (gmt 0)

>Perhaps we are missing the obvious here. Google already knows that their market share will change in 2004 with Y! dropping Google at some point and the possibilty of M$N's new foray into search.

My sites do well pretty much with every search engine. From my logs Yahoo has a very healthy share of the market, and msn.com is quite significant. Unless MSN's new foray into search is at least as good as the current Ink SERPs, I wouldn't expect that to change. Barring something very unexpected, Google's market share WILL have a significant drop. Competition is heating up.

webdude




msg:212395
 2:31 pm on Dec 19, 2003 (gmt 0)

Thought everyone might be interested in this. It is a reply from answers.google. Someone paid $10 to get this reply from google...

[answers.google.com ]

[edit] make that $100[/edit]

[edited by: webdude at 3:10 pm (utc) on Dec. 19, 2003]

superscript




msg:212396
 2:53 pm on Dec 19, 2003 (gmt 0)

Edit: Link now working.

That's a hell of a lot of answer for $10 - you have to give them credit. Being such a sensitive topic, I assume it has been carefully sanctioned by Google, its well worth downloading and analysing carefully. Nice find!

[edited by: superscript at 3:07 pm (utc) on Dec. 19, 2003]

Kirby




msg:212397
 2:56 pm on Dec 19, 2003 (gmt 0)

Quite a bit of over-generalizing in that piece.

"... fake credibility (as in reciprocal linking schemes)is finally a thing of the past. At least until the
scammers figure out a new way to scam."

There are still several sites that crosslink every page that still rank well.

webdude




msg:212398
 3:07 pm on Dec 19, 2003 (gmt 0)

whoops. The price is $100. Sorry for the mistake...

superscript




msg:212399
 3:15 pm on Dec 19, 2003 (gmt 0)

Even for $100, it's still (on the face of it) a comprehensive and detailed piece of research.

[edited by: superscript at 3:45 pm (utc) on Dec. 19, 2003]

Bobby




msg:212400
 3:16 pm on Dec 19, 2003 (gmt 0)

Great work webdude!

There's lot of good stuff all rolled up in one in that piece.
Too bad there wasn't any direct reference to the new filters Google has implemented.

There are still several sites that crosslink every page that still rank well.

And there are still sites that have duplicate content.
In my sector there is a competitor who has 80 of the first 100 results for "mycity accommodation" and they are ALL the same content (but the backgrounds have been changed to protect the innocent).

There was one interesting point, the one about spam reporting. I can't believe they really have the resources to check out all of them but it does make you think...

DeValle




msg:212401
 3:38 pm on Dec 19, 2003 (gmt 0)

Ontology, semantics, yadda yadda yadda. Much of the speculation parallels how many angels will fit on the head of a pin? And that's already been done.

All I know is that, in my brutally competitive group of kw's it's the usual suspects who've risen to the top again-- a cosy cartel of people who link between the five or six of themselves and who won't even talk to anybody else about exchanging links. These are the same sites who came to the top after Googles reshuffle of a year ago. Is this cynical manipulation of link exchanging within the scope of Google's vision of How The Web Ought To Be? I think not; it's a form of spamming.

The top SERPs are about links. That little link exchange group is still flavored with sites that have nothing, or very marginally little-- to do with the search terms. One is the site of British transvestite who is apparently only visiting us here on this planet temporarily, and other unrelated sites--- that have tons of links. The SERP for the search term went like a laser to an internal page that, as a user experience for that search term, is abysmal and useless. Clearly, the page got high placement solely on the basis of how many links pointed to the site.

Other sites ranking above legitimate destinations (but who run afoul of Google's latest vision) are bizarre in there degree of unrelatedness.

It's apparent that Google wants to catch over-optimized pages, and I emphasize over-optimized, but at present Google is broke. The idea that some people have put forth in this forum-- that Google is penalizing sites that unfortunately have domain names that include tarteted keywords-- is I believe right on. Colossally dumb, but right on.

But wait! There's more! Google in their arrogance of worshipping at their altar of a pure Internet have destroyed many Christmases for Mom & Pop operations. These victims are people and families, and not just a bunch of statistics fleetingly considered by a little coterie of disinterested computer engineers. "A single death is a tragedy. A million deaths is only a statistic." But when you're an 800 pound gorilla...

darkroom




msg:212402
 3:39 pm on Dec 19, 2003 (gmt 0)

i am seeing quite a lot of changes in -in, www2 and www3 datacenters. Anyone else see that?

superscript




msg:212403
 3:42 pm on Dec 19, 2003 (gmt 0)

"A single death is a tragedy. A million deaths is only a statistic." But when you're an 800 pound gorilla...

It's gorillicide?

James_Dale




msg:212404
 4:05 pm on Dec 19, 2003 (gmt 0)

Interesting further quotes from Craig Nevill-Manning, Senior Research Scientist at Google:

Froogle's pre-identified a sub-set of the web which is about products. So if you know you're interested in buying a product - you go to Froogle! But, if you're kind of doing research, if you want a review or you want the manufacturers site - you go to Google.

We'll see how this experiment goes, we'll probably modify things in the future...

We determined, for certain kinds of queries, the search results had... Well, many irrelevant results. We've tried to weed those out by changing our algorithm.

And... We don't claim that we're perfect either! You know, we rolled out a change in the search results... We're going to tweak that and change it and get feedback from people and improve it.

This may be a larger change than many changes that have happened in the past. But the bottom line is the engineers that have worked on that, and I know them very well, so I know their motivation, and they're totally focused on making the top ten results as relevant as possible.

(quotes from Mike Grehan's newsletter.)

Hissingsid




msg:212405
 4:06 pm on Dec 19, 2003 (gmt 0)

how many angels will fit on the head of a pin?

Hi,

I did a search for exactly that "how many angels will fit on the head of a pin?" and this is the answer I got.

The quotes are specifically on Adsense but it does not take a massive leap to see how these things could easily be applied to Google Web search.

In October 2002 Applied Semantics Issued a Press Release on Adsense

Here’s an interesting quote from that Press release.
“CIRCA technology, a semantic engine that understands and extracts the key concepts on any web page.”

In May 2002 they issued a press release on some new taxonomies being introduced.
“The company says the four industry-standard taxonomies supplementing the existing Open Directory Project (ODP) and International Press Telecommunications Council (IPTC)”

I found an interesting web page that put two and two together here is a quote from it.

One taxonomy that could be used for the AdSense product is the ODP taxonomy, which is a hierarchy of more than a quarter of a million categories and sub-categories. This taxonomy forms the basis of DMOZ.

By extracting meaning and relationship from pages by means of an ontology and then matching this information into a taxonomy Google AdSense can serve relevant ads for your site even if the ads on a strict (and primitive) keyword basis doesn't match anything on your site.

If you turn that around then using the same approach you can serve the most relevant SERPs for a given search term. Add in an analysis of pages that you link to and that link to you and perhaps we have our answer.

Co-incidence - the Google directory tab results are exactly the same as the web search results.

Co-incidence – results are now returned much quicker. Could this indicate a predetermined categorization?

It would not take a massive brain to decide to add one last category to a Google taxonomy tree and that is spam. So what is spam. That’s up to Google to decide. Applied Semantics had/has an Autocategorizer product that allowed the user to specify hierarchies.

Most of the press releases and papers that you find on Applied Semantics Web site by a Google search for ontology semantics taxonomy and similar searches have been pulled from the site but are available in the Google Cache.

Best wishes

Sid

Bobby




msg:212406
 4:09 pm on Dec 19, 2003 (gmt 0)

It's apparent that Google wants to catch over-optimized pages

...and over in the far corner weighing in at 800 pounds and rattling its chains ladies and gentlemen...

Right On DeValle! Right On!

claus




msg:212407
 4:30 pm on Dec 19, 2003 (gmt 0)

merlin30:
The assumption that a search engine must make about any page it finds is that the page contains mostly nonsense and is of little value - until *reliable* evidence suggests otherwise.
For some odd subsets of pages this seems like an okay asumption, but across the whole 4 billion page set, i'd say it was the reverse: The page in question generally has value, you just have to figure out for what purposes that page has said value.

>> Google doesn't yet know that a Keyboard Gift is a type of Gift.
>> So it doesn't highlight Gifts and Gifts. Why doesn't it know?

Now, that's an interesting one, and it does shed some light on what's really happening. The word "gift" is not just a gift, you see. Here are five queries with the last two in Danish, none are so specific that they can harm or benefit any members, so i think they're okay to post:

1) christmas gift
- identifies topic of "gifts", stemming or broad match occurs.

2) birthday gift
- identifies topic of "gifts", stemming or broad match occurs.

3) keyboard gift
- does not identify topic of "gifts", only singular "gift" is matched/highlighted.

4) blev gift (Danish for "were married" as in "they were married")
- does not identify topic of "gifts", only singular "gift" is matched/highlighted.

5) rotte gift (Danish for "rat poison")
- does not identify topic of "gifts", only singular "gift" is matched/highlighted.

So, clearly there is some kind of ruleset that decides that if "gift" is used nearby "christmas" or "birthday", then it's a search on the topic of gifts, and both the singular and plural versions are matched. If "keyboard" was a common occasion for gift-giving the stemming would occur here too, but it isn't so it doesn't.

As a lot of words have more than one meaning (except for nonsense-words) it does not make sense to focus exclusively on one particular sense of the word, unless you are confident that this sense of the word is the intended one. For "gift" it seems that "christmas" or "birthday" are two such helper words that makes the sense (or topic) of the word "gift" apparent - if none such helper words are found, the query is ambiguous.

/claus


Added: yup, "married" and "poison" really are the same word in Danish, don't say Danes doesn't have a sense of humor (humour, even)
Edit: replaced "most words have more than one meaning" with "a lot of words have more than one meaning" as i don't even know all words, much less their meaning.

[edited by: claus at 5:58 pm (utc) on Dec. 19, 2003]

Bobby




msg:212408
 4:44 pm on Dec 19, 2003 (gmt 0)

Precisely Claus, and helper words help Google (possibly through applied semantics) determine whether or not to apply the OOF (over optimization filter) to a search phrase.

Hissingsid




msg:212409
 5:47 pm on Dec 19, 2003 (gmt 0)

Precisely Claus, and helper words help Google (possibly through applied semantics) determine whether or not to apply the OOF (over optimization filter) to a search phrase.

What if your pages have been pre categorised for certain search terms. For Christmas Gift your page might be assessed as a spam (OO) site but for keyboard gift it isn't. If I was a Google exec that would make me feel good "so you say that even spam sites are treated fairly! Great lets roll it our!" I can here him say.

We know that they have a list of Adsense/Adwords terms and in $ and basic reality terms they are probably the ones that are categorised and probably the ones that are subject to OO and spam.

Pre categorising pages off line makes a lot of sense as it makes earching really quick, reduces processing overhead etc.

Best wishes

Sid

merlin30




msg:212410
 6:20 pm on Dec 19, 2003 (gmt 0)

Claus,

I didn't infer that most of the indexed pages are valueless - but it only turns out AFTER ANALYSIS that the pages have value. My point is that the analysis should produce evidence that the page has value - and that evidence has to be more than self certification

Bobby




msg:212411
 6:34 pm on Dec 19, 2003 (gmt 0)

What if your pages have been pre categorised for certain search terms

My take is that the filter is applied to searches in a dictionary based on whatever Google's engineers thought would work best.

For example, let's say that the words "christmas gift" are in that dictionary. A search including that phrase or "term" if you like would be subject to the filter.

This does not mean that your site will not show up for that particular search phrase, just that it will be scrutinized.

Now MY site may use that phrase equally to another site but I may get filtered out. Why? Ah...that is the $64,000 question.

Perhaps page rank plays a role, and backlinks which have that phrase give you "bonus points" so you don't get filtered, whereas another site with the same PR but NOT HAVING backlinks with that phrase would be filtered.

You can build all sorts of algorithms to determine relevance to a particular phrase or set of keywords, but in the end there is a threshold point at which you either make the team or you don't.

Hissingsid




msg:212412
 6:50 pm on Dec 19, 2003 (gmt 0)

You can build all sorts of algorithms to determine relevance to a particular phrase or set of keywords, but in the end there is a threshold point at which you either make the team or you don't.

Hi Bobby,

I guess it could work like this:

Pages given a score for that search term from 1 to 10 with 10 being the highest score. If your page would score over 10 it must be spam, OO or whatever and so is flagged to be given a sliding scale "penalty" in the final ranking process. This is balanced against other ranking factors so if your page is a bit over optimised or borderline spammy but has high PR, many backlinks etc and the gist of the page is that it is on the subject being searched then it might still appear somewhere in SERPs but not rank as high as previously.

If its negatives are not out weighed by its positives when compared to the 1000 returned in SERPs then it doesn't appear anywhere in the results.

This categorisation could be just one pass of a new algo that uses technology bought with Applied Semantics at a number of different levels.

Best wishes

Sid

DeValle




msg:212413
 7:08 pm on Dec 19, 2003 (gmt 0)

Hissingsid,
"how many angels will fit on the head of a pin?"

Now that I've vented and am become lucid, I moused through some prior posts-- those starting around p11--12. And I secured a copy of the White Paper. The Applied Semantics theory fits all the SERP facts that I've observed.

I've seen that all the sites on page one for my brutally competitive search term are very light on graphics (almost nonexistent), have reams of copy (much of it unnervingly, to my prior-Florida mindset, off center to the search terms so as to fit the semantics/ontology criteria), and tons of incoming links.

Even though my site actually has more content-- and much of it original content and not just fluff artificially constructed just to carry a bucketful of keywords-- apparently Google zapped me as spam because of being over-SEO'd for two particular keyword phrases. Guilty.

The semantics theory also explains how several sites that are unrelated (under pre-Florida criteria) to my search terms rose to the top.

Still, as one earlier post said, Google have to realize that people often search the Web for products and don't want returns for arcane academic papers to appear at the top in resulting SERPs.

BTW here's an interesting take from Fortune magazine proving that Google need adult supervision:
[fortune.com...]

superscript




msg:212414
 7:16 pm on Dec 19, 2003 (gmt 0)

how many angels will fit on the head of a pin?

O.K. - an analogy - but a poor one, and not very helpful. This thread, or the algo, have little to do with medieval religious nonsense.

So let's get back on track, and try to work this out. Current opinion based upon what is observed please.

caveman




msg:212415
 7:38 pm on Dec 19, 2003 (gmt 0)

G's SERPS: Commercial vs. Informational Spectrum:

Commercial Only------------------------------------------------Information Only

Pre-Florida:
-------------------------X----------------------------------------------------

Post-Florida:
------------------------------------------------------------------------X-----

New WWW2:
----------------------------------------------------------------------X-------

Should Be:
---------------------------------------------X--------------------------------

But what do I know anyway... :-)

marin




msg:212416
 8:15 pm on Dec 19, 2003 (gmt 0)

"Due to the similarities between spam and non-spam our
original semantic analyzers are not an effective method to
classify spam content. Since spam and non-spam
documents are so similar"

citation from :Stanford report [webmasterworld.com]

IMHO this Quek report [cs.cmu.edu]
better explain Florida update.

"A first step to this datamining operation
is to be able to classify web pages according to some predetermined ontology "

- this could explain the commercial filters

dotme




msg:212417
 8:42 pm on Dec 19, 2003 (gmt 0)

A sort of on-topic, off topic post... A user on Google's Newsgroup posted this link today as a tongue-in-cheek explanation of Florida. I have no idea how old the page is, but it does make you smile - so I thought others here might get a grin from it too.

[google.com...]

John_Caius




msg:212418
 8:55 pm on Dec 19, 2003 (gmt 0)

April Fools 2002, oft quoted...

Sunset_Jim




msg:212419
 9:21 pm on Dec 19, 2003 (gmt 0)

Does anyone know if the Applied Semantics white paper
"CIRCA Technology: Applying Meaning to Information Management," is still available anywhere on the Web? It appears to have been removed from Applied Semantics Web site and the Google cache.

Bobby




msg:212420
 12:39 am on Dec 20, 2003 (gmt 0)

It's still in cache Jim,

[216.239.41.104...]

Save it locally before it disappears for good - or for Google's good...

Essex_boy




msg:212421
 1:28 pm on Dec 20, 2003 (gmt 0)

So to sum up the threads so far, Google is now judging pages not by the number of keywords appearing on it but by natural (human) speech patterns.

Is that right? Should be interesting if it works pages should have a wider range of subjects within them, by that they will not just feature green widgets or blue ones but the process by which the raw materials are extracted, resulting in a green widget.

Hmmmmm interesting indeed.

One thing I have noticed is the low combination of keywords to filler text on top ranking sites somewhere between 5% - 10%, which tends to suggest that I am correct about speech patterns.

being the Brave English man that I am I have just edited a page to prove my point so if it works Ill brag about on WW forever more or... Youll all hear my screams.

This 260 message thread spans 9 pages: < < 260 ( 1 2 3 4 5 6 7 [8] 9 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved