|Update Brandy Part 3|
Continued From: [webmasterworld.com...]
"Any clue as to the possible role greater reliance on semantics is playing in your never ending quest for more relevant results?"
I'd say that's inevitable over time. The goal of a good search engine should be both to understand what a document is really about, and to understand (from a very short query) what a user really wants. And then match those things as well as possible. :) Better semantic understanding helps with both those prerequisites and makes the matching easier.
So a good example is stemming. Stemming is basically SEO-neutral, because spammers can create doorway pages with word variants almost as easily as they can to optimize for a single phrase (maybe it's a bit harder to fake realistic doorways now, come to think of it). But webmasters who never think about search engines don't bother to include word variants--they just write whatever natural text they would normally write. Stemming allows us to pull in more good documents that are near-matches. The example I like is [cert advisory]. We can give more weight to www.cert.org/advisories/ because the page has both "advisory" and "advisories" on the page, and "advisories" in the url. Standard stemming isn't necessarily a win for quality, so we took a while and found a way to do it better.
So yes, I think semantics and document/query understanding will be more important in the future. pavlin, I hope that partly answers the second of the two questions that you posted way up near the start of this thread. If not, please ask it again in case I didn't understand it correctly the first time. :)
I think if anyone has problems with these spammy sites, im sure GG would like to get a spam report and investigate these sites.
As i have noticed, I believe personally that the aim of these updates is to get rid of those crappy spam sites, while trying to keep the high quality sites remaining at the top of the serps
[edited by: penfold25 at 6:18 am (utc) on Feb. 16, 2004]
allanp73, you say, amongst other things, "Commercial sites for commercial terms makes sense. And hopefully the mistake of Florida and Austin is realize."
While I am not exacly sure what you mean, I assume that you mean to say that, since mid-November, non-commercial sites were appearing more than they "should" have in the Google serps for search terms that are used by commercial sites, and because commercial sites choose to use those terms that they should automatically be conferred with some sort of priority in the serps.
Comment: My observations about what had been happening with search terms in Google's money search terms SERPs runs completely the opposite to your observations. I posted about that issue several days ago, and I pointed out that information sites about money were "filtered", leaving just the commercial ones, for the most popular money search terms. In any case, isn't it a bit over the top to argue for a preference for the commercial use over non-commercial use of certain terms in serps that no-one pays for anyway. If you want that sort of preference you can go to other payperclick, etc engines, or use adwords.
I can't comment on the broader balance in the serps due to the Brandy changes, however I can do so in my area of activity: the money search terms serps in 64 appear to be much better balanced between commercial and info sites than before Brandy. However the spammers are still there, as I have advised GoogleGuy in a "BrandyUpdate" email. (Keeping fingers crossed on that one!)
|Kirby/Allan- I didn't take GG's comments to mean that all or any of the site's missing from the big city searches were spammy, but rather the the methods used to fight spam may have been more severe than those in less competitive (spammy) areas. If you have to throw out more bathwater, naturally more babies will go with it. I think that's why sending in the examples of poor results is so important. |
Trumble- If you have lost 70% of your Google referrals already, that may not be a problem since the "change" hasn't propagated to enough datacentres to have that sort of effect yet. Have you tried your queries here- http://188.8.131.52
GoogleGuy- Are we able to use the "What are some other things to look out for?" section from that page on our websites?
Perhaps the black=white crew might spend some time productively and compare "good" city results to "bad" city results. Do the good city topics have well-edited yahoo/dmoz directory sections, or very authoritative local directories? Do the bad city topics have weaker yahoo/dmoz directory sections for the topic? If you don't right this minute already know the answer to that question then you are just blasting away in the dark. Authority is to some degree a part of the changes we are seeing, so one thing that would make sense to do is to study the relative accuracy of the perceived authority sites in each niche.
|I noticed that on 64 that major city areas were showing pre-florida style rankings; however, smaller city or town areas were not. |
I've noticed this too. It's helpful to learn it's not just me. Let's hope they fix it soon. Nice to see what's on 64 as far as it goes.
|Do your results coincide with leading cities being hit the hardest? Put another way, are you fine with 80% of your cities, but wiped out in the majors? |
This is the way the pattern started for me, but I think allan may be right about the sequence in which they're fixing it.
I've felt for years, though, that Google has weighted CityName in inbound link text much more than it should, so CityName Widget searches have often been skewed.
On CityName Widget searches, high PR sites of entities that contain CityName as a part of their name and also contain the word "widget" somewhere on their home page will often rank ahead of solid widget sites in local areas.
It could be that that the multiplying factors/filters in the recent updates were also sensitive to CityName anchor text in a way that compounded the problem on big city searches for less than major sites.
It may (also) be that widget inbounds might tend to contain fewer CityName keywords than widget keywords.
I'm also seeing from (just a few) test searches in areas I monitor that Google 64 is particularly sensitive to word order, so searching for cityname widgets gives surprising different results than widgets cityname. Haven't compared this to Florida and Austin, though.
One of my longtime test searches to gauge where Google is at in CityName search is san francisco public relations. (Mods... I hope it's OK to cite this specific search. It's one we've talked about for years). Take a look at the results on 64, and compare them with AllTheWeb (both with and without the phrase quote rewriting), and you'll see how much better ATW handles this particular search than Google does. (At least the results are a lot prettier and more apparently what the search is looking for). At the same time, Google is notably better than ATW in some other searches... for the same reasons, I think, that it suffers on this one.
GOOGLE AS AN ENCYCLOPAEDIA?
GoogleGuy wrote ...
<If I want to buy a diamond for someone, I might go on the web and just search for a place to buy a diamond. But a typical user is also going to want to know about their purchase. Things like color, carats, clarity, and so on that people want to find out about. I probably would want to know about the different organizations that certify diamonds, along with some believable opinions about the organizations themselves and their value. >
<This is all just my personal take of course, but I'd recommend building the sort of resource site that people can use to read and research, the sort of site that people bookmark and return to.>
Does this mean that Google's vision of the future Internet is as a massive online encyclopaedia with little commercial content?
I mean if I just want to buy something and I already know about it I may not want all this information. For example if I want to buy a car I probably don't want to read about stuff like the inner workings of an internal combustion engine. If I did I would search for "internal combustion engine", (which I did as a test and got some great results.)
Conversely I also tried searching for "car" and 6 of the top ten results were car rental sites, which contradicts Google Guy's statement. I mean I did not even add the words hire or rental? This must be Latent Semantics in their broadest sense :-)
My own personal problem as (a one man consultancy) is that I have an authority, original content site that is just about 100% informational but it has been completely dropped for reasons unknown to me. All my communications to G about this have been ignored so far. I was effectively providing the kind of content that Google were seeking before all this started and they pulled the plug on me? A bit of work to be done yet methinks?
To conclude, I have no problem either way, shopping or encyclopaedia. I would be very happy for Google to go down the encyclopaedia route. Just make a statement, let us know clearly that this is your intention and that Google is no longer for people who are shopping and the SEO's and shoppers can move away somewhere else.
I think the key here is "Information".
I personally like FreshBot and put "News" and "Information" on my pages.....
I have 1 site, using 2 domains, this is because 1 domain is for international use and the language is english, the other domain is the same site in dutch. I don't know if this is considered as spam? The languages are different. And some products are only for dutch use only.
I also have another question. On the site mentioned before, i sell for example widgets. In this thread i read something that Google is becomming more and more like an encyclopedia. If you are searching for the artistA widgets you are looking for a commercial product. If the results are non commercial, these results are illegal, because you have to pay the buma stemra, if you offer those widgets.
I also saw that my competitor is using hidden words and so on and he uses 8-10 domain names all containing almost the same site. He interlinks between his sites (a linkfarm) He is for every search term almost on the #1 position. I don't think this is fair. I'm trying to make a site that is offering products and i don't use this techniques. My site is very clear and almost on the bottom of the page, but the spammy site is in the #1 position. I thought that Google recognizes this kind of spam, well it is very clear that he uses it, but he is still in #1 position. I reported it already as spam a while ago, but still nothing has changed. I know that G will receive a lot of spamreports. Are those spamreports handled manually or not? If so, maybe the report isn't looked for yet. If not, the bot still doesn't recognizes it.
|Does this mean that Google's vision of the future Internet is as a massive online encyclopaedia with little commercial content? |
I don't see why information and commercial content should be incompatible - just so long as users can find the one they're looking for. I still believe that most users start by looking for content and then go on to compare prices.
It's the old marketing adage: People don't buy products, they buy solutions. Say my problem is to keep my girlfriend happy on Valentine's day. One solution would be an "inexpensive" artificial diamond that looks as good as the more expensive ones and is guaranteed not to fall apart for three months ;-) So give me content to convince me she'll love it then a button to buy.
As far as an online encyclopaedia goes: If I was a diamond retailer I would be extremely happy for my advert to printed alongside the "Diamond" entry in the Encyclopaedia Britannica.
|I mean if I just want to buy something and I already know about it I may not want all this information. For example if I want to buy a car I probably don't want to read about stuff like the inner workings of an internal combustion engine. If I did I would search for "internal combustion engine", (which I did as a test and got some great results.) |
Yes but its all about what is relevent to your search.
Google will return your site so you can buy a car but its going to give you the best site that has the most information regarding the car or buying a car.
It does take sometime to get your head around the thinking of Google but at the end of the day Google is trying to provide you with search results that don't just contain the keywords you typed but pages that are relevent to the keywords entered.
property is relevent to apartments
Automobile is relevent to Cars
Sindy is relevent to Barbie
ooops sorry about the last one got carried away :)
Any way you catch my drift, Google also returns plural and none plural words. They know which words are the most popular and cross examin the keywords entered to other words that are also popular and related.
The sites that are high will have a few good related links outbound and inbound, plenty of unique content and above all no spammy techniques.
p.s still waiting for 64.xx.xx to come through here in Europe
Valeyard wrote ...
<As far as an online encyclopaedia goes: If I was a diamond retailer I would be extremely happy for my advert to printed alongside the "Diamond" entry in the Encyclopaedia Britannica. >
You may be extremely happy with this but do you really think it would do you much good? I would think that the Encyclopaedia Brittanica gets about as much traffic as Magellan.
But then again, Encyclopaedia Brittanica (I spelt that again without a dictionary) KNOWS that it is an Encyclopaedia and as such its job is not to sell stuff.
>Does this mean that Google's vision of the future Internet is as a massive online encyclopaedia with little commercial content?
Or maybe a search engine where the article on "what to look for when buying a widget" comes up before 30 identical pages that list nothing but widget product specs...
"This is all just my personal take of course, but I'd recommend building the sort of resource site that people can use to read and research, the sort of site that people bookmark and return to."
can the toolbar tell if your site is bookmarked? or if youve returned off a bookmark? Might give weight to Bretts ideas on toolbar usage and serps.
From the UK, google.com now showing 216.**** results.
google.ca now showing something like 64.**** results.
European Googles still showing austin.
|"There are no rules at general right now"|
I meant guidelines, not rules.
>I bet the two keyword searches are more relevent to buying or renting.
Want to bet? I happen to do a lot of stuff in the car hire area and can show you loads of destinations where the top 10 sites have nothing to do with renting or buying!
"Does this mean that Google's vision of the future Internet is as a massive online encyclopaedia with little commercial content? "
I don't really think so. That would be DMOZ, and not Google. As far as we've seen on Google Romania (who accidentally has some inversed links that sort of ruins Google's credibility in our country), they did not update the serps as previously done on each Friday. Nor did the PR change, as done on each Wednesday. And as far as we can see, Google only display results into the first three serps. If you quickly browse the first three serps, you'll see Google doing another search for you term. Results on Google are very easy to handle, when being backed-up by a traffic legion. And what's with that error we're getting? (e.g., link a web page to a web site that has PR 6 for example and is located on the same web server you're on. Sooner or later, you'll get its PR on your web page). Further more, how come Google gives carte blanche for a randomly-generated forum web pages? For instance, we freely receive a PR 1 on each randomly-generated web page, while webmasterworld gets 3. Is that some sort of "trust vote"?
You mean that you don't see commercial results in DMOZ?
|It does not explain what hapened to my site, that conteins the keyword in the url, in the site's name and on all of the backward links. |
Well Pavlin, maybe you have gone too far by trading links with the same anchortext every time?
In the area my business is in, the 64 results are very very good and also in every search I needed to doon 64 not regarding my business I found relevant information in the TOP 10 results.
So for me the 64 index is absolutely great!
I went out for a very nice meal last night, started late this morning and find that GoogleGuy has confirmed all the stuff we have been belly aching about for three months.
SteveB has it exactly right IMHO re "qulity signals". But what does that mean. Well if you boil it down to its pure essense it means that Google can understand what the page and to some extent site is about if you blank out the term searched for. Just try it on your pages. Print out the source and take a thick felt tip. Score out all of the HTML then score out all of the words that are in your top term. Do you still know what its about? How does it compare with the top three in SERPs for that term whan you do the same to their pages?
City searches are particularly difficult here because very often there are no synonyms or stems for the city name. You need to look for what the top sites have as triggers. Build those things into your page, have links to pages on those terms using that term in the anchor text. Google doesn't and can't assess quality subjectively although quality is a subjective measure. It therefore measures objective things that approximate to a subjective assessment.
Everyone here who is interested in this stuff should go and read the thread started by Marin about Latent Semantic Indexing. Read the paper that he cites and try and find the white paper on CIRCA. The penny will drop.
I'm certain that Steveb has not implemented this in a contrived way his ite is just so full of large pages of rich language around his subject. He has achieved high ranking by doing what comes natural to hime. For those of us who need to make a change to break old habits and give the Google algo what it is looking for there are ways to do so. Its metaphorically like following a diet, you just need to learn the basics of what to do and stick to it.
If you want to find what Google has in its Ontology (if you don't know what one is do this search define:ontology) then do a search like this ~widgets -widgets and not the words that are bold in the reults (if you have prefs set to 100 you can quickly scan the results. Then feed these words back in to create a map of associated words. Search for the term and look what the top three pages use in terms of associated terms and wher they use them. Now use this new vocabulary you have learned to broaden the language in your pages and in your site. Pretty soon we'll all be doing what Steveb does naturally.
PS The roast Partridge was excelent
I can't notice even some slight changes to the 64 SERPs, anyone? some flux? for the last few days?
Hissingsid, you are absolutely right. But this set of rules can help only in some industries/themes.
I guess you have read the LSI paper and it's clear, that this way of handling sites work only when the search engine has a set of semanticaly connected words. I still think it's a dangerous AI game.
Anyway now I'm going to make my sites "prefered" - get rid of some of the content and add some booble link pages. I have a directory page with no content - just links, that is performing great, so for me thats the way to go.
Thanks GoogleGuy for being so forthcoming. You know we watch your words as carefully as Alan Greenspan's ;)
I just have one question which nobody has ever adequately answered for me. It seems to be generally accepted practice among SEOs to have a links page and perform link swaps. These are necessary to rank well in competitive areas especially for commercial sites that don't receive many natural links, and where everybody who is anybody has a (user indifferent) links page. Probably on topic - but about as targeted as a double barrel shotgun.
Of course the whole concept is rediculous, and users would never think of reading most of the links pages on these sites. The Google Guidelines even prohibit artificial linking to deceive the search engine algorithm.
On the other hand. For many competitive areas, no links page means no good ranking, and so far Google has tolerated sites which have these pages. So what is your take on this activity?
There is another important question - What happens to the pages, that SHOULD NOT be handled by the new semantic algo. I think this is the key to MIA sites and this is the reason for all of the OOP rumours.
So if you have a page that is non-english but use english word as a keyword in the url an site's name or is so closely tied with a topic, with the LSI you are in trouble.
It seams the new algo is based realy on those dictionaries of close words and G is expecting that if your site is dedicated to topic "kw1" it have to say something about "kw2", "kw3", "kw4" and so on. If not - this is a SPAM, the algo asumes.
So if you have done good optimisation for kw1, but do not use the rest of the kw's, you end up "penalised".
The problem is G is using this algo everyhere even if it knows that the pages are non-english.
I guess thats the problem with my MIA page - it is non- english language, but uses an english word in it's title and as a main kw. But when G sees this kw (kw1) it expects to see the other kw's in it's dictionary. And when they do not show up, it thinks the site is spam. The truth is that the rest of the kw's are there (I did some "~kw1" testing and know what my synonyms are), but are writen in other language and even in other alphabet (cyr). (It will be pain for the users if I go and use all of the english words.)
I guess thats the same with the sites, that are so closely on some subject, that do not include the other words. And that's why there are so many portal sites on top - their directory listings contain links with desriptions that use almost every kw that G expects to see.
So the question I ased untill now is wrong. It's now what is happening to the sites that the alglo fails to understand. The answer is - they are getting handled the old way (pre-Austin).
! -> So it will be nice if G stops to aply the semantic algo on the sites it knows are non-english!
As for the english sites - Hissingsid is absolutely right - do some "~KW1" testing and try to use as much of the other kw's that come up.
Also - make those kw's links to some hight relevant sites.
It's not what people think is relevant any more. It's what the machine (in this case G) thinks is relevant. Obey and God help us all!
[edited by: pavlin at 1:55 pm (utc) on Feb. 16, 2004]
Sorry for interrupting the discussion, but there's still lots of spam in the SERPs. Is it ok to file a spam-report and mention "brandyupdate GoogleGuy mynick"? I just tried it and I'm curious if it works.
I'm seeing 64 results on google dot com this morning.
Does that mean that it's over? If it is, my site is toast!
still seeing 216 on www in the US (East Coast). Have not seen 64 on www yet...
I saw 64 on co.uk about 2hours ago for about 3 minutes(it was only on one or two datacentres and I had to refresh a few times)and then it was gone! Aol.co.uk is still showing 64 - results are good here. Thanks everyone for keeping us up to date with this.
Not seeing here in the Great Lakes area (Ohio) either. 216 and 64 are not even close so I figure it must not be done cooking yet. At least I hope since the results are so dramatically different.
sloney - are you sure aol.co.uk is showing 64 results? They look identical to 216 (and different to 64) for the keywords I'm checking.
Sorry, I spoke too soon, it's back to 216 results. Still cooking.
This is the first big update that I've been through. Does GG let us know when it's done?