Forum Moderators: open

Message Too Old, No Replies

Google finally decides to relax it's filter?

         

jaffstar

11:31 am on Dec 11, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have closely been monitoring Google's filter and have started to see them relaxing the filter. Here is some proof.

Below is a history of how many sites have been filtered from the top 100.

Here is the term "online keyword".

December 06: 58 online keyword
December 07: 41 online keyword
December 08: 41 online keyword
December 09: 41 online keyword
December 10: 41 online keyword
December 11: 2 online keyword

If this holds out, many of the "dropped" sites should come right back.

Anyone else seeing this? Is life back to normal?

[edited by: vitaplease at 12:09 pm (utc) on Dec. 11, 2003]
[edit reason] made less specific [/edit]

claus

2:51 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nice to see that finally a discussion about the concept of "filters" have emerged :)

FWIW and AFAIK, here's my list of Google filters:

  1. A query filter - collecting from the whole 3 bio. set of pages only those that do meet the query
  2. A SERP limit (not really a filter) - showing only the 1,000 most relevant ones returned from (1)
  3. An adult content filter, or the "safe search" filter
  4. A duplicate filter, removing or hiding pages with duplicate content

I have often wished i could add a spam filter to this list. It seems to me, however, that Google prefers to deal with spam algorithmically in stead of filtering it. (Added: They might actually have done something here with Florida, i'm not sure what to think about it, as a lot of other things have happened as well, and there's not one entirely clear definition of spam)

As for these three filters, the only one i've seen relaxed is (1), as Google now uses "stemming" on the words entered in the query box in stead of matching the exact terms. Filter (3), i don't know about, as i tend to disable it when doing such searches, and filter (4) has been tightened, imho.

/claus

michael heraghty

3:07 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



If you dropped in rankings, go back and look at who you linked to and who’s linking to you. If any of these people are using spam techniques, they're the reason your site no longer appears on Google.

Marissa Mayer made this statement directly to Garrett French of WebProWorld yesterday at the Search Engine Strategies conference (search for "Marisa Mayer" within the WPW site to find the interview).

Doesn't this directly conflict with what GG was saying -- that competitors can't sabotage your site? Or was he choosing his words carefully. Perhaps competitors can't knock your site out of Google, but they *can* knock it out for particular keywords and phrases...?

Although I take GG at his word when he says that Google wouldn't allow competitors to act in ways it considers unfair -- it is nevertheless distressing to have Mayer make a seemingly conflicting public statement.

[edited by: michael_heraghty at 3:40 pm (utc) on Dec. 12, 2003]

espeed

3:15 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



Both of these research papers explain algorithms that are indicative of what's going on with florida -- more large sites and directories are showing up since they have more outgoing links to and from "important" sites.

Improved Algorithms for Topic Distillation in a Hyperlinked Environment [jamesthornton.com]

Hilltop: A Search Engine Based on Expert Documents [jamesthornton.com]

While I think this type of algorithm will create a type of boys club where it will be hard to get in once well-established, you should be able to help your chances by removing outgoing links to "junk" sites and increasing outgoing links to the sites coming up first in the SERPs. Then hope you can get enough incoming links from a few of these expert sites so that your site is deemed as an expert site too.

Hissingsid

3:21 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(1), as Google now uses "stemming" on the words entered in the query box in stead of matching the exact terms.

filter (4) has been tightened, imho.

(1)+(4)+(a bit of antispam)=Florida!

They just changed too many variables at one time and got unpredictable results.

I also think that they have tightened what they consider to be spam or perhaps this is directly linked to the duplicates filter and duplicates =~ spam.

Assuming that none of us are spammers the only thing that we have direct control over is (4). For anyone who still thinks there is an OOP penalty I would really dig deep on into duplication before trying to find an OOP penalty.

The duplicates issue is not just about www non www, exactly duplicate pages on the same IP range etc. I think that it is now about inadvertent duplication of text in the key areas of the page when compared with other linked pages. This is easily diagnosed and if you have it you can cure yourself of it.

Best wishes

Sid

ciml

3:22 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Steve, giving less weight to some things wouldn't be expected to have the dramatic changes we've been seeing. Certainly I'd agree that it's a new data batch, plus we have stemming and probably other things. The filter idea came from looking at the results when it was toggled on and off by adjusting the search string.

> semantics

It's more than semantics kackle. Whether this is a major part of the core ranking (eg. theming) or some post-processing filter has significant implications for understanding Google.

I think claus is sensible to compare it to SafeSearch, not to say they work the same or even nearly. If we think back to the geolocation/localisation threads of late last year we're probably nearer IMO.

Michael, you're not the only one to wonder about Google's stance on externally influencing a site down. Anyone remember the subtle language change on Google's webmaster pages? Something like the word impossible changing to very difficult.

Hissingsid

3:29 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



directories are showing up since they have more outgoing links to and from "important" sites

Or use tracking URLs that pass your link through a script and so there is no backlink for Google to count. <rant starts>These parasitic PageRank grabbers that have none of their own content should be stamped on.

I would like to see Googlebot being able to traverse these links or for sites that use these techniques to be flagged as spam sites. They give nothing just take and only achieve any ranking becuase they hang on to what they have got.</rant>

Sid

espeed

3:32 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



The -nonsense queries returned different results because it made the query more complex without changing the relevant search terms. From what I can tell, if Google has started incorporating algorithms like Hilltop or other topic distillation algos, they would only come into play with broad (simple) queries.

espeed

3:36 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



I would like to see Googlebot being able to traverse these links or for sites that use these techniques to be flagged as spam sites. They give nothing just take and only achieve any ranking becuase they hang on to what they have got.

Google can follow these links fine -- I use them, and the pages they point to have shown them as a back link (it's a like a redirect, which as you know Google can follow).

Hissingsid

4:10 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google can follow these links fine -- I use them, and the pages they point to have shown them as a back link (it's a like a redirect, which as you know Google can follow).

I'm realy talking about PR transfer. So if Googlebot goes down a link like this [url.com...] which page does it transfer PR from and how does it know which page to transfer it to?

If I have a link to a page which JS redirects to another page but it has a meta robots noindex nofollow does Googlebot follow and if it does, does it transfer PR from the first page or the redirect page?

Last time a looked at this it was about as certain as a commercial words filter ;-)

Confused

Sid

kaled

4:56 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Claus said
I have often wished i could add a spam filter to this list. It seems to me, however, that Google prefers to deal with spam algorithmically in stead of filtering it. (Added: They might actually have done something here with Florida, i'm not sure what to think about it, as a lot of other things have happened as well, and there's not one entirely clear definition of spam)

Careful what you wish for...

I'm afraid this is what you've now got. From the problems described here in WW, a dynamic spam filter is the most likely explanation - although people are calling it an over-optimisation filter (OOP).

As I have said before, eventually, Google will have no choice but to drop this ludicrous concept. A highly optimised page is not necessarily a spam page. If the content is relevant to the user that's just fine - IT SHOULD NOT BE FILTERED OUT.

There are pure spam sites out there and loads of duplicate (spam) domains, yet Google seems content to do nothing about them even when reported. Instead they think that they can serve good results with dynamic spam filters. Well, such results may look ok to users but it means that good sites can be banished from the SERPS for no good reason. Eventually, users will realise this (with a little help from bad publicity) and start using other search engines.

If SERPS are dominated by directories and users want real sites then they will switch to MSN/AV/OV or whatever.

Big corporations hate to admit to getting something wrong. Perhaps Google will gradually drop this or not. If not, their share value will be much poorer. If the bad publicity continues, no one will want to buy shares in Google.

Kaled.

layer8

6:52 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



Well I have been following Google's advice (even though I think it is a pointless excercise). I found 5 sites I linked to having 5 different domains and they were branded the same. I.e. not duplicate content but duplicate layout/graphics. The paranoid side of me is now removing good sites like these at the expense of my visitors because Google are not allowed to make direct statements about any new algo changes.

My conclusion to the 'Florida Update' is this:

Keyword steming has now flooded uk search results with a load of US sites, i.e. widgets now becomes widgetz (well it doesn't but in many words it does). American sites do not use the correct language or selling techniques to market correctly in the UK. Also by making the search query's algo scan more words per search term means instead of producing more 'relevant' results in the SERPS (as per intention of such as change) it has the opposite effect.

It creates a match on more high PR sites than before but for less 'specific' keyword matches.

It sounds too easy to be the problem you all say but I think it makes sense. I am very interested in the people effected and what keyterms were effected, i.e. could this be the overiding factor?

The new results have the following consequencies a) can't find a suitable site for my query, b) more overhead for webmasters - analysing logs, asking themselves why no-one likes their site anymore and no one is buying (answer to this is they are using the wrong language) and their visitors are looking for something else - site content they want to see, c) sites that should be in the rankings ask themselves - why is that site in the rankings and not my site cause it's better matched, d) searchers can't find what they are looking for so start using another search engine and e) Google's reputation is put into question yet again.

I think before we see a spaming filter implemented we now need an extented word steming filter to filter out the excessive part of the stemming filter recently implemented, this will counter the filtering of the geofilter and will make the stemming filter find more relevent sites without the adverse effect of filtering the geofilter.

That's how I'm starting to read the situation, however I could be wrong. Im interested to hear from webmasters effected by the Florida update, could this be the problem?

Maybe they realised this in mid update and had to put a block on a load of sites, thus the loosing of rankings for no good reason. Now when your site returns to the rankings it is in position 40 as they have toned down the stemming, however it is still present therefore your rankings is not restored. Could I be right?

People are now saying search engine optimisation strategies are now a waste of time, all it means is that more sites have been added to the rankings but the intention to get more relevant sites has backfired and has had the opposite effect.

Maybe they know this and are asking every webmaster to check their linking strategies. This could be just a ploy to buy time to put it right. In which case, they can sit back and see what % of people jumped onto adwords campaign.

It also means that if you do nothing your rankings could be restored. I spent £300 on adwords, did not get one email so for me it's a complete waste of time unless you have a BIG pocket full of cash. I am now going to wait until the New Year to see how the rankings change again.

I have also seen my site crawling back very slowly each day, in the last few days it has gone from page 6 to page 4. Pre Florida it was in position 2 on first page.

allanp73

7:05 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



I want to point to end the discussion by SteveB and others who still believe that there is no filter and the results are wonderful.
First the anchor text search and the double minus search (example blue widgets -ggg -hhh) never produced the same search results. They are not related. The allinchor still works to show different results in the serps.

Second the effect that the new algo is having does filter out sites. You easily see this when looking at the results the double minus produced (though this no longer works) compared to the current results. Sites are not just re-ranked they are completely removed from the serps.

Third the filter is only for certain money words. It was not applied to all searches. I have evidence to this fact.
Also, it is easy to see that when someone does a commercial search compared to a non-commercial search the type of results are very different. Non-commercial results show informational sites uneffected by Florida where as commercial results the informational sites are removed and only directories remain.

Finally, the filter is not filtering based on spam or over optimization. Too many sites are affected to believe this. I still see directories ranked high which have high keyword density and other page factors associated with over optimization.

The hilltop theory seems to be the best explaination for what is happening. This would explain why directories and link sites dominate the serps for commercial terms. However, ranking in this way has really reduced the quality of the serps. It is like being operated on by someone who knows a good doctor rather than being operated on by the doctor him/herself.

On a positive note. The new serps make it easy to find places to get directory links.

mikeD

7:10 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



I run a directory allanp73 and it's been hit by the -in results quite a bit. Seems to be just big players such as walmart, epinions etc who are ranking

Hissingsid

9:52 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Third the filter is only for certain money words. It was not applied to all searches. I have evidence to this fact.
Also, it is easy to see that when someone does a commercial search compared to a non-commercial search the type of results are very different. Non-commercial results show informational sites uneffected by Florida where as commercial results the informational sites are removed and only directories remain.

Hi Alan,

I share your understandable frustration.

What you might be describing here could be a series of coincidences. There is a strong corelation between describing pages based on what they are, simple but correct HTML and correct use of tags, for what they were designed for, in the non-commercial web. There is a strong corelation between putting what gets the best results in as many tags as possible in the commercial part of the Web. You could have one rule for all that would create an apparent dicotomy in the treatment of commercial and non-commercial pages.

I'm probably worse than most here in terms of trying to find the answer, give it a name and analyse it. In broad terms I think that they may have introduced something (at least one something) that forced them to introduce something else to balance it. If they are now using stemming and broad matching then they would also need to introduce something else to re-rank the sites found by broad matching. Whatever the something else was that is used to re-rank the sites, found by broad matching, was probably completely broken by too much spam so they had to introduce a spam filter of some kind.

The fact is that no matter what we call it, and even if we find the silver bullet, we can't change anything that Google decides to do. We can however do something about the part of the Web that each of us influences or controls.

I think that the key things that we can do now is to check and recheck that our pages do not have too many common features (use of keywords in key areas) between pages on the same site/IP range. Work tirelessly to build your sites'/pages' PageRank. And cut any links with sites that are clearly spam or have too much content (text in key areas of the page) duplicated between pages and/or sites.

I think that I'm seeing far more influence of PageRank in the final ranking on SERPs. In my niche there is a mix of crap directories and general pages ranked by Pagerank. It kind of works like this, find a bunch of sites that are vaguely related to what the user was searching for and then rank them by PageRank. Real relevance has gone.

I would be very interested in us discussing how broad match and stemming actually works and then look for a rational reason why this might have a greater effect on commercial search terms. I don't dispute that there is a correlation between commercial search terms and crap results but I think that this might be a side effect of whatever medicine they are trying to administer to us.

Best wishes

Sid

kaled

11:00 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HissingSid said
In broad terms I think that they may have introduced something (at least one something) that forced them to introduce something else to balance it. If they are now using stemming and broad matching then they would also need to introduce something else to re-rank the sites found by broad matching. Whatever the something else was that is used to re-rank the sites, found by broad matching, was probably completely broken by too much spam so they had to introduce a spam filter of some kind.

On November 26 2003 I wrote the following

I studied Cybernetics at university. One vital lesson was that non-linear components are bad. If Google are applying non-linear filters as people here believe then this marks a monumental backstep in design. If this is part of Google philosophy then results will not get better, they will get worse and worse as they try to fix one botch with another.

GoogleGuy, if you're listening, non-linear filters (if they exist) will be the death of Google.

Make of this what you will.

Kaled.

superscript

11:06 pm on Dec 12, 2003 (gmt 0)



Think, think - and then think again!

Nothing adds up - and if nothing adds up, then something is probably broken.

This 'broken' thesis is by far the simplest one; and deserves much more consideration than has previously been given.

steveb

11:29 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Steve, giving less weight to some things wouldn't be expected to have the dramatic changes we've been seeing."

Of course it does, and that is why the sort of filtering being talked about here is impossible to be responsible. The changes are drastic, and are thematically comprehensible. The weighting algorithm is different. There is no possible way any filtering could cause such drastic changes.

"First the anchor text search and the double minus search (example blue widgets -ggg -hhh) never produced the same search results. They are not related. The allinchor still works to show different results in the serps."

allanp, allinanchor was always different than the old algorithm. Similar but different. This is very basic stuff.

Stefan

11:34 pm on Dec 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google is not broken. The [www-in.google.com...] serps are excellent for research purposes. My understanding of the internet is that it was established to enable exactly that type of communication. Easy world-wide access to the Home_Shopping_Network was not one of the original goals.

Google might not be enabling commerce at the moment, but it is better than ever for tapping into the global academic database.

The commercial end of things might have problems, but they're going after the link spammers tooth and nail... so it goes. When the dust settles, it might be very good.

<edit>fixing the URL... I'm an idiot... :-)</edit>

frup

11:42 pm on Dec 12, 2003 (gmt 0)

10+ Year Member



The reason people think there is a filter is that sites drop from #1 to oblivion for certain searches. That can't be just weighting, if it were weighting, then the sites would dip lower but not disappear.

The filter is applied to search phrases, not to sites, in many cases. As I have posted, I have a site that disappeared for Keyword1 Keyword2 but is now ranking relatively well for Keyword1 (Keyword1 is the world's most popular keyword).

steveb

12:45 am on Dec 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"The reason people think there is a filter is that sites drop from #1 to oblivion for certain searches. That can't be just weighting, if it were weighting, then the sites would dip lower but not disappear."

It's like people are being intentionally blind. Stop fixating on one aspect of what is going on. If you do that, you end up drawing incomplete conclusions. All kinds of sites are affected, only some go to oblivion. Ignoring what occurs with the great majority and fixating on the minority makes no sense at all.

Google has always employed filters, and they always employed an algorithm for ranking. Florida is an algorithm, and needs to be considered as such. At the same time, sites are filtered as always for whatever reasons Google wants. Anchor text counts for less now algorithmically. Duplicate content is filtered. These are different things conceptually.

The results being displayed now are the result of a completely different algorithm, not because the old algorithm is being filtered.

allanp73

1:14 am on Dec 13, 2003 (gmt 0)

10+ Year Member



Okay Steveb,

What causes 90% of sites to disappear? What are all these people doing wrong? Google is perfect ;)

Stefan

1:22 am on Dec 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What causes 90% of sites to disappear?

Dodgy links and duplicate content?

kaled

1:38 am on Dec 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know we're not supposed to post searches but this is just too good an example of rubbish results.

I wanted to find a solution to a fault in Delphi when running under XP so I searched for delphi xp patch.
Just take a quick look at Google's results and compare with ATW.

It's fair to say neither of these searches yielded the answer (but running Delphi in Compatibility mode seems to be working - not sure yet) but Google's results look like distilled essence of black-hat rubbish whilst ATW results look like pretty much pure white-hat at a glance.

Kaled.

btolle

1:38 am on Dec 13, 2003 (gmt 0)



The whole thing is crazy.

Do a search on "real estate denver colorado" (without the quotes) and see what comes up on the first page.

#4 is phonebook.superpages.com/yellowpages/ C-Real+Estate/S-CO/T-Denver/PI-58156

#9 is dir.yahoo.com/Regional/U_S__States/Colorado/Metropolitan_Areas/ Denver_Metro/Real_Estate/

So now Google is sending traffic to Yahoo?

allanp73

1:48 am on Dec 13, 2003 (gmt 0)

10+ Year Member



Stefan,

So 90% of sites are spammers with duplicate content. I never knew this. Wow, thank you Google for removing all those spammy sites. Directories are so much better than those evil sites with actual content ;)

This 85 message thread spans 3 pages: 85