Forum Moderators: open
Below is a history of how many sites have been filtered from the top 100.
Here is the term "online keyword".
December 06: 58 online keyword
December 07: 41 online keyword
December 08: 41 online keyword
December 09: 41 online keyword
December 10: 41 online keyword
December 11: 2 online keyword
If this holds out, many of the "dropped" sites should come right back.
Anyone else seeing this? Is life back to normal?
[edited by: vitaplease at 12:09 pm (utc) on Dec. 11, 2003]
[edit reason] made less specific [/edit]
[webmasterworld.com...]
Trying to move forward is extremely difficult as no one knows what is going to trigger the filter.
I have put everything on ice :(
We all know about the commercial kw filter
If you mean we've all heard about it, I agree. But if you are saying there is now a general consensus that one is in place - this is still a matter of serious dispute.
(Despite the name of this discussion - which might be taken to imply that one exists!)
I think it is one of the key questions to be answered
1. is there a filter / filters in place? (or is it simply a new algo we can't fathom)
2. if there is a filter in place, what is the nature of the filter; is it an OOP filter, or a commercial one, or both?
It's not difficult to see that an OOP filter could easily be mistaken for a commercial filter due to commercial sites' understandable tendency to optimise.
Trying to move forward is extremely difficult as no one knows what is going to trigger the filter.
If results are poor, either the algo is wrong or the implementation is wrong, i.e. pesky, creepy-crawly little bugs are at work. If it's bugs, all the analysis by outsiders is a waste of time. Literally, it could be as simple as a bracket in the wrong place, a plus instead of a minus, anything, and it could take a very long time to find.
Kaled.
PS Call me an idiot, but what does OOP stand for in this context?
We all know about the commerical kw filter, but the 64 million dollar question, is why certain sites are filtered and others not. Lots of speculation about spam filters, yet some sites do not look like they spammed at all and are out of the race.
From Hilltop: A Search Engine based on Expert Documents [jamesthornton.com]...
"An alternative to PageRank is Topic Distillation [Kleinberg 97, Chakrabarti et al 98, Bharat et al 98, Chakrabarti et al 99]. Topic distillation first computes a query specific subgraph of the WWW. This is done by including pages on the query topic in the graph and ignoring pages not on the topic. Then the algorithm computes a score for every page in the subgraph based on hyperlink connectivity: every page is given an authority score. This score is computed by summing the weights of all incoming links to the page. For each such reference, its weight is computed by evaluating how good a source of links the referring page is. Unlike PageRank, Topic Distillation is only applicable to broad queries, since it requires the presence of a community of pages on the topic."
This is of course assuming that Florida was not just a fluke and that we will go back to monthly updates in the future.
ie. for a "search term"
in top 30 results
over 1/2 are just pages filled with useless links, either being a comparison shopping page, or some kind of search engine result.
3 results were pure spam.
there were 2 results in japanese! the term i searched had nothing to do with that language.
1 result was in brazilian language. why would i want results in other languages which i can't read?
go figure... searches not only returning irrelevant results, its also returning serps that's in other languages, so users have a even tougher time finding stuff.
[edit] - other dc's like -gv, -ex, -dc etc... are showing much better results. i also noticed that these other dc's have a fresh date of dec. 10 for many pages, where on -in and www2, www3, the same pages either have a fresh date of dec. 9 or no fresh date at all. any ideas?
I'd like to think though that -in's weaknesses are so readily apparent that another datacenter will show the next significant change.
Lots of fresh stuff picked up the past few days is only appearing on -in, and not the other datacenters, so that definitely has to mean something too.
From my serp perspective, I don't see a relaxation of a filter on -in... our .org is restored on one important kw combination to #1, from florida #2, over an about.com page that has 5 links to our site on it and had scooped #1 in florida... still there in all the other dc's. It's good to see those parasites back underneath of us; they have no real content, just kw heavy text linking to sites like us. We've also gone up on other serps that commercial sites target. Sorry man, -in is great for us. It likes info sites.
I have closely been monitoring Google's filter and have started to see them relaxing the filter. Here is some proof.
You mean "Here is some" findings don't you? Also, while many who have suffered via Florida take comfort in saying there is a "Filter" in place, it is by no means fact or been proven.
However, it is a fact that things have changed dramatically
Yes it is, by the "filter" is not.
I'm guessing you weren't hit
Yes and no. As I constantly add content pages some pages went up while others went down, overall we did well. Yes our site is a commercial one and yes we sell software, so the "money terms" theory doesn't wash with me either.
I_am_back, did you do much testing while the filter tests were working?
I don't waste my time. I would rather focus on adding content than running around in circles. There are 3.6 billion pages so any 'good' testing would take FAR too long and probably still prove little.
==================================
I tend to believe it is only temporarily and partially and hope I'm wrong. It seems to me that G never trash this type of algo as seen in Florida but keep on improving it. For ex. Dominic is the advance version of Sept 02 and Florida is the advance one of Dominic. If Florida is still not good enough, perhaps we'll see more index pages return but be aware of the next version that could be rolled out within the next couple of months or in the next 6-7 months. Just my thought.
For example, PageRank, Google's seminal algorithm/filter's rules give a higher score to the pages that have more incoming links. Links are viewed as votes and are determined by people. PageRank is one algorithm in the broader category of research called collaboritive filtering [jamesthornton.com] or value filtering.
Why is everyone so adamant about distinguishing that florida is or isn't a new algorithm or filter?
To me these are 2 seperate things. An algorithm is used to determine your page ranking position in the SERP's. A filter is used to omit certain sites from the SERP's based on a criteria. Just like when you use a negative keyword, your are filtering out that keyword.
Yes and no. As I constantly add content pages some pages went up while others went down, overall we did well. Yes our site is a commercial one and yes we sell software, so the "money terms" theory doesn't wash with me either.Yes, keep adding content and then cross link it. That is definitely the way to go.
I cannot see how a page (that meets a filter criteria) can be pushed *down*. It's either filtered out or it's not. Standard database filtering!
I would also like to know what specific methods were used....
I guess this is asking for someone to share their "bag of tricks", but if it really is purely based on best practices, then it shouldn't be a secret at all....right?
Anyway, I'm a newbie and have read tons and tons of tutorials online, but to me it seems what normally works is either having a huge marketing budget and being a well known company, or using scams and trickery. I haven't seen the little guy succeed.
If you can help me out, please sticky mail me.
Thanks!
Any algorithm/filter can increase or decrease a page's score based an an infinite number of factors, or change how much weight they give to certain aspects.
Can we change that to read
Any filter can indirectly increase the ranking of page that does not meet the filter criteria.
Just like when you search, e.g
"Widgets -blue" you are telling Google you do not want the word blue, not that you want it ranked lower. Without the negative keyword you may have 10,000 pages. With the negative keyword you will get 5,000 pages thus meaning some pages just increased in ranking as you have *filtered* out 5,000 pages.