Forum Moderators: Robert Charlton & goodroi
Google victim of redirect too ;):
Search for "Google" and [desktop.google.com...] shows first. If you click, [desktop.google.com...] redirects to Google.com
[edited by: ciml at 4:35 pm (utc) on May 9, 2005]
As of a few days ago it dropped to 9 million, and today is just less than 7 million. It looks like "something" is working its way through the index correcting the "true" counts; maybe.
1) It forces publishers to use AdWords more aggressively.
2) It makes people to search longer and the longer they search the more AdSence/AdWords they encounter.
302 bug is a double winner for them! I'm sure they understand it all too well and dragging their feet as much as possible.
The only solution I can think of is a healthy competition from rival SE
That makes no sense at all. If a page ranks at #1 the day before it is hijacked, then gets hijacked, then gets unhijacked and appears at #794, it is ludicrous to think the drop to #794 is caused by sun spots or *anything* else besides the hijacking and/or "cure".
search yahoo for: desktop.google.com [search.yahoo.com]
GoogleGuy, please fix this, we're hurting, badly. You guys can do it. Think of what other challenges have faced Google, this is probably nothing.
I doubt that's the case, but even if the site is down for 15-30 minutes, it shouldn't lose it's rankings. Many things can wrong, cables cut, server reboot, congestion, etc.
I think eventialy Yahoo and MSN and who-ever comes along then, will stop showing results that have AdWords on it, cause it will slowly eat-up the space reserved for them. Then G will suffer the most I think. Big hit is comming, I am sure of it. Scraper Sites are sourse of income for Google, Big one. Its not Advertising its poluting other SEs with their Brand Name and they stick to it.
I dont Believe that my webstore softwear which I wrote myself is not good enough to be in top 1000 on Google and so good-good be in top 10 on Yahoo 50 Keyword phrases and top 5 for 50 keyword phrases on MSN. There is a glich and its not on my side of the wire, sorry.
QUERY 1 for Banned_IP Table:
SELECT COUNT(DISTINCT banned_date) AS Expr1
FROM tbl_ban_ipaddr
WHERE (user_agent LIKE '%grub-client%')
OUTPUT: 1289
QUERY 2 for Banned_IP:
SELECT COUNT(DISTINCT banned_ip) AS Expr1
FROM tbl_ban_ipaddr
WHERE (user_agent LIKE '%grub-client%')
OUTPUT: 217
QUERY 3 for Banned_IP:
SELECT COUNT(DISTINCT banned_date) AS Expr1
FROM tbl_ban_ipaddr
OUTPUT: 2069
QUERY 4 from tbl_unique_visitor
SELECT COUNT(DISTINCT ip) AS Expr1
FROM tbl_unique_visitor
OUTPUT 11217
I don't know a lot about Grab-Client, but it is sad stats for me and IPs that got banned
And that's what puzzles me, because there's no apparent reason that these errors and malfunctions should be desirable.
With perfect, non-spammy results there would be less temptation to click on ads.
In one area that I follow, where there are less than 10 real vendors world wide, the most likely search term yields over 8 MILLION results. This has grown from less than 1 million results in a year. The same search term on Yahoo and MSN yields most of the vendors in the top 10.
This is a business search where the searcher will click on any link that he thinks will solve his problem. If all the results are spam, he's going to click the ads. The ads run for at least the first 5 pages of results.
Bottomline, spammy serps are profitable serps, and google is now a bottomline driven business.
With perfect, non-spammy results there would be less temptation to click on ads.
Sorry, but that logic fails to address the fact that search quality (or the lack thereof) ultimately has an effect on market share. Besides, do you seriously think that Google's search engineers, imported academics, etc. would intentionally corrupt the SERPs just because some newly minted MBA who got hired because there was a scoring mistake on his IQ test said "Wall Street wants more spam in the search results"? Those brainiacs can write their own tickets; they aren't unskilled minimum-wage workers at Wal-Mart.
Let's get real. The simplest explanation is usually the correct one, and in this case, the simplest explanation is that Google's engineers are still trying to clean up a mess without causing an even bigger mess.
I've thought about that, but, and that's a big but: What would that bigger mess be, then?
- less than eight billion URLs listed on the Google front page?
- less listings with wrong URLs?
- less URL-only listings?
- less "penalized" sites in the index?
- less "spam" and "scraper sites"?
- a different distribution of PR?
- a need to update the index?
- having to have a separate database for URLs than for pages?
- some results showing splash pages in stead of the page behind them?
(the latter being, if they also solve the meta refresh problem)
I just can't imagine that "bigger mess". None of the things i can personally think of sounds even remotely like a problem to me. At most, it's "business as usual" or even "desirable".
The number on the front page being the only exception, but then they don't really have to change that. If anything they can just change "pages" to "URLs". In time, the number of real pages will reach that watermark as well.
I just can't imagine that "bigger mess". None of the things i can personally think of sounds even remotely like a problem to me. At most, it's "business as usual" or even "desirable".
I searched yesterday:
"firstname lastname" keyword keyword
I know a blog on that subject exists with my name, yet Google couldn't find it. It's nice that the results have less spam on the spammed terms, but what use is it if I can't find things!? If I want spam free subset of the web then I'd go to Yahoo directory!
Put it this way:
Scenario A:
Google gives me 20 results, 15 spam, 4 match but are not what I'm looking for, 1 is the perfect site.
Scenario B:
Google gives me 5 results, all matches but not what I'm looking for.
You might think that B is better, afterall its spam free! Yet with result A, I have 1 in 20 results to look at, with result B I still have 1 in 8000000000 to look at.
Its more important to find stuff than to produce a spam free result set!
That's being charitable to Google, actually my blog did appear twice in the results. In 'news' scraper sites. I could read it in the extract, but there was no cache and when I clicked the link it was just a search engine doorway page.
I think there is some mistake in this spam filter, but we're reduced to guessing what it might be.
I think there is some mistake in this spam filter, but we're reduced to guessing what it might be.
Regardless of what rankings might be, what it has taught me to do is to go to other search engines to perform my searches.
Why - Because I can't stand searching through irrevelant results!
If my website never shows up on a search, then how many other good websites is that happening to also?
Also...
Could any of these penalities (or lack of showing up in Google results) result from the use of Google API tool?
However, all but one page were already excluded by robots.txt and always have been.
(Your search shows "1 - 1 of about 850 000" at present but this search: [google.com...] returns nothing at all)
ass soon as you post them, GoogleGuy is removing them from Google :). This one shows nothing too
The whole 302 thread assumes that the problem is the 302 redirect. What if it isn't? What if its the duplication and nothing else. Here's a variation on my Bayesian theory (see my Cliff top algo) applied when the results are served up rather than crawled:
What if G filters down to a result set of say 3000 items, then sorts them by relevance, then applies a Bayesian filter onto that result set to remove duplicates, keeping the highest ranking items in any duplication.
The change is to apply a mechanical filter to attempt to remove affiliates/duplicates etc.
There will be lots of times when the ranking is not perfect, where a copycat site or a directory outranks the site it copies, just because the algorithm is complicated and imperfect and based on many variables.
So instead of a copycat at say position 10 and the real site at 15, the site at 15 is filtered leaving only the copycat.
The more important a site, the more likely it is to have scrapers copying it so the more opportunities it will have to be filtered.
Only with enough PR could you reasonably expect to be above other sites and relatively safe from filtering.
Of course competing sites selling basically the same product would find themselves mostly filtered by that algo, so 50 sites selling "crunchy chocolate caramel cookies" would find 49 of them filtered, with sites further down being about "caramel bananas" only mentioning cookies in passing, or with a domain "crunchy-gravel-stone-chippings.com" that happens to be available in chocolate or caramel colours. If you see what I mean.
Also single purpose sites with lots of pages on red widgets, green widgets, flanged widgets with lots of the same products with small variations between the products would knock themselves out.
That might explain the shallow results, I just did a search on "red widgets" and noticed that every single entry in the top ten says "red widgets" in a different way, sometimes on page, sometimes at the end of the title, sometimes spread in the title, sometimes in the domain.
It might also explain why the serps are dominated by diverse sites, sites with lots of completely unrelated pages on different subjects. They don't knock themselves out, whereas the sites that focus on single subjects in depth do.
I was assuming the filtering was done on the crawl but filtering when the results are dished up might explain these results.
It doesn't explain everything, why for example does Google not index the text of some sites, only the titles? But it would explain quite a bit.
Yes, the end state is a duplicate content problem.
One possible cause is the incorrect assignment of a 302 as a page within the target site under a different url. AKA shows up in a site:view (now where have we seen this, DMOZ, and other places).
If Google gets that one little item fixed a duplicate content cause no longer exists. This doesn't mean that the penalty (filter same net effect) would automaticlly go away. Hopefully at the very least the problem would age out (and this is what I'm seeing).
Left as an excercise for the interested reader, list the other possible outcomes.
This can be a very vicious circle.
Please note the use of weasel words. YMMV, IANAGE APYAE (I am not a Google expert and probably you aren't either.)
Within just a few days, both of those searches suddenly reported zero matches (Google merely filtered the hijacks from the visible results), but site:dmoz.org and site:dmoz.com searches continued to report massively inflated numbers (compared to the real number of pages on the site).
The numbers are now gradually falling but are still wildly wrong.
Yes I know exactly what was showing in the dmoz site view several weeks ago, as well as what was showing in the site views of our sites as well as many other WebmasterWorld members.
I also know that the 302 suddenly started disapearing from the site views of dmoz, DRUDGE, and several hundred sites (including some of those I work on) that I had seen them in previously.
I'm delighted to now hear that the page counts are now heading in the correct direction for a number of sites, it could be that Google is correcting its attribution of pages.
This will take time computers are fast but massive updates to multiple related distributed databases take time.
(I do believe that I raised this point about time to ciml in a skicky.)
Several of these folks actually got rid of the 302 pages that were affecting them without killing their site.
There has been positive motion in such cases.
Some of the sites had also gotten to the point where they were classified as spam before they realised what the cause was and took action to correct it. Then after haven taken action to correct the situatation they discovered it was going nowhere and had to request reinclusion.
In short if the duplicate content (from whatever cause) gets removed (and Google hasn't shut you off completely) then chances are your pages will rise like the mythical Pheonix to once again place in the serps.
I see the decreasing counts as a good sign, just like I do when 301's have their proper effect on split sites.
I know I felt a lot better when our pages once again were showing in the serps, and I'm pretty sure that the boss sleeps better.
So you can have a rest from my yammering, I'll go silent on this issue for at least a couple of weeks.
YMMV, IANAGEAPYAE
"Is google updating? "