Further Google 302 Redirect Problems

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Further Google 302 Redirect Problems

walkman

2:05 pm on May 9, 2005 (gmt 0)

(Continued from Google's 302 Redirect Problem [webmasterworld.com])

Google victim of redirect too ;):
Search for "Google" and [desktop.google.com...] shows first. If you click, [desktop.google.com...] redirects to Google.com

[edited by: ciml at 4:35 pm (utc) on May 9, 2005]

g1smd

7:05 pm on May 10, 2005 (gmt 0)

The site:dmoz.org search, which should report less than 2 million real pages, was reporting over 11 million pages for several months.

As of a few days ago it dropped to 9 million, and today is just less than 7 million. It looks like "something" is working its way through the index correcting the "true" counts; maybe.

zeus

7:52 pm on May 10, 2005 (gmt 0)

g1smd - The site count on google has been a mess for a long time, so dont bet on that they are correcting the fake pages in the index.

Shurik

10:12 pm on May 10, 2005 (gmt 0)

If people find what they are looking for too fast google is out of the game. And that is not a good thing for them.

Why would google want to fix 302 bug?
To them it is a blessing in disguise!

1) It forces publishers to use AdWords more aggressively.
2) It makes people to search longer and the longer they search the more AdSence/AdWords they encounter.

302 bug is a double winner for them! I'm sure they understand it all too well and dragging their feet as much as possible.
The only solution I can think of is a healthy competition from rival SE

steveb

10:16 pm on May 10, 2005 (gmt 0)

"then it would be logical to consider the possibility that the 302s were not the root cause of the problem in the first place."

That makes no sense at all. If a page ranks at #1 the day before it is hijacked, then gets hijacked, then gets unhijacked and appears at #794, it is ludicrous to think the drop to #794 is caused by sun spots or *anything* else besides the hijacking and/or "cure".

zeus

10:29 pm on May 10, 2005 (gmt 0)

steveb Im 100% on your site ther, it has ABSOLUTE nothing to do with other possible impacts on a sites, its the hijacker/google bug that ruined the site and now you got a dublicated site filter/ban, maybe it is time to dublicate the site on another domain and delete the old site.

walkman

10:37 pm on May 10, 2005 (gmt 0)

[desktop.google.com...] has ONLY 3 links according to Yahoo, yet it was enough to take over Google's 131,000,000 back links and PR 10. Now if you had 20 backlinks, you'd be on page 23, and slapped with a dupe penalty. Since you get no traffic, your rank will be reduced further because of Google's new algo (no bookamrks, less backlinks, less traffic etc. etc.). Vicious cycle.

search yahoo for: desktop.google.com [search.yahoo.com]

GoogleGuy, please fix this, we're hurting, badly. You guys can do it. Think of what other challenges have faced Google, this is probably nothing.

walkman

10:53 pm on May 10, 2005 (gmt 0)

"Could www.google.com have been down briefly when Googlebot tried to fetch it? If so, this would help explain why a page that redirects to it is crawled."

I doubt that's the case, but even if the site is down for 15-30 minutes, it shouldn't lose it's rankings. Many things can wrong, cables cut, server reboot, congestion, etc.

blend27

12:23 am on May 11, 2005 (gmt 0)

-- The only solution I can think of is a healthy competition from rival SE --

I think eventialy Yahoo and MSN and who-ever comes along then, will stop showing results that have AdWords on it, cause it will slowly eat-up the space reserved for them. Then G will suffer the most I think. Big hit is comming, I am sure of it. Scraper Sites are sourse of income for Google, Big one. Its not Advertising its poluting other SEs with their Brand Name and they stick to it.

I dont Believe that my webstore softwear which I wrote myself is not good enough to be in top 1000 on Google and so good-good be in top 10 on Yahoo 50 Keyword phrases and top 5 for 50 keyword phrases on MSN. There is a glich and its not on my side of the wire, sorry.

blend27

12:36 am on May 11, 2005 (gmt 0)

and i don't understand as well why a reputable company like Google would have their business name on the site like worldwideweb-x.tld. It gets a lot of webmasters who are aware of the problem turned off from using the search engine just for the fact that it is a known problem.

blend27

3:07 am on May 11, 2005 (gmt 0)

New System Implemenetd on April 17th

QUERY 1 for Banned_IP Table:

SELECT COUNT(DISTINCT banned_date) AS Expr1
FROM tbl_ban_ipaddr
WHERE (user_agent LIKE '%grub-client%')

OUTPUT: 1289

QUERY 2 for Banned_IP:
SELECT COUNT(DISTINCT banned_ip) AS Expr1
FROM tbl_ban_ipaddr
WHERE (user_agent LIKE '%grub-client%')

OUTPUT: 217

QUERY 3 for Banned_IP:
SELECT COUNT(DISTINCT banned_date) AS Expr1
FROM tbl_ban_ipaddr

OUTPUT: 2069

QUERY 4 from tbl_unique_visitor
SELECT COUNT(DISTINCT ip) AS Expr1
FROM tbl_unique_visitor

OUTPUT 11217

I don't know a lot about Grab-Client, but it is sad stats for me and IPs that got banned

plumsauce

4:33 am on May 11, 2005 (gmt 0)

And that's what puzzles me, because there's no apparent reason that these errors and malfunctions should be desirable.

With perfect, non-spammy results there would be less temptation to click on ads.

In one area that I follow, where there are less than 10 real vendors world wide, the most likely search term yields over 8 MILLION results. This has grown from less than 1 million results in a year. The same search term on Yahoo and MSN yields most of the vendors in the top 10.

This is a business search where the searcher will click on any link that he thinks will solve his problem. If all the results are spam, he's going to click the ads. The ads run for at least the first 5 pages of results.

Bottomline, spammy serps are profitable serps, and google is now a bottomline driven business.

europeforvisitors

4:52 am on May 11, 2005 (gmt 0)

With perfect, non-spammy results there would be less temptation to click on ads.

Sorry, but that logic fails to address the fact that search quality (or the lack thereof) ultimately has an effect on market share. Besides, do you seriously think that Google's search engineers, imported academics, etc. would intentionally corrupt the SERPs just because some newly minted MBA who got hired because there was a scoring mistake on his IQ test said "Wall Street wants more spam in the search results"? Those brainiacs can write their own tickets; they aren't unskilled minimum-wage workers at Wal-Mart.

Let's get real. The simplest explanation is usually the correct one, and in this case, the simplest explanation is that Google's engineers are still trying to clean up a mess without causing an even bigger mess.

walkman

4:54 am on May 11, 2005 (gmt 0)

I agree with europeforvisitors,
that would be very short-sighted, and would only happen in a Worldcom type situation when they need to make the numbers for that quarter, or else...

claus

6:55 am on May 11, 2005 (gmt 0)

>> trying to clean up a mess without causing an even bigger mess.

I've thought about that, but, and that's a big but: What would that bigger mess be, then?

- less than eight billion URLs listed on the Google front page?
- less listings with wrong URLs?
- less URL-only listings?
- less "penalized" sites in the index?
- less "spam" and "scraper sites"?
- a different distribution of PR?
- a need to update the index?
- having to have a separate database for URLs than for pages?
- some results showing splash pages in stead of the page behind them?

(the latter being, if they also solve the meta refresh problem)

I just can't imagine that "bigger mess". None of the things i can personally think of sounds even remotely like a problem to me. At most, it's "business as usual" or even "desirable".

The number on the front page being the only exception, but then they don't really have to change that. If anything they can just change "pages" to "URLs". In time, the number of real pages will reach that watermark as well.

ncgimaker

7:28 am on May 11, 2005 (gmt 0)

I just can't imagine that "bigger mess". None of the things i can personally think of sounds even remotely like a problem to me. At most, it's "business as usual" or even "desirable".

I searched yesterday:
"firstname lastname" keyword keyword
I know a blog on that subject exists with my name, yet Google couldn't find it. It's nice that the results have less spam on the spammed terms, but what use is it if I can't find things!? If I want spam free subset of the web then I'd go to Yahoo directory!

Put it this way:

Scenario A:
Google gives me 20 results, 15 spam, 4 match but are not what I'm looking for, 1 is the perfect site.

Scenario B:
Google gives me 5 results, all matches but not what I'm looking for.

You might think that B is better, afterall its spam free! Yet with result A, I have 1 in 20 results to look at, with result B I still have 1 in 8000000000 to look at.

Its more important to find stuff than to produce a spam free result set!

That's being charitable to Google, actually my blog did appear twice in the results. In 'news' scraper sites. I could read it in the extract, but there was no cache and when I clicked the link it was just a search engine doorway page.

I think there is some mistake in this spam filter, but we're reduced to guessing what it might be.

thnkfst

9:11 pm on May 11, 2005 (gmt 0)

I think there is some mistake in this spam filter, but we're reduced to guessing what it might be.

Regardless of what rankings might be, what it has taught me to do is to go to other search engines to perform my searches.

Why - Because I can't stand searching through irrevelant results!

If my website never shows up on a search, then how many other good websites is that happening to also?

Also...

Could any of these penalities (or lack of showing up in Google results) result from the use of Google API tool?

zeus

10:05 am on May 12, 2005 (gmt 0)

I see today on a site:mydomain.com no URl only listings, but alot of supplemental results and even less pages indexed, but the site count is still a mess, somtimes it says 200, but you can only see 40 including omitted results and sometimes it says 31 sites found, I give up its just a mess you can not count on any number.

g1smd

2:49 pm on May 12, 2005 (gmt 0)

The 11 million miscount for site:dmoz.org (less than 2 million real pages), is down from 11 million a month ago, and now stands at just under 6 million.

steveb

9:30 pm on May 12, 2005 (gmt 0)

This link
[google.com...]
showed 1.28 million results when posted a few days ago in the previous thread, with URL only listings showing. Google now displays a single result (title and decription too) from 303,000.

g1smd

9:46 pm on May 12, 2005 (gmt 0)

Even 1.28 million is a vastly inflated number when there are only 650 000 categories.

However, all but one page were already excluded by robots.txt and always have been.

(Your search shows "1 - 1 of about 850 000" at present but this search: [google.com...] returns nothing at all)

steveb

10:23 pm on May 12, 2005 (gmt 0)

Made a mistake, this one shows the 303k number
[google.com...]

walkman

12:10 am on May 13, 2005 (gmt 0)

"Made a mistake, this one shows the 303k number"

ass soon as you post them, GoogleGuy is removing them from Google :). This one shows nothing too

jd01

12:58 am on May 13, 2005 (gmt 0)

I think he's meaning the oddity of the right corner of the page vs. actual pages displayed.

I get 1 of 1 from about 302k results.

Justin

claus

11:17 am on May 13, 2005 (gmt 0)

Your search - site:report-abuse.dmoz.org - did not match any documents.

ncgimaker

12:26 pm on May 13, 2005 (gmt 0)

Just a thought,

The whole 302 thread assumes that the problem is the 302 redirect. What if it isn't? What if its the duplication and nothing else. Here's a variation on my Bayesian theory (see my Cliff top algo) applied when the results are served up rather than crawled:

What if G filters down to a result set of say 3000 items, then sorts them by relevance, then applies a Bayesian filter onto that result set to remove duplicates, keeping the highest ranking items in any duplication.

The change is to apply a mechanical filter to attempt to remove affiliates/duplicates etc.

There will be lots of times when the ranking is not perfect, where a copycat site or a directory outranks the site it copies, just because the algorithm is complicated and imperfect and based on many variables.
So instead of a copycat at say position 10 and the real site at 15, the site at 15 is filtered leaving only the copycat.

The more important a site, the more likely it is to have scrapers copying it so the more opportunities it will have to be filtered.
Only with enough PR could you reasonably expect to be above other sites and relatively safe from filtering.

Of course competing sites selling basically the same product would find themselves mostly filtered by that algo, so 50 sites selling "crunchy chocolate caramel cookies" would find 49 of them filtered, with sites further down being about "caramel bananas" only mentioning cookies in passing, or with a domain "crunchy-gravel-stone-chippings.com" that happens to be available in chocolate or caramel colours. If you see what I mean.

Also single purpose sites with lots of pages on red widgets, green widgets, flanged widgets with lots of the same products with small variations between the products would knock themselves out.

That might explain the shallow results, I just did a search on "red widgets" and noticed that every single entry in the top ten says "red widgets" in a different way, sometimes on page, sometimes at the end of the title, sometimes spread in the title, sometimes in the domain.

It might also explain why the serps are dominated by diverse sites, sites with lots of completely unrelated pages on different subjects. They don't knock themselves out, whereas the sites that focus on single subjects in depth do.

I was assuming the filtering was done on the crawl but filtering when the results are dished up might explain these results.

It doesn't explain everything, why for example does Google not index the text of some sites, only the titles? But it would explain quite a bit.

theBear

3:35 pm on May 13, 2005 (gmt 0)

ncgimaker,

Yes, the end state is a duplicate content problem.

One possible cause is the incorrect assignment of a 302 as a page within the target site under a different url. AKA shows up in a site:view (now where have we seen this, DMOZ, and other places).

If Google gets that one little item fixed a duplicate content cause no longer exists. This doesn't mean that the penalty (filter same net effect) would automaticlly go away. Hopefully at the very least the problem would age out (and this is what I'm seeing).

Left as an excercise for the interested reader, list the other possible outcomes.

This can be a very vicious circle.

Please note the use of weasel words. YMMV, IANAGE APYAE (I am not a Google expert and probably you aren't either.)

g1smd

5:35 pm on May 13, 2005 (gmt 0)

Ah, but site:www.dmoz.org and site:www.dmoz.com searches were showing a huge number of 302 redirect hijacks, as I reported about 4 to 6 weeks ago in an earlier thread.

Within just a few days, both of those searches suddenly reported zero matches (Google merely filtered the hijacks from the visible results), but site:dmoz.org and site:dmoz.com searches continued to report massively inflated numbers (compared to the real number of pages on the site).

The numbers are now gradually falling but are still wildly wrong.

theBear

6:34 pm on May 13, 2005 (gmt 0)

g1smd,

Yes I know exactly what was showing in the dmoz site view several weeks ago, as well as what was showing in the site views of our sites as well as many other WebmasterWorld members.

I also know that the 302 suddenly started disapearing from the site views of dmoz, DRUDGE, and several hundred sites (including some of those I work on) that I had seen them in previously.

I'm delighted to now hear that the page counts are now heading in the correct direction for a number of sites, it could be that Google is correcting its attribution of pages.

This will take time computers are fast but massive updates to multiple related distributed databases take time.

(I do believe that I raised this point about time to ciml in a skicky.)

Several of these folks actually got rid of the 302 pages that were affecting them without killing their site.

There has been positive motion in such cases.

Some of the sites had also gotten to the point where they were classified as spam before they realised what the cause was and took action to correct it. Then after haven taken action to correct the situatation they discovered it was going nowhere and had to request reinclusion.

In short if the duplicate content (from whatever cause) gets removed (and Google hasn't shut you off completely) then chances are your pages will rise like the mythical Pheonix to once again place in the serps.

I see the decreasing counts as a good sign, just like I do when 301's have their proper effect on split sites.

I know I felt a lot better when our pages once again were showing in the serps, and I'm pretty sure that the boss sleeps better.

So you can have a rest from my yammering, I'll go silent on this issue for at least a couple of weeks.

YMMV, IANAGEAPYAE

eyezshine

7:40 pm on May 13, 2005 (gmt 0)

I am seeing spikes in my traffic today! Is google updating? Could my hijacked site become un-hijacked soon! From my experience it's probably Yahoo updating and not google...

walkman

8:01 pm on May 13, 2005 (gmt 0)

from my observations google updates (other than everflux) on, or around the 22nd, and then around the 3rd.

"Is google updating? "

This 126 message thread spans 5 pages: 126