|
Strange Behavior of site: operator
|
reseller
#:3617014
| 11:24 am on April 2, 2008 (utc 0) |
I have noticed recently that site: operator showing much different results on different Google data centers. Allow me to illustrate the problem (possible bug?). Lets take a look at two data centers: http://64.233.161.104 and http://72.14.207.104 For example for query site:nytimes.com http://64.233.161.104 shows 29.700.000 results While http://72.14.207.104 shows 36.000.000 results I have tested several other sites and could see more or less the same behavior. What could be the reason for such strange behavior of site: operator? Thanks. [edited by: tedster at 3:41 pm (utc) on April 2, 2008] [edit reason] fixed link [/edit]
|
Quadrille
#:3617035
| 12:08 pm on April 2, 2008 (utc 0) |
site: operator, like most webmaster searches, is not and never has been very reliable. Use webmaster tools. Or Yahoo! ;)
|
BillyS
#:3617071
| 12:48 pm on April 2, 2008 (utc 0) |
I don't think it's a "bug" only proof that Google is constantly testing / tweaking their index and / or their estimating logic for the site: command.
|
reseller
#:3617255
| 3:38 pm on April 2, 2008 (utc 0) |
BillyS Of course the other possibility is that the "site:" operator is functioning well, but the two data centers I mentioned contains different volume of data. As such we might expect http://72.14.207.104 to contain around 20% more data than http://64.233.161.104 . Having said that, I'm aware of what Matt Cutts wrote once in 2006: In the middle of that session, I talked about the frustration that modern data center watchers will encounter these days (because there are often slightly different things at different places) and I mentioned a slide from Boston Pubcon...... Can you imagine trying to monitor that, especially when the same IP address can query different data centers for different people? It wouldn’t be my preferred hobby. |
| [edited by: reseller at 3:51 pm (utc) on April 2, 2008]
|
tedster
#:3617261
| 3:47 pm on April 2, 2008 (utc 0) |
I don't think 72.14.207.104 really contains more data. I did a search for "the", one of the most common English words: 64.233.161.104 - 12.27 billion 72.14.207.104 - 12.63 billion In other words, they're just about the same size. I think BillyS has a good idea when he mentions "tweaking their...estimating logic for the site: command." With the current "flux" in Google, many webmasters have commented that thoe estimates. which had improved, have recently become less accurate.
|
reseller
#:3617285
| 4:06 pm on April 2, 2008 (utc 0) |
tedster and BillyS But that leaves us with the thought; which of the two DCs the folks at the plex are doing the tweaking on? because I can't imagine they are tweaking all over the place. I say http://72.14.207.104 in that case. However, we had witnessed high site: results problem before. And I wish to recall another interesting 2006 post of Matt Cutts, were he mentioned the high site: results estimates - high site: results estimates. I believe that more accurate site: results estimates are live everywhere now. |
|
|
BillyS
#:3617638
| 10:51 pm on April 2, 2008 (utc 0) |
reseller - I do remember the problem back in 2006 because I believed it affected my site directly. Back then Google would show my site as having 10,000 pages where it really only had around 1,000. I felt, perhaps wrongly, that Google thought I was spamming their engine because I went from 980 pages to over 10,000 overnight. I believe that some of the dialog that took place here caused Google to rethink the accuracy of the site: command. I'm also from the camp that these small observations - especially on such a relatively obscure query (one used a lot by webmasters....) sometimes are fallout from larger changes behind the scenes. In other words, an unintentional change occuring from an intentional change. I also think this is why Matt is interested in these observations.
|
reseller
#:3617653
| 11:02 pm on April 2, 2008 (utc 0) |
BillyS I'm also from the camp that these small observations - especially on such a relatively obscure query (one used a lot by webmasters....) sometimes are fallout from larger changes behind the scenes. In other words, an unintentional change occuring from an intentional change. I also think this is why Matt is interested in these observations. |
| Agreed. Power to you! I'm beginning to think of a software update (infrastructure update) similar to BigDaddy might have been taking place during the last two weeks or so.
|
reseller
#:3617822
| 6:55 am on April 3, 2008 (utc 0) |
Just wish to mention that I have reported the current case of site: operator behavior to Google WebSpam Team as per Matt Cutts request.
|