Forum Moderators: open

Message Too Old, No Replies

What's the slowest search possible on Google?

"the a and of to in an" takes 6.20 seconds uncached

         

phaze

10:45 pm on Nov 11, 2004 (gmt 0)

10+ Year Member



Stopwords (commonly occuring words) are both the delight and bain of any SE designer's existence. They're good because they make your index smaller. They're bad because users want to do phrase searches and expect the stopwords to be included.

So I'm curious: what in your experience is the slowest google search you've seen. When searching for a phrase of stopwords (above) I've managed to get consistently slow responses from google for the first uncached search. If you hit reload, it's fast - and I'm sure the above phrase will be fast once a few of us hit the SE. But if you wait a while and the results expire from cache, you'll notice it's slow as heck.

My theory is that stopwords are kept in a seperate index so they don't clog the main index, and are only used for phrase searches. What's interesting is Yahoo's search simply drops stopwords from your phrase and consider's them a wildcard: "ban a bomb" returns lots of "ban the bomb" sites.

[For the record, I don't have any affiliation with the two sites that come up for the above search]

DerekH

11:44 pm on Nov 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What a fascinating thread!

Without the quotes, I got
======================
The following words are very common and were not included in your search: the a of to in an. [details]
The "AND" operator is unnecessary -- we include all search terms by default. [details]

 Web 

Did you mean: the a and off to in an  

News results for the a and of to in an - View today's top stories

The successors - Daily Times - 8 minutes ago
Jets Defensive Coordinator Faces Old Team - Kansas City Star (subscription) - 9 minutes ago

No standard web pages containing all your search terms were found.
=====================

In quotes....
Tip: Try removing quotes from your search to get more results.

Your search - " the a of to in an" - did not match any documents.
=====================
No times on either...
DerekH

phaze

12:13 am on Nov 12, 2004 (gmt 0)

10+ Year Member



[google.com...]

Looks like you left the 'and' out in your phrase search.

graywolf

6:47 am on Nov 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



[For the record, I don't have any affiliation with the two sites that come up for the above search]

Are you a competitor? Look around you might learn something ...

phaze

7:32 am on Nov 12, 2004 (gmt 0)

10+ Year Member



hmmm... the egos around here are definitely nauseating at times.

moltar

8:00 am on Nov 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I got 0.44 for the first time and 0.14 for the second. Maybe they cache results server side?..

phaze

9:11 am on Nov 12, 2004 (gmt 0)

10+ Year Member



They do. See: [decweb.ethz.ch...] for details. In particular the 'proxy cache' they mention as a future development, and the cached query response times next to 'Conclusions'. FYI, this doc is years old now, but it's still very much an abbreviated bible for SE designers. It should probably be required reading for anyone who wants to get into google's guts.

ThomasB

5:26 pm on Nov 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



phaze, very good link. I will most probably read it on the flight to Vegas.

If you want to stop the caching you could also try to add nonsense exclusion words like that "blue widgets -jffjakd" should work I guess.

Matt Probert

5:32 pm on Nov 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




My theory is that stopwords are kept in a seperate index so they don't clog the main index, and are only used for phrase searches. What's interesting is Yahoo's search simply drops stopwords from your phrase and consider's them a wildcard: "ban a bomb" returns lots of "ban the bomb" sites.

We use a search system for a complex site. We index non-stop words providing the name of pages which include that word, and then search that page for the actual phrase.

So, if you were searching for "ban the bomb" we would first locate pages containg "ban", and pages containing "bomb" and search them for "ban the bomb".

It's not the greatest solution, but it works, and we are not indexing 8 million pages <g>

Matt

phaze

8:23 pm on Nov 12, 2004 (gmt 0)

10+ Year Member



ThomasB - ah, vegas. *sigh* - maybe next year. The way things are looking now, I'm going to be stuck in this little room coding till after the skiing season.

moltar

8:57 pm on Nov 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Heh, I looked for
"the a and of to in an" -"aaa" -"asdadsasd"
, and the result was 6.66 seconds ;)

Andre

9:05 pm on Nov 12, 2004 (gmt 0)

10+ Year Member



I have seen some Google Groups searches take quite a long time - just take a look at [img52.exs.cx...] (17.62 seconds). Has anyone seen Google taking even more time to prepare the search results - web, groups or other?

phaze

9:34 pm on Nov 12, 2004 (gmt 0)

10+ Year Member



I got 6.89 seconds on "the a and of to in an" -"who" -"sdfsd"
Just can't seem to break that 7 second barrier. ;)

I can see it already. Some rapper calls himself "the a and of to in an" and Google suffers a denial of service attack from crazed teens looking to download mp3s.

Rick_M

6:01 pm on Nov 14, 2004 (gmt 0)

10+ Year Member



"the a and of to"

Slowest I found at least - 8.45 seconds. And this thread is the first result. And who said Google was stale?

Powdork

6:43 pm on Nov 14, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And who said Google was stale?
I did. I own www.theaofandto.com which I launched in June. I have over 500 backlinks with theaofandto as anchor text pointing at my site. In addition my 'to' subdirectory has hundreds of links pointing at it as well. The same goes for the 'a', 'and', 'of', and 'the' subdirectories. Though Google has thousands of pages about the subject indexed from my domain, I still can't be found on the first twenty pages of results.

Now returning to the original subject matter. I haven't been able to get anything over .44 seconds using any combination of stopwords. Not sure why. Surely I hit upon something that wasn't cached.