Forum Moderators: open
I have been recently shopping for an MP3 player. I would like to see more pages listed (not less) when I search for specific items. If I put in a particular brand and model, the search engine should show me information pages about that brand and model and place where I can buy it.
I really don't get how you are thinking. Pages from a site with lots of good product pages with information about each product and a way to buy them are very valid search results.
There is absolutely no reason for google to cache every page of every site when that could be thousands of pages. You mention you have thousands of products. What if google crawled every page of amazon.com and buy.com - do you really think you would have a chance to compete?
If you cannot get google to know what your site is about in a couple thousand pages - the extra 90K pages are not going to help.
If I had a site of public records, do you think it is Googles duty to spider every record?
Sure it's all unique content but not really ;)
But, public records are valuable information. If I am researching obscure stuff, I hope to find it there. I believe Google intends to be be comprehensive. Isn't that why they call it Google?
You can't really be arguing that no one source can have more than "a couple thousand" pages of useful information.
I am not saying at all that the site does not have usefull information. What I am saying if Google cached every page of the public records that are on file for local governments, how could a site that has the same compete? Do a search for "Windows Information" on Google. Now what if Google cached every page of Microsoft. Who do you think would have the first 1000 pages of SERPS?
Would you really like to compete with this on a grand scale? Just because somebody dumps a database into a website does not mean that Google should crawl every page.
Each of my records is about a specific person, group, or corporation. Many, many searchers put the name of a specific person, group, or corporation into the Google search box. Google might even spit out a white pages listing when it responds, if it detects that it's a candidate for a white pages scan. If you put in a ten-digit U.S. telephone number, it will do a reverse lookup in the U.S. white pages. They aren't using the best collection of white pages data, but it's still useful.
Have you ever heard the term, "I googled him and found out..."? I think it was in a New York Times article about two people on a first date, and it turned out that each had already "googled" the other. It was a lightweight piece, and Google loved the publicity.
Name searches are powerful on Google if you know how to search, particularly when the name is not so common. I think this is one of the more significant contributions search engines have made to our society. For investigative journalists, Google is the first port-of-call. Over 95 percent of my Google referrals are zeroing in on a specific name, so I try to optimize that page (one page per name) for the name itself.
Google has even bragged about their ability in this respect. I believe it was Sergey who said that the first thing any employer might want to do when looking at a promising resume, is to "google" the person.
Google becomes a verb, as happened many years ago with Xerox, as in "I'll xerox a copy for you." This is like living in heaven for any public relations department in an aggressive company.
It seems to me that Doofus has the point correct. His point, if I may re-state it, is that Google's method of determining what pages to crawl and index is counter to their goal of indexing all quality relevant content because it sometimes causes large sites with good content to be under-indexed. To me, this seems to be a good and interesting point.
You countered with assertions that large sites should not be indexed simply because they have lots of pages. And your posts carry a strong suggestion that you think that any site having a large number of pages must be spam. That, it seems to me, misses the point.
I think the suggestion that any site with many pages must be spam is a very wrong and ill conceived.
Google and the search public both benefit from an algorithm that indexes all quality and relevant content, irrespective of it comes from a large or small site.
For the moment, this does not seem to be the case.
If you were Google, how would you choose what to spider frequently and index deeply and what not? Does choosing for higher pagerank plus new unindexed sites not make sense?