"Can this explain the Amazon deluge?"
I don't know about "deluge", but Amazon is a clear example of a site that should benefit from the status quo. A jillion pages with zero pagerank but with anchor text pointing at other pages.
Google gets better at crawling long urls; Google devalues pagerank; Google considers anchor text to be gold; Amazon does well. All that is pretty easy to understand. Even if people don't agree that all these things are occuring, if they were, the Amazon effect makes perfect sense.
What doesn't make sense is why Google is not treating the Amazon mirror sites as spam, per their guidelines. Frankly one Amazon result on most any search would be reasonable. Two results each from five mirrors should never be seen.
On the other hand, the search in post 61 isn't about anchor text, but the phrase is in the title and that page is one of the featured links on related pages. Do a search on Amazon for car parts, then click around the pages, and you'll see that page linked all over the place. It could have thousands of links. If those links had anchor text that #7/8 result likely would be #1.
If you're still having trouble finding "clinchers" for irrelevant search results and spam, how can I send you this report for the phrase "widget store"?
|Ranking #2 is a CNN article describing a legal battle about a widget store suing another company. Why is it so relevant to earn 2nd place? |
Ranking #3 seems to show one URL having hijacked another. The listing shows one domain (which is irrelevant to the search term) but the cache shows metatags and content for a completely different domain.
Ranking #5 seems to be an example of cloaking, because the cached page contains 100% keyword stuffing whereas the actual link leads to a completely different page.
Ranking #8 is another news article about a widgets retailer changing careers.
Meanwhile, many commercial widget sites (which you'd expect to find if searching for "widget store") have either fallen in rankings or disappeared altogether.
The phenomenon of missing index pages and irrelevants search results are real, we're not making this up!
What a Joke
in Google and got 5 PDF files and not one site offering hotel accommodation
But hay everything seems to be normal :(
"But hay everything seems to be normal :( "
you are very wrong with that example. You complain that there's 5 pointless pdfs in that example. In reality there are SIX pdf's in the top 10.
does anyone else feel like this is all getting sureal? how long till a google spokesperson tells us they had just three spam reports? results are bad right? getting worse daily right? and yet still no acknowledgement from google themselves.
>SIX pdf's in the top 10
>in Google and got 5 PDF files
Can't find not even one pdf for your query example, not even within the first 20 results. I'm connecting from Europe but also searched using us proxies and various datacenters. So couldn't the pdf phenomenon just be a new flux phenomenon that sorts itself after a while!?
>and yet still no acknowledgement from google themselves
soapystar, you might want to read GoogleGuy's recent statements about the pdf thingy - no acknowledgement - no denying. He however clearly stated that he gonna talk with people inside the plex and investigate. That's pretty much good a feedback, imho.
People even said he'd use his multiple answers about the pdf problem as a trick to divert from the ebay, amazon, etc. problem.
I'm a bit confused know ...
">SIX pdf's in the top 10
>in Google and got 5 PDF files
Can't find not even one pdf for your query example, not even within the first 20 results. I'm connecting from Europe but also searched using us proxies and various datacenters. So couldn't the pdf phenomenon just be a new flux phenomenon that sorts itself after a while!?"
I can confirm 6 pdfs in top 10 from a US and UK location search.
>Can't find not even one pdf for your query example ...
>I can confirm 6 pdfs in top 10
Again, what if the pdf phenomenon is just a new flux phenomenon that sorts itself after a while, then? That's a optimist's pov, though.
btw, didn't say you were lying.
"Again, what if the pdf phenomenon is just a new flux phenomenon that sorts itself after a while, then?"
Believe me I hope that is the case, but I've only seen it get worse. Besides I think we should offer feedback when there seems to be a problem. As I've mentioned before it has NOT affected me as far as clicks, and the examples of problems I submitted were BROAD searches that rarely result in leads for me. The point I'm trying to make is that the search quality in some areas is simply not what it used to be, and when I'm talking to my significant other last night about something in general on the Internet and she mentions that she can't find things in Google anymore, then I think there is a problem. She is the example of the casual user.
Back in post 27 I talked about a certain Metallica album as an example of Amazon dominance - good title, lots of backlinks therefore it was #1 in a search. I just checked again for the exact same album and the title has disappeared from the listing and it now ranks #10 (9 datacenters checked). Curiously enough the listing with the full title still exists but doesn't figure in the results - it can be identified by the same URL but has a space i.e.
However doing searches for other albums I can say that this change is not across the board - but things are still fluid. I hope that things are changing back to how they used to be.
Is it worth pointing out here that the other country Amazons are not 'mirrors' of Amazon.com. The UK has it's own reviews, it's own customer reviews, and quite often has different ASINs (Amazon product IDs).
yes googleguy did say he had spoken to some gizmoid and had been assured that the spidering/indexing of pdfs hadnt changed. The point is not how many are in the index but how many now dominate first page returns of serps. And no, they are not highly relevant to most searches, they simply seem to have one or two occurences of keywords on them without actually being about the keyword.
Tried out lasco's
query and it was worse than he had led me to think!
Top 5 PDFs, only 2 general interest and 1 Bali, rest seven, out of 10, dealing with needs of a very very small group of people!
Take a look at a search for data storage.
1. A large phone companies pdf on the subject (They are not a large competitor in the data storage space).
2. Site on portable data storage
3. Doc from the same site in #2
4. The big book store's listing
5. The big book store's listing
6. A company that should be here, but by no means the largest.
7. A company that should be here, but even smaller than #6.
8. I have no idea what this is, but it has no place in the top 100. It mentions data storage once.
9. A link to buy mp3 players on a company mentioned in the subject.
10. A company that should be in the top 3.
The companies that should be here are not even in the top 20. I gave up after that and went to another SE.
This is not a technical search by any stretch of the imagination.
if for a specific german query there are on
SERP1 : 3/10 amazon redirects (with different domain name)
SERP2 : 7/10 amazon redirects
SERP3 : 6/10 amazon redirects
SERP4 : 2/10 amazon redirects
total 18/40 = 1/2
there is a real problem with your SERPs. I sent these things via spamreport. let's see.
best wishes for hopefully future relevant SERPs,
how relevant is the number one for this? :-)
Either Google will have to introduce some clustering options, or the results we now see from Amazon will eventually be viewed as pollution. For years now, Sergey and Page have been telling the story of someone who saved himself by calling an ambulance, because in a search for "heart attack" the symptoms he was experiencing came to the top of Google's search.
There are still relevant results for heart attack victims on top of the SERPs, but the landscape is changing. There are two Amazon results in the top ten (out of between one and two million results, depending on whether you use quotation marks). The first Amazon result is pushing a rock CD by the group "Queen" for 7.99 pounds sterling.
There has to be a joke in here somewhere about how a "Grateful Dead" listing would be better, but then Google's chef would get mad at me....
Can't find a problem with the serps? That's hard to understand:
Try these search terms:
"adidas golf shoes"
#3 Relevent Site
#4 Adidas - #4 for adidas golf shoes?
#5 MSN search results
#6 MSN search results
#7 - #10 look relevent
"sony mp3 player"
"cheap cd player"
#3 Relevent Site
I find dozens and dozens more seraches that get the same results. This to you is good relevent serps? A few companies dominating markets with pages absolutely stuffed with irrelevent content.
What's happening to Google?
One way to see the thing is:
With Googlebot's improvement in crawling dynamically created / database driven sites, big, established sites that cover a wide range of topics (offer a wide range of products - recently discussed: amazon) naturally can fill many top slots for even non commercial searches.
Another way is:
Could be that the strong amazon listings are based on the fresh bonus. Kackle, the heart attack example search is a really interesting one. It remembers me of the gold fresh listings i had in the past with some pages. #1 for many, many searches. For a month or even two. I guess it's not totally impossible that after a while (update, pr and backlink recalculation etc.) these strong amazon results will fall in their positions. We all know this phenomenon from our own fresh listings, don't we?
I originally planned to ask Sergey [webmasterworld.com], what his idea is how to best deal with this in future keeping the goal of using a neutral algo in mind that works without human intervention.
But i don't ask it since i have the strong feeling that it's a flux / fresh listing thingy.
If it's not, then facing the recently discussed amazon phenomenon, even Apple could gain top positions for ie. gnocci receipts and influence the Google search experience if they'd one day start publishing other stuff than hardware and software related infos.
I doubt this'll be the new algo.
[edited by: Yidaki at 6:01 pm (utc) on Aug. 12, 2003]
At least one search looks OK to me.
Tried searching for
and the top 8 results were HTMLs! (same with search for PDF only.)
Who says SERPS are full of PDF files? ;)
Forget technical searches. The problem I have at the moment is the product/shopping related searches.
#6 Panasonic page (yeah! But it's a "product does not exist" page)
#7 Epinions.com (not bad)
#9 Amazon UK
Page two is even more messy ;)
Like c1bernaught said, there are plenty more examples like this....
Geez, if you put in a shopping term into Google then you will probably get a shopping site as a return. That isn't a bad result.
People use google to find things to buy and if google excludes them then shoppers will use another search engine.
The real problem is the spammy sites that use very long addresses to get results.
Yeah all good examples. But all have the same similarity: newly crawled dynamically created pages (didn't see url's with "=" before). So what's with my guess about the fresh bonus for new, dyn pages that haven't been indexed/weighted before? Any opinion on that?
Yes the serps are poor, in many cases useless not only because of the amazon links but also of poor relevancy. It would be naive to think that google are not aware of the problems of the amazon flood. The question is how they deal with it. If it stays like this then the accusations that google have allowed these entries will continue. At that point google continues the steep slope down into paid inclusions for effectively thats what it means, if you are a spammer you can have your site removed, if 7/10 results came from a small site they certainly would be. I for one will move onto using other engines, unfortunately while my clients and their clients do I will have to continue to promote sites on google. I don't however have to recommend others to use it, and I started doing that a few months ago.
[edited by: Marcia at 7:43 pm (utc) on Aug. 12, 2003]
I do think that google should limit one result per domain per page on a search result return and give more results to other sites.
[So if there are 10 results on a page 4 shouldn't be from amazon.com and 4 from kelco].
It should be ten different sites with all different domains.
One of the theories was that maybe the amazon rankings could be explained by anchor text. Try paper towels - they are number 7 (behind 3 pdfs) but don't even show up in the top 562 for allinanchor: paper towels. I tried this on some other terms and found the same thing, less correlation between anchor text and serps than I can remember...
GG doesn't speak for the company as anyone with the sense to do a forum search would know. He is a person who works for google who posts on his own.
You clue-seekers think he posts at midnight California time when he is on the Google clock?
[edited by: Marcia at 7:50 pm (utc) on Aug. 12, 2003]
Right on steveb, we all post as individuals.
Speaking of which, discussion of *individuals* or *individual members* is off-topic for the board and this forum. The concern here is about the quality of the search which is certainly of concern so let's all confine it to that and stay on topic - and play nice and be courteous, while we're at it. We're all in this together, remember.
There's also a matter of simple Internet_101 technology. I personally hate it when all of a sudden another application on the computer starts to open when I click on a search result without checking first. That isn't user-friendly at all, considering some people may not have the resources available on their computers at the time.
The fact is that we all use browsers to search and we should expect that what we find will open up in the browser, the software we're choosing to use - not require another application to open. If I wanted PDF I could search with Acrobat Reader, right? Ludicrous, but accessing files that require another application other than the BROWSER should be by search preferences set by those who know and make that choice deliberately, and normal browser-access files alone should be the default. I can't see anything else being logical from a user point of view.
[edited by: Marcia at 8:03 pm (utc) on Aug. 12, 2003]
Hmm.. it must simply be that these new pages have not been weighed and measured. I can't see any other reason for these pages to be showing up where they are.
Anybody have a solid technical reason these pages are dominating?
The more I think about it, the more convinced I become that Google will have to start offering a clustering option that provides a drill-down list of two or more general categories for most searches. Rather like Teoma, or Alltheweb, or Altavista. Or for something really fancy, like Vivisimo.
It's not going to work to give ranking bonus for freshness. Look at what one blogger was complaining about [kottke.org] -- and this was before this latest "Amazing" update. Bloggers are ever-fresh, and a lot of observers have noticed blogging noise in the SERPs for over a year now. It's not good enough to say that Google will "work it out" by the end of the cycle.
One pro-Google critic attacked this blogger on the grounds that his searches are insufficiently refined. Yes, with any search engine I can get better results by using better and more specific multiple search terms. But that's not going to cut it. By the time Google educates a few more people on how to do searches, folks will be drilling-down on Teoma, Alltheweb, and Altavista. It would be nice if all Internet users had search-engine smarts, but they don't, and they don't particularly care to learn if they don't have to.
One good question for Sergey would be whether Google has any plans to introduce clustering anytime soon. This is not a trivial thing for Google. The reason I say that is because if the clustering is introduced on the back end as a filter, the CPU cycles go through the roof. More server farms will be required. Google is so popular that they have to think about the load implications of anything they do.
No, I think Google would prefer to install some categorization on the crawl end (off-line), just like they way they computed PageRank. Doing it off-line means you do it once per crawl, not once per search, and you can deliver search results many times faster this way.
But doing it on the front (crawl) end implies more system-wide software architecture changes than doing it on the back (search) end. Perhaps Google is caught between a rock and a hard place. They won't tell us, of course, because it's none of our business. So we're free to speculate.
Great Point Marcia
I would like to see Google index PDF file differently.
Some times because of PDF being so big my computer has crashed trying to open the program and load the file.
In general the results are not good and its not about stats either, google may have the same number of PDF files or even Amazon pages but the fact remains that searching for hotel accommodation brings you nothing but PDF.
Their are too many different search results to list and would be pointless to list more as we all agree to the problem (well nearly all of us)
I also think the spidering of interactive url's will cause more problems. I do php programing and it would not be difficult for me to make loads of links forcing Google to crawl them and index the content. All the same content just in different locations and some different picture files etc.
So whats the answer?
Go back to basics, create great sites, get plenty of links and content.
If its not broke don't fix it!
That's the solution, Google you are trying to hard and you will ruin the quality and your perfect reputation.