It means you should check for duplicate URL problems, potentially infinite dynamic URLs that create a spider trap for crawling, and things like that.
For example, if your site has many different kinds of "sorts" and each one assigns a parameter in the query string that makes crawling and indexing your content problematic. Even more, if you assign user tracking parameters in the query string, then every crawl will create a new URL for every page.
Usually this kind of message is only generated if each of the URLs gogolebot is finding do not have unique content - so it is good to understand what is happening on your site and address it as best you can. Your truly unique content will be indexed better and rank better if you do.
countries/states/cities and then i have put max 30 latest pages under each cities. And then there are categories like sport/cricket and then 30 pags under these.. so many sports and so many other activities..
It normally displays the users and their profiles and other things under these categories..
yes.. there can be same users under some categories but this is how we make browse easy..
other sites, same as my niche, have much much more links like i have. It is not at all spamming, for example if you have a PHPBB from or other social networking site, will you hide links of the profile displaying on the topic page?
I can see that other sites have much more links and profile than mine.. mine is very small before them.
>>>So.. if I consider Panda update, Facebook is in danger?
Panda only applies to independently owned sites and/or sites that aren't in the tech industry with friends working at Google.
I wish I was being sarcastic.
That said, another thing to watch out for with that High Number of URLs message, I got that and found that Google was somehow indexing phantom URLs from my site which didn't exist at all. Unfortunately I didn't get the problem Google cause itself fixed before Panda so I've been penalized.
Or at least that's one of my theories for why I've been penalized. Who knows anymore! No one.
Having tons of really bad pages did *NOT* hurt sites in panda. I run multi-million page web sites. None of them were hit by the Panda update. They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.
Having tons of poor pages itself is not going to hurt you. Not having good pages, or passing too much internal pagerank to poor pages would seem to be a more worrying signal for panda.
Our sites always have that message in webmaster tools as well. We've been ignoring it for years.
It really depends on the situation. My advice is to understand it, in your particular case. In some cases ignoring it is fine - because you know why it's there and your important pages are getting crawled frequently and thoroughly. In other cases, you may find improvements if you fix the issue.
Google is getting good at identifying and avoiding common spider traps. That doesn't mean other search engines are doing it as well as Google, so sometimes by finding a problematic cause for this complaint can give you a side benefit on other search engines.
Merely ignoring without understanding could mean you are missing an opportunity.
They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.
Unless of course, part of your definition of quality is 'real' and not 'simulated', then all your pages could be considered higher quality, because a real forum or large website will probably have those characteristics, but a 'simulated' one won't ... Everyone seems to think 'quality' is an 'easily definable characteristic' and imo they have more of a 'big picture' view than mot give them credit for, and contrary to popular belief really really try to not throw the baby out with the bathwater, without getting gamed at the same time.
The key to this imo is longer-term patterns, not the 'today's view' we're all so used to seeing and many try to manipulate. By looking at the longer-term picture of a site on a whole you can more reliably differentiate between sites 'built to rank' and sites that are 'built for visitors'.
Hey, I didn't say ignore it. Clearly Google thinks you have a lot of URLs. But Google is not saying that is a bad thing. Google is saying it is unusual. So maybe you have a lot of URLs when you shouldn't, like lots of dup URLs, pagination issues, or whatever.
>>>Having tons of really bad pages did *NOT* hurt sites in panda. I run multi-million page web sites. None of them were hit by the Panda update. They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.
Not true. Maybe that happened in your case but spend 5 minutes in the Google Webmaster forum and you'll see that 90% of the sites complaining there have tons of really good content... and no one has any idea why they were penalized
People really need to stop regurgitating this "it's your fault, you suck" message. It's not true or helpful.
Yes, they can. By measuring click-throughs, bounce rates, and repeat click-throughs in a search session, for example. Doing so, they can quickly get a fairly good metric on how happy a searcher is with the link Google provided to your site in their search results. See all those www.google.com/url?cd=1&q=fuzzy%20blue%20widgets referrers? That's what they're all about.
Oh, and it's only "cd=1" if you're lucky... :)
Sometimes the issue is content quality. Sometimes it's the site's technical quality -- duplicate-content and infinite URL-spaces as hinted-at here. Sometimes it's many factors. As Tedster said above, the important thing is to *know* what that message might mean in the context of your site and its unique URL-space.
>>Sometimes the issue is content quality. Sometimes it's the site's technical quality -- duplicate-content and infinite URL-spaces as hinted-at here.
What are you basing this analysis on? Many, many of the sites penalized by Panda do not fit into this mold at all. Panda would seem to suggest that you are in fact, very wrong. And that if Google can measure quality, they aren't doing it.
This thread is about a specific message in Webmaster Tools: "Googlebot found an extremely high number of URLs on your site". We don't need another thread discussing Panda - we already have so many. So dwelling on the Panda update in this context is not on-topic.
That particular Webmaster Tools message has been around long before the recent update. Please, let's not hijack this thread - for the sake of everyone who comes here for help on this topic, now or in the future.