homepage Welcome to WebmasterWorld Guest from 50.17.27.205
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Googlebot found an extremely high number of URLs on your site
tabish




msg:4284900
 5:19 am on Mar 21, 2011 (gmt 0)

Hi,

I keep getting this message in my Web Master Tools admin. My site is dynamic with so many users and updated contents on daily basis.

What is wrong if my site have so many categories and users?

Regards

 

tedster




msg:4284901
 5:37 am on Mar 21, 2011 (gmt 0)

It means you should check for duplicate URL problems, potentially infinite dynamic URLs that create a spider trap for crawling, and things like that.

For example, if your site has many different kinds of "sorts" and each one assigns a parameter in the query string that makes crawling and indexing your content problematic. Even more, if you assign user tracking parameters in the query string, then every crawl will create a new URL for every page.

Usually this kind of message is only generated if each of the URLs gogolebot is finding do not have unique content - so it is good to understand what is happening on your site and address it as best you can. Your truly unique content will be indexed better and rank better if you do.

tabish




msg:4284902
 5:48 am on Mar 21, 2011 (gmt 0)

Thanks @tedster

Actually.. the content is like this

countries/states/cities and then i have put max 30 latest pages under each cities. And then there are categories like sport/cricket and then 30 pags under these.. so many sports and so many other activities..

It normally displays the users and their profiles and other things under these categories..

yes.. there can be same users under some categories but this is how we make browse easy..

what is your input to deal with it?

Regards

tabish




msg:4284903
 5:50 am on Mar 21, 2011 (gmt 0)

and also.. i have about a Million users and the link of their profiles are also being indexed by google

walkman




msg:4284904
 5:55 am on Mar 21, 2011 (gmt 0)

"and also.. i have about a Million users and the link of their profiles are also being indexed by google "

You are playing with fire post Panda with those many profile pages and links.

tabish




msg:4284905
 6:00 am on Mar 21, 2011 (gmt 0)

Dear walkman

other sites, same as my niche, have much much more links like i have. It is not at all spamming, for example if you have a PHPBB from or other social networking site, will you hide links of the profile displaying on the topic page?

I can see that other sites have much more links and profile than mine.. mine is very small before them.

tabish




msg:4284906
 6:08 am on Mar 21, 2011 (gmt 0)

When you do:

site:facebook.com you gets about About 2,080,000,000 results (0.05 seconds)

So.. if I consider Panda update, Facebook is in danger?

Shatner




msg:4284927
 9:25 am on Mar 21, 2011 (gmt 0)

>>>So.. if I consider Panda update, Facebook is in danger?

Panda only applies to independently owned sites and/or sites that aren't in the tech industry with friends working at Google.

I wish I was being sarcastic.

That said, another thing to watch out for with that High Number of URLs message, I got that and found that Google was somehow indexing phantom URLs from my site which didn't exist at all. Unfortunately I didn't get the problem Google cause itself fixed before Panda so I've been penalized.

Or at least that's one of my theories for why I've been penalized. Who knows anymore! No one.

tabish




msg:4284940
 10:07 am on Mar 21, 2011 (gmt 0)

Thankx Shatner

For making me understand about Panda update.

I am working on my site and trying to discard old contents. May be this will help further.

Regards

rustybrick




msg:4284966
 11:19 am on Mar 21, 2011 (gmt 0)

Actually, at SMX, Maile from Google said sometimes it just means you have a really large site and you can ignore the warning. It doesn't always mean you need to do something.

tabish




msg:4284970
 11:27 am on Mar 21, 2011 (gmt 0)

Yes rustybrick, it is actually a 8 years old site with lots of users.

helpnow




msg:4285002
 12:21 pm on Mar 21, 2011 (gmt 0)

Ignore the message at your own peril.

deadsea




msg:4285020
 12:59 pm on Mar 21, 2011 (gmt 0)

Having tons of really bad pages did *NOT* hurt sites in panda. I run multi-million page web sites. None of them were hit by the Panda update. They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.

Having tons of poor pages itself is not going to hurt you. Not having good pages, or passing too much internal pagerank to poor pages would seem to be a more worrying signal for panda.

Our sites always have that message in webmaster tools as well. We've been ignoring it for years.

tedster




msg:4285026
 1:18 pm on Mar 21, 2011 (gmt 0)

It really depends on the situation. My advice is to understand it, in your particular case. In some cases ignoring it is fine - because you know why it's there and your important pages are getting crawled frequently and thoroughly. In other cases, you may find improvements if you fix the issue.

Google is getting good at identifying and avoiding common spider traps. That doesn't mean other search engines are doing it as well as Google, so sometimes by finding a problematic cause for this complaint can give you a side benefit on other search engines.

Merely ignoring without understanding could mean you are missing an opportunity.

tabish




msg:4285028
 1:21 pm on Mar 21, 2011 (gmt 0)

@deadsea Thank you

I have also been ignoring it for almost an year.. but today I thought about asking masters.. and people here makes you shiver :)

TheMadScientist




msg:4285029
 1:28 pm on Mar 21, 2011 (gmt 0)

They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.

Unless of course, part of your definition of quality is 'real' and not 'simulated', then all your pages could be considered higher quality, because a real forum or large website will probably have those characteristics, but a 'simulated' one won't ... Everyone seems to think 'quality' is an 'easily definable characteristic' and imo they have more of a 'big picture' view than mot give them credit for, and contrary to popular belief really really try to not throw the baby out with the bathwater, without getting gamed at the same time.

The key to this imo is longer-term patterns, not the 'today's view' we're all so used to seeing and many try to manipulate. By looking at the longer-term picture of a site on a whole you can more reliably differentiate between sites 'built to rank' and sites that are 'built for visitors'.

rustybrick




msg:4285241
 7:23 pm on Mar 21, 2011 (gmt 0)

Hey, I didn't say ignore it. Clearly Google thinks you have a lot of URLs. But Google is not saying that is a bad thing. Google is saying it is unusual. So maybe you have a lot of URLs when you shouldn't, like lots of dup URLs, pagination issues, or whatever.

If not, I wouldn't worry.

Shatner




msg:4285272
 8:41 pm on Mar 21, 2011 (gmt 0)

>>>Having tons of really bad pages did *NOT* hurt sites in panda. I run multi-million page web sites. None of them were hit by the Panda update. They all have really good pages, but also tons of pages that are *REALLY* poor. User profiles with no content, inane forum topics, pages about small towns with no content, etc, etc, etc.

Not true. Maybe that happened in your case but spend 5 minutes in the Google Webmaster forum and you'll see that 90% of the sites complaining there have tons of really good content... and no one has any idea why they were penalized

People really need to stop regurgitating this "it's your fault, you suck" message. It's not true or helpful.

Shatner




msg:4285274
 8:43 pm on Mar 21, 2011 (gmt 0)

>>By looking at the longer-term picture of a site on a whole you can more reliably differentiate between sites 'built to rank' and sites that are 'built for visitors'.

But can Google differentiate? Sites built to rank have done very, very well with Panda. eHow for instance.

jdMorgan




msg:4285394
 12:31 am on Mar 22, 2011 (gmt 0)

> But can Google differentiate?

Yes, they can. By measuring click-throughs, bounce rates, and repeat click-throughs in a search session, for example. Doing so, they can quickly get a fairly good metric on how happy a searcher is with the link Google provided to your site in their search results. See all those www.google.com/url?cd=1&q=fuzzy%20blue%20widgets referrers? That's what they're all about.

Oh, and it's only "cd=1" if you're lucky... :)

Sometimes the issue is content quality. Sometimes it's the site's technical quality -- duplicate-content and infinite URL-spaces as hinted-at here. Sometimes it's many factors. As Tedster said above, the important thing is to *know* what that message might mean in the context of your site and its unique URL-space.

Jim

Shatner




msg:4285397
 12:41 am on Mar 22, 2011 (gmt 0)

>>Sometimes the issue is content quality. Sometimes it's the site's technical quality -- duplicate-content and infinite URL-spaces as hinted-at here.

What are you basing this analysis on? Many, many of the sites penalized by Panda do not fit into this mold at all. Panda would seem to suggest that you are in fact, very wrong. And that if Google can measure quality, they aren't doing it.

tedster




msg:4285407
 1:01 am on Mar 22, 2011 (gmt 0)

This thread is about a specific message in Webmaster Tools: "Googlebot found an extremely high number of URLs on your site". We don't need another thread discussing Panda - we already have so many. So dwelling on the Panda update in this context is not on-topic.

That particular Webmaster Tools message has been around long before the recent update. Please, let's not hijack this thread - for the sake of everyone who comes here for help on this topic, now or in the future.

Shatner




msg:4285408
 1:06 am on Mar 22, 2011 (gmt 0)

Sorry Ted, you're right. It does have some relevance though because that message could have some impact with Panda that it did not have before. Just something to keep in mind.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved