Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Message from Google Webmaster Tools -- 'HIGH' number of URLs

         

c41lum

3:46 pm on Aug 21, 2008 (gmt 0)

10+ Year Member



Hi Guys, just got this message show up in my WebmasterTools account.

'Googlebot found an extremely high number of URLs on your site

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.'

Could this be the reason for me yo-yoing in and out of the SERPS has anybody else seen this message?

tedster

5:11 pm on Aug 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a relatively new error message for Webmaster Tools. Here's a snippet from Google's Help Page on the topic:

Unnecessarily high numbers of URLs can be caused by a variety of issues. These include:
  • Additive filtering of a set of items
  • Dynamic generation of documents. This can result in small changes because of counters, timestamps, or advertisements.
  • Problematic parameters in the URL. Session IDs, for example.
  • Sorting parameters. Some large shopping sites provide multiple ways to sort the same items
  • Irrelevant parameters in the URL, such as referral parameters.
  • A dynamically generated calendar might generate links to future and previous dates with no restrictions on start of end dates.
  • Broken relative links.

[google.com...]

So is this a reason for a Yo-Yo problem in the SERPs? Only Google can say for sure, but I would definitely encourage you to fix the issue and find out.

SEOPTI

5:48 pm on Aug 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



c41lum, how many distinct URLs do you feed them?

c41lum

6:56 pm on Aug 21, 2008 (gmt 0)

10+ Year Member



They have given me a example of the URLS, looking at it there is upto 10,000 filter pages that google would have had to crawl.

I have put the nofollow tag on all those pages and I have also added it to my robots.txt disallow list. Fingers crossed this should fix my problems.

"Would this cause the yo-yoing that I have had since xmas?"

Reno

7:20 pm on Aug 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Where should we look for this message in GWT? Is it in:

Diagnostics => Content analysis

Thanks...

.................................

pageoneresults

8:18 pm on Aug 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nice find! Crawler technology has improved leaps and bounds. So have web platforms. Between the two of them, it could be a recipe for disaster.

We're blocking the indexing of navigation and all that other stuff with new releases. Gotta a neat little setup going that I think is going to prevent all of the above. We will see as we move forward with a recent launch.

SEOPTI

2:07 am on Aug 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Only 10,000 distinct URLs, I feed them 1,000,000+ and never got this message.

tedster

2:20 am on Aug 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This error message comes when their spidering detects a potentially infinite url space. If you feed 1 million urls that are obviously distinct, then you'll never see this messsage.

subhendu

4:17 am on Aug 22, 2008 (gmt 0)

10+ Year Member



Good point surfaced.
There is no such warnings for me but I have some calendar codes, sort by column and paging scripts issues. Now I have added rel=nofollw to them.
This should help me in duplicate description tag issue also.

c41lum

10:09 am on Aug 22, 2008 (gmt 0)

10+ Year Member



The strange thing is guys, is that all these pages already used the noindex, follow tag.

Seems there not happy with me taking up G bots time.

pageoneresults

10:23 am on Aug 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The strange thing is guys, is that all these pages already used the noindex, follow tag.

Prior to the warning? If so, can you show us the code you are using? Just the metadata element and the order you have the directives. I know you said noindex, follow and if that is the case, I think something may be broken with comma separated directives. I too have seen stuff getting indexed, being followed, yada, yada, yada, even after having various metadata elements to noindex, or nofollow, or none, you name it. I use that element judiciously.

c41lum

10:45 am on Aug 22, 2008 (gmt 0)

10+ Year Member



Pre-Changes Meta Data:

<title></title>
<meta name="description" content="" />
<meta name="keywords" content=""/>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />
<meta name="robots" content="noindex,follow" />
<meta name="Language" content="en" />

pageoneresults

11:36 am on Aug 22, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Googlebot found an extremely high number of URLs on your site.

Seems there not happy with me taking up G bots time.

<meta name="robots" content="noindex, follow" />

You know, I was thinking something else originally. < That happens more frequently these days. After reading through the topic again, I might agree with your assessment. Maybe there is a disproportionate number of allowed, noindex and the follow.

Maybe some sort of duplication occuring in the follow routine? Looping? This would be a dynamic site yes? Is there a rewrite involved? Have you double checked everything? I've seen Google Webmaster Tools flag things that have actually been problems for Webmasters. It's a nice feature to have available to you when things are accurate. ;)

subhendu

12:45 pm on Aug 22, 2008 (gmt 0)

10+ Year Member



<meta name="robots" content="noindex, follow" />

This tag is used any more ? If I use this tag for a page and I have some links in the body of the page which I want to be indexed then that will work ?