Welcome to WebmasterWorld Guest from 107.20.5.156

Message Too Old, No Replies

Google WMT Indexing Gone Wild?

     

Komodo_Tale

8:41 pm on Jul 7, 2010 (gmt 0)

10+ Year Member



Google has gone nuts. It is combining URLs and trying to crawl them. The index wend from 10k to 126k/79k uniques. GWMT is also displaying a warning:

Googlebot found an extremely high number of URLs on your site: http://www._____.com/ July 7, 2010

Googlebot encountered problems while crawling your site http://www._____.com/.

Googlebot encountered extremely large numbers of links on your site. This may indicate a problem with your site's URL structure. Googlebot may unnecessarily be crawling a large number of distinct URLs that point to identical or similar content, or crawling parts of your site that are not intended to be crawled by Googlebot. As a result Googlebot may consume much more bandwidth than necessary, or may be unable to completely index all of the content on your site.


I have checked the site, sitemaps and RSS feed. Everything seems okay.

Has anyone seen something like this?

tedster

9:21 pm on Jul 7, 2010 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



There are a number of recent posts here about WMT data being off - but none have been about a "too many URLs" problem. That has historically been a very real warning to the site owner.

I would say check out some of these URLs and see how your server responds. Google will crawl not only URLs from your sitemap or your site's internal links. They will naturally crawl URLs that they find in external backlinks, and if an incorrectly configured backlink ends up resolving 200 OK on your server, that can start a cascade of "bad URLs".

Komodo_Tale

9:42 pm on Jul 7, 2010 (gmt 0)

10+ Year Member



I don't know where the links are coming from, but you nailed one thing. They resolve to a custom error page with a 200 server response. That's not good. I was fixated on finding the source I did not look at the server response. #duh