homepage Welcome to WebmasterWorld Guest from 54.227.20.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
"Extremely high number of urls" report includes robots.txt urls
achean



 
Msg#: 4362947 posted 5:27 pm on Sep 15, 2011 (gmt 0)

Got a GWMT warning 'Googlebot found an extremely high number of URLs on your site', which is obviously cause for concern. What's puzzling is that the list of examples includes lots of URLs that are either excluded via our Robots.txt file or use parameters that should be ignored based on our parameter handling settings. Any thoughts as to what's going on and how to address the problem?

 

deadsea

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4362947 posted 7:23 pm on Sep 15, 2011 (gmt 0)

Its usually not a problem. I tend to work on huge sites. They all get this message but do great at ranking pages in the SERPs.

achean



 
Msg#: 4362947 posted 7:24 pm on Sep 15, 2011 (gmt 0)

When you've gotten the warnings, have they referenced URLs that you've restricted?

GlobalMax



 
Msg#: 4362947 posted 12:10 pm on Sep 16, 2011 (gmt 0)

Can you use pattern matching to cut down on the number of entries in your robots.txt file?

The patterns that Google supports in robots.txt are described here [google.com ]. (Look under "Pattern matching," after expanding the "Manually create a robots.txt file.")

indyank

WebmasterWorld Senior Member



 
Msg#: 4362947 posted 1:20 pm on Sep 16, 2011 (gmt 0)

What's puzzling is that the list of examples includes lots of URLs that are either excluded via our Robots.txt file or use parameters that should be ignored based on our parameter handling settings.


They should be obeying your robots.txt unless you have used it wrongly. But, I have never seen their bots respecting those "parameter handling settings" in google webmaster tools.I don't even know why they provide that feature when their bots attempt to crawl most of those URLs.

The only solution to block them is robots.txt. But they find workarounds for that these days. For example, make sure that you don't have a +1 button on those pages. otherwise, they might not obey robots.txt

achean



 
Msg#: 4362947 posted 3:09 pm on Sep 16, 2011 (gmt 0)

I did recently add +1 buttons to some of the pages with 'ignored' parameters... which may be why this just cropped up for those, but the robots.txt blockages have been in place forever, are fairly limited in number, and are definitely implemented correctly.

GlobalMax



 
Msg#: 4362947 posted 3:26 pm on Sep 16, 2011 (gmt 0)

If Google considers the presence of a +1 button as grounds to ignore robots.txt exclusions, then perhaps it's ignoring your directives about which parameters on those pages should be ignored too, and the combinatorics are blowing out the URL count.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved