Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

One reason Google may not find your Sitemap

         

tedster

5:00 pm on Oct 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is a highly technical server issue, and right now I'm not following it well. However, Google's JohnMu has commented about it on the Google Webmaster Groups and it appears that the easiest fix is to "Let Google determine my crawl rate" rather than control it. Here's a very condensed version of the high points - click through the links if you'd like to read more detail.

Sometimes in Google Webmaster Tools -> Sitemaps you see error messages like: "We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit."

  • You realize that googlebot makes multiple (we counted up to 11) GET requests in one single TCP/IP connection
  • You realize that if one these GET requests has a major time lag (is much slower than the other GET requests) Google cuts the TCP/IP connection.
  • You see an error in Google Webmaster Tools, without a trace in your logfiles.

    [blogen.tupalo.com...]

    More coverage from SERoundtable [seroundtable.com]
  • tedster

    6:06 pm on Oct 11, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    The more I ponder this, the more I think this issue has tripped up at least one client of mine. Time to go digging!

    aakk9999

    1:49 am on Oct 12, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    My understanding is that the sitemap was just a lucky occurrence that this issue was found (because of the WMT report).

    It is not that just sitemap was not found, but that anything past long response will not be fetched. So if one wonders why so few pages are crawled, then perhaps this is something to check.

    For instance having just one script that takes too long to execute (too long from Google standpoint) could potentially seriously reduce the number of pages crawled depending on how many pages that script generates and how often pages of that script are requested and in what order in relation to other requests (and you have no idea of knowing unless you go to TCP/IP level and examine communication).

    It also says that the problem seem to be only if the crawl rate is set in WMT as in that way the wait delay cannot be set up to be long enough. John Mu implies that if you let Google determine crawl rate, Google may decide to wait longer than the max delay from manual setting and therefore will (may?) wait for the long delayed script(s) to respond.

    windy138

    3:15 am on Oct 12, 2010 (gmt 0)



    So what we have to do in this case is? Can U give me the solution? Thanks in advance :X

    tedster

    5:15 am on Oct 12, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Welcome to the forums, windy138.

    The solution is don't try to pick your own crawl speed unless you really, REALLY know what you're doing. Just go with the recommended default and let Google decide how fast to crawl your server.