homepage Welcome to WebmasterWorld Guest from 54.161.228.29
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Google Search Results Froze in Time.
Can XML sitemap submission lead to Google stop crawling?
cssatsc




msg:3934463
 4:33 pm on Jun 16, 2009 (gmt 0)

Google used to crawl my web site very diligently before I opened a webmaster account with Google, submitting my first XML sitemap around May 24th (not sure about the exact date, could March 20). Sometimes several times a day.

Since then, new pages that I added to my web site simply don't show up in Google Search (even when I force site:<mydomain>.com).

Old pages (prior to May 24th) show up fine, just as they used to.

What could have I done wrong?

Is it possible that an XML sitemap submission to Google actually hurts?

Thanks!

BTW, my robots.txt contains only the following:

User-Agent: *
Allow: /
Sitemap: [<mydomain>.com...]

 

Propools




msg:3934480
 5:18 pm on Jun 16, 2009 (gmt 0)

Does your sitemap and file location comply with: [sitemaps.org ] ?

[edited by: Propools at 5:19 pm (utc) on June 16, 2009]

cssatsc




msg:3934492
 5:46 pm on Jun 16, 2009 (gmt 0)

Does your sitemap and file location comply with: [sitemaps.org...] ?

Absolutely. I know that the path [<mydomain>.com...] looks problematic but I (think that I) made sure that everything complies as follows:
  1. In ~/public_html's .htaccess there is a redirect 301 to [<mydomain>.com...]
  2. Thus everything in my website is currently located under [<mydomain>.com...] - including the newly added pages that Google ignores for some reason.
  3. In addition, I made sure that robots.txt (located in ~/public_html, i.e. [<mydomain>.com...] is world-readable and contains a line that tells where the site's XML sitemap file is located (see the message that started this thread).

Did I miss anything?

Thanks!

cssatsc




msg:3934629
 9:13 pm on Jun 16, 2009 (gmt 0)

Also, it seems that the problem is not with Google not crawling my site anymore but rather not "completing" the crawl. In fact, I just "caught" it crawling:

00:06:48 0 •Spider 66.249.71.38 16:00:54 16:00:54
Time Since Clicked:
00:06:48 ago Session ID:
Host: crawl-66-249-71-38.googlebot.com
User Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
/shop/index.php?main_page=product_free_shipping_info&products_id=34&language=en

The problem seems to be that it somehow doesn't reach products_id=73 ...

Any idea what could have gone wrong?

g1smd




msg:3934640
 9:31 pm on Jun 16, 2009 (gmt 0)

I have seen intermittent reports like this ever since XML sitemaps first came into use.

Be aware that the Allow: syntax is not Universal; I wouldn't use it with User-agent: * here.

I would use Disallow: <blank> or else not specify anything at all there.

cssatsc




msg:3934731
 12:00 am on Jun 17, 2009 (gmt 0)

Be aware that the Allow: syntax is not Universal; I wouldn't use it with User-agent: * here.

I would use Disallow: <blank> or else not specify anything at all there.

Thanks for the tip. I wasn't aware of this. I just changed my robots.txt to include only the following line:

Sitemap: [<mydomain>.com...]

I will track this and see if this helps.

cssatsc




msg:3936881
 5:00 pm on Jun 19, 2009 (gmt 0)

Hmmm... even after implementing the above suggested change in robots.txt, it seems that Google is insisting on ignoring the last product description page that I added to my web site on June 8, 2009, 21:05.

Interestingly enough, a product that I added on June 4, 2009, 06:58 is found by Google search.

Is there a minimum 2-week delay for Google to add website pages to its results?

I did specify <changefreq> for product pages to change daily but I read somewhere that this parameter is treated as a suggestion only? I have no explanation for this weird behavior.

Additional insights would be appreciated.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved