Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google not crawling Sitemap

         

tigger

4:33 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is anyone having a problem with G crawling there SM? I've looked at 3 sites all don't have cached information on the SM - but all the other pages within the site are fine

none of the sites have great PR but the thing I'm finding odd is that the SM is linked from every page and in some case pages that are 3 links deep are cached but the SM inst!

The sites are a couple of years old so its not like they are new but I can't think of a reason why G would be ignoring them considering every other page is cached

Any having similar problems ?

tedster

10:02 pm on Nov 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the SM is linked from every page

I assume this is an html sitemap, rather than an xml sitemap - right? Does the sitemap include more than a set of links and anchor text - descriptive blurbs and so on?

[edited by: tedster at 4:08 am (utc) on Nov 14, 2010]

indyank

4:06 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am seeing google having problems in detecting the robots.txt. Now, if that happens often, it will be a disaster for anything that you block using rotbots.txt.

tigger

7:19 am on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I assume this is an html sitemap


Hi Ted

Yes this is a good old html site map with links to all the pages within the site, inst that what a SM should be?

Also I've got other SM's that are getting crawled that are built in the same way - The SM's do have a brief note telling people about the page to place a "bit" of text but its probably no more than 50 words

But what don't understand is why one site would be cached fine whilst others aren't - has something changed regarding the thoughts on SM's and what webmasters are supposed to do? I was always under the impression a SM was just a collection of links within the site nothing more

tigger

7:09 pm on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



anyone have any feed back on this ?

tedster

9:56 pm on Nov 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sometimes public cache pages do go missing, or back up to an earlier date - and there doesn't seem to be any pattern to it. But visibility in the public cache is not equivalent to actually being cached and indexed on Google's back end.

So I guess one question is whether the URL shows up in the index.

tigger

7:25 am on Nov 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



what do you mean the URL of the SM? if so searching for the full URL doesn't show the the page listed - as for the site itself every page is listed ! odd !

Any ideas, why it should be ignoring considering its linked on every page

tedster

4:23 pm on Nov 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It begins to sound like Google decided this sitemap page would not provide a good landing page for a search user and so moved it out of the main index. It's not clear whay all their criteria are for this kind of action, but I have been seeing more of it recently.

However, there is one thing I want to be clear about. Originally you mentioned that Google is not "crawling" the page, but then you discussed not "caching" the page, and now we're discussing not "indexing" the page. These are really three different actions.

So, here's the question - from your own server logs, do you see googlebot requesting the sitemap URL?