Forum Moderators: Robert Charlton & goodroi
The google bot is coming to the site looking at a couple of pages then dissapearing. I've tried site maps but they don't seam to be helping.
Any advice out there as the client is getting a bit pissed
Cheers
Sid
[edited by: lawman at 9:38 pm (utc) on April 5, 2006]
[edited by: tedster at 12:36 am (utc) on April 8, 2006]
[edit reason] add link to related thread [/edit]
Mozilla Googlebot is now trying to find pages on my site that have been gone for over 2 years ...
Even stranger yet ... The last Googlebot to come by & even attempt to find these same pages was over 1 1/2 years ago ... It's almost like Google is trying to spider the site, based from a map that is 2 years old :-/
Has he returned for others who were reporting his/her absense recently?
No loss of traffic so far, Serps remain about the same, just no Gbot visits.
Back to watching
WW_Watcher
We should this other index the "just in case" I think even google knows that search will always be tough so its keeping those older pages backed up for a while.
I updated a few redirects to permanent redirects for some directories i renamed a while back...
killed the apache mod- check spell, I think it gives of to many 302 redirects ... not sure on this yet...
I also went in and deleted a bunch of left over pages, where I noticed some errors in the error logs, for these not being there, but I just made a better 404 error page....
now its the wait and see game.
Since then - nothing. It's not even picking up robots.txt and sitemap.xml, which it used to do daily.
The site is plain hand-written HTML. Absolutely no tricks whatsoever. It's still indexed, but very low, though you can force the home page to #1 by using highly specific (pointless) keywords.
A site: search gives me all the pages.
A very few pages change pretty much every day and the whole site is regularly spidered by all the others: MSN, Yahoo, Ask, etc.
It has a sitemap in .xml format with lastmod and priorities correctly coded to reflect individual pages.
At 16:26 it downloaded my robots.txt and sitemap in that order.
At 20:58 it came back and downloaded /html/Case_Studies.htm
At 22:12 it came back again and downloaded /html/Philosophy.htm and
/html/Experience.htm
These pages are the only three on the site that are _NOT_ in the sitemap.
BUT - I _HAD_ changed pages with identical names (except for lower case) that reside in the root - and I'd updated the sitemap. Is the spidering/sitemap interface at Google confusing names across directory
levels?
Each of the old names, BTW, does a "noindex, follow" HTML redirect to the new part of the site.
Just after midnight it downloaded the sitemap again, and again at 09:14 this morning.
Since the Googlebot last came, the home page has been updated a number of times and a date of yesterday was clearly indicated in the sitemap that the Googlebot downloaded. But this page has not been spidered.
It looks to me, reviewing this log, that Google is only spidering pages that are NOT in the sitemap - close to the reverse of what I would expect.
Results of the GSiteCrawler Server-Test
Tested at 4/16/2006 11:13:38 AM / from 62.255.32.16:
URL=http://www.mydomain.co.uk/index.html
Result code: 200 (OK / OK)
Server: Microsoft-IIS/5.0
Date: Sun, 16 Apr 2006 11:07:27 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Sat, 15 Apr 2006 08:57:30 GMT
ETag: "168f95a06a60c61:c2f"
Content-Length: 4169
And from the sitemap.xml file:
<url>
<loc>http://www.mydomain.co.uk/index.html</loc>
<lastmod>2006-04-15</lastmod>
<changefreq>MONTHLY</changefreq>
<priority>1.0</priority>
</url>
But no Googlebot interest. It will cheerfully download ancient history, though.
[edited by: tedster at 3:47 pm (utc) on April 16, 2006]