Googlebot isn't crawling

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot isn't crawling

sidney rascal

9:26 pm on Apr 5, 2006 (gmt 0)

< note: related thread here: [webmasterworld.com...] >

The google bot is coming to the site looking at a couple of pages then dissapearing. I've tried site maps but they don't seam to be helping.

Any advice out there as the client is getting a bit pissed

Cheers

Sid

[edited by: lawman at 9:38 pm (utc) on April 5, 2006]

[edited by: tedster at 12:36 am (utc) on April 8, 2006]
[edit reason] add link to related thread [/edit]

catch2948

6:31 pm on Apr 12, 2006 (gmt 0)

Wow ... Strangest thing I have seen yet ...

Mozilla Googlebot is now trying to find pages on my site that have been gone for over 2 years ...

Even stranger yet ... The last Googlebot to come by & even attempt to find these same pages was over 1 1/2 years ago ... It's almost like Google is trying to spider the site, based from a map that is 2 years old :-/

WW_Watcher

3:15 am on Apr 15, 2006 (gmt 0)

Googlebot Still MIA for my sites, for two weeks now on one site, one visit last week on the other for 3 existing pages, very strange.

Has he returned for others who were reporting his/her absense recently?

No loss of traffic so far, Serps remain about the same, just no Gbot visits.

Back to watching
WW_Watcher

carminejg3

5:11 am on Apr 15, 2006 (gmt 0)

I noticed I had a bunch of pages indexed in googles suplamental index....

We should this other index the "just in case" I think even google knows that search will always be tough so its keeping those older pages backed up for a while.

I updated a few redirects to permanent redirects for some directories i renamed a while back...

killed the apache mod- check spell, I think it gives of to many 302 redirects ... not sure on this yet...

I also went in and deleted a bunch of left over pages, where I noticed some errors in the error logs, for these not being there, but I just made a better 404 error page....

now its the wait and see game.

catch2948

5:18 am on Apr 15, 2006 (gmt 0)

I just posted the results of a month long experiment that I have been running, concerning the new Mozilla Googlebots. Doesn't appear yet (waiting to be approved), but I can guarantee that you will want to read it :-)

Phil_Payne

10:22 am on Apr 15, 2006 (gmt 0)

One of my sites was pretty much sucked down by GB in late March, a total of three sessions. Late on the 29th March (UTC) I had a last visit.

Since then - nothing. It's not even picking up robots.txt and sitemap.xml, which it used to do daily.

The site is plain hand-written HTML. Absolutely no tricks whatsoever. It's still indexed, but very low, though you can force the home page to #1 by using highly specific (pointless) keywords.

A site: search gives me all the pages.

A very few pages change pretty much every day and the whole site is regularly spidered by all the others: MSN, Yahoo, Ask, etc.

It has a sitemap in .xml format with lastmod and priorities correctly coded to reflect individual pages.

Phil_Payne

11:22 am on Apr 16, 2006 (gmt 0)

After a LONG Googlebot drought, a new one turned up yesterday - 66.249.66.5

At 16:26 it downloaded my robots.txt and sitemap in that order.

At 20:58 it came back and downloaded /html/Case_Studies.htm

At 22:12 it came back again and downloaded /html/Philosophy.htm and
/html/Experience.htm

These pages are the only three on the site that are _NOT_ in the sitemap.

BUT - I _HAD_ changed pages with identical names (except for lower case) that reside in the root - and I'd updated the sitemap. Is the spidering/sitemap interface at Google confusing names across directory
levels?

Each of the old names, BTW, does a "noindex, follow" HTML redirect to the new part of the site.

Just after midnight it downloaded the sitemap again, and again at 09:14 this morning.

Since the Googlebot last came, the home page has been updated a number of times and a date of yesterday was clearly indicated in the sitemap that the Googlebot downloaded. But this page has not been spidered.

It looks to me, reviewing this log, that Google is only spidering pages that are NOT in the sitemap - close to the reverse of what I would expect.

Results of the GSiteCrawler Server-Test
Tested at 4/16/2006 11:13:38 AM / from 62.255.32.16:

URL=http://www.mydomain.co.uk/index.html
Result code: 200 (OK / OK)
Server: Microsoft-IIS/5.0
Date: Sun, 16 Apr 2006 11:07:27 GMT
Content-Type: text/html
Accept-Ranges: bytes
Last-Modified: Sat, 15 Apr 2006 08:57:30 GMT
ETag: "168f95a06a60c61:c2f"
Content-Length: 4169

And from the sitemap.xml file:

<url>
<loc>http://www.mydomain.co.uk/index.html</loc>
<lastmod>2006-04-15</lastmod>
<changefreq>MONTHLY</changefreq>
<priority>1.0</priority>
</url>

But no Googlebot interest. It will cheerfully download ancient history, though.

[edited by: tedster at 3:47 pm (utc) on April 16, 2006]

ronburk

4:14 pm on Apr 16, 2006 (gmt 0)

Has he returned for others who were reporting his/her absense recently?

Yes. I don't expect the new visitation pattern to be established for a week or two yet, though.

wanderingmind

8:48 am on Apr 17, 2006 (gmt 0)

I can see Gbot coming to the site and crawling new pages occasionally, but those pages just do not get into the index. And my homepage which was getting indexed every day now shows a cache date of 1 April.

This 68 message thread spans 3 pages: 68