Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: open
This is the "normal" fresh-bot. He comes and get pages, and shoves them into the index within a day or two.
This is the crawler I do not understand. He is acting like the old "deepbot", requesting approximately 2X the pages of the "freshbot". Also, the content he has crawled, at least on a few of my sites, has not showed up in the live index.
Are we going back to regular updates from "deep" crawls?
64.68.81.* appeared for me on the 10.8 and the pages are now today appearing in the index but only with the title.
I have to admit I'm little all over the place as I am still trying to recover from a server problem a month or 2 ago when I gave Gbot alot of 500 status codes and if I am really honest I don't know if these pages were ever in the index in the first place or not. :0
64.68.82.* came by on the 17.8 and I'm wating for those to appear. Unlike you though 82 did quite an extensive crawl whereas 81 just crawled a small fraction of pages.
I added 10 new pages last month. 4 of them got picked up, 6 didn't. The newest of the pages added was one of the first to be added to the index. Go figure? Its anyone's guess how this stuff works. I have no clue!
Has it taken longer with anyone else lately?
It's been variable. Just went through a re-design on one site, and seeing it take well over 20 days to get some of the new pages in. Very similar to what Liane is describing.
Some sites are popping in under 48 hours, though. I haven't figured out the pattern yet.
One thing that I've noticed however is odd spider behavior. Sometimes a spider will request 20 different pages from 20 different IP address, each time requesting robots.txt after every page request. Sometimes it'll request one random page, then won't come back for 30 hours.
It's very erratic right now.
I think there seems to be a problem, at least from what I am observing.
[edited by: webdude at 5:56 pm (utc) on Aug. 19, 2004]
Mmm...I'm not sure either.
The pages that 64.68.81.* crawled were very deep (3 levels from the homepage) and there is a reasonable chance they were not crawled before.
There has been some discussion about Google potentially running out of DocID's, I'm no DB wizard like some on here, but it seems to me a very efficient way to store or allocate them with the page title and little else.
I also noticed from the recent upheaval and discussions that some felt that large directory like sites had been hit by the august update.
It would be great to hear from others who have been visited by 64.68.81.* and the result of the crawl.
I'm still not sure if I want to see 64.68.81.* again. One more point 64.68.82.* never went any where near the pages that 64.68.81.* had crawled seven days earlier.
I think various aspects of the bot are broken
I agree various aspects of the bot may be broken. In mid may, google started displaying ALL pages of my two websites with old titles (from 6 months ago) despite showing current cache. In other words, googlebot seems to have difficulty detecting the current title tag. Furthermmore, on my larger site (around 100 pages), the bot is not performing a complete crawl. All of my pages are displayed in google with url only, and the index page is no longer showing in the serps (after 4 years). I hope the old bot is coming back around so that some of these problems may be resolved (assuming they are being caused by a broken crawler).
It could be pages that have had a 301 in the past are being checked again to see if they will get the same response. I'll check thru the logs tommorow and see if that's the case.
1) Cleaner 64.68.81.****
18.104.22.168, 22.214.171.124, 126.96.36.199, 188.8.131.52 and so on 64.68.81.xxx going after my 301 (also a 404) and stressing robots.txt a lot.
This seems naturally to me some kind of cleaner bot from google checking for this kind of stuff.
2) Mediapartners-Google/2.1 184.108.40.206 seems to be some Adsense bot here :-)
Servers for different tasks mixed up here in same ip range ('ve seen this mix in other very big farms sometimes for randomly reasons from rack to rack too).