Welcome to WebmasterWorld Guest from

Forum Moderators: open

Message Too Old, No Replies

64.x.x.x bot finds new pages?

I thought only 216.x.x.x spidered new pages

5:40 am on Feb 20, 2003 (gmt 0)

New User

10+ Year Member

joined:Feb 6, 2003
votes: 0

Yesterday and today I found that

crawler12.googlebot.com (64.68.82.x)

were spidering pages that no googlebot has ever touched before. I understand 216.x.x.x to be the DeepCrawlBot (for spidering whole sites, including new pages) while 64.x.x.x is the FreshBot (for spidering only pages previously spidered by DeepCrawlBot).

What gives?

5:59 am on Feb 20, 2003 (gmt 0)

Preferred Member

10+ Year Member

joined:Jan 7, 2003
votes: 0


I've noticed Freshbot doing some things that it doesn't normally do, like making 700 page requests in one day on my 400 page site. It seems that they've changed something.

6:40 am on Feb 20, 2003 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 1, 2003
votes: 0


not sure if related or just my limited understanding:

fresh-bot visited plenty of pages yesterday that are not in the index yet, but that deep-bot saw during the walk that will end up in the february update.

I thought that fresh-bot would get its 'todo list' from the current index that was based on the 'deep-bot results' from _last_ month.

Apparently not.

The weird and fuzzy science of gbot-guessing ...

[google often makes me feel like a cave men observing a thunder storm]

8:59 am on Feb 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 20, 2002
votes: 0

Freshbot has always crawled new pages.
9:00 am on Feb 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 23, 2002
votes: 0

Welcome to WebmasterWorld, WileE

It's _very_ common that new pages are first found by freshbot. For further information about bots read this:


12:38 pm on Feb 20, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 13, 2002
votes: 0

Freshbot's JOB is to find the "new stuff". There are three types of pages freshbot crawls.

1) Your Seed Page: Usually, this is your home page, but not always. When the freshbot FIRST finds your site, it's usually from a link from a fairly authoritive site. If that site links to say, your "news" page, then THAT will be freshbot's seed page. This can change over time and will usually settle at your home page.

2) Map Pages: There are certain pages on your site that freshbot will deem as your "map" page(s). This can be a site map, a "what's new" page, or even your homepage. It will often take freshbot a month or two to settle in on what it wants to use as your map page(s). I don't know what the limit of number of "map" pages it will use is. On my site, it's 6 + my homepage which is also my seed.

3) The actually fresh and/or new pages. These, obviously are the pages that it find that are changed since it's last visit or are completely new. Completely new pages are golden. Updated pages will get picked up next if there's still room under that mysterious "page cap" - i.e. the max number of pages freshbot is going to crawl on your site.

The mystery I have here is that once the freshbot has a handle on your site, it is a very cunning and wise little bugger. For example, I have about 5% of my pages in Google (there are millions of pages). I can take a page that Google has never ever seen before and update it which then brings it to the top of my "What's New and Updated" list. Then I can add a brand new page. That brand new page will make it in there every time (unless it's during the occasional "Where's that freshbot?" period). That updated page may or may not make it in there, despite the fact that, as far as freshbot knows, it's a brand spanking new page - it's never been indexed, so it's definitely new to it.

How does it know? Who knows.

If freshbot is new to your site (less than a couple of months) then it will act a little weird from time to time as it "learns" about your site. It'll venture into weird corners from time to time looking for a new or better map page. It'll hammer a page 3-4 times in a couple of minutes (I presume, to check out if there are elements that change with every hit like a random quote or even banners?). If it hits a page and then finds a bunch of new pages linked off that page, then, obviously, it's going to like that first page a lot as a potential map page, so it'll keep checking that page.

Finally, it'll have everything figured out. Freshbot's been on my site for six months or so now and it's got it down to a science. I'd imagine if you are changing your linking structure within your site, or if you have ambiguous linking structure (i.e. Things just kinda link from wherever it's convenient) the odd behaviors will continue longer or even forever. If your site is layed out in such a way that it can easily identify the #1 and #2 types of pages above, then the #3 types will all get in there.

Hope this helps!