64.x.x.x bot finds new pages?

Forum Moderators: open

Message Too Old, No Replies

64.x.x.x bot finds new pages?

I thought only 216.x.x.x spidered new pages

WileE

5:40 am on Feb 20, 2003 (gmt 0)

Yesterday and today I found that

crawler11.googlebot.com
and
crawler12.googlebot.com (64.68.82.x)

were spidering pages that no googlebot has ever touched before. I understand 216.x.x.x to be the DeepCrawlBot (for spidering whole sites, including new pages) while 64.x.x.x is the FreshBot (for spidering only pages previously spidered by DeepCrawlBot).

What gives?

NickCoons

5:59 am on Feb 20, 2003 (gmt 0)

WileE,

I've noticed Freshbot doing some things that it doesn't normally do, like making 700 page requests in one day on my 400 page site. It seems that they've changed something.

eraldemukian

6:40 am on Feb 20, 2003 (gmt 0)

Hello,

not sure if related or just my limited understanding:

fresh-bot visited plenty of pages yesterday that are not in the index yet, but that deep-bot saw during the walk that will end up in the february update.

I thought that fresh-bot would get its 'todo list' from the current index that was based on the 'deep-bot results' from _last_ month.

Apparently not.

The weird and fuzzy science of gbot-guessing ...

[google often makes me feel like a cave men observing a thunder storm]

steveb

8:59 am on Feb 20, 2003 (gmt 0)

Freshbot has always crawled new pages.

troels nybo nielsen

9:00 am on Feb 20, 2003 (gmt 0)

Welcome to WebmasterWorld, WileE

It's _very_ common that new pages are first found by freshbot. For further information about bots read this:

[webmasterworld.com...]

Grumpus

12:38 pm on Feb 20, 2003 (gmt 0)

Freshbot's JOB is to find the "new stuff". There are three types of pages freshbot crawls.

1) Your Seed Page: Usually, this is your home page, but not always. When the freshbot FIRST finds your site, it's usually from a link from a fairly authoritive site. If that site links to say, your "news" page, then THAT will be freshbot's seed page. This can change over time and will usually settle at your home page.

2) Map Pages: There are certain pages on your site that freshbot will deem as your "map" page(s). This can be a site map, a "what's new" page, or even your homepage. It will often take freshbot a month or two to settle in on what it wants to use as your map page(s). I don't know what the limit of number of "map" pages it will use is. On my site, it's 6 + my homepage which is also my seed.

3) The actually fresh and/or new pages. These, obviously are the pages that it find that are changed since it's last visit or are completely new. Completely new pages are golden. Updated pages will get picked up next if there's still room under that mysterious "page cap" - i.e. the max number of pages freshbot is going to crawl on your site.

The mystery I have here is that once the freshbot has a handle on your site, it is a very cunning and wise little bugger. For example, I have about 5% of my pages in Google (there are millions of pages). I can take a page that Google has never ever seen before and update it which then brings it to the top of my "What's New and Updated" list. Then I can add a brand new page. That brand new page will make it in there every time (unless it's during the occasional "Where's that freshbot?" period). That updated page may or may not make it in there, despite the fact that, as far as freshbot knows, it's a brand spanking new page - it's never been indexed, so it's definitely new to it.

How does it know? Who knows.

If freshbot is new to your site (less than a couple of months) then it will act a little weird from time to time as it "learns" about your site. It'll venture into weird corners from time to time looking for a new or better map page. It'll hammer a page 3-4 times in a couple of minutes (I presume, to check out if there are elements that change with every hit like a random quote or even banners?). If it hits a page and then finds a bunch of new pages linked off that page, then, obviously, it's going to like that first page a lot as a potential map page, so it'll keep checking that page.

Finally, it'll have everything figured out. Freshbot's been on my site for six months or so now and it's got it down to a science. I'd imagine if you are changing your linking structure within your site, or if you have ambiguous linking structure (i.e. Things just kinda link from wherever it's convenient) the odd behaviors will continue longer or even forever. If your site is layed out in such a way that it can easily identify the #1 and #2 types of pages above, then the #3 types will all get in there.

Hope this helps!