Forum Moderators: open

Message Too Old, No Replies

Behavorial and functional differences between crawlerxx and crawlxx

curious and trying to sort it out

         

Trisha

1:06 am on Jul 11, 2003 (gmt 0)

10+ Year Member



I don't usually follow googlebot crawling behavior very closely but I'm curious about what I've been seeing in my logs lately.

I uploaded two new sites at roughly the same time, early April. One has been indexed well, and getting recrawled with new pages getting indexed somewhat frequently. The other one Google is pretty much ignoring. One difference I have noticed is that the one that is well indexed is getting regular visits from both crawlerxx and crawlxx. Please correct me if I'm wrong, but I believe crawlerxx is freshdeepbot.

The one Google is pretty much ignoring is getting visits from only crawlxx (crawl31, specifically), and that may only be recently since I wasn't paying much attention until now. It's not going any deeper than the index page either. It's showing a TB PR of 0, with 2 backlinks both PR5, but doubt that it could have any sort of penalty. TB PR doesn't seem to be very accurate or make much sense now anyway so I'm not concerned about that. It gets absolutely no referrals from Google either. All internal pages are greyed out.

Does anyone know what crawlxx's function is? Is it maybe just finding new sites/pages just to then tell freshdeepbot to come and visit longer and deeper? Or am I reading too much into this? The ignored site doesn't have as many links yet, maybe that's the only reason crawlerxx hasn't visited and why it's not been crawled any deeper.

I hope this isn't sounding too much like a 'why isn't google indexing my site' post. I'm really more interested in the differences between the two googlebots.

rfgdxm1

12:52 am on Jul 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



With Google apparently moving into a new era of continuous updating, exactly what the function of these specific bots now are is something I wouldn't even hazard a guess. The reason you got no answer before may be nobody is sure they know.

projectphp

6:19 am on Jul 18, 2003 (gmt 0)

10+ Year Member



With Google apparently moving into a new era of continuous updating, exactly what the function of these specific bots now are is something I wouldn't even hazard a guess. The reason you got no answer before may be nobody is sure they know.

I don't want to be rude, but isn't that just YOUR idea? The apparently essentially comkes from a post that YOU started. There is no apparently about it. We just don't know. So, in answer to the original question, "I don't know", not because of any major chanmges at Google, just because I don't have an answer :)

dgdclynx

10:27 am on Jul 18, 2003 (gmt 0)

10+ Year Member



crawlXX seems to be Deepbot and crawlerXX was Freshbot
which is supposed to be turning into Freshdeepbot, so this
might be Deepbot's last appearance. Deepbot was around the end of March for the previous deep crawl which resulted in
the recent dance and now he has returned again in the last week or two so we will be due another dance. But GoogleGuy implied that this would be the last dance ever as continual updating was being brought in with Freshdeepbot so Deepbot's days may be numbered. Still I am pleased to see him back cos I have about 100 files at the fourth level to be indexed and that is far beyong Freshy's reach.
Twenty were indexed this week.

HitProf

11:20 am on Jul 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



crawlerxx (IP range 64.68.82.xx) still seems to behave like Freshbot, including frestags in the serps.

Pages crawled with crawlxx (64.68.82.xx) show up later without a freshtag.

My guess is the update as we know it has been devided into more independent processes:
- freshbot as usual
- deepbot pages added/updated more often (weekly?)
- backlink/PR recalculation as usual (every so many weeks)
- algo tweaks and spam filter runs when they feel like it

All these processes seem to occur independently nowadays.

We could still call the recalculation of backlinks and algo changes 'updates', I don't think just adding and refreshing pages deserve the name anymore :)

rfgdxm1

12:58 pm on Jul 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>I don't think that fact is in question and has been confirmed by G ...?

Note I have in there "apparently." Also, I have seen some major moves since Esmeralda of the sort that never happened between updates. This wasn't the usual everflux.

borisbaloney

2:46 pm on Jul 18, 2003 (gmt 0)

10+ Year Member



I'm agree with HitProf.

Since the last update was about the 15th I think, I am expecting a PR reclalculation and backlink update and day now.

To backup my authority on the matter - I am claiming the first post regarding continuous updates way back on April 28: [webmasterworld.com...]

Sometimes even newbies can get it right.

/stupid gloating

dgdclynx

3:21 pm on Jul 18, 2003 (gmt 0)

10+ Year Member



I have had daily Freshbot indexes of my unchanged Home Page (cos of the rubberstamped date) but the twenty additions to the index seem to have come from Deepbot (no datestamp). So I am still waiting to be convinced that Freshbot has become FreshDeepbot.

Regarding updates I did notice a little dance yesterday of one of my files which moved position and back again. I have seen no changes in PR but monthly updates seem a sensible conjecture.

Trisha

4:12 pm on Jul 18, 2003 (gmt 0)

10+ Year Member



Thanks for the replies!

Is there anyone else that has only had either crawlerxx or crawlxx at their site? That may help to narrow down what each of them do. Crawlxx doesn't seem to be living up to its name (or former name) of deepbot, as it's not going deep into mine at all.

As HitProf says, crawlerxx still seems to behave like Freshbot though, but going deeper and living up to its new name of Freshdeepbot.

I guess I'll just have to be patient and see what happens.

chris_f

1:37 pm on Jul 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It could just be that a temp forgot the "er" in the user-agent and DNS record ;)

Seriously though! I havn't noticed a difference in their behaviour. What I have noticed is that "crawl" is rarely visiting me. It seems "crawler" is replacing "crawl".

ATOB
c

Trisha

7:48 pm on Jul 21, 2003 (gmt 0)

10+ Year Member



What I have noticed is that "crawl" is rarely visiting me. It seems "crawler" is replacing "crawl".

I wish I could get crawlerxx to come by the one site of mine, still just getting crawlxx, and it's not going any deeper than the index page.

I'm convinced that its not been penalized now though, I did a search using a unique phrase from the site, and it did show up in Google. I'm just not getting any regular referrals from Google because the index page is just too general, the internal pages would need to be indexed for people to find them in searches.

Key_Master

8:53 pm on Jul 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sometimes even newbies can get it right.

I disagree with the assumption that crawlerxx is freshbot and crawlxx is deepbot. I go by the http_accept header Googlebot sends. text/html,text/plain for freshbot- text/html,text/plain,application/* for deepbot. It doesn't always add up to your theory.

Trisha

10:31 pm on Jul 21, 2003 (gmt 0)

10+ Year Member



I use Faststats and they list the Googlebots as either crawlerxx or crawlxx, so that is what I go by.

Key_Master - you said you don't think that crawlerxx is freshbot and crawlxx is deepbot. Do the http_accept headers you mentioned not always match up consistently with either crawlerxx or crawlxx?

Key_Master

11:33 pm on Jul 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Even though crawlerxx is primarily used to fresh listings it is often used to deep crawl pages as well. In other words, if crawlerxx.googlebot.com hits a page that doesn't necessarily mean it's going to be freshed, even if that page contains fresh content. It's may be just as likely that it's being indexed for the Google database. I wish it were otherwise, I'd have thousands of additional Googlefreshed pages for my sites.

>>Do the http_accept headers you mentioned not always match up consistently with either crawlerxx or crawlxx?

Nothing is absolute with Googlebot. crawlxx primarily uses the "text/html,text/plain,application/*" accept header (it very rarely uses the "text/html,text/plain" accept header) but crawlerxx is more mixed and may use either header.

RoadRash

6:16 am on Jul 22, 2003 (gmt 0)

10+ Year Member



I have a "high" PR6 site. CrawlerXX will fallow links from the index page only. Crawl will continue down the site (aka deepbot) and pick up hundreds of deep content pages.

CrawlerXX has picked up 47 pages, CrawlXX has picked up 18,650 as of last night.

Hope that helps somebody who is keeping track!

Giacomo

7:44 pm on Aug 5, 2003 (gmt 0)

10+ Year Member Top Contributors Of The Month



On July 18 one of our sites was deep-crawled by bots in the 64.68.82.* IP range (crawler*.googlebot.com).

They spidered every single page on the site, which had around 60 pages, some of which were 3 levels down in the directory structure. We do have a comprehensive site map linked from the home page, though. Oh, and all of our site is spider-friendly (no querystrings or session ID's, just plain old .html files, even though the site is all PHP+MySQL).

I guess this stands against the speculation that crawler* might be freshbot and crawl* deepbot or deepfreshbot, or whatever. I think Google's new generation crawlers are completely interchangeable: freshbot can become deepbot and vice versa, as needed.

In the following weeks I have been only seeing bots in the same IP range (crawler*.googlebot.com), requesting just the URLs linked to from the home page. The pages showed up in the index with a date tag by the next day or two.

Today I got my first hit from a 64.68.85.* bot (crawl*.googlebot.com), which requested

/robots.txt
and the home page at around 19:20 UTC.

The beginning of a deep crawl? I don't think so, even though I really hope our site gets deep-crawled again before the next update (we added about 450 new pages lately). ;-)

Trisha

3:10 pm on Aug 6, 2003 (gmt 0)

10+ Year Member



I'm starting to believe that crawlxx and crawlerxx are basically interchangeable too. The one site of mine has now had one crawlerxx and 2 of the crawlxx, but the index page is still all that has been added to the index, and still no Google referrals. It seems that a page now needs a higherPR/more links for a whole site to be even be spidered.