Is Freshbot now Deepbot?

Forum Moderators: open

Message Too Old, No Replies

Is Freshbot now Deepbot?

The line is getting drawn ever thinner

trillianjedi

4:18 pm on May 22, 2003 (gmt 0)

I've seen several postings about this now in the last few days, although this is my first actual experience of it.

I'm being hit very hard by google's freshbot at the moment, and going deep too. At first glance at what is currently going on with the little guys, I had to check and double check that the IP's were 64.... (they are).

It's behaviour, in terms of hard hitting and depth of crawl (it's going through the entire site) is more like the character of the old deepbot.

In fact, it's identical behaviour to deepbot the last time it crawled this site back in April.

I'm interested in hearing from others who are seeing the same.

trillianjedi

10:22 pm on May 22, 2003 (gmt 0)

Critter,

Sorry, I'm not quite with you - can you explain a bit more?

I think you're saying that freshie is crawling pages on a site that are not linked from other pages on that site?

I don't quite follow how deepbot would have found them in April?

<EDIT: re-read your post and understand now. It hasn't followed either old-index links or other fresh links to get there, the only place could be from April deep crawl links. OK, that's interesting.....>

Critter

10:26 pm on May 22, 2003 (gmt 0)

Hi TJ:

No, I mean that the pages fresh is getting right now can be found by crawling my site--but fresh has only got 100 or so pages, starting today. I've never seen the freshbot before.

When fresh first started crawling today she was getting pages that are 'deep' in the site, she didn't start with my home page (well she did, but the subsequent pages can't be gotten to from the home page).

So fresh must be crawling pages that deep got in April.

Peter

trillianjedi

10:32 pm on May 22, 2003 (gmt 0)

Critter,

What you said made me revisit the logs with a slightly different view and you're absolutely right.

There are some pages for which my site is in the same boat as yours - freshie simply could not have got there by following existing fresh links.

Well spotted.

So, this raises another "what is going on here then" type question.

Is it possible that google is using the freshbot "method" as a means of getting the April deepcrawl data into the "new" index I wonder?

Critter

10:36 pm on May 22, 2003 (gmt 0)

Let's just hope those results get in the index in 24 hours or so...like fresh usually does.

Peter

WebGuerrilla

10:41 pm on May 22, 2003 (gmt 0)

Yes, it does appear that FB is recrawling April DP.

I'm looking at a log that contains April data. And FB is requesting the pages in identical order sa last month.

But what isn't clear is whether or not it really is FB. It could simply be DP running from IP's that have been FB in the past.

I think things will be much clearer tomorrow. If this crawl is really part of the fresh system, we should all end up pretty happy by this time tomorrow.

trillianjedi

10:49 pm on May 22, 2003 (gmt 0)

It could simply be DP running from IP's that have been FB in the past.

Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot? Would explain the timeframe I guess, but seems like a lot of work and I don't really see why they'd do it that way around.

As you say, we'll perhaps know more tomorrow.

My instinct is google is now using freshie as a means of introducing the April deepcrawl pages into the index rather than having to update all of the indexes to a single new build.

But heck, it's all speculation until there's a fat lady somewhere singing her heart out!

parabola

11:12 pm on May 22, 2003 (gmt 0)

All my pages are already picked up by the freshbot, so my concern is really getting April backlink data factored into the ranking of the index. Anyone think freshie would do that?

BigDave

11:18 pm on May 22, 2003 (gmt 0)

Wouldn't that mean a lot of PC's having software switched from freshbot to deepbot?

These machines would most likely do network boots. change an entry in your DHCP server, send a remote reboot and the bot machine comes up as a totally different beast.

I know for a fact that google uses PXE to network boot many of their systems. I cannot say for certain that they do it with their Googlebot machines, but if they don't, they should.

r3ved

11:31 pm on May 22, 2003 (gmt 0)

The pages that were hit today were new as of 2 weeks ago. It is the first time the bot has ever seen them.

Felina

11:32 pm on May 22, 2003 (gmt 0)

This really makes me wonder if we will ever see Deepbot again.
I'm getting pages pages crawled that were new 3 days ago, and they are linked from internal pages not from main page.

trillianjedi

3:39 pm on May 23, 2003 (gmt 0)

Is anyone seeing their freshbotted pages from the April deepcrawl appearing in the index yet?

Very curious to know whether they go in the index "proper" or as regular freshbot pages.

Pricey

3:45 pm on May 23, 2003 (gmt 0)

I'v still not had any new (week old) pages crawled at all :/ My site is well ranked (pr 4 for my keywords), but Googlebot rarley appears in my logs. Tbh, I don't even think any of my sites have seen a deepcrawl yet.

BigDave

4:02 pm on May 23, 2003 (gmt 0)

TJ,

As I mentioned in another thread, I have seen several of my April deep crawl pages in the index. I had an error in my navigation bar that was only htere for a very short time during the deep crawl. I have found four of those pages "cached" in google. One of these was in before freshbot hit it yesterday.

It has not yet changed over to the freshbotted version from yesterday.

About 1/3 of my pages that I have looked at are versions that were freshbotted since the last deep crawl, and have been move in permanently. I made a change in late April that I have looked for by viewing source of the cache.

Napoleon

4:14 pm on May 23, 2003 (gmt 0)

>> All my pages are already picked up by the freshbot, so my concern is really getting April backlink data factored into the ranking of the index. Anyone think freshie would do that? <<

That's the moot point... the missing data is probably the cause of the diminished quality of the Google dbase.

Just slapping the missing pages back in the index anywhere is not the issue. It's getting the quality pages ranked properly again, with a 'proper' PR and relevancy calculation.

Of course, as speculated, FB could now have the facility to acheive all that. In other words, re-ranking on the fly, without an entire (monthly) re-calcuation for the whole database.

Sadly, that's all it is... speculation. However, something has to give soon with this index in terms of the backlink data, to align with GG's comments (weeks not months: and we are weeks into this thing already)

parabola

4:24 pm on May 23, 2003 (gmt 0)

Napolean, that is what I was saying- thanks for making the sad truth a bit clearer. I have my doubts as to whether thye are going to re-score the index based on new backlinks anytime soon. It's easy to get new pages in...

GoogleGuy

4:32 pm on May 23, 2003 (gmt 0)

Some fresh data might be incorporated next-day, but I wouldn't expect everything freshbot found yesterday to make it in a day. There's a lot of data that needs to be fetched and cross-checked--I would expect the full data to show up more on the timeframe of what you would expect from a crawl/index. Step 1 is done (index is at all data centers). I know that one subtle spam filter is going in soon, but Napoleon, I would start counting on the weeks-but-not-months comment beginning from the time that the index switchover finished. Again, just trying to give webmasters information so they have the right expectations: more than weeks, less than months.

But I'm glad that people have noticed that freshie has been crawling deeply in addition to normal freshie duties.

ogletree

4:51 pm on May 23, 2003 (gmt 0)

I don't think there is any consistancy. I have seen sites with new stuff and some with old stuff. Right now there is just no way to know what is going on. I have 51 links today and I had 40 yesterday. Of course they are all from the March Crawl as I stated earlier. I have a friend that had 1 link a month ago and now has 21. We both made the same change to our CMS system so that Google would crawl it at the same time. He has new links I have old ones. I guess we all just have to wait.

parabola

4:53 pm on May 23, 2003 (gmt 0)

So, as I anticipated, we will have to wait for a "normal" period of time to see new backlinks (ie month from now)

Napoleon

4:53 pm on May 23, 2003 (gmt 0)

Thanks GG. The major point I think everyone is choking on though is when will the 'new' data (including April deepcrawl) embed itself into the proper CALCULATIONS for ranking.

I'm not talking the fresh stuff here, which is (or at least was) relatively superficial and short term in the ranking system (come and go). I'm talking stable ranking.

It's this data that has surely caused most of those more recent, but high quality, sites to go/fall. I assume that this will NOT be tomorrow, but "more than weeks, less than months".

That timescale actually sounds like the next mid-June cycle to me (assuming there is one anymore), with the new algo implementation replacing the mid-May cycle.

GoogleGuy

4:57 pm on May 23, 2003 (gmt 0)

I think we're on the same wavelength, Napoleon.

P.S. About the questions you raised about country language/tld/redirection, I think we've got some changes scheduled soon (next week) that should bring it in line with what many users expect again. Just wanted to let you know that that was coming too.

eflouret

5:01 pm on May 23, 2003 (gmt 0)

Sorry, but I don't get it.

GoogleGuy says that the index is in all servers.

What does it means?

I've been continuosly checking google by searching for 'homeandoutdoors' which is the name of one of my sites, and since it only returns results from that site, it is easier for me to check the number of pages indexed.

When I perform that search in different datacenters I get different results:
www -> 2440 pages
www2 -> 272 pages
www3 -> 272 pages
www-cw -> 2360 pages
www-fi -> 2200 pages
www-sj -> 272 pages
www-ex -> 316 pages

So it seems that "the index is the same in all servers" is not what I understand. There is something that I don't know about. Perhaps these are not the servers googleguy says, or perhaps the meaning of 'index' is different from what I supposed.

Thanks,

Enrique

Napoleon

5:03 pm on May 23, 2003 (gmt 0)

Top man.... appreciated.

I think someone mentioned the other day that predictability is always helpful for business. This at least gives us some general direction.

crobb305

5:07 pm on May 23, 2003 (gmt 0)

GG,

You said before that backlinks would be factored in "gradually" and "overtime". That implied continual updating of backlinks over a long period of time. Now it sounds like you are saying that the new data (backlinks) will be factored in all at once, in a few weeks (more than weeks, less than months). Thus, we are stuck with the current backlink count (and missing anchor text) until around mid-June. Is this a correct interpretation?

parabola

5:13 pm on May 23, 2003 (gmt 0)

Not quite sure where everyone got the idea that we were going to have some sort of continual update process. Seems the one they have in place makes more sense, technology wise.

Either way, we can look for a recent index mid June.

kevinpate

5:20 pm on May 23, 2003 (gmt 0)

I've read it differently. It's hardly gradual (in my opinion) if weeks go by and then badabing, badaboom, a new switch is flipped and a whole lot happens in 24 hours.
Maybe I'm misreading GG, but my take is that acknowledgements of watch the calendar, not the clock is that a little happens every few days and it will take a month or more for everything that planned to be fully online, including old and new filters, backlinks since Feb/March era, expanded freshy, etc.

Again, just my opinion. right or wrong, a pizza place isn't gonna swap me a 12 inch pepperoni for it.

crobb305

5:34 pm on May 23, 2003 (gmt 0)

Not quite sure where everyone got the idea that we were going to have some sort of continual update process. Seems the one they have in place makes more sense, technology wise

No one knows what Google is doing. Everyone has had different interpretations and opinions. Hence, 9 update threads with thousands of posts. Regardless, Googleguy's use of the words "gradually" and "overtime" implied to a number of people a slow, continual update of backlinks. Not just one dump of new backlinks 6 weeks later (i.e., a regular monthly update). Besides, if Inktomi and Alltheweb are able to do continual updates, who is to say Google can't (technologically-speaking)?

GoogleGuy

5:51 pm on May 23, 2003 (gmt 0)

crobb305, some things will be filtering in sooner (I know a few more autospam filters will happen earlier), and I wouldn't be surprised to see fluctuations in backlinks and pages, but I would hold on to the idea of an update that brings in more data for a little while longer. In time, I do think things will be more gradual. However, we're still in the transition period for this system, so I wouldn't be surprised to see a traditional update for a little while longer. Hope that helps.

Critter

6:09 pm on May 23, 2003 (gmt 0)

Boy is that just my luck or what :) I had all my pages optimized and ready to go for the April crawl/May update, and then Google basically skips the May update.

Reminds me of the old school saw: "The day you decide to turn over a new leaf is the day the teacher decides to make an example out of you."

Peter

parabola

6:17 pm on May 23, 2003 (gmt 0)

Crobb,

You are right that Google is surely capable of doing many things.

IMO, if anything, they have shown how difficult it is to bring in backlinks rather than a shift towards more fluidity. If not, why else would we be seeing Feb data?

Thanks GoogleGuy for clearing everything up and letting us know to expect a regular update at some point with recent backlinks. I think this is the info we have been waiting for.

GoogleGuy

6:23 pm on May 23, 2003 (gmt 0)

Happy to help. I'll try to post if something changes so people know what to expect.

This 211 message thread spans 8 pages: 211