Next Google Deep Crawl?

Forum Moderators: open

Message Too Old, No Replies

Next Google Deep Crawl?

What Can We Expect.

JoeHouse

4:14 pm on Jun 7, 2003 (gmt 0)

Can anyone give me an educated guess as to when we will see the next Google Deep Crawl for June and what can we expect to see?

Stefan

4:18 pm on Jun 7, 2003 (gmt 0)

No one knows, JoeHouse. Let's hope that we see a deepcrawl in June; there's no guarantee that we will.

johnser

4:34 pm on Jun 7, 2003 (gmt 0)

I think fairly experienced guesstimates elsewhere on WW have put it towards end of June

CygnusX1

4:35 pm on Jun 7, 2003 (gmt 0)

When it comes to deep craws from google. I just make sure that the new pages I'm wanting to add have a link on the home page. You only have to do this untill the page has been added to google. You don't have to wait until google deep craws your website. You can use this time as a marketing ploy for the new pages, and a link on the home page helps. Keep in mind this is just to get the pages listed on google. They don't have to stay there forever.

Macguru

4:45 pm on Jun 7, 2003 (gmt 0)

Recent related thread : [webmasterworld.com...]

Slade

5:02 pm on Jun 7, 2003 (gmt 0)

Keep in mind this is just to get the pages listed on google. They don't have to stay there forever.

Just make sure they're still on your sitemap or other directory pages.

bobmark

6:11 pm on Jun 7, 2003 (gmt 0)

Does anyone have a feel for how freshbot handles dynamic pages in comparison to deepcrawl bot?
I have a theory that Google is reworking the deepcrawl algorithm to try and better handle dynamic pages. Remember until quite recently, these pages hardly appeared in Google at all.
Think of the problem. Regularly on here you see people saying things like "the Google index has 90,000 of my 110,000 pages." The problem is what you are talking about is content that might represent a few thousand static pages that, because of dynamic creation, ends up with ridiculous numbers and Google's index gets clogged with essentially duplicate content that - whether deliberate or not - is awfully close to spam.
If Freshbot started with a relatively clean page, it may handle some of these things better, so I am curious about people's experience with how freshbot handles their dynamic content.
Could it be Google declared a moratorium on deepbot crawls until they have an algorithm they like better?

skipfactor

6:47 pm on Jun 7, 2003 (gmt 0)

The problem is what you are talking about is content that might represent a few thousand static pages that, because of dynamic creation, ends up with ridiculous numbers and Google's index gets clogged

A recent observation I made that may support your theory:

Session IDs Freshbot & Deepbot tend to favor [webmasterworld.com]

nakulgoyal

7:04 pm on Jun 7, 2003 (gmt 0)

Somewhere on Google.com I read, oogle updates it's database (what we say Google dance) once every month.

Now somewhere on Webmasterworld.com, I read this as 1.5 month.

What I feel is that it's almost unknown. Can GoogleGuy commend on this? It will really help the SEO Consultants if we know what is happending.

We just keep giiving vague ideas to our clients. Anybody there who thinks the same?

bobmark

10:39 pm on Jun 7, 2003 (gmt 0)

Thanks, skipfactor,
as you said, maybe co-incidence but maybe not.
I thought of it because, even in the recent absence of deep crawls, freshbot crawls my new static pages but ignores the dynamic pages I have.

wasmith

10:46 pm on Jun 7, 2003 (gmt 0)

Good question, I am wonder also which is how this thread got me. PR reported by the toolbar seem to be changing on a day to day basis, newly listed sites seem to be via freshbot, toolbar PR seems to like urls with newly discovered links.

I seem to think that there is something new coming, but i think that every month?

johnser

12:56 am on Jun 8, 2003 (gmt 0)

We've just created a site with links to 11,000 pages. 10,500 of these pages are dynamic: (X.asp?id=1>10,500)

The idea behind the site was that with 11,000 pages of fairly unique content, we'd blow the competition away.

Are you saying that this strategy might be dead? So far we've had about 20 static pages showing temporarily in Google and the site went live 2 weeks ago.

bobmark

6:45 pm on Jun 8, 2003 (gmt 0)

As I am sure you know, awfully early to tell, johnser.
I am really basing my guess on the sheer magnitude of the index Google would have to maintain (not to mention the crawl time) if they are to index all the created pages a dynamix site can generate.
I think it is why they didn't include them for a long time after dynamic sites became common and I am guessing that once they started they got into unanticipated problems.
Time will tell what solution they come up with (or if I am totally wrong and the current stop in deep crawl activity has nothing to do with reworking the algo to handle dynamic pages differently). Only God and GoogleGuy know.

johnser

7:03 pm on Jun 8, 2003 (gmt 0)

You might be right bobmark - I'm seeing incredibly stale pages at the moment on a whole range of sites. Content months old. Its really bad.

Even if G just picks up my 500 static pages I'll be very happy. Someone said elsewhere that it 3-4 months to have all their dynamic pages spidered so I've passed that nugget on to the client.

Its hard to give good advice right now 'cos as you say - Only God & GG (& maybe not even GG!) know :)

trillianjedi

7:36 pm on Jun 8, 2003 (gmt 0)

Johnser - you should be OK as long as you keep any variables down to less than two.

The deepbot doesn't like the "?" though.

I would suggest you look into using mod_rewrite to create static looking URL's.

johnser

10:11 pm on Jun 8, 2003 (gmt 0)

Thanks TJ - I understood that once every dynamic page (in the format of product.asp?id=1) is linked to from a static page then it'll be spidered?

(This way the spider avoids looping on the page)

Granted that advice was only for SEs in general as opposed to just Deepbot. Is this advice also good for Deep?

I'm using a cheap windows host for this so no mod_rewrite. Theres the q.asp component but is there an easy way of achieving the same thing if you don't have IIS/server access? EG - asp code I can run on my dynamic template page of .com/product.asp?id=123 to turn it into ../product/123.html

TIA
J

trillianjedi

10:38 pm on Jun 8, 2003 (gmt 0)

Out of my league now Johnser - I'm a designer not a coder - have to leave that one to someone else I'm afraid!

johnser

10:46 pm on Jun 8, 2003 (gmt 0)

oh well! Guess I'll just have to stick to marketing & forget the techie stuff for the moment :)
J

DuhKid

6:46 pm on Jun 6, 2003 (gmt 0)

The good news is the deepcrawler visited me last night. Hadn't seen her for a long time, so I greeted her with open arms.

216.239.45.4 - - [05/Jun/2003:23:19:54 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

216.239.45.4 - - [05/Jun/2003:23:19:56 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

216.239.45.4 - - [05/Jun/2003:23:20:00 -0400] "GET XXXX
htm HTTP/1.0" 200 30034 "-" "Googlebot/2.1 (+http://www.googlebot.com/bot.html)"

And the bad news. She didn't stay very long.

Napoleon

6:51 pm on Jun 6, 2003 (gmt 0)

Anyone else?

entropy

7:04 pm on Jun 6, 2003 (gmt 0)

I've only got freshbot here. By the look of the content length, it's grabbing the same page. They could be stopping and re-starting.

May I ask, what PR is the page that got hit?

EDIT:
That's not a normal deep crawler IP. It's in the .45 range.

DuhKid

7:39 pm on Jun 6, 2003 (gmt 0)

"That's not a normal deep crawler IP. It's in the .45 range."

It's a 2nd level page. A PR0 casualty, probably a real PR4.

You're right that it was the same page (URL replaced with XXXX as a forum courtesy)

I looked for the deepcrawler ip range with site search, but couldn't find it.

Been so long since I've seen anything starting with 216.239.
...did I jump the gun with undue excitement? :)

If it's not a deepie, then what is it?

farside847

8:10 pm on Jun 6, 2003 (gmt 0)

that IP was not used last month for deepcrawl... looks like it might be tied to the froogle site. Here is a hit from that IP from last month:

216.239.45.4 - - [03/Apr/2003:11:18:13 -0800] "GET /XXXXXX HTTP/1.1" 200 22811 "http://www.corp.google.com/cgi-bin/asolovyova/feeds.cgi?id=FROOGLEID&user=dscotton&ticket=1943517&prompt=true&submit=Submit" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.2.1) Gecko/20021130"

Critter

8:23 pm on Jun 6, 2003 (gmt 0)

I've seen 216.239.45.4 in my logs, however it's been a regular Mozilla user agent, not Googlebot.

Perhaps they shuffled IPs over at Google. If this IP is being shown with a user agent of Googlebot I would think it would be deepcrawl related.

Peter

entropy

8:35 pm on Jun 6, 2003 (gmt 0)

My thoughts are that it's a Google employee testing for cloaking by using the googlebot UA.

skipfactor

9:20 pm on Jun 6, 2003 (gmt 0)

[webmasterworld.com...]

Kackle

12:11 am on Jun 9, 2003 (gmt 0)

The IP number 216.239.45.4 reverse-resolves to this: 216-239-45-4.google.com

It is not the googlebot, and I don't care what the user-agent says. For over a year this IP has been visiting my site about once a week, picking up tidbits that a bot never does: GIFs, a Java applet, etc. This is a live human. They even key characters into forms to do searches. If Google's robots were this smart, their index would be up to date by now.

If you're cloaking for Google based on the user-agent, you'd be better off reverse-resolving and cloaking for googlebot.com, but not for google.com.

Another one that behaves the same way is 216.239.44.189

All of the deep bot URLs I've ever seen have been 216.239.46.X, and I haven't seen it for weeks now.

Dolemite

1:39 am on Jun 9, 2003 (gmt 0)

If Google's robots were this smart, their index would be up to date by now.

Hehe...

captainhannes

10:15 am on Jun 9, 2003 (gmt 0)

No deepcrawler here in June so far, but she visited quite often last month:
Days in May: 2,3,7,8,9,10,11,12,13!,14!,15!,16!,20!,23,24,25,28
(! indicates a day with heavy activity)

And finally freshbot started to crawl pages I have up for about a year now :-)))
Backlinks are still the same as 3/4 year ago.
(It is a very, very specific site)

*bigfriendlyletters* DON'T PANIC */bigfriendlyletters* ;-)

73,
Captain / Austria

olwen

11:01 am on Jun 9, 2003 (gmt 0)

I've got Googlebot/2.1 (+http://www.googlebot.com/bot.html) from
64.68.82.54 and 64.68.82.65 on my latest visitors
I don't know if that the freshbot, or deepbot.

But it has found a new level of pages I completed yesterday, so I'm happy that it's around

This 95 message thread spans 4 pages: 95