homepage Welcome to WebmasterWorld Guest from 54.205.247.203
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google's spidering pattern lately?
stcrim




msg:174292
 3:10 am on Aug 15, 2003 (gmt 0)

I have noticed that google will quickly grab the home page of a new site and index it - sometimes in just a few days. And, then not visit for an extended period of time.

Does anyone have a handle on what that extended period of time is?

It is nice to see new home pages going up quickly but our rarely contain our real meat and potatoes content...

-s-

 

ciml




msg:174293
 11:31 am on Aug 16, 2003 (gmt 0)

I think you'll see quick, deep spidering only if you have high PR links to you. How much PR you need, I don't know.

kelvinhui




msg:174294
 11:39 am on Aug 16, 2003 (gmt 0)

Yes. When your site is linked by a high PR site, then Googlebot may find you and process a quick crawl.

whizkiddo




msg:174295
 2:42 pm on Aug 18, 2003 (gmt 0)

Yeah the same thing happened to me; the main page got indexed pretty quick. But no deep spidering. I wondered if that was bcuz of dynamic pages. But i have found higher PR of the main page (already indexed) will go a long way in full indexing of the site

BlueSky




msg:174296
 3:15 pm on Aug 18, 2003 (gmt 0)

I have a new site with a PR0 put up last month. I used mod_rewrite to make the urls look static. Googlebot visited and grabbed the robots.txt. Two days later, he came back with his relatives and deep crawled the whole site. He comes by almost every day and deep crawls about every 7-10 days. On the other days, he grabs any new or changed pages I put up.

If you go out and get some links, I think he'll do the same to you. When I put up new pages, I make the info available to visitors on some other same topic sites thus giving me direct links back. Google follows them back and grabs my new stuff within 24 hours. I've pretty much got enough links now where he's looking on his own for new pages.

stinky




msg:174297
 7:15 pm on Aug 18, 2003 (gmt 0)

"mod_rewrite to make the urls look static"
What is mod_rewrite and what is do you mean by static(working link)?
Also what is "robots.txt" , i have seen it in the html of other web sites but did not know if i should put it in my site, Should I? Below is an example of what i have seen in a <head> tag,

<META NAME="Robots" CONTENT="index,follow">
<META NAME="GOOGLEBOT" CONTENT="INDEX, FOLLOW">
<META NAME="Robots" CONTENT="index,follow">

Any input on if its good to use, please let me know
Thanks

scareduck




msg:174298
 7:25 pm on Aug 18, 2003 (gmt 0)


"mod_rewrite to make the urls look static"
What is mod_rewrite and what is do you mean by static(working link)?

mod_rewrite is an Apache webserver module that allows a URL to be rewritten to a different form. One use of this module is to convert forms like

[foo.bar...]

to

[foo.bar...]

Googlebot refuses to crawl URLs containing CGI forms because of looping issues. However, it will crawl URLs that appear to be subdirectories. It's unfair, it's wrong, but it just is.

As to robots.txt -- it tells Googlebot whether it should NOT crawl the site. There's a debate on whether its absence will also prevent Googlebot from crawling your site, but absence is supposed grant permission to crawl. You can read more about robots.txt at [robotstxt.org....]

beezee




msg:174299
 8:09 pm on Aug 18, 2003 (gmt 0)

On a related note, what percent of a website's traffic would you expect to come from Googlebot? I don't recall this being discussed anywhere.

claus




msg:174300
 9:24 pm on Aug 18, 2003 (gmt 0)

>> what percent of a website's traffic would you expect to come from Googlebot?

Googlebot's traffic pattern of late can have a large impact - especially for smaller sites. It can be double-digit percentages. I'm not speaking about the visitors being sent from the SE here, but about the traffic the bot generates by itself - false pageviews that is.

With regards to visiturs (humans) it depends a whole lot on the site, SE traffic in general is most important for new sites where a critical mass of new users have not been found yet, imho. As the site grows larger and more established, the percentage of new users (SE's, links) relative to repeat users (Bookmarks, addressbar) will decline.

/claus

killroy




msg:174301
 10:14 pm on Aug 18, 2003 (gmt 0)

I have some large sites with medium traffic (i.e. more pages then daily visitor numbers), and general bot traffic can make up around 60% or more of total traffic.

SN

PS: I always exclude all bots when doing my stats.

austtr




msg:174302
 10:45 pm on Aug 18, 2003 (gmt 0)

Regarding msg #1 & 2... the following experience involving 3 sites, all in the same 30 day time frame in May-June, may throw some light on what happens.

One was an existing site with PR4 that had a major update. It was spidered within 2 days and continues to be spidered acrross all pages.

Second was a new site with a reasonable startup group of PR6/5/4 inbounds. It was spidered within a few days, the index page appeared after about 2 weeks and the other pages after about a month. The site is spidered regularly every few days. The index page has just started showing PR4.

Third was a site with basically the same group of startup links, but for some reason Google isn't seeing all of them. This site shows a PR2 and gets nothing like the same number of spider visits.

It may be that PR4 and above get regular spidering, below PR4 gets less frequent spidering.... its too small a sample to be making statements of fact.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved