Forum Moderators: open
crawls the entire site. If spider crawls the entire site then hypothesis that "[u][u]only 101k of your site is being crawled[/u][/u]" is incorrect.
?
Tejas
[edited by: heini at 5:49 pm (utc) on April 24, 2003]
[edit reason] No tools please per TOS / thanks! [/edit]
Basically hypothesis "Google crawls only 101k of your site", is still a hypothesis huh?
Hypothesize this:
Look for any keyword combination on google.
In general, unless the site has requested that google not cache it, it will tell you the file size.
find one that is bigger than 101K.
For instance, certain forums *cough* slashdot *cough* have huge audiences and their comment forums can generate many thousands of comments, yet if you search for this forum, you will see there isn't a single one larger that 101K.
---
edit:
by "this forum", i meant slashdot. sorry it was unclear.
[edited by: PatrickDeese at 4:59 pm (utc) on April 24, 2003]
PD>>yet if you search for this forum, you will see there isn't a single one larger that 101K.
But he was saying, it's possible Googlebot CRAWLS further than 101k but will only cache the first 101k. By the test above, it seems to stop cold.
Why is that link deleted yet stickysauce.com can stay even though it has banner advertising and links out to many other pages on the site. Isn't my tool just as valid as theirs? Especially since it performed a function that is not available elsewhere!
mysql> select timeserved, useragent, remoteip, bytes from requests where useragent like '%ooglebot%' order by bytes desc limit 10;
+---------------------+----------------------------------------------------+-------------+--------+
¦ timeserved ¦ useragent ¦ remoteip ¦ bytes ¦
+---------------------+----------------------------------------------------+-------------+--------+
¦ 2003-03-18 17:25:12 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.65 ¦ 216615 ¦
¦ 2003-03-28 12:54:02 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.48 ¦ 214929 ¦
¦ 2003-03-18 17:35:37 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.37 ¦ 213792 ¦
¦ 2003-03-21 12:18:41 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.79 ¦ 211522 ¦
¦ 2003-04-13 04:15:03 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.65 ¦ 210099 ¦
¦ 2003-04-11 08:37:30 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.65 ¦ 208899 ¦
¦ 2003-04-11 10:06:27 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.48 ¦ 207742 ¦
¦ 2003-03-28 12:30:40 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.41 ¦ 202404 ¦
¦ 2003-04-06 13:18:42 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.55 ¦ 202393 ¦
¦ 2003-04-11 09:47:30 ¦ Googlebot/2.1 (+http://www.googlebot.com/bot.html) ¦ 64.68.82.52 ¦ 201193 ¦
+---------------------+----------------------------------------------------+-------------+--------+
10 rows in set (4.16 sec)
I wonder if that means that you can put stuff at the bottom of large pages that Google dislikes, but some of the other bots still like?
Rob
<emphasis added>