Forum Moderators: open

Message Too Old, No Replies

Can you explain this FreshBot behavior?

         

jamesa

1:52 am on Dec 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry if this is a newbie-ish question. I did search quite a bit and not seeing an answer that fully explains this.

Freshbot is hitting one page and pretty much only that one page every few mintues, and has been doing so continuously for as long as I've been watching (the last four days at least). It will occasionally grab robots.txt or some other page every so often, but it's just going crazy on that one page of the site. I know it's not because of a session ID in the URL or anything related - there's none of that going on. It's just a straight html page - no scripting at all. The other curious thing is that each time it visits the byte count (as shown in the logs) is different, and those byte counts are only a portion of the total size (the page *is* ridiculously huge, btw).

Not complaining, just wondering is this is a normal behavior and what it means.

Oaf357

2:54 pm on Dec 13, 2003 (gmt 0)

10+ Year Member



Since the page is huge (over 100k?) this makes some sense.

It would appear that Google is trying desperately to index the entire page (and having a hard time doing it, based on the changing byte counts).

It might a good idea to break the page up into a few smaller pages that way all of the data is indexed and represented well.

Good luck.

jamesa

10:35 pm on Dec 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Very interesting. Thanks, Oaf357.

Stefan

2:13 am on Dec 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



(the page *is* ridiculously huge, btw).

How ridiculously huge? Google will usually take the first ~100k of an html page. I have one at 411K that Google only has part of. It doesn't phase the bot at all, it just doesn't index the whole thing.

If it's hitting one page every few minutes, over and over, non-stop, it almost sounds like googlebot has decided it wants to mate with the page or something, (maybe that particular googlebot is in heat). It's sure not normal. Try chopping it up as was suggested and see if that helps.

How big is the html page?

jamesa

3:28 am on Dec 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> How big is the html page?

hehe, almost a meg (don't shoot me, I'm only the piano player :)). The best I could do is get it down between 100k and 200K. Did that several hours ago and it's still going strong. But now it's eating the whole page in one bite.

>> it almost sounds like googlebot has decided it wants to mate with the page or something

Kinda what I'm hoping... would be a good addition to the family. Seriously though, I don't know if I should be worried or happy.

Stefan

3:33 am on Dec 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Start knitting little sweaters and socks and stuff, there are babybots on the way... :-)

ADDED: Sorry for my absurdness. If at all possible, keep pages 100kB or less, that way google will be able to find and index all the text. Almost a meg? What's on it, a volume of the Encyclopedia Brittanica? :-) Sheesh, chop it into chapters.

Oaf357

4:10 am on Dec 14, 2003 (gmt 0)

10+ Year Member



I defintely wouldn't eliminate content, just break it up into a few pages.

I would say it's a good thing if Google is trying to pull down the entire page.

jamesa

6:03 am on Dec 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> Sheesh

Yea, I'm tempted to put "best viewed with an OC-48" on there, heh. Did get the file size down to the low 100's. Now it seems to be grabbing other pages more often, don't know if there's a correlation. Still loving that one page, however. Going to sleep, will see what it does over night. Thanks again for all the input so far.