Forum Moderators: open
Freshbot is hitting one page and pretty much only that one page every few mintues, and has been doing so continuously for as long as I've been watching (the last four days at least). It will occasionally grab robots.txt or some other page every so often, but it's just going crazy on that one page of the site. I know it's not because of a session ID in the URL or anything related - there's none of that going on. It's just a straight html page - no scripting at all. The other curious thing is that each time it visits the byte count (as shown in the logs) is different, and those byte counts are only a portion of the total size (the page *is* ridiculously huge, btw).
Not complaining, just wondering is this is a normal behavior and what it means.
It would appear that Google is trying desperately to index the entire page (and having a hard time doing it, based on the changing byte counts).
It might a good idea to break the page up into a few smaller pages that way all of the data is indexed and represented well.
Good luck.
(the page *is* ridiculously huge, btw).
How ridiculously huge? Google will usually take the first ~100k of an html page. I have one at 411K that Google only has part of. It doesn't phase the bot at all, it just doesn't index the whole thing.
If it's hitting one page every few minutes, over and over, non-stop, it almost sounds like googlebot has decided it wants to mate with the page or something, (maybe that particular googlebot is in heat). It's sure not normal. Try chopping it up as was suggested and see if that helps.
How big is the html page?
hehe, almost a meg (don't shoot me, I'm only the piano player :)). The best I could do is get it down between 100k and 200K. Did that several hours ago and it's still going strong. But now it's eating the whole page in one bite.
>> it almost sounds like googlebot has decided it wants to mate with the page or something
Kinda what I'm hoping... would be a good addition to the family. Seriously though, I don't know if I should be worried or happy.
ADDED: Sorry for my absurdness. If at all possible, keep pages 100kB or less, that way google will be able to find and index all the text. Almost a meg? What's on it, a volume of the Encyclopedia Brittanica? :-) Sheesh, chop it into chapters.
Yea, I'm tempted to put "best viewed with an OC-48" on there, heh. Did get the file size down to the low 100's. Now it seems to be grabbing other pages more often, don't know if there's a correlation. Still loving that one page, however. Going to sleep, will see what it does over night. Thanks again for all the input so far.