Any ideas why only half a page is cached?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Any ideas why only half a page is cached?

jaimes

1:40 pm on Aug 17, 2006 (gmt 0)

Does anyone have thoughts on why only the top half of the homepage is google cached and the rest below does not show up on the cache page? It's the first time in 4 years I've seen this on the site, gotobaby ,

We are switching DNS servers currently, but the cache is from several days ago when that was Not going on. Ranks have also dropped on our main keyword and not sure if this is related.

Much appreciated

Brett_Tabke

1:48 pm on Aug 17, 2006 (gmt 0)

We really don't care for url examples here as it leads to direct spam. However, it sure looks like you have found a problem we can all learn from.

Site on your profile right?

This is the third time in a few weeks I have seen a thread like this come through. I don't get why your page isn't cached completely [64.233.187.104]. I am almost ready to call it a html stripping bug on googles part.

The only other thing I could think of would be a CMS error on your server side or code error. Are you doing any bot detection, or cloaking? The weird part is that the noncached part starts right where there is a equiv content break and a code error.

jaimes

1:57 pm on Aug 17, 2006 (gmt 0)

Didn't mean to post name of site, but figured i could not post a link, thought you should be able to see it so what I explained made sense.

The server I have been until today has had too many interruptions lately which is why I am switching to a new cfmx server, but it is such a specific stoppage where the cache ends that I'm not sure that played a role.

There is no cloaking or anything so I'm still searching for thoughts or similar situations people have had.

Brett_Tabke

2:07 pm on Aug 17, 2006 (gmt 0)

It looks like that is coming out of a cms though right?

Has this been a consistent issue with Google for awhile?

encyclo

2:13 pm on Aug 17, 2006 (gmt 0)

The precise cut-off point for the cached page is the start of a nesting error in the markup. From the validator:

# Error Line 1539 column 134: end tag for "SPAN" omitted, but its declaration does not permit this.
... located in Manhattan.</span></P></TD>
* You forgot to close a tag, or
* you used something inside this tag that was not allowed, and the validator is complaining that the tag should be closed before such content can be allowed.
The next message, "start tag was here" points to the particular instance of the tag in question); the positional indicator points to where the validator expected you to close the tag.
# Info Line 1518 column 32: start tag was here.
<TD><SPAN class="style2"><span class="V11Purple">A B

I wonder if the bug would be fixed if you closed the

span

...

tedster posted about a similar problem a few months back which involved the

span

element:

[webmasterworld.com...]

jaimes

2:32 pm on Aug 17, 2006 (gmt 0)

thanks,
I'll take a look at that and get back, not sure how it could have changed, been the same for a long time.

Let me know if you think of anything else in the meantime.

g1smd

7:28 pm on Aug 17, 2006 (gmt 0)

My guess would have been:

- the page is more then 100KB long (but I have seen cache sizes up to 1220KB here and there sometimes, yes 1.2MB).
- an HTML code problem that stops the bot spidering the rest of the page.

Your example puts to rest the age-old question:

"Does non-valid code harm my site in Google?"

YES, it does.

tedster

8:11 pm on Aug 17, 2006 (gmt 0)

not sure how it could have changed, been the same for a long time.

Your page may have been the same for a long time, but that's not so for Google Google. For example, Google recently moved to a whole new infrastructure (Big Daddy). As part of that move, they re-wrote their spider code, and I assume that includes error recovery routines.

Although being totally "W3C Valid" is not a magic pill of some kind (Matt Cutts just recently confirmed it is not) finding and removing real errors can be essential -- especially things like partial or missing tags, nesting errors and the like. Just because a browser renders the page (they each have their own error recovery) that doesn't mean Google will handle any errors the same way. After all, Google is not even trying to display the page, only take apart the pieces and put them through a very complex ranking algo.

encyclo

8:50 pm on Aug 17, 2006 (gmt 0)

It is important also to check out your server logs, especially if you have access to the access log for the date of the cached page. Do you have any "206 Partial Content" errors or were there any database errors? It could simply have been that only a partial page was served to Googlebot due to system resources or errors on your server.

jaimes

8:54 pm on Aug 17, 2006 (gmt 0)

I'll make some changes and look through the logs to see if the bots have a smoother path and give an update.

Keep the thoughts coming though in case this is not the problem. thanks

trinorthlighting

10:12 pm on Aug 17, 2006 (gmt 0)

Being w3c compliant means your site is 100% viewable on a text browser. Something matt did mention....

My take, I validate all my pages, helps me catch the simple errors that can tank a site.

jadebox

10:21 pm on Aug 17, 2006 (gmt 0)

"Does non-valid code harm my site in Google?"
YES, it does.

The question has always been whether there is any benefit to having pages which validate. I don't think there is. But, that's not the same as saying that invalid code isn't harmful. :-)

-- Roger

jaimes

5:10 pm on Aug 21, 2006 (gmt 0)

We cleaned up the error span, do you see any nonvalidated code left?. Also, looked in the log files for "206 Partial Content", but not sure what to look for so didn't find anything as of yet that stood out.

Still shows an Aug 14th Cache so not sure if it completed caching the whole page. Like I said, nothing has chnaged for a long time on our end and are ranks have been solid in the top 4 for over 2 years on keywords that were over 200 million in competition like, baby gifts

Any other thoughts or comments on the partial cache please let me know.
Thanks in v. much advance as always.