|Pages Updating Daily - An Issue with G and other SE's?|
Is there such thing as: "too many uptades" per site?
My users enter the site from any direction, the site is heavy on user contribution and thus content in the some sections is constantly changed.
Now, showing a list of current changes in the left hand navigation that is present on every page would mean: all (complete) pages change all the time (at least for this section).
Is there a potential problem with this with G?
Asking cuz G only crawls about 10% as heavy as all others.
That's a good question. I'd like to know the answer to this one too.
Ahh heck! No I wouldn't.
I'll just focus on Yahoo and MSN for now. I'll let G do what it wants to to.
On one of his videos discussing A/B split testing [video.google.com], Matt Cutts made a comment that "anytime we go to a page and see different content, or reload a page and see different content every time, that does look a little strange."
Now, saying something looks "a little strange" is not the same as saying it "will get you in trouble", but it does give me pause.
In our news site we have a related links section underneath each article in our database, the related links change every time our page is loaded. Is this a problem? We are looking for answers because although our entire site is indexed, the majority of our articles are in supplemental results. Could this be the reason?
If Matt wouldn't say it "definitely is a problem" then I'm not about to say it either. Maybe it's more of a problem for lower PR pages and becomes more trusted and less problematic if there are all kinds of trusted inbound links and higher PR. That sound reasonable to me and would fit with other Google patterns that I notice.
But I will offer some food for thought. If the page is never the same, and especially if its outbound links are always changing, then how can the algorithm score it dependably? If in doubt, I might try loading those "related links" in an iframe so they are not in the same html document. It's worth a test, especially if the page is already having troubles.
On the other hand, Tedster, constantly / regularly changing related links on news story pages is recommended practice editorially for news websites. Present the most relevant and latest related stories to the reader.
I would hate it if something that has shown me over 6-7 years to be a good editorial practice has to be modified to suit G algos.
Wanderingmind, do those related story links change EVERY time a page is loaded? That is the concern in this thread. Of course frequent changes on news related home pages are to be expected -- but frequent and "every time" are not the same thing.
Still, if someone did decide to use an iframe to offer this kind of information, the user experience would not be affected at all.
are iframes kosher with Google? Haven't kept up with these things for a while? it seems borderline cheating since a different page is shown to the users, and a different one to Google--if they cannot see or index the iframe.
Sure -- and it's a way of not "contaminating" the content of the main html document. Even though the iframe content looks like it's on the "same page", as far as a spider is concerned it's really on its own url.
I need to make an apology. I just registered that the title of the thread is about "daily" updating, and not "constant" updating. I based my early comments on misunderstanding what "all the time" meant in this case. I hope what I said is clear enough to help people looking at this issue.
I still think that I would lean toward an iframe approach so the algorithm has some firm meat to sink its teeth into, but there's no doubt that Google can handle regularly updating Home Pages in news sites -- especially if that Home Page has lots of trusted inbound links and PR.
I believe that google is experiencing some technical problem bcoz there r so many bad results or irrelevent results r coming in SERPS.
Apart of this, google is showing the S. Results for all new n fresh copyrighted content on a new website which does not have any similarity with other. Also problem with Site search command.
So if u r scared of the results, then stay cool for some time, watch the updation and hope for the best.
The more I think about it, the more questions pop up.
If a site is designed to refresh parts of the content regularly, what would be considered a healthy portion of the served page to be changing frequently? 10%, 30%, 50%?
Also, how many pages within a site (if not all as in my case cuz of the 2% flexible/updating of the total page content) are considered ok to behave that way? Again, percent if possible?
On top: Is there any SE perceived difference in refreshing content via Java, PHP, ASP, RSS? How?
And there would certainly be a difference respective to the page(s) PR, linkage (external vs. internal). News with external links being treated differently than internal links (like in my case where I show current updates, very much like WebmasterWorld's homepage or category overview pages).
And finally: SE's loving and supporting fresh content, how does that fit with this whole discussion here?
|Is there a potential problem with [a constantly changing page] with G? |
Every page on my site is like this, and my experience has been "Yes". I also think that I've found the means to resolve it.
My site's situation:
An inclusion of a "Top Ten Pages Today" panel on each page, which also includes a that-page hit-counter plus site-wide hit-counter. Also, the inclusion of a "Latest Additions" panel.
There are 2 issues that affect SEs:
I would contend that the last is more important than the first.
- On-page content changing between bot-accesses.
- Last-Modified headers changing.
The type of pages that we are talking about are dynamically-produced (PHP, ASP, etc) pages. By default, static HTML pages will produce a reliable--which is to say stable--Last-Modified header. In that situation, the SE bot will receive a 304-status header (page not changed) and will go away perfectly content. That is not the case with a dynamic page. By default, a dynamic page will never produce a 304-header even if the page-content is unchanged. This single fact is the cause of many of the WebMaster complaints recorded on this board, such as "Google has downloaded my 100-page site 10,000 times this month" (I exaggerate, but not much).
The scenario that has led to this thread is so common that the HTTP 1.1 authors introduced a construct specifically to account for it: Weak ETags [w3.org]. A Strong ETag says "This page is byte-by-byte the same as the previous page", and a Weak ETag says "This page is essentially the same as the previous page". The Last-Modified headers from HTTP 1.0 are to be considered as structurally equivalent to Weak ETags.
To keep this post reasonably-short, the bottom line is:
If you produce static HTML pages, your web-server software will take care of the Content-Negotiation (production of 304-headers, etc) for you. If you produce pages dynamically, then it is your responsibility to take care of Content-Negotiation.
A Content-Negotiation Class for PHP is here [webmasterworld.com]. The latest version can be downloaded here [modem-help.freeserve.co.uk] (currently at v0.12.2).
|If you produce static HTML pages, your web-server software will take care of the Content-Negotiation (production of 304-headers, etc) for you. If you produce pages dynamically, then it is your responsibility to take care of Content-Negotiation. |
Yes, your web-server should take care of the production of 304 responses, etc. correctly for static pages. However, it might be a good idea to check how your server is really responding. If you are using server-side includes on pages, i.e. .shtml, you may not be responding with a 304.
I will check the response. But first I took off the scripts and will observe any change in g-bot behavior, reporting back to this thread in a couple of weeks.
when you say you had problems, what do you mean? Ranking problems or just bandwidth? Look at those pages with "Script executed in #*$! seconds" or date and time...I would hate if Google went that way. I could see a problem if links changed everytime google accessed it, and if this was done sitewide google could have a problem with the PR.
|AlexK, when you say you had problems, what do you mean? |
How much time do you have?!
How about pages going supplemental? There have been occasions in the past when virtually the whole of my site was MIA from the index.
|Look at those pages with "Script executed in #*$! seconds" or date and time... |
Both feature on my site.
A clear view on this picture re: my site is muddied by past canonical issues, dupe content, etc etc and does not really help illuminate this thread's topic. In this thread, I'm promoting just one issue, which is neatly highlighted by the "my site has only 100 pages, and this month G took 1,000 hits on it" type threads. Again:
A web-server with default settings will correctly handle 304's and other such important Content-Negotiation issues, but only for static pages. With dynamic sites (PHP, ASP, SHTML, whatever) the web-programmer has to handle all those issues within the web-scripts.
The OP has not yet indicated whether the above is an issue for them or not.
The interesting follow-on, I would think, would be whether a dynamic site which does not implement Content-Negotiation would be penalised by the SEs. The impact on bandwidth would (I hope) be self-evident.
"...the web-programmer has to handle all those issues within the web-scripts..."
Which would be your favorite methods then?
Alex, do you see anything wrong here?
#1 Server Response: [myurl.com-page...]
HTTP Status Code: HTTP/1.1 200 OK
Date: Wed, 27 Sep 2006 18:39:09 GMT
Set-Cookie: csuv=visitor; expires=Thu, 28 Sep 2006 18:39:09 GMT
Set-Cookie: PHPSESSID=af43******************4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
|Which would be your favorite methods then? |
The absolute minimum that a web-server needs to handle is to provide 304s (which means responding to If-Modified-Since, plus sending Expires headers) for unchanged content. As walkman first said, that is good for bandwidth reasons, if no other.
Next is probably Cache-Control, although it is also important that the pages report the correct Charset and Language (headers as well as HTML).
I do not know of any shortcuts for this, although for PHP pages the link that I gave will be a godsend. If you want some idea of the full menu, have a look at the PHP Class [modem-help.freeserve.co.uk] and weep!
Content-Negotiation is a complexity which the operators of static (HTML) page websites are shielded from by their web-server software. My experience of running a dynamic site is that it is a complexity which those website operators need to take on-board, else suffer the consequences.
|Alex, do you see anything wrong here? |
I do not see anything "wrong", but it provides a page which will never be cached, nor can ever return a 304 nor 206.
|Expires: Thu, 19 Nov 1981 08:52:00 GMT |
- date in the past
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
- explicit do not cache instruction, which will affect both proxies and browser
It's the sort of header that my site returns on the admin pages, which explicitly do not want to be cached. I assume that the reason that your page is like that is because of the "PHPSESSID" in the cookie (occurs by default if my memory serves me correct).
If I was to get really anal about it:
|Vary: Accept-Encoding |
- yet is not gzipped
- better that there is also a charset declaration at that point
- no Content-Length header
- no Content-Language header
- no Last-Modified header (understandable, since no-cache, but a good habit to get into)
[edited by: AlexK at 10:45 pm (utc) on Sep. 27, 2006]
It is actually gzipped. Tested it and I got this:
"Original Size: 37 K
Gzipped Size: 8 K
Data Savings: 78.38%"
Plus, virtually the pages that Google gets are at about 7-8k. As far as cache, I don't mind since I have plenty of bandwidth. As far the 304: I guess I could tell the programer to modify it to see if the page has been updated or not, and then issue the appropiate code.
The thing is that now, I removed the code that change each time is reloaded, and at most the pages change once a day. I think the block that changes google make is part of the template and sort of discounts it.
304 response: can I test that myself and how?
Your general pages may well be, but that page, with those headers, is not (unless the Content-Encoding header was missed off).
To example, here are the headers for this forum page when I looked at it:
(the headers that refer to gzip are in italics; Vary is there for the sake of Proxies, and Content-Encoding is there for the browser) (the example headers that you provided included the Vary header, but not the Content-Encoding header, which is why I made that comment).
|Response Headers - [webmasterworld.com...] |
Date: Thu, 28 Sep 2006 12:08:03 GMT
X-Powered-By: BestBBS v4.00
Content-Type: text/html; charset=ISO-8859-1
|As far the 304: I guess I could tell the programer to modify it... |
Not on that page you cannot - it will never return a 304, due to the presence of a PHP-Session [uk.php.net] (remember the PHPSESSID?).
Here is a brief 304 tutorial:
The server sends a Last-Modified Response header together with the page. Also included is an Expires header to say how long that Last-Modified is valid for. This is an example of what it could look like:
|Last-Modified: Sat, 20 May 2006 23:00:00 GMT |
Expires: Thu, 28 Sep 2006 12:08:03 GMT
A browser, proxy or SE simply will not re-request this page until the Expires header terminates, unless other headers tell it otherwise.
After the Expiry date, if the page is re-requested, an If-Modified-Since Request header will be included (same date as Last-Modified) with the request for the page. If the page is unchanged, the server will respond with a 304 status header and no content - just headers, including new Expires header, etc. The browser, proxy or SE will re-use it's cached copy, and the whole process starts from scratch.
Now, look at the headers for this page (above) and you will see neither Last-Modified nor Expires headers - Brett does not want this page to be cached, nor to return a 304.
Look now back at the headers you supplied in post#:3099422 - Last-Modified is missing, and the Expires is in the past. That is a belt 'n' braces way of making sure of no-cache and also no 304.
Phew! Long post. HTH.
[edited by: AlexK at 1:07 pm (utc) on Sep. 28, 2006]
|304 response: can I test that myself and how? |
Get yourself a browser which is designed to show them. I use Mozilla, and FireFox also will work. Then, install the extensions that display them. Web Developer is the all-purpose essential, and Live HTTP Headers specifically for the headers.
Then, request a page. Then, re-request a page. Examine the headers in both cases.
There is also the obvious look at the (server) logfiles. No 304s means no Content-Negotiation.
[edited by: AlexK at 1:09 pm (utc) on Sep. 28, 2006]
Thanks bunches AlexK!