Forum Moderators: Robert Charlton & goodroi
Now, showing a list of current changes in the left hand navigation that is present on every page would mean: all (complete) pages change all the time (at least for this section).
Is there a potential problem with this with G?
Asking cuz G only crawls about 10% as heavy as all others.
Thanks!
Now, saying something looks "a little strange" is not the same as saying it "will get you in trouble", but it does give me pause.
I would hate it if something that has shown me over 6-7 years to be a good editorial practice has to be modified to suit G algos.
Still, if someone did decide to use an iframe to offer this kind of information, the user experience would not be affected at all.
I still think that I would lean toward an iframe approach so the algorithm has some firm meat to sink its teeth into, but there's no doubt that Google can handle regularly updating Home Pages in news sites -- especially if that Home Page has lots of trusted inbound links and PR.
Apart of this, google is showing the S. Results for all new n fresh copyrighted content on a new website which does not have any similarity with other. Also problem with Site search command.
So if u r scared of the results, then stay cool for some time, watch the updation and hope for the best.
thanks
If a site is designed to refresh parts of the content regularly, what would be considered a healthy portion of the served page to be changing frequently? 10%, 30%, 50%?
Also, how many pages within a site (if not all as in my case cuz of the 2% flexible/updating of the total page content) are considered ok to behave that way? Again, percent if possible?
On top: Is there any SE perceived difference in refreshing content via Java, PHP, ASP, RSS? How?
And there would certainly be a difference respective to the page(s) PR, linkage (external vs. internal). News with external links being treated differently than internal links (like in my case where I show current updates, very much like WebmasterWorld's homepage or category overview pages).
And finally: SE's loving and supporting fresh content, how does that fit with this whole discussion here?
Is there a potential problem with [a constantly changing page] with G?
My site's situation:
An inclusion of a "Top Ten Pages Today" panel on each page, which also includes a that-page hit-counter plus site-wide hit-counter. Also, the inclusion of a "Latest Additions" panel.
There are 2 issues that affect SEs:
The type of pages that we are talking about are dynamically-produced (PHP, ASP, etc) pages. By default, static HTML pages will produce a reliable--which is to say stable--Last-Modified header. In that situation, the SE bot will receive a 304-status header (page not changed) and will go away perfectly content. That is not the case with a dynamic page. By default, a dynamic page will never produce a 304-header even if the page-content is unchanged. This single fact is the cause of many of the WebMaster complaints recorded on this board, such as "Google has downloaded my 100-page site 10,000 times this month" (I exaggerate, but not much).
The scenario that has led to this thread is so common that the HTTP 1.1 authors introduced a construct specifically to account for it: Weak ETags [w3.org]. A Strong ETag says "This page is byte-by-byte the same as the previous page", and a Weak ETag says "This page is essentially the same as the previous page". The Last-Modified headers from HTTP 1.0 are to be considered as structurally equivalent to Weak ETags.
To keep this post reasonably-short, the bottom line is:
If you produce static HTML pages, your web-server software will take care of the Content-Negotiation (production of 304-headers, etc) for you. If you produce pages dynamically, then it is your responsibility to take care of Content-Negotiation.
A Content-Negotiation Class for PHP is here [webmasterworld.com]. The latest version can be downloaded here [modem-help.freeserve.co.uk] (currently at v0.12.2).
If you produce static HTML pages, your web-server software will take care of the Content-Negotiation (production of 304-headers, etc) for you. If you produce pages dynamically, then it is your responsibility to take care of Content-Negotiation.
Yes, your web-server should take care of the production of 304 responses, etc. correctly for static pages. However, it might be a good idea to check how your server is really responding. If you are using server-side includes on pages, i.e. .shtml, you may not be responding with a 304.
AlexK, when you say you had problems, what do you mean?
How about pages going supplemental? There have been occasions in the past when virtually the whole of my site was MIA from the index.
Look at those pages with "Script executed in #*$! seconds" or date and time...
A clear view on this picture re: my site is muddied by past canonical issues, dupe content, etc etc and does not really help illuminate this thread's topic. In this thread, I'm promoting just one issue, which is neatly highlighted by the "my site has only 100 pages, and this month G took 1,000 hits on it" type threads. Again:
A web-server with default settings will correctly handle 304's and other such important Content-Negotiation issues, but only for static pages. With dynamic sites (PHP, ASP, SHTML, whatever) the web-programmer has to handle all those issues within the web-scripts.
The OP has not yet indicated whether the above is an issue for them or not.
The interesting follow-on, I would think, would be whether a dynamic site which does not implement Content-Negotiation would be penalised by the SEs. The impact on bandwidth would (I hope) be self-evident.
#1 Server Response: [myurl.com-page...]
HTTP Status Code: HTTP/1.1 200 OK
Date: Wed, 27 Sep 2006 18:39:09 GMT
Server: Apache
Set-Cookie: csuv=visitor; expires=Thu, 28 Sep 2006 18:39:09 GMT
Set-Cookie: PHPSESSID=af43******************4; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html
Which would be your favorite methods then?
Next is probably Cache-Control, although it is also important that the pages report the correct Charset and Language (headers as well as HTML).
I do not know of any shortcuts for this, although for PHP pages the link that I gave will be a godsend. If you want some idea of the full menu, have a look at the PHP Class [modem-help.freeserve.co.uk] and weep!
Content-Negotiation is a complexity which the operators of static (HTML) page websites are shielded from by their web-server software. My experience of running a dynamic site is that it is a complexity which those website operators need to take on-board, else suffer the consequences.
Alex, do you see anything wrong here?
Specifically:
Expires: Thu, 19 Nov 1981 08:52:00 GMT
- date in the past
.
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
- explicit do not cache instruction, which will affect both proxies and browser
If I was to get really anal about it:
Vary: Accept-Encoding
- yet is not gzipped
.
Content-Type: text/html
- better that there is also a charset declaration at that point
.
- no Content-Length header
- no Content-Language header
- no Last-Modified header (understandable, since no-cache, but a good habit to get into)
[edited by: AlexK at 10:45 pm (utc) on Sep. 27, 2006]
Plus, virtually the pages that Google gets are at about 7-8k. As far as cache, I don't mind since I have plenty of bandwidth. As far the 304: I guess I could tell the programer to modify it to see if the page has been updated or not, and then issue the appropiate code.
The thing is that now, I removed the code that change each time is reloaded, and at most the pages change once a day. I think the block that changes google make is part of the template and sort of discounts it.
It is actually gzipped.
To example, here are the headers for this forum page when I looked at it:
Response Headers - [webmasterworld.com...](the headers that refer to gzip are in italics; Vary is there for the sake of Proxies, and Content-Encoding is there for the browser) (the example headers that you provided included the Vary header, but not the Content-Encoding header, which is why I made that comment).
.
Date: Thu, 28 Sep 2006 12:08:03 GMT
Server: Apache/2.0.52
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Cache-Control: max-age=0
Pragma: no-cache
X-Powered-By: BestBBS v4.00
Content-Length: 4300
Connection: close
Content-Type: text/html; charset=ISO-8859-1
As far the 304: I guess I could tell the programer to modify it...
.
Here is a brief 304 tutorial:
The server sends a Last-Modified Response header together with the page. Also included is an Expires header to say how long that Last-Modified is valid for. This is an example of what it could look like:
Last-Modified: Sat, 20 May 2006 23:00:00 GMT
Expires: Thu, 28 Sep 2006 12:08:03 GMT
After the Expiry date, if the page is re-requested, an If-Modified-Since Request header will be included (same date as Last-Modified) with the request for the page. If the page is unchanged, the server will respond with a 304 status header and no content - just headers, including new Expires header, etc. The browser, proxy or SE will re-use it's cached copy, and the whole process starts from scratch.
.
Now, look at the headers for this page (above) and you will see neither Last-Modified nor Expires headers - Brett does not want this page to be cached, nor to return a 304.
Look now back at the headers you supplied in post#:3099422 - Last-Modified is missing, and the Expires is in the past. That is a belt 'n' braces way of making sure of no-cache and also no 304.
Phew! Long post. HTH.
[edited by: AlexK at 1:07 pm (utc) on Sep. 28, 2006]
304 response: can I test that myself and how?
Then, request a page. Then, re-request a page. Examine the headers in both cases.
Added:
There is also the obvious look at the (server) logfiles. No 304s means no Content-Negotiation.
[edited by: AlexK at 1:09 pm (utc) on Sep. 28, 2006]