I've been bothered by this quite a bit recently - I'm pretty sure it's a bug on Google's end and not something that websites are intentionally exploiting. I see it when I'm using the date range options too - some results are being incorrectly EXCLUDED based on a buggy date.
Yes, this is really bothering.Google is showing the dates of pages wrongly. For eg., I see a certain page as having a date as say, Sep 2009 (in the snippet), while the page was actually created in May 2009.It also now ranks lower than some recent pages.If age is a factor, then this is really an issue that google has to sort out quickly.
Are pages simply being updated with information? That could account for the freshnesh date on google, yet the displayed published date on the website is different. The page updates could be anything from updating links/adding/removing links, adding/removing text/images...anything really.
If a page is updated by the author/webmaster, shouldn't it be considered fresh?
That seems to be the problem that Google is having. The page has not really updated, but the server is giving a recent time/date stamp. For example, I see this on some forums where the thread has not had a reply in two years, but the date on Google is showing as "6 hours ago".
In many cases the date and time are accurate, but in others they are very wrong. If this is affecting how the page ranks - for example, on a "query deserves freshness" term - then the quality of the search result suffers.
[edited by: tedster at 9:14 pm (utc) on Apr 18, 2010]
Could not it be enough if anything on that page changes? Maybe the webmaster at the forum has added a new link to the footer? Would that be enough to count as a page update and qualify for the inclusion in the x hours ago listing?
It is really telling for us Google seems to be using the header date of last modified to return freshness? Does this occur only in conjunction with new inbound external or inbound internal links?
|It is really telling for us Google seems to be using the header date of last modified to return freshness? Does this occur only in conjunction with new inbound external or inbound internal links? |
How could inbound external links change the last modified date of head?
This isn't an area that I have looked into but my first thought about a more recent date that then actual update is that the webmaster simply uploaded a batch of pages including some that hadn't changed.
it would be nice if this boost works. all we'd have to do is put today's date on each page and we'd jump up the rankings. can't see it, though.
|How could inbound external links change the last modified date of head? |
It couldn't, but IMO it's a really interesting question, because 'freshness' in the ranking system is not necessarily the last modified time of the page. It also depends on links, and in some ways 'cascades' like PR (and everything else they do), so it could be some 'goofy glitch' where the discovery time of a new inbound link is being mishandled and credited to the 'modified date' of the page in the results...
Of course, in 'Google Speak' I should qualify what I'm saying with something like: 'In certain situations' which in translation could mean where x+y+z===a && a!=b || c<=d+y*r but not when (x+y+z===a && a!=b) || c<=d+y*r which could range from 'often' to 'only at certain times and in very specific situations'.
Agreed that these "edge cases" are probably coming from a combination of factors. Dynamic pages that are never cached on the server? Incorrect configuration for respeonses to If-modified-since? Some kind of AJAX chaos?
Just to add my 2 cents to this discussion:
This is not really new. I support many forum sites and the tags like "Last post: May 3, 2009" have been appearing for more than a year in most SERPs I'm looking at.
#1 - a fresher result DOES NOT guarantee a higher spot in SERPs
#2 - the date is ALWAYS, 100% of the time wrong
It bothered me in the beginning - I tried to ascertain how in the world they come up with the date that's always wrong - but nothing conclusive came out and I just learned to ignore it (possibly at my own peril).
I can't tell you how they do come up with the date but, based on what I looked at, I can list what they DO NOT use for the calculation.
They DO NOT use
- HTTP headers
- Dates mentioned in the text of the page
- if there is an RSS feed listing the page, they do not use <pubDate> field
- if there is a sitemap, they do not use <lastmod> field
I guess, I'll have to re-phrase the above: if they use the data, they come up with the date that does not equal to any of the above mentioned dates. In many cases they guess even as much as the year wrong.
I could not see a clear indication that a fresher page based on the "Last post: #*$!" tag means higher SERPs. That was a year ago. If it changed since, I'd be interested to hear about it of course, but I have to tell you: it will be really hard to game this because you never know how they come up with the date and why it is always wrong.
|if they use the data, they come up with the date that does not equal to any of the above mentioned dates. |
And, to rephrase your rephrase, just for the fun of it: if they use the data, they come up with a display date that does not equal to any of the above mentioned dates for use as one of more portions of the visitor accessible values related to a site or page.
I wonder if CAFFEINE is really like a TBPR lottery style drawing? The higher your TBPR is the more entries you get, and the more entries you get the better the chance you have of being selected. The dates they're showing could be the dates of the drawing so you'll know if you're a recent winner or not... Those dates you're seeing could be very accurate!
TheMadScientist, tedster and 1script, appreciate your thoughts on it. When I saw this it was showing a results which said updated 6 hours ago, and through to the top result in question revealed the article was from 2007 and the latest ping or comment from 2009.
I get it that you have observed them not using certain dates in the cases which you have looked at.
Where else can a date / time be extracted in relation to a page? Of all the places I can think of only these:
Page Created timestamp
Page Modified timestamp
Sitemaps Created date
On-Page published timestamp
JSON/XML oembed dates
Discovery timestamp of inbound link
Aside from these can you think of any other places they could extract timestamp signals?
IMO It's most likely a miss-association of a parameter somewhere in the process between spidering and publishing the information.
It could be any of a large number of things not having to do with the extraction of a date, including something inserting incorrectly from spidering on certain occasions, and it could be difficult to find, because it could be as simple as the difference in the two 'small' equations earlier... One is 'the first and either the second or third' and the other is 'the first and second together or the third singularly'.
I would not worry too much about it 'being extracted from the page incorrectly', unless it's a page that's not serving correct information in the first place and it's not being handled very well. It could even be what tedster said about caching or not serving a modified date, or the opposite in the caching situation and they're picking up on a cached date rather than a date from the site or page itself.
There are all kinds of reasons it could be happening and different places an incorrect date could be coming from, and unless you work there and are on the team working on it guessing what it is will be difficult IMO.
There are times when finding a slight error in a script is difficult, even if you're the one writing it, and I know personally there are some that look obvious if you're looking at the screen and many might think, 'Well can't you blah to fix it?' which always leaves me thinking, 'If it was so easy to find and I could just do blah to fix it you wouldn't be looking at it... duh?'
Anyway, the point is there are a huge number of processes and associations going in to making the results pages people see happen and it seems somewhere in the process there is a date association error...
As my account name here suggests, I know exactly what you are talking about :)
There are times when finding a slight error in a script is difficult, even if you're the one writing it
However, in most cases, while it may be hard to debug the software, it's usually a piece of cake to remove / comment out the part of the script/software that creates the HTML code you're not happy about. If this phrase is still on the page, it means that they see it as valuable for something. We just have to keep guessing what that something is.
I may be wrong but it looks to me that the very fact that G* uses the time tag implies to the user that the fresher the better yet in most areas outside news delivery it's a wrong assumption. Add to that the fact that the time tag itself is incorrect and you have enough reason to simply remove the part of the code responsible for displaying the tag.
|We just have to keep guessing what that something is. |
Why not leave the errors and know exactly when the issue is fixed since only a webmaster will know it's wrong?
An interesting 'hmmmm... could be thought' I had a few minutes ago is it might not even be Google. It could be a cache malfunctioning and serving the wrong date information to GBot which would mean Google is handling it correctly on their end, and there's nothing they can do even though it's messing with their results so they just have to shrug and wait or offer to fix the malfunctioning cache...
|I may be wrong but it looks to me that the very fact that G* uses the time tag implies to the user that the fresher the better yet in most areas outside news delivery it's a wrong assumption. |
If you read the patent application the method is a situational application and they compare the use of stale or fresh results to user clicks and then build the 'freshness factor' from queries from there. Also, 'freshness' is partially determined by the links to a page, the 'freshness' of those links, and the 'freshness' of the page containing each link, so it's possible to have a page that hasn't been updated for a year still be determined to be fresh.
If people think 'fresh' means 'new' or 'recently updated' that's not exactly the case.
Edited: Adjusted my guess upon re-reading prior posts.
@ 1script... LOL on the 'knowing about how tough it is to find an issue some times'. Last night what prompted me to take a break and post for a bit was the 15 minutes it took to find the issue in a 150 line script that checks and sets information. (It's basically a big if elseif.) You know one of those ones you use to do something AJAXY on a page and you have to constantly 'refresh' or 'reset' the page to get the error so you can try to figure out where it is in the PHP, and you only get it once in a while, and it worked a few minutes ago, but you found a faster, shorter, better way to do the same thing, so you 'fixed' it... LMAO. Always Fun!
A possiblility in SOME circumstances: date formats in different countries are often confused by software algorithms.
Typically 8/9/10 could be 8th Sept 2010 (UK) 9th Aug 2010 (US) or even 10th of something.
It all depends on where the date is found: in headers it's amost certainly in a predefined format and is correct but within the text it could be anything.
And likely it has nothing to do with this at all. :)