Welcome to WebmasterWorld Guest from 54.158.51.150

Message Too Old, No Replies

Google News now filtering out duplicate stories

     
9:12 am on May 3, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure if others have seen this before, but it's news to me, if you'll excuse the pun.

Search through Google News (this is UK) and on the top right the selection criteria not only offers the usual:

* sort by date, or

* sort by relevance

but now includes:

* sort by date with duplicates included.

So, duplicate news stories are now being filtered out automatically. Initially it would appear that they are determining duplicates based solely on headline and standfirst! I've just posted a news item and seen it on Google News 10 minutes later, and directly above me in G-News is the same story. The only difference - headlines and standfirsts.

I'm hoping the duplicate filters remain at that level!

Syzygy

3:10 am on May 6, 2007 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I never noticed this before, but it's also showing in the US. I think it's extremely useful to me as a user.
4:03 am on May 6, 2007 (gmt 0)



It may be something there testing and only available to a few ip's (used to get test stuff but probably pissed google off of late). Would you mind posting a screenshot.
5:40 am on May 6, 2007 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's too simple to bother with a screenshot - that's why I missed it. Top right of the News Search Results, right under the line where it says how many results there are, there are three options:

Sorted by relevance -- Sort by date -- Sort by date with duplicates included

Sorted by relevance is the default.

2:12 am on May 7, 2007 (gmt 0)



Nah not seeing it.

Its in testmode I think tedster and for me a screenshot would be usefull.

3:09 am on May 7, 2007 (gmt 0)
12:34 pm on May 7, 2007 (gmt 0)

5+ Year Member



I haven't noticed that new filter, to be honest. A very handy tool.
Syzygy, thanks a bunch for your share,

J.

4:18 pm on May 8, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After posting a news story this morning I decided to monitor it on Google News (UK) to see what happened (using a search term unique to the article).

After ten minutes:

* Sort by date:

1st story - Is mine.
2nd story - There is an earlier story in second slot, posted 3 hours previously.
3rd story - Is the original press release as it appears on a specific newswire.
Pic - Is from the second story.

(Note: at about 15 minutes, Story 3 is no longer showing.)

* Sort by relevance:

1st - My story.
2nd - The earlier story.
Below that is the link for ‘omitted results’. The original press release is part of the omitted results.

* Sort by date with duplicates included.

This shows all three stories, unsurprisingly in date/time order.

After 30 minutes:

* Sort by date:

Now only the earlier story is showing. Both my story and the original press release have disappeared. The pic from the earlier story is still showing.

* Sort by relevance:

1st - My story is no longer on top - the earlier story is first.
2nd - My story shows, but only as a headline under Story 1.
The pic shown is from Story 1.
There is a link to "all 3 news articles". Click on this and my story is now on top. The listing for the original press release shows.

* Sort by date with duplicates included.

Again, all three stories show in date order.

After five hours:

* Sort by date:

Still only the earlier story shows. All other versions of the story are excluded.

* Sort by relevance:

1st - A new version of the story from another site is now in first place.
2nd - My story is here.
The earlier story no longer shows, but its pic does.
There is now a link to "all 7 news articles".

A few minutes later and the pic from my story now shows. Refresh the page a few times and it becomes apparent that Google is alternating the pics that accompany news items.

Another version of the same story shows but for some reason shows as a separate news item.

* Sort by date with duplicates included:

As more versions of the story are added, so mine drops down the page lower and lower.

It would seem then that factors come into play in determining the running order of "relevancy". My little test show that the relevancy of a story can change with time. Fresher stories appear to be deemed more relevant for an initial period. Possibly stories are rotated to determine which performs better? Likewise, it would also appear that accompanying pics are rotated, possibly for the same purpose.

(On the other hand, use "search by date" after some time has passed and seemingly only one story shows. There is not even a link for "omitted results". In this instance the original news wire source for the story loses out to the first site to post it "publicly".)

Possibly then Google News has become that much more dynamic. Stories can shift up and down the news listings in the course of minutes dependent on unknown factors. Perhaps this is directly related to recent Google patents (discussed here some while ago if I recall correctly) focused on identifying and factoring duplicate content?

Syzygy

[edited by: Syzygy at 4:22 pm (utc) on May 8, 2007]

4:53 pm on May 8, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



This gives us insight on how good google is able to find the original content writer. I bet this info will work into regular serps soon.
5:18 pm on May 8, 2007 (gmt 0)

5+ Year Member



"I bet this info will work into regular serps soon. "

Maybe it already is. I'm seeing a lot of deindexing of articles written for syndication submission.

6:02 pm on May 8, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I have a feeling it has been or is being experimented with. It makes sense for the original writer of the content to be in the number one position
 

Featured Threads

Hot Threads This Week

Hot Threads This Month