Google News Publishers Help

Have any of you ever read the documentation that Google have for their other services besides search? Do you think the suggestions given for other services are to be considered when developing documents in general? Ever read the documentation on their Search Appliances?

Here are some little tidbits I've included in my reference library while performing various research for inclusion into the various services that Google have to offer. This one is specific to Google News as the title implies.

Google News (publishers) Help
[google.com...]

Google News (publishers) Help > Technical Requirements
[google.com...]

The above will take you to the starting points for the below snippets of information...

Google News (publishers) Help > Technical Requirements: Article URLs
[google.com...]

Display a three-digit number. The URL for each article must contain a unique number consisting of at least three digits. For example, we can't crawl an article with this URL: http://www.example.com/news/article23.html. We can, however, crawl an article with this URL: http://www.example.com/news/article234.html. Keep in mind that if the only number in the article consists of an isolated four-digit number that resembles a year, such as http://www.example.com/news/article2006.html, we won't be able to crawl it.

^ Did you know that? Of course you did! ;)

Google News (publishers) Help > Technical Requirements: Dynamic content
[google.com...]

Google News indexes dynamically generated webpages, including .asp, .php, and pages with question marks in their URLs. However, these pages can cause problems with our crawler, and may be ignored.

^ Ya, even after all the advances in technology, there are still challenges in this area.

Google News (publishers) Help > Technical Requirements: Forum URLs
[google.com...]

Google News is unable to include articles that are set up as posts or threads. For example, if a URL contains specifically one of these following substrings, then it will not be crawled.

And...

Please keep in mind that we're unable to include sites that don't have a formal editorial review process.

^ I didn't know that certain URI strings are off limits to the Google News Crawler. I wonder how this translates over to search?

Oh, here is an interesting one...

Google News (publishers) Help > Technical Requirements: Links to your articles

In order for our crawler to correctly gather your content, each article needs to link to a page dedicated solely to that article. We're unable to index articles from news sections which consist of one long page rather than a series of links that lead to articles on individual pages.

And...

Keep in mind that our automated system is currently best able to crawl headlines or anchors (text links such as "Full story" or "read more") that have 22 words or less.

^ 22 words or less. There are more references to that 22 word limit and another that states 2 to 22 words as a minimum/maximum for headlines and page titles!

Those are just a few of the things I added to me library. There are all sorts of interesting tidbits within the documentation for each of their products and/or services. There is consistency across the board with some suggestions and then there are specifics if you are targeting certain Google services such as Google News.

Do you think the written suggestions for other Google Products and Services apply to search in general? I was surprised to see min/max for headlines and titles. 22 words? Whew, that is one healthy title. We all know that Google does not stop at character 67 which is usually the max point of truncation. 22 words? What is the average word length?

Google News Publishers Help

pageoneresults

tedster

GrendelKhan TSU

Syzygy

tedster

pageoneresults

tedster

Syzygy

roseberry

nealrodriguez

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week