Google News has Strict Requirements
I’ve been poking around Google News, trying to figure out how to register as a legitimate news source with the new web site. There is a long list of requirements to qualify to be listed in Google News, and the new web site qualifies for all of them, except one.
Here’s a list of the technical requirements, for those who are interested. The only qualification that our new web site doesn’t pass is under Article URLs:
Technical Requirements: Article URLs
In order to be included in Google News, your articles URLs should meet the following guidelines:
Be unique. Each of your pages that display an article’s full text needs to have a unique URL. We can’t include sites in Google News that display multiple articles under one URL, or that do not have links to pages dedicated solely to each article.
Be permanent. Our system is unable to crawl sites that use a single URL for multiple articles. For example, we wouldn’t be able to crawl the page www.yoursite.com/news1.html if it displayed a different story every day. In order to ensure that our links to articles function properly, each article on a news site needs to be associated with one unique URL, and that URL must be permanent (i.e., it can’t be recycled).
Display a three-digit number. The URL for each article must contain a unique number consisting of at least three digits. For example, we can’t crawl an article with this URL: http://www.google.com/news/article23.html. We can, however, crawl an article with this URL: http://www.google.com/news/article234.html
Keep in mind that if the only number in the article consists of an isolated four-digit number that resembles a year, such as http://www.google.com/news/article2006.html, we won’t be able to crawl it.
The URLs on www.koreaittimes.com are automatically-generated aliases that consist of most of the words of a story title. For instance, in the December story A Wave Too High? by Chun Go-eun, you may notice that the URL follows the format of web site name, then the word story, then the article URL which is wave-too-high. This is really convenient for our users I think because it’s easy to remember, and it looks nice too. However, it doesn’t contain a three digit number, so it doesn’t qualify as an article by Google’s guidelines. I am going to have to think of a solution to this next week.
One thing I have been pondering is to put the node ID of each node into the story URL. It could look something like www.koreaittimes.com/story/5534/this-story. However, that would turn the human-friendly URLs into something that is impossible for casual readers to remember, with that 3 or 4 digit number in the middle. I’m not sure how to resolve this yet, but I do think we need the traffic that Google News can generate. I’ll have to think of something.
I found a note from Google that seems to indicate if you use the News Sitemap feature you can use your existing URLs:
http://groups.google.com/group/news-HelpPublishers/msg/87efba8b275c761c
http://www.google.com/support/news_pub/bin/topic.py?topic=11666
Though I’m not sure if it would be an easy thing to generate through Drupal. And I would probably send Google a note to confirm the exception.
Futurize Korea
January 8, 2009 at 6:03 pm
Thanks for those links, that sounds like the best alternative. Doing a bit of searching through Drupal.org, I’ve been able to find a site map module at http://drupal.org/project/xmlsitemap. I’ll have to see if I can’t put these two together.
Matthew Weigand
January 9, 2009 at 1:15 pm
dsfsdfs67877 test test
0ql6ty
June 12, 2009 at 3:16 am