Technology Webmonkey Days

Mighty Atom: Really Similar Syndication?

…quick, then, before they hit the lights, a run-down on Atom Syndication.

Update circa, 2008: And hit the lights on Webmonkey they did. For sentimentality’s sake, I’ve reproduced the text of the original article below.

This one was a lot of fun to write:

Mighty Atom: Really Similar Syndication?

Last time around, we took a look at the increasingly popular rss graphic graphic, which developers use to signpost links to RSS (Really Simple Syndication) files. Well, RSS has a new competitor now: the upstart Atom Syndication Format.

Atom files can be recognized by their own cute-as-a-button button,  atom graphic And whether you click rss graphicor atom graphic, you still get the same raw-looking XML markup – not quite fit for human consumption. All these cute buttons have a real and practical purpose, though: Site Syndication.

Those corresponding XML files are designed for audiences using the “news aggregators” — sort of like mini Web browsers — much favored by high-volume websurfers. Using a news aggregator, you can browse through the latest updates from a customized set of your favorite websites in just a fraction of the time it takes to do the same thing using Internet Explorer. Also, because syndication files follow a standardized XML-based format, they allow other sites to integrate your latest links and headlines into their pages, automatically.

So, what’s the difference between Atom Syndication and RSS?

It’s kinda like the difference between “newfangled” Philips screwdrivers and “old-fashioned” Slotted screwdrivers. A few fire-eatin’ types may spend their days trolling the rec.arts.woodworking forums, debating the differences of torque capacity thresholds, drill bit cam-out, patent history, and manufacturing-cost issues between these incompatible rivals. But most of us are more or less resigned to keeping both types of screwdrivers in our toolkits — we’ll use whichever one is handy and fits our needs.

old-school layout.

Similarly, Atom Syndication and RSS are both tools designed to do the same basic job: advertise and distribute website content by creating machine-readable XML newsfeeds.

Chances are, your choice of syndication format will be influenced largely by your choice of Content Management System. Google’s Blogger, for example, pushes the nascent Atom format and includes a pre-built starter template for Atom feeds and Atom feeds only. (All this, even though Atom is still a work-in-progress, only at version 0.3.) Tripod’s Blog Builder tool, on the other hand, offers an RSS 2.0 generator.

Whether you’re invested in a CMS already or you’re still shopping around, it’s a good idea to have a working understanding of both technologies. We went over all the whys and hows of RSS in “Sharing Your Site With RSS,” so in the pages that follow, we’ll be focusing on Atom, starting with why people bothered to build another site syndication format in the first place.

Wherefore Need I Atom?

The best testament to the intelligent design of RSS is its popularity. According to Syndic8.com, the RSS 0.9x and 2.0 formats account for about 65 percent of the 25,000+ feeds they track. Mind you, that’s a market that RSS helped pioneer.
Plus RSS (the 0.9x and 2.0 versions evangelized by Dave Winer) has weathered competition before. The unfortunately named RSS 1.0 (a separate RDF-based format) emerged as a standard for developers seeking a more sophisticated, avowedly non-commercial alternative to the RSS variants controlled by Dave Winer’s Userland Software. It’s been a few years, however, and the market still overwhelmingly favors the Fisher Price-simple RSS of versions 0.9x and 2.x.

Meanwhile, quality newsreaders and aggregators have become increasingly adept at handling almost any “flavor” of syndication. Given that, the best way to improve the usefulness of your site is to focus on creating high-quality content, not change the feed format.
So then, why do we need Atom, again?

Dave Winer’s RSS spec is penned in a succinct, conversational style that’s both easy to read and easy to learn. Yet while brevity and informality work great when it comes to getting the masses to adopt new ideas, the spec had some notable gaps and gray areas. And late efforts to re-write or repair the RSS spec were hampered by a variety of issues, such as RSS’s emphasis on backwards compatibility and its adoption of a Creative Commons license.

Perhaps the best anecdotal example of RSS’s “under-specification” is its failure to address how a developer should deal with markup code (like HTML) when it’s mixed up with content. It’s like bubble gum ice cream #151; are you supposed to chew the gumballs while eating the ice-cream, or spit them out, one by one, into a soggy, mottled napkin? Every kid, it seems, has a slightly different take. So it goes with RSS.

Consider the title of a news story. What if certain words in it just have to be italicized, or the article is called, “An Introduction to the <blockquote> Tag”. In both these cases, you’ll need to introduce ruckus-causin’ <i> or <blockquote> tags inside the <title> element of an RSS feed. Now the RSS spec, as written, doesn’t say this is allowed. But then, it doesn’t explicitly prohibit this, either.

So what’s a well-meaning newsreader app supposed to do when it comes across this markup? If it chose to always display the HTML, brackets and all, the second example would work just fine. But then the first headline would look pretty <i>funny</i>. Vice versa, newsreaders that attempt to act on (or hide) HTML tags would make the first headline look nice, but would transform the second healdine into “An Introduction to the Tag”. Hi, tag!

Those aren’t the only issues developers have come across. How do you put relative URLs in a feed? What are proper uses of the <link> tag? Sure these are oddball cases, certainly they aren’t the biggest deals in the world, but they’re just enough of a problem to make certain programs get goofy – not the kind of foundation you want for archiving or API purposes.

Despite these small but potentially pesky foibles of RSS, nobody wants to switch technologies unless there’s an option with wild new features, or at least a couple of major bugfixes. Atom heavily favors the latter. With its lengthier and rather “lawyerly” documentation, the Atom spec nails down many of the quirks, ambiguities, and surprises that became inadvertently enshrined in the now-forever-frozen RSS. And while Atom may not be burgeoning with never-before-seen features, it does add a few goodies: new tags like <summary> and <content>, and proposed attributes like content rel=”fragment”, make it easier to programmatically distinguish excerpts from entries.

Although Atom’s technical improvements over RSS are more evolutionary than revolutionary, don’t mistake this for timidity: the Atom project harbors a fairly agressive agenda. Right now, the loosely knit group that fostered the Atom Syndication Format is simultaneously working on the “Atom API”, an additional set of programming definitions that allows software agents to communicate about basic weblogging actions such as posting, editing, requesting feeds, etc. If all goes according to the Big Plan, Atom-formatted data will be used for far more than newsfeeds. Soon, Atom-enabled programs could help you edit or archive your website without opening a browser, or allow you to painlessly switch between hosting/CMS providers like TypePad or Radio Userland.

Of course it’s hard to tell how all this syndication stuff will shake out in the long run. Right now, Atom’s inchoate “version 0.3” state isn’t perfect or free from glitches, but its momentum towards that ultimate goal has wooed early adopters. To see more technical cases of why key players such as Google and Movable Type are lining up behind Atom, check out Why We Need Echo (Atom’s former name), or these slides. As for the rest of us, let’s get building.

The Building Blocks of Atom

Atom Syndication was created to accomplish the very same task as RSS, so it’s little surprise that Atom code wound up looking like RSS. Atom is newer, and wisely builds off the design successes of its predecessor. If you know how to build a feed with RSS, it’s easy to get started with Atom.
The old “view source” technique probably remains the best pedagogical tool on the Web, so let’s get started building a minimal Atom feed by looking at one. Here’s a minimal example, lifted straight outta the 0.3 spec:

<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#">
<title>dive into mark</title>
<link rel="alternate" type="text/html"
href="http://diveintomark.org/"/>
<modified>2003-12-13T18:30:02Z</modified>
<author>
<name>Mark Pilgrim</name>
</author>
<entry>
<title>Atom 0.3 snapshot</title>
<link rel="alternate" type="text/html"
href="http://diveintomark.org/2003/12/13/atom03"/>
<id>tag:diveintomark.org,2003:3.2397</id>
<issued>2003-12-13T08:29:29-04:00</issued>
<modified>2003-12-13T18:30:02Z</modified>
</entry>
</feed>

Not so scary, right? It looks a lot like HTML, but since Atom is an application of XML, it’s a tad stricter: Every Atom feed must be well-formed, so when you open a tag, remember to close and nest it properly. XML rigamarole also demands the first two lines, which define the XML version and namespace. Confused? Just cut and paste the opening two lines, they’re the same for every Atom feed.
Now, before we jump into our tag-by-tag, play-by-play breakdown, there’s one design concept that’s worth pointing out. It’s Atom’s idea of “constructs”, which are elements that share a common structure. Don’t worry, it isn’t as fancy as it sounds. It’s just that certain groups of tags always follow the same core rules, wherever and whenever they’re used.

The Person Construct requires a nested element called <name> to be present when describing persons. (Makes sense, right?) Other “child” elements of the Person Construct, <email> and <url>, are considered optional. So whether you’re talking about the author of a feed as a whole, or the author of an individual entry, these basic rules apply: <name> is a must-have tag, <email> and <url> aren’t.

The Date Construct specifies that any time you mention a date, it has to follow a very specific format. This avoids problems like confusing MM/DD/YYYY with the Euro-style DD/MM/YYYY. The rules, in this case, are to follow the delightful, whimsical guidance of “W3C.NOTE-datetime-19980827”, and write your dates just like so: YYYY-MM-DDThh:mmTZD.

The Content Construct lays down the law on dealing with markup and other weird stuff. In short, it tells you to include only plain-text (no markup) inside Content tags, unless you explicitly define another registered media type.

The Link Construct explains how to format all those URLs and links in your feed. The big news here is the mandated inclusion of a “rel” and “type” attribute for every link. The “rel” attribute describes the type of relationship a link represents, while “type” indicates an advisory media type.

Alrighty then. Now that we all understand the Construct-ion of Atom, let’s take a closer look at the code sample above:
<title> If your feed is a duplicate of a Web page, they should both share the same <title>. Simple, that. The <title> element is a Content Construct, so it’s presumed to be plain-text by default.
<modified> A simple timestamp, indicating when your feed was last modified. it’s a Date Construct, so mind your M’s and Y’s.
<author> This refers to the entity that is responsible for the feed as a whole – be it a single person or corporation. Unsurprisingly, <author> is a “Person” construct, which is why you see the <name> tags nested in there, as child elements of <author>.

OK. The elements we’ve covered so far are roughly akin to the <head> of an HTML file. They’re the basic data that describes your feed as a whole. That information won’t change as your feed is updated. Now onto the <entry>s: the dynamic headlines, links, and content you’ll be syndicating. Every time you update your site and add new stories, new entries get added to the Atom feed.

<entry> Each entry is wrapped with an <entry> tag.
<title> When it’s inside an <entry>, the <title> is a Content Construct that conveys a title for that particular entry. One <title> mandatory for each entry.

<link> At least one <link> element is also required for each entry. A <link> can represent a “permalink” for the entry itself, or it can refer to a site that you are commenting on within the entry. To clarify what kind of relationship a given <link> implies, use the required “rel” (relationship) attribute. For each entry, you must have at least one <link> with a “rel” attribute of “alternate”, as shown here. (The link with rel=”alternate” is your permalink.)

<issued> indicates the time an entry was issued. It’s a Date Construct, of course.

<id> elements convey a permanent, globally unique identifier for the entry (akin to the “guid” in RSS 2.0). The <id> element should never change, even if the entry is edited – it’s purpose is to help newsreaders and other software understand that your entry isn’t suddenly “new” just because you’ve moved some punctuation around. A unique <id> is required for each <entry>.

Got that? Now close them tags out, and … congratulations, you’ve just built a bare-bones Atom feed! You are now ready for the final steps: validating and advertising your feed.

Feed the World

Now it’s time to validate your work against the universal Feed Validator. Of course, the Atom Syndication file you’ve created needs to be updated each and every time you add content to your site. To avoid the dull and error-prone task of updating Atom feed manually, most Web builders rely on automated tools to update their feeds. Content management systems like Blogger, MovableType, and TypePad all feature starter Atom templates and tools. And you can expect to see a slew of new freeware server-side scripts to process and create Atom feeds soon.

After uploading your Atom file to your server, you still need to inform people of its existence. When you create a new feed, running through these few steps will debut your feed in style.

  1. Advertise to Web surfers! Put an atom graphic button or text link on your page, linking to the feed file. Go ahead, right-click, Save as …
  2. Advertise to the machines! Some newsreader / aggregator applications will identify your feed’s location if you put the following “autodiscovery” code in the <head> section of your homepage: <link rel=“alternate” type=“application/atom+xml” title=“MySite’s Atom feed” href=“http://www.YourSiteHere.com/xml/index.atom” />
  3. Get listed by the major feed directories! Syndic8.com and News Is Free are two of the biggest collections of RSS and Atom feeds. Before announcing yourself to these sites, however, run a final test against the Atom feed validator. While Web browsers will render many poorly coded Web pages, Atom (and RSS) parsers can be less forgiving, and may require a well-formed XML file.

There. That should be plenty enough to get you started. Note that this was just an introduction to Atom, which means we’ve covered only the basic required elements. For early adopters, there are plenty of additional, optional tags you can integrate into your feed to make it more robust. Good places to look for ways to bulk up your feed include AtomEnabled.org, the official Atom Syndication Format specification, Cover Pages, and the Atom Wiki. Good luck, have fun, and stay on target!

Thanks to Brent Simmons, Graham Parks, Jeff Barr, Sam Ruby, and Mark Pilgrim for pointers and clarifications when researching this article.  This article was originally published on Webmonkey.com

You Might Also Like