Cambridge Days

Kettle’s Yard

Just north of the Cam, past Magdalene College, is Kettle’s Yard. From four 17th and 18th century cottages, Jim and Helen Ede built foundations for a single house, and a singular home.

It’s an art museum, now – though I suspect Mr. Ede (himself the curator of the Tate in London, once) would agree that the scale and setting make it something altogether different. Above all, Kettle’s Yard remains a home: you tug on the doorbell to enter, and once inside, you can grab a seat anywhere, even pull a book from the shelves. And that’s how it went for us, yesterday afternoon.

Of course, art is the big draw: when you’re a networked fellow like Mr. Ede, I suppose it was easy to gather up pieces from sundry ‘artist friends’ like Ben Nicholson, Gaudier-Brzeska, even Miro and Brancusi. However, the personal authenticity of the collection is what impresses the most – knowing that behind every piece was afternoon tea or a handshake, the ties of friendship and patronage.

You can feel how the house was slowly assembled, built – not simply bought at Sotheby’s. In that sense, Kettle’s Yard reminds me of Jim Thompson’s house in Bangkok: an organically-produced shell of an extraordinary life. That, and then there’s the fun of traipsing through the tiny bedrooms, hallways, and winding staircases, so unlike the squared halls of most museums.

Best of all? At Kettle’s Yard, it’s the arrangement and selection of every piece which matters, not cash value; some of the most important features are pebbles, plates, and lemons (just ask), each item placed properly, and just so.

Oh – and it’s free. Next time it really rains, I’m heading back.

Cambridge Days

Gown and Town: suiting up for b-school

I bought my gown last week. I couldn’t help but grin, trying it on: the long robes are probably one of the more peculiar and quintessential images associated with Oxbridge colleges.

Actually, the first time I glimpsed a formal Cambridge robe was in Berzerkley, of all places; one of my undergrad professors was a Cambridge (and Oxford) don, and at graduation he’d ambled onto the stage wearing colorful garments which looked like a cross between a rodeo clown’s outfit and the Vatican Guard uniform. Amidst all our cookie-cutter rental-quality black robes, and the tattered business-class upgrades worn by most Berkeley profs, his outfit was… brilliant.

Americans generally associate gowns only with graduation; here, it was a more important part of your daily outfit, once upon a time. I needed to purchase mine before school starts (T-minus 2 weeks, ack) because it’s still mandated for nightly dinner at Magdalene.

Thankfully, I won’t need to strut about town always looking like Zorro, or a wayward Renaissance Faire vendor – graduate-level gowns are simple, uniform black affairs – and anyhow, I gather it’s a thing to keep stashed in a locker or backpack right until you walk into Formal Hall. Perfect compromise, in my book.

Cambridge Days

Grantchester, The Orchard

All it takes is a little Murphy’s law: the day after local papers led with “WETTEST SUMMER IN 50 YEARS”, this place starts feeling like California. In a sunshine-y sense, that is.

We took a most civilized stroll out of town yesterday, and walked alongside the river Cam towards Grantchester. The footpath dips and rises through hyperpastoral meadows, and it offers exactly the sort of scenery you’d hope for: grazing livestock, starry-eyed punters, and rolling farmland afar.  It’s quiet, verdant, and all feels (relatively) isolated, especially for a route that starts just twenty minutes’ walk from the city center.

An hour later, we stumbled across Grantchester, and its tea-room of some repute: The Orchard. As the name implies, the outdoor grounds are sprinkled with apple and pear trees; Az and I entered from an adjacent meadow by first squeezing past some cows and then climbing a cattle-fence. I’d hoped to congratulate myself on my little discovery, but turns out this is a place Cambridge students have flocked to for 100 years; The Orchard even offers a glossy brochure listing its famous tea-takers, beginning with Virginia Woolf and ending with John Cleese.

Closer to my own heart, they claim Alan Turing ‘first conceived’ the idea of Artificial Intelligence whilst strolling from Cambridge to The Orchard. I don’t entirely buy it: I’m no genius, but do I spend an inordinate amount of time daydreaming about computers, sci-fi, and othersuch nerdworthy nonsense, and I can say that bits, bytes, and computer cognizance were the last thing on my mind during that pleasant walk. To me, it’s like arguing that Thoreau penned Walden whilst riding the London Underground. Doesn’t jibe, somehow – but, then again, I’m no genius.


orchard sign with paint peeling

Cambridge Days

Shades of grey with sunny highlights

The local weather report is lately prefaced with so many apologies, and so thoroughly riddled with qualifiers, that it’s sometimes hard to tell just what the day’s weather is actually supposed to be.

“Not nearly as nice as it ought to be,” is how the weatherman sheepishly started his routine Sunday. By the end, he was preaching stridently about how things could really be “much, much worse”. The end result, I found, was one of those days where it’s too brisk for a T-shirt, but you’d sweat when wearing a jacket.

I suppose this can’t all be the meteorologist’s fault; the weather’s just uneven, recently. It’s like George Lucas was chosen to direct England’s summer – not original-Star-Wars-George-Lucas, but bad-new-trilogy-George-Lucas – giving the viewer just a few moments of brilliance in a show mostly mediocre, like an hour of warm sunshine shimmering on the Cam, bracketed by a day full of grey dross and rain showers.  But I’ll take it.

cambridge buildings

Cambridge Days

Back to the blogstone

Hey! You know the one restaurant in town that never seems to make it? The name changes, menus shuffled somewhat, day-glo vinyl banners proclaim Grand Reopenings, Under New Management, and so forth… just for a few months, until everything goes dark, again?

Yep, and so it goes with this oft-mothballed blog. Still the same management, alas, but other items are moving around – us, namely – as we’re translocating to England.

It’s a year move, at least, and we’re allowed just four suitcases. All empty, at the moment, save for a single frying pan that’s nestled in one. (Experience has taught us that most furnishings can be cheaply IKEA’d across Europe; obtaining well-seasoned, hard-anodized kitchenware is another story.)

Our saucepan might be a nice addition, too, but unlikely. I briefly considered wearing one, Johnny-Appleseed-style, right onto the plane. (“What ma’am? This? Oh, it’s my Calphalon Safety Helmet. They say Chairman Kaga wears one made of gold, you know.”) But then, what if the overhead luggage spot is full? It’s hard enough to sleep in Coach Class, as is… could be uncomfortable.

Technology Webmonkey Days

Mighty Atom: Really Similar Syndication?

…quick, then, before they hit the lights, a run-down on Atom Syndication.

Update circa, 2008: And hit the lights on Webmonkey they did. For sentimentality’s sake, I’ve reproduced the text of the original article below.

This one was a lot of fun to write:

Mighty Atom: Really Similar Syndication?

Last time around, we took a look at the increasingly popular rss graphic graphic, which developers use to signpost links to RSS (Really Simple Syndication) files. Well, RSS has a new competitor now: the upstart Atom Syndication Format.

Atom files can be recognized by their own cute-as-a-button button,  atom graphic And whether you click rss graphicor atom graphic, you still get the same raw-looking XML markup – not quite fit for human consumption. All these cute buttons have a real and practical purpose, though: Site Syndication.

Those corresponding XML files are designed for audiences using the “news aggregators” — sort of like mini Web browsers — much favored by high-volume websurfers. Using a news aggregator, you can browse through the latest updates from a customized set of your favorite websites in just a fraction of the time it takes to do the same thing using Internet Explorer. Also, because syndication files follow a standardized XML-based format, they allow other sites to integrate your latest links and headlines into their pages, automatically.

So, what’s the difference between Atom Syndication and RSS?

It’s kinda like the difference between “newfangled” Philips screwdrivers and “old-fashioned” Slotted screwdrivers. A few fire-eatin’ types may spend their days trolling the rec.arts.woodworking forums, debating the differences of torque capacity thresholds, drill bit cam-out, patent history, and manufacturing-cost issues between these incompatible rivals. But most of us are more or less resigned to keeping both types of screwdrivers in our toolkits — we’ll use whichever one is handy and fits our needs.

old-school layout.

Similarly, Atom Syndication and RSS are both tools designed to do the same basic job: advertise and distribute website content by creating machine-readable XML newsfeeds.

Chances are, your choice of syndication format will be influenced largely by your choice of Content Management System. Google’s Blogger, for example, pushes the nascent Atom format and includes a pre-built starter template for Atom feeds and Atom feeds only. (All this, even though Atom is still a work-in-progress, only at version 0.3.) Tripod’s Blog Builder tool, on the other hand, offers an RSS 2.0 generator.

Whether you’re invested in a CMS already or you’re still shopping around, it’s a good idea to have a working understanding of both technologies. We went over all the whys and hows of RSS in “Sharing Your Site With RSS,” so in the pages that follow, we’ll be focusing on Atom, starting with why people bothered to build another site syndication format in the first place.

Wherefore Need I Atom?

The best testament to the intelligent design of RSS is its popularity. According to, the RSS 0.9x and 2.0 formats account for about 65 percent of the 25,000+ feeds they track. Mind you, that’s a market that RSS helped pioneer.
Plus RSS (the 0.9x and 2.0 versions evangelized by Dave Winer) has weathered competition before. The unfortunately named RSS 1.0 (a separate RDF-based format) emerged as a standard for developers seeking a more sophisticated, avowedly non-commercial alternative to the RSS variants controlled by Dave Winer’s Userland Software. It’s been a few years, however, and the market still overwhelmingly favors the Fisher Price-simple RSS of versions 0.9x and 2.x.

Meanwhile, quality newsreaders and aggregators have become increasingly adept at handling almost any “flavor” of syndication. Given that, the best way to improve the usefulness of your site is to focus on creating high-quality content, not change the feed format.
So then, why do we need Atom, again?

Dave Winer’s RSS spec is penned in a succinct, conversational style that’s both easy to read and easy to learn. Yet while brevity and informality work great when it comes to getting the masses to adopt new ideas, the spec had some notable gaps and gray areas. And late efforts to re-write or repair the RSS spec were hampered by a variety of issues, such as RSS’s emphasis on backwards compatibility and its adoption of a Creative Commons license.

Perhaps the best anecdotal example of RSS’s “under-specification” is its failure to address how a developer should deal with markup code (like HTML) when it’s mixed up with content. It’s like bubble gum ice cream #151; are you supposed to chew the gumballs while eating the ice-cream, or spit them out, one by one, into a soggy, mottled napkin? Every kid, it seems, has a slightly different take. So it goes with RSS.

Consider the title of a news story. What if certain words in it just have to be italicized, or the article is called, “An Introduction to the <blockquote> Tag”. In both these cases, you’ll need to introduce ruckus-causin’ <i> or <blockquote> tags inside the <title> element of an RSS feed. Now the RSS spec, as written, doesn’t say this is allowed. But then, it doesn’t explicitly prohibit this, either.

So what’s a well-meaning newsreader app supposed to do when it comes across this markup? If it chose to always display the HTML, brackets and all, the second example would work just fine. But then the first headline would look pretty <i>funny</i>. Vice versa, newsreaders that attempt to act on (or hide) HTML tags would make the first headline look nice, but would transform the second healdine into “An Introduction to the Tag”. Hi, tag!

Those aren’t the only issues developers have come across. How do you put relative URLs in a feed? What are proper uses of the <link> tag? Sure these are oddball cases, certainly they aren’t the biggest deals in the world, but they’re just enough of a problem to make certain programs get goofy – not the kind of foundation you want for archiving or API purposes.

Despite these small but potentially pesky foibles of RSS, nobody wants to switch technologies unless there’s an option with wild new features, or at least a couple of major bugfixes. Atom heavily favors the latter. With its lengthier and rather “lawyerly” documentation, the Atom spec nails down many of the quirks, ambiguities, and surprises that became inadvertently enshrined in the now-forever-frozen RSS. And while Atom may not be burgeoning with never-before-seen features, it does add a few goodies: new tags like <summary> and <content>, and proposed attributes like content rel=”fragment”, make it easier to programmatically distinguish excerpts from entries.

Although Atom’s technical improvements over RSS are more evolutionary than revolutionary, don’t mistake this for timidity: the Atom project harbors a fairly agressive agenda. Right now, the loosely knit group that fostered the Atom Syndication Format is simultaneously working on the “Atom API”, an additional set of programming definitions that allows software agents to communicate about basic weblogging actions such as posting, editing, requesting feeds, etc. If all goes according to the Big Plan, Atom-formatted data will be used for far more than newsfeeds. Soon, Atom-enabled programs could help you edit or archive your website without opening a browser, or allow you to painlessly switch between hosting/CMS providers like TypePad or Radio Userland.

Of course it’s hard to tell how all this syndication stuff will shake out in the long run. Right now, Atom’s inchoate “version 0.3” state isn’t perfect or free from glitches, but its momentum towards that ultimate goal has wooed early adopters. To see more technical cases of why key players such as Google and Movable Type are lining up behind Atom, check out Why We Need Echo (Atom’s former name), or these slides. As for the rest of us, let’s get building.

The Building Blocks of Atom

Atom Syndication was created to accomplish the very same task as RSS, so it’s little surprise that Atom code wound up looking like RSS. Atom is newer, and wisely builds off the design successes of its predecessor. If you know how to build a feed with RSS, it’s easy to get started with Atom.
The old “view source” technique probably remains the best pedagogical tool on the Web, so let’s get started building a minimal Atom feed by looking at one. Here’s a minimal example, lifted straight outta the 0.3 spec:

<?xml version="1.0" encoding="utf-8"?>
<feed version="0.3" xmlns="">
<title>dive into mark</title>
<link rel="alternate" type="text/html"
<name>Mark Pilgrim</name>
<title>Atom 0.3 snapshot</title>
<link rel="alternate" type="text/html"

Not so scary, right? It looks a lot like HTML, but since Atom is an application of XML, it’s a tad stricter: Every Atom feed must be well-formed, so when you open a tag, remember to close and nest it properly. XML rigamarole also demands the first two lines, which define the XML version and namespace. Confused? Just cut and paste the opening two lines, they’re the same for every Atom feed.
Now, before we jump into our tag-by-tag, play-by-play breakdown, there’s one design concept that’s worth pointing out. It’s Atom’s idea of “constructs”, which are elements that share a common structure. Don’t worry, it isn’t as fancy as it sounds. It’s just that certain groups of tags always follow the same core rules, wherever and whenever they’re used.

The Person Construct requires a nested element called <name> to be present when describing persons. (Makes sense, right?) Other “child” elements of the Person Construct, <email> and <url>, are considered optional. So whether you’re talking about the author of a feed as a whole, or the author of an individual entry, these basic rules apply: <name> is a must-have tag, <email> and <url> aren’t.

The Date Construct specifies that any time you mention a date, it has to follow a very specific format. This avoids problems like confusing MM/DD/YYYY with the Euro-style DD/MM/YYYY. The rules, in this case, are to follow the delightful, whimsical guidance of “W3C.NOTE-datetime-19980827”, and write your dates just like so: YYYY-MM-DDThh:mmTZD.

The Content Construct lays down the law on dealing with markup and other weird stuff. In short, it tells you to include only plain-text (no markup) inside Content tags, unless you explicitly define another registered media type.

The Link Construct explains how to format all those URLs and links in your feed. The big news here is the mandated inclusion of a “rel” and “type” attribute for every link. The “rel” attribute describes the type of relationship a link represents, while “type” indicates an advisory media type.

Alrighty then. Now that we all understand the Construct-ion of Atom, let’s take a closer look at the code sample above:
<title> If your feed is a duplicate of a Web page, they should both share the same <title>. Simple, that. The <title> element is a Content Construct, so it’s presumed to be plain-text by default.
<modified> A simple timestamp, indicating when your feed was last modified. it’s a Date Construct, so mind your M’s and Y’s.
<author> This refers to the entity that is responsible for the feed as a whole – be it a single person or corporation. Unsurprisingly, <author> is a “Person” construct, which is why you see the <name> tags nested in there, as child elements of <author>.

OK. The elements we’ve covered so far are roughly akin to the <head> of an HTML file. They’re the basic data that describes your feed as a whole. That information won’t change as your feed is updated. Now onto the <entry>s: the dynamic headlines, links, and content you’ll be syndicating. Every time you update your site and add new stories, new entries get added to the Atom feed.

<entry> Each entry is wrapped with an <entry> tag.
<title> When it’s inside an <entry>, the <title> is a Content Construct that conveys a title for that particular entry. One <title> mandatory for each entry.

<link> At least one <link> element is also required for each entry. A <link> can represent a “permalink” for the entry itself, or it can refer to a site that you are commenting on within the entry. To clarify what kind of relationship a given <link> implies, use the required “rel” (relationship) attribute. For each entry, you must have at least one <link> with a “rel” attribute of “alternate”, as shown here. (The link with rel=”alternate” is your permalink.)

<issued> indicates the time an entry was issued. It’s a Date Construct, of course.

<id> elements convey a permanent, globally unique identifier for the entry (akin to the “guid” in RSS 2.0). The <id> element should never change, even if the entry is edited – it’s purpose is to help newsreaders and other software understand that your entry isn’t suddenly “new” just because you’ve moved some punctuation around. A unique <id> is required for each <entry>.

Got that? Now close them tags out, and … congratulations, you’ve just built a bare-bones Atom feed! You are now ready for the final steps: validating and advertising your feed.

Feed the World

Now it’s time to validate your work against the universal Feed Validator. Of course, the Atom Syndication file you’ve created needs to be updated each and every time you add content to your site. To avoid the dull and error-prone task of updating Atom feed manually, most Web builders rely on automated tools to update their feeds. Content management systems like Blogger, MovableType, and TypePad all feature starter Atom templates and tools. And you can expect to see a slew of new freeware server-side scripts to process and create Atom feeds soon.

After uploading your Atom file to your server, you still need to inform people of its existence. When you create a new feed, running through these few steps will debut your feed in style.

  1. Advertise to Web surfers! Put an atom graphic button or text link on your page, linking to the feed file. Go ahead, right-click, Save as …
  2. Advertise to the machines! Some newsreader / aggregator applications will identify your feed’s location if you put the following “autodiscovery” code in the <head> section of your homepage: <link rel=“alternate” type=“application/atom+xml” title=“MySite’s Atom feed” href=“” />
  3. Get listed by the major feed directories! and News Is Free are two of the biggest collections of RSS and Atom feeds. Before announcing yourself to these sites, however, run a final test against the Atom feed validator. While Web browsers will render many poorly coded Web pages, Atom (and RSS) parsers can be less forgiving, and may require a well-formed XML file.

There. That should be plenty enough to get you started. Note that this was just an introduction to Atom, which means we’ve covered only the basic required elements. For early adopters, there are plenty of additional, optional tags you can integrate into your feed to make it more robust. Good places to look for ways to bulk up your feed include, the official Atom Syndication Format specification, Cover Pages, and the Atom Wiki. Good luck, have fun, and stay on target!

Thanks to Brent Simmons, Graham Parks, Jeff Barr, Sam Ruby, and Mark Pilgrim for pointers and clarifications when researching this article.  This article was originally published on

Technology Webmonkey Days

Metadata: FOAF, RDF and geourl

This blog now seems to be officially shuttered for the summer. ‘Cause it’s sunny out.

Elsewhere, though, there’s this: Metadata, Mark II, an overview of some nifty metadata technologies.

Update, 2008:  Webmonkey shuttered its doors not too long after this article was published. While outdated already, I’ve pasted the original text of the article below – ’twas one of the last freelance writing bits I did while living in Rome.

Metadata, Mark II: FOAF, RDF, GeoURL, and SMBmeta

Remember META tags? Once upon a time, a finely crafted META keyword tag would get you the bourgeois treatment from search engines. You could specify exactly which search words should be associated with your site and, best of all, META tags were invisible to users, allowing webmasters a touch of the ol’ “editorial liberty.”

Yeah. *That* didn’t last. Almost instantly, META tags were abused and mis-used by pageview-hungry Web developers, who crammed all sorts of irrelevant and naughty keywords in their pages, trying to shunt the flow of Web traffic their way. And now today Google and other search engines essentially ignore META keyword tags.

(Of course, if you’re absolutely adamant that your page be promoted in response to specific search terms, Google, Yahoo, HotBot and the gang are happy to help, but with an improved targeted-placement technique far less attractive to spammers: It’s called Advertising, and it costs cash-money.)

End of story? That’d be sad, indeed, because META keyword tags were a rather sweet idea, at least on paper: short, sensible descriptions of your site, tailored so that machines could quickly read and index it, and subsequently help people find it.

Well, META’s not dead.

In the pages that follow, I’ll be giving you a bird’s eye view of a few independent technologies, each aspiring to get useful *metadata* back into the Web. Some are homegrown, some corporate, and some academic, but all of them let you enhance your site with useful information and improve the ways your site is associated with other sites. Sound interesting? Good, then here’s the game plan:

1. We’ll start with an explanation of that *metadata* word (so we can finally quit italicizing it).

2. Next comes a tour of the platitudes and latitudes of GeoURL, a fun, on-your-site-in-just-ten-minutes META tag that pinpoints your webpage’s real-world location with GPS-style accuracy.

3. Then we’ll check out SMBmeta, a newly launched metadata framework designed to give small businesses their fair share of the Web limelight.

4. We’ll finish up with a macro look at some of the “Semantic Web” standards favored by the W3C: Dublin Core and RDF — and we’ll show them off a bit with FOAF (Friend of a Friend), an application which leverages both those high-minded efforts.

OK then, let’s get started!

Metadata Background

A lot of smart people (like Tim Berners-Lee, who merely *invented* the Web) are still laboring to make the big dream behind the old “META keyword” come true. That concept is Metadata, which, strictly speaking, means “data about data”, but in our context means “stuff describing your Web page as a whole”: who wrote it, what it’s about, related concepts or categories, the date it was written or updated, the language it’s written in, who controls the copyright, physical locations it describes, if there’s a Table of Contents, etc., etc.

The point is, nobody necessarily wants to *see* all those details cluttering every single Web page. But if that data were invisible, machine-readable, and used to describe both the contents and *context* of Web pages, that would open a lot of possibilities, allowing Things of Great Niftiness to ensue. The W3C calls this ambitious idea the “Semantic Web”:

“The Semantic Web will bring structure to the meaningful content of Web pages, creating an environment where software agents roaming from page to page can readily carry out sophisticated tasks for users.”
Scientific American, May 2001

So having a metadata-rich Web wouldn’t just improve our user experience as we search and surf the Web, but it would also augment the ability for “robots” or software agents to collect and process information on our behalf.

When people talk about the “Semantic” Web adding *meaning* to the Web, it’s not really for you and me — you and I generally understand whatever we’re reading, and know which links we need to click to get certain tasks done — it’s about adding *meaning* that machines can process and navigate.

Whoooo! Robots! Software Agents!

Indeed. Which brings us to this rather key caveat: In this tutorial, we’ll be looking at a mishmash of different technologies that are not yet widely adopted, and may well never be. (That includes the aforementioned Robot Agents.) As of press time, *not a one* of these technologies will deliver the slightest boost to your Google PageRank or your listing on HotBot. And there’s absolutely no guarantee they will in the future.

Still, ‘tis better to lead than follow, and more fun to fiddle with emergent tech than to wait for the “critical masses” to show up and master it before you, right?

Let’s start, then, with a metadata application already adopted by thousands of webloggers, because it’s fun and entertaining but keeps its feet firmly anchored in the real world: GeoURL.

Getting on the Map with GeoURL

With just two lines of code, GeoURL maps documents in cyberspace to real-world locations. Once you’ve added your site to GeoURL’s database, you can immediately see who else has registered Web pages in (or about) your neighborhood.
Here’s what the code looks like: The first line contains your Latitude and Longitude, and the second line contains your site’s name.

<meta name=“geo.position” content=“41.8833; 12.500” />
<meta name=“DC.title” content=“Professor Falken’s weblog” />

(here’s a fill-in-the-blanks example:)

<meta name=“geo.position” content=“xx.xxxx; yy.yyyy” />
<meta name=“DC.title” content=“your site’s name here” />

In old hacker lingo, these coordinates are called an “ICBM Address.” (Like, Inter-Continental Ballistic Missile.) The Cold War humor might now be passe, but you may still might want to use discretion here: If you aren’t comfortable publishing your street address to the Web, you can use broader coordinates here, like those corresponding to your postal ZIP, City Hall, etc.

So, where do you get these coordinates, anyways? Your own latitude and longitude can be grabbed from a GPS device, but otherwise, free Web services make it an easy lookup. For sites in the U.S., GeoURL’s Geocoder page is the easiest method; it quickly converts street addresses into regional coordinates. (Nifty, no? A crafty search spider which sniffed out postal addresses in web pages and indexed them by location with this technique won Google’s 2002 Programming Contest.)
For those outside the U.S., handy lookup services include the Getty Thesaurus of Geographic Names, GeoNET, and MultiMap, though you’ll need to jump through a few additional hoops. Additional lookup resources are available here, along with GeoURLs own documentation.

When you’ve figured out your coordinates, edit the Latitude, Longitude, and Your-Site-Name portions of the GeoURL tags, then drop them into the of your document. You’ll now need to request for the GeoURL crawler to visit your site, and in a few minutes, you’re listed &#!51; a twinkling blue pinpoint on their global map of sites!

Now, I chose GeoURL as our first example because it’s not only fun (been poking around your neighbors’ sites already?) and a fast setup, but because it nicely exemplifies how a “controlled vocabulary” makes these locations machine-understandable and easier to search for in a database. In this case, the “controlled vocabulary” is the numerical latitude and longitude coordinates of the ICBM address.
After all, while you might advertise your store’s address like this for newspaper readers:

“we’re two doors down from Starbucks, over by the mall.””

But a machine isn’t going to understand this:

<meta name=“geo.position” content=“ we’re two doors down from Starbucks, over by the mall.” />

I know what you’re thinking: Most folks don’t search for, say, local restaurants by keying in latitudes and longitudes. And that’s why controlled vocabs often include a thesaurus that maps equivalent relationships to one another — for instance, a list of City Names and GPS coordinates. Ideally, producers like you and I take care of machine-readable data, while search engines would build the thesauri.

There’s one more educational nugget in the GeoURL example: That “DC.title” name used to identify your site’s name. “DC.title” isn’t an arbitrary term, it’s a smart use of the
Dublin Core vocabulary, which we’ll cover in detail later.

While Dublin Core simmers on the backburner, let’s take a gander at SMBmeta.

Getting Busy with SMBmeta

SMBmeta knows its clientele: (S)mall and (M)edium-sized (B)usinesses. These entrepreneurial enterprises — like florists and lumberyards, candy stores and dentists — are the powerhouse of the U.S. economy. On their own, however, these local shops can’t afford an in-house Web team, or compete with Fortune-500-style advertising budgets. For many of them, simply being found on the Web is a challenge.

SMBmeta helps them by providing a “virtual Rolodex card” of sorts — a limited set of data fields, which when used like a fill-in-the-blanks template, can describe any business. The data fields cover all the small-biz essentials: name, description, address, parking, store hours, etc. To make sure it’s all easily machine-readable (and easier to search against), SMBmeta information is stored as XML, in which the tag attributes come from a controlled vocabulary but you can freely add your own descriptive content. Check out this code sample (lifted and condensed from SMBmeta’s docs), especially the line we bolded:

<?xml version=“1.0” encoding=“UTF-8”?>
<smbmeta version=“0.9” xmlns=“”>
<business domain=“”>
<name>Concord Eggplant Restaurant</name>
<description>Innovative vegetarian food for young and old.</description>
<type naics=“722110”>Vegetarian restaurant</type>
<location country=“us” postalCode=“01742”>
<address href=“”>300 Baker Avenue</address>
<languageSpoken language=“en-us”>English</languageSpoken>
<hours day=“all” open=“1130” close=“2130” timezone=“local” />
<parking type=“on-street”>Lots of metered on-street parking</parking>
<publicTransportation type=“train” blocksAway=“3”>Five minute walk from the commuter rail</publicTransportation>

The <publicTransportation> element showcases a nice balance between natural-language descriptions and a controlled vocabulary of tag attributes. With transportation types strictly limited to a fixed selection of *bus, train, subway, trolley, cable-car,* or *other*, and the blocksAway attribute limited to integer numbers, this data can easily be parsed, indexed, and searched against. (It’s also immune from poetic hyperbole.) The machine-friendly info is supplemented by the element’s contents, “Five minute walk from the commuter rail,” a bit of information tailored for the human reader.

Having seen some code, you might wonder what’s particularly novel, here. After all, SMBmeta’s XML structure is rudimentary, simple enough to be read (and hand-edited) by humans. The rather meager collection of tags, with names like <Location> and <LanguageSpoken> can be understood at face value. Furthermore, you don’t need to register your SMBmeta file with any Web service or registry — just upload it to your domain. For brand-new technology (SMBmeta was launched in 2003) this looks like code five years outta fashion. There’s no complicated RDF syntax, no small-biz ontology, no Dublin Core vocab — almost no fancy footwork at all.

And that’s exactly the point.

SMBmeta’s creator, Dan Bricklin, hasn’t been shy about citing the RSS (Really Simple Syndication) format as an inspiration; he’s hoping a similar grassroots, bottom-up adoption of SMBmeta will lend critical mass to the format. We Webmonkeys may be in the tutorial business, but we’ll readily admit that “View Source” can be the most edifying tutorial of all; SMBmeta knows this, too, and is engineered so that middling HTML hackers can cut, paste, and tweak our way into this technology without any tools other than a text editor and the (mercifully short) spec sheet.

But even though SMBmeta emulates Fisher-Price’s focus on user-friendly design, it’s not entirely painless to implement, and purposely so. Recalling the ignominious fate suffered by the “META keyword” tag at the hands of spammers, SMBmeta developers tossed a tiny hurdle into the setup process: the SMBmeta XML file must live at the very top (root) level of the domain, and you’re allowed just one SMBmeta file per domain. This restriction discourages spammers from flooding or shotgunning the system, since registering a multitude of domain names costs a fair chunk of change. There’s other spam-proofing features built into the foundations of SMBmeta, like pointers to third-party “affirmation” authorities, who certify that your descriptions match your website and real-world offerings, and who blacklist offending entries (and their entire domains, for that matter). If the cat-and-mouse arms race between spammers and searchers intrigues you, this essay outlines the anti-spam groundwork performed by the SMBmeta folks.

Obviously, SMBmeta is a well-thought-out format, and one that addresses a real need in the Small Biz community. It’s new, though, so it’s still too early to know if it will succeed. SMBmeta faces the same chicken-and-egg conundrum as most other metadata efforts: It’s unlikely to gain search-engine support before it’s widely adopted, but unlikely to be widely-adopted before it gains search-engine support.
But c’mon, why not take the leap and build a file for your business? (When it does take off, you’ll be leading the pack.) Instead of a tag-by-tag tutorial to get you started, we’ll point you at’s Web-based form, which will spit out skeletal SMBmeta file for you in just a few minutes. Using the spec, you can quickly add any accoutrements and double-check your work. Then, after uploading your file to your own domain, we’d suggest you register your site with, a fledgling SMBmeta directory and search engine.

Now, on to Dublin Core, the super-flexible, way-modular, librarian-friendly metadata vocabulary that only *sounds* like an Irish heavy metal band.

Dublin Core Curriculum

Back in 1995, a motley crew of 100-odd software engineers, librarians, and Web architects held a workshop in Dublin, Ohio, to tackle a familiar problem: the difficulty of finding stuff on the Web. (And this was back when the web contained a cute half-a-million documents, today it’s in the billions.)
The result of the brainstorming session was a core set of 15 metadata elements, designed to describe any resource (like Web pages, images, music) available via the Web or other networks. The collection was dubbed “The Dublin Core.” These are the elements:

  1. Title — A name given to the resource.
  2. Creator — Who/what is primarily responsible for making the content.
  3. Subject — The topic of the content.
  4. Description — Gives an account of the resource’s content.
  5. Publisher — entity responsible for making the resource available.
  6. Date — Typically, the creation or availability date.
  7. Language — The language.
  8. Contributor — An entity that’s made contributions to the content.
  9. Source — A reference to a resource from which the present resource is derived.
  10. Format — The physical or digital manifestation of the resource.
  11. Type — The nature or genre of the content.
  12. Resource identifier — An unambiguous reference to the resource, like a URL, URI, or ISSN#.
  13. Relation — A reference to a related resource.
  14. Coverage — The extent or scope of the content (e.g., a place or time).
  15. Rights management — Information on Intellectual Property Rights, Copyright, etc.

You can see bibliographic influences in this vocabulary, but it’s still simpler than the heavyweight record-keeping languages used by professional catalogers for libraries and museums. With an eye towards flexibility and ease-of-use, Dublin Core allows everything to be optional — you can use as few or as many of the elements as you wish. Additionally, the Dublin Core team avoided wasting time talking about syntax and other implementation details, at least at first. (Encoding Dublin Core information in HTML META tags is now detailed here, though this more modern working draft may now be a better guide.) In short, tags look like this:

<meta name=“dc:element” content=“Value” />

As in:

<meta name=“dc:title” content=“Webmonkey” />
<meta name=“dc:date” content=“2003-06-20” />

Fast-forward to 2003, and Dublin Core hasn’t impacted the life of the average websurfer much. Mainstream search engines don’t give much weight to DC META tags, nor did commercial sites or homepages ever really bother to include Dublin Core markup in their pages. Despite their ignorance of “resource description languages” (or, arguably, because of it) brute-force, free-text search applications like Google reign unthreatened as the kings of Web research. It’s tempting, therefore, to write off the Dublin Core effort.

Of course, it’s also tempting to write off Carrot Top as a comedian. Yet between 1-800-CALL-ATT commercials and Hollywood Squares spots, it’s evident that the foul humorist’s career doth liveth still, finding new life on radio and TV airwaves. And so it is with Dublin Core.

See, Dublin Core elements have a surprising habit of making cameo appearances in other metadata frameworks. In fact, it’s in this other context that you’re most likely to find tell-tale terms like “DC.title” and “DC.subject” today. Dublin Core elements are regularly used as building blocks within richer and more specific metadata frameworks. This way, even if a spider doesn’t understand the entirety of a metadata language, it can still recognize the lowest-common-denominator DC objects, making the Dublin Core a sort of lingua franca among different metadata languages. Big organizations (like university library systems) especially rely on Dublin Core to enable searching across heterogeneous databases.

And you? Remember that code needed to get listed in GeoURL?

<meta name=“geo.position” content=“41.8833; 12.500” />
<meta name=“DC.title” content=“Professor Falken’s weblog” />

That first line is specific to GeoURL’s crawler, but the second line is a generic expression of the Dublin Core “Title” element. By adding GeoURL code to your page, you’ve also made it possible for any Dublin-Core-savvy spider or agent to identify the title of your work. (It also makes it easier for human programmers to recognize the meaning of that metada, even if they don’t fully understand the description model being used.)

So what other sorts of metadata frameworks are there? Let’s take a look at the big one, a general-use framework intended to describe anything and everything on the Web, RDF.

Resource Description Framework

RDF stands for “Resource Description Framework,” and like other metadata we’ve examined so far, it’s just another way of describing resources on the Web. RDF, however, is an official initiative of the W3C, the same folks who wrote the specifications for HTML, XML, and CSS. (*Nobody* breeds prize-winning acronyms like the W3C.)
Apart from its esteemed pedigree, RDF is remarkable for its large scope: It was designed to be a super-encompassing framework providing interoperability between different types of metadata. RDF can describe a single Web page, but also inter-relationships between a page and other resources on the Web. Likewise, RDF-crawling applications can do more than just parse one page’s worth of metadata — they can independently follow links to other metadata resources, placing things within a larger context. Even if an RDF agent wasn’t originally designed to handle the kind of metadata on your page, it may be able to automatically “learn” enough to process it meaningfully anyhow.

RDF has a rep for being academic and hard to understand. In truth, advanced RDF gets almost philosophically abstract, not to mention technically tricky. But the basics aren’t bad at all.
At heart, RDF is just a list of sentence-like assertions, or “Statements.” Like this here:

(This article) (is authored by) (Jason Cook)
(subject) (predicate) (object)

Happily, that’s as complicated as any single RDF statement gets. Every statement *must* follow that simple, three-part structure of Subject, Predicate, and Object; because of this, RDF statements are often referred to as “triples.” Witty, neh?

You’ll notice that statements always describe the relationship (the predicate) between the subject and object, like in this triplet o’ triples:

(This article) (is authored by) (Jason Cook)
(Jason Cook) (has email) (
(Jason Cook) (has homepage) (

On a cocktail napkin, we’d graph those relationships out like so:

[image lost to time]

Well, turns out my sketch above isn’t only cute-as-a-button, but it actually represents a directed graph, a type of mathematical model that’s easily traversed by computer algorithms, and which scales well to millions of nodes. That’s good news for agents like search-and-retrieval spiders.
Apart from illustrating the flow of direction from Subject to Object, you’ll notice that every arrow in my sketch is also attatched to a URL. That’s important. In fact, it’s downright key. Because while you and I already have a shared notion of what the “Is Authored By” relationship implies, a computer doesn’t.

Having a URI associated with each arrow (each predicate) allows software agents to follow links when they need more information about the properties of a relationship. For example, you could provide guidelines saying, “watch out, neither ‘January 23rd, 1971’ nor ‘Sinatra_MyWay.MP3’ is a plausible Author”.

One common way of doing this is by putting a ‘schema’, in the form of a XML namespace document, at the URI. Obviously, a schema can’t give sci-fi-type Artificial Intelligence to any crawler that visits it, but it can list useful rules like “Authors sometimes have email addresses” to aid data-gathering.
Thankfully, you don’t have to code all this by yourself! A big benefit of using URI-based schemas is that you can piggyback off of previous work on the web, and refer to terms already defined by others. For instance, if you include concepts like ‘Author’ and ‘Title’ in your metadata, might as well link to a common schema like Dublin Core to define those terms for you.

Another benefit of tacking URIs onto every relationship expressed within an RDF file is that it eases those awkward moments when people insist on using different vocabularies to describe the same stuff.
Let’s say my metadata uses:

( ThisArticle )( isAuthoredBy )( Jason ) … while most folks prefer … ( ThisArticle )( DC:Creator )( Jason ) With RDF, I can append machine-readable instructions to the (isAuthoredBy) URI which explains, “(isAuthoredBy) is equivalent to (DC:Creator)”. Theoretically, that’s enough for a clever Dublin-Core-aware agent to translate and processs my metadata.

Before we get too abstract, let’s see some code. Here’s our sketch example, in RDF-XML syntax:

  1. <rdf:RDF
  2. xmlns:rdf=“”
  3. <rdf:Description about=“”>
  4. <dc:title>Metadata Redux</dc:title>
  5. <dc:creator rdf:resource=“#Jason" />
  6. <foaf:Person rdf:ID="Jason”>
    <foaf:name>Jason Cook</foaf:name>
    <foaf:homepage rdf:resource=“”/>

Here’s a play-by-play commentary of what’s going on in the code above:

  1. Get the party started with the opening <rdf:RDF> tag.
  2. Inside the opening tag, we list URIs for three schemas used in this document. The first, the RDF schema, is always required. The second is the Dublin Core vocabulary, ideal for describing publications. The third is the Friend Of A Friend vocabulary, a set of terms that describe people.
  3. We start describing our resource with URI (Webmonkey/thisarticle) i.e., this article.
  4. We say the resource has a title, “Metadata Redux.” Recall that big to-do about needing a URI associated with every relationship? By using the <DC.title> tag, not native to RDF, but added via the Dublin Core namespace, we’ve implicitly named the Dublin Core schema as the URI containing details of what the “DC.title” relationship means.
  5. Ditto with DC:creator, though instead of giving a value, we reference a more detailed description (‘#Jason’) a few lines down.
  6. Using the FOAF vocabulary, we describe the resource “Jason,” identifying it as a Person, and then give him an Email and Homepage. Again, the semantics of all these relationships are in the FOAF schema listed up top.

By the way, a nice thing about coding RDF’s relational model is that it doesn’t much matter what order stuff gets listed in — it’s the (metaphorical) bubbles, boxes, and arrows of relationships that counts.
Phew! If that seemed intimidating, know this: RDF-XML code has a reputation for looking gnarly. Some RDF proponents argue that end-users shouldn’t worry about code or syntax, because one day, RDF will be hidden into tools like Dreamweaver, Word, or MovableType. That’s a debatable defense, but we can already point you to one such no-brainer tool that paints your biographical portrait in RDF, using Friend Of A Friend vocabulary. It’s kinda fun, and you won’t need to type a line of code, promise.

Friend Of A Friend does for humans what SMBmeta does for small businesses: It provides a metadata vocabulary that’s excellent for describing a specific thing — in this case, People.
Another benefit of FOAF is that it’s an application of RDF. Leveraging RDF’s proclivity for expressing relationships, FOAF links your profile in with that of your coworkers, your friends, and other on-line communities. (And, yes, this technology can be utilized to share pictures of cats.)

Like we said earlier, you don’t need to code anything to build a basic FOAF file. A quick visit to the FOAF-A-Matic website can generate one for you, Jetsons-style. After that, just upload it to your website, optionally adding a self-discovery link in your site’s <head>. Browsers like FOAF Explorer and FOAFnaut will help you visualize and hop around the FOAF universe, and they’re handy for validating your code, too.

‘Course, the terminally-curious among you (anybody still sticking around) probably won’t be satisfied without knowing a smidgeon more about what you can do with FOAF. For starters, you can slather on as much metadata from other vocabularies as you wish.

For instance, here’s a FOAFy file which talks about somebody’s location, but in two different ways: First, it uses a property from the FOAF vocabulary called ‘based_near’, which itself uses part of a geographic vocabulary called ‘geo’. Second, it tacks on ‘NearestAirport’ data from a completely different vocabulary called ‘contact’, which in turn relies on a more-specific ‘Airport’ vocabulary. (I point to the *xmlns:geo*, *xmlns:contact*, and *xmlns:airport* namespaces up top, so that RDF crawlers can understand what I’m talking about when I specify lattitudinal geographic coordinates, or the three-character Airport codes.)

xmlns:airport=“" >
<foaf:name>Samuel Clemens</foaf:name>
<foaf:nick>Mark Twain</foaf:nick>
<geo:Point geo:lat="41.8833" geo:long="12.5”/>

RDF’s core extensibility adds a touch of free-market flair: As new schemas and vocabularies become popular, they can be easily added to your FOAF without breaking the file.

But really, what’s the point of all this? Is this not just the mortal sin of vanity, marked up in XML?
Tough call. At the moment, FOAF suffers the same Adoption vs. Support catch-22 that hampers most new technologies. (And most new technologies bite the dust.) Like the early hypertext Web, however, a community of individuals is convinced there’s something intrinsically nifty about this technology, and they’re determinedly tinkering away on it, releasing homespun apps like FOAFexplorer, FOAF: web view, FOAFbot, and others.

Some have proposed “Web of trust,” community profiling, and anti-spam applications for FOAF, but these are, by and large, still experimental.

As is the Semantic Web itself.

We’d suggest you hedge your bets, watch this space, and in the meanwhiles, have fun building!

*Originally published on, August 2003. Page updated Sept. 2008.*