Tuesday, April 11, 2006

Code, and other laws... (part 2)

In part 1 I talked about the ideal world where feeds were all clearly licensed. So now I'll turn to the real world, and I'll be very US-centric because this article is quite long enough as it is. You might want to skip to the happy fun summary at the bottom.

Millions of feeds aren't explicitly licensed.  Some can't be because their generators don't allow for it.  For others, the owner doesn't know or care about licensing.  For unlicensed feeds, it's not reasonable to make the default assumption "nothing more than fair use" because there are millions of feeds out there whose owners want their content syndicated as-is (headline feeds with links back to content, for example).  On the other hand, if you assume anything more than fair use, you also need to be prepared handle exceptions.  So how to do both of these in a way that minimizes overhead and lets aggregation happen without lawyers while respecting copyright?

My take is that a reasonable default assumption is to assume the Creative Commons Attribution license only if the feed owner hasn't specified otherwise. 
This means that by default, we'd assume that copying of feed content is allowed as long as attribution is given through an appropriate hyperlink.  Then, provide easy ways to let feed owners specify a different license whenever they explicitly declare one. 

If a feed owner is happy with the default, they need to do nothing.  My sense is that this covers 98% of unlicensed feeds.  For the remainder, a feed owner could go to individual aggregators and tell them explicitly what license they prefer.  They can always choose a completely restrictive license that allows only fair use for the general public.  Or, they can choose a noncommercial license.  My take is that something equivalent to the current Creative Commons license chooser is sufficient.

Of course, what we'd all really prefer is for feed owners to put the licenses in their feeds directly.  That way, our AOL proxies and caches would simply pass the information along to clients, which would make appropriate decisions about what to do based on the particular license.  If we're dealing with a small number of well understood licenses, this is the easy part.

How should the feed licenses work?  There's a pretty good page with reasonable recommendations at Creative Commons on the subject.  James Snell's Feed License Link Relation works well for Atom and is pretty flexible:
<link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
The Creative Commons RSS Module works for RSS 2.0:
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5</creativeCommons:license>. 
Both of these work with CC and other licenses and have been deployed in real implementations  There's an RDF version for RSS 1.0 as well (cc:license).

Finally there's the RSS 2.0 <copyright> element, which is just plain text.  But, given that some tools might allow people to put text in this field but not embed the other types of licenses, I think it's reasonable to look for a known license URL in the copyright text as well:
<copyright>The contents of this feed are licensed to the public under http://creativecommons.org/licenses/by-nc-sa/1.0/</copyright>
If a processor can't find any of the above licenses, I'm proposing that AOL feed consumers fall back to a license based on an explicit list that AOL maintains by feed owner request.  This would be part of our feed infrastructure.  I see this working two ways.  First, we would add metadata to feeds which are requested via our feed proxies.  For Atom and RSS 2.0, the two output formats we support, this would be a namespaced extension, aol:declared-license:
<aol:declared-license>
      <link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
</aol:declared-license>
It would contain a Feed License Link Relation indicating which license the owner specified to AOL.  It could potentially contain multiple license links.  It could contain other namespaced elements in the future as well, but feed consumers can ignore ones they don't understand.

A client might also want to inquire about a feed's declared license without retrieving it.  For this, we could provide a simple REST API:
GET http://example.aol.com/declared-license/example.org/feed/atom.xml
which returns a simple XML document:
<?xml version="1.0" encoding="utf-8" ?>
<declared-license xmlns="http://example.aol.com/2006/aolfeeds">
    <link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
</declared-license>
Note that non-AOL clients could potentially make use of this; you'd just have to believe that AOL is maintaining a good declared license list (the licenses themselves are the ones the feed owners want to provide to the general public, not to AOL specifically).  We could even potentially share these lists between feed aggregators.  An embedded (original) license would always override any declared license; this would let feed owners easily start embedding their own licenses in the future.  (Should we eliminate any declared license as soon as the source feed starts licensing itself?  I think so, but our legal team would need to weigh in on that.)

Finally, we'd advertise a variety of ways for feed owners to contact us and declare their licenses.  There does need to be some sort of validation step to ensure they really own the feed.  As part of the hopefully painless process we'd ask them to pick from one of the existing Creative Commons licenses.  If these aren't sufficient we can add other licenses but it's easier all around if people can agree on a small set.

How about a real world example?  Brian Alvey of Weblogs Inc. recently announced support for excerpt feeds, for example Engadget full vs. Engadget headlines.  The full Engadget feed has the copyright statement:
<copyright>Copyright 2006 Weblogs, Inc. The contents of this feed are available for non-commercial use only.</copyright>
Translating into license-speak, we'd get an Attribution-NonCommercial-NoDerivs license for the full feed, meaning no commercial exploitation, links back are required, and editing of the material is not allowed beyond fair use:
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/2.5/</creativeCommons:license>
The excerpt Engadget feed has the copyright statement:
<copyright>Copyright 2006 Blogsmith, LLC. The contents of this headlines and excerpts feed are available for limited commercial distribution. You may repost this feed to your site provided you link back to the original story, do not edit the material, and do not remove this copyright notice.</copyright>
Translating into license-speak, we'd get Attribution-NoDerivs for the excerpt feed, meaning that commercial use is OK but links back are required and the material may not be edited:
<creativeCommons:license>http://creativecommons.org/licenses/by-nd/2.5/</creativeCommons:license>
(I'm assuming here that the restriction on editing applies to the individual articles, not the feed document as a whole, since feed documents are not intended to be kept intact in any case.  This minor ambiguity goes away with Atom's Feed License Link Relation.)

So far, so good.  Having multiple versions does raise the question of how automated processors are supposed to find these feeds.  I think that's going to have to be a followup post.

That's about it.
In summary:
None of this is black or white.  I should also mention that I'm completely conflicted here, in that my company both syndicates and aggregates content and I'm directly involved on both sides.  I'm coming at this from the viewpoint of someone trying to provide online feed aggregation services where the end users subscribe to the feeds; they're not being selected or screened by editors.  In other situations other rules about default licences might be better.  Explicit licences are definitely best to avoid problems down the road.  Here are some other links I've stumbled across:  A basic practical primer on copyright and RSS. One re-aggregator's viewpoint (Palfrey).  Producer's viewpoints: Shelley Powers, Om Malik (here and here) .  Some legal discussion (with Wendy Seltzer, previously of the EFF, weighing in). (Feedburner already does CC licensing following the methodsoutlined above, except that they're using the creativeCommons namespace extension for Atom as well as RSS 2.0; consumers should look for either one in Atom feeds.)

Tags: , Creative Commons, RSS, Atom, syndication

1 comment:

Anonymous said...

John,
   This is great however I think this alludes to the question what is AOLs plans for the content of peoples blogs?...yeah true AOL (in part I assume) owns the content of any feeds/blogs posted however for the majority of our users who have no clue this would raise questions and probably lose users...

  As far as "what we do" with it is something that is of non-consequence however...this could be preceived as "duping" members into acquiring their content. I would just be cautious as to what we lead our members to follow, namely the concept of owning their journals and the potential profit vs. creating the content and leaving it as that.