Friday, December 15, 2006

Why AOL Should Go OpenID

I've argued before that identity is a building block -- an essential amino acid, if you will -- for social networks.  It's far from the only thing you need, but without stable, persistent, verifiable identity, it's very hard to build relationships.  It's so important that there are specialized subnets in the human brain that recognize voices and human faces to help you remember people.

The digital world doesn't work like that.  Identifying someone online is hard.  Even solving the more limited problem of verifying that this person is the same person who you were socializing with yesterday online is not trivial.  All social software has some mechanism for letting people verify some online identity -- usually a user name and password.  Of course that just means that you have different user names for different services.  In the new "Web 2.0" world, though, a primary rule is for services to be open and interoperate and play together.  That's difficult if people have to remember that you're leetjedi67 on service A and urtha52 on service B.  It's fine if you want to do that, but most people want to be themselves most of the time.  And our infrastructures don't allow for that.

Well, at least they didn't.  There's a remarkable convergence of user centric identity systems happening right now.  At the lightweight end, basically everyone has converged on the OpenID standard.  This lets you be leetjedi.net everywhere if you want.  Or at least everywhere that supports OpenID.  The first, most practical benefit is that you won't need to fill out another registration screen on most new services.  The more long term benefit is that you get to keep your identity and your reputation with you as you move between services.

Of course none of this matters if companies don't adopt it, so what's the benefit for them?  Well, if their service involves a social network, it gains immediate access to both a network and an ecosystem of services which work with it.  The value of a social network grows quadratically with the number of users; the value increases linearly as the difficulty in connecting two users drops.  Connecting two OpenID users with is a lot easier than if you have to convince one or both to acquire a new identity.

This is the big value in promoting and leveraging a common standard.  Even Microsoft is adopting open standards for their CardSpace identity system (and CardSpace and OpenID are talking cordially to each other, by the way).  So embracing the open network, leveraging the quadratic multiplier in network value, and competing on value added services is really the way to go.  Of course this means that you are opening up your own services to more competition as well as cooperation).  Since AOL has already committed to open web services, this is a logical next step.  Just playing around with ideas:  What would happen if every AIM user name were OpenID enabled?  What if you didn't need to even register to use UnCut Video, AIM Pages, or AOL Journals

Tags: , , , , ,

Tuesday, December 12, 2006

Atom API for AOL Journals

Journals exposes a very complete API for creating and managing blogs, entries, and comments.  I'm working on getting the API documentation up on dev.aol.com sometime soon.  But it's very easy to get started with basic blog posts.  Here's an example using curl, that would post to this blog, if my password were MYPASSWORD:

curl -k -sS --include --location-trusted --request POST --url 'https://journals.aol.com/_atom/journal/panzerjohn/abstractioneer' --data @entry.xml --header 'Content-Type: application/atom+xml; charset=utf-8' --user panzerjohn:MYPASSWORD

where entry.xml is the Atom entry to be created, like this:
<?xml version="1.0" encoding="utf-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:aj="http://journals.aol.com/_atom/aj#">
<title>Blog entry title</title>
<published></published>
<content type="html">
   Hello World!
</content>
</entry>
On success, you'll see something like this in response:
HTTP/1.1 201 Created
Date: Tue, 12 Dec 2006 18:21:57 GMT
Server: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7e mod_jk/1.2.14 mod_rsp20/RSP_Apache2_v6_2.05-08-11:mod_rsp20.so.rhe_x86-3.v8_r1.44
Set-Cookie: RSP_DAEMON=1ceaffc0a8b18da03cfaaea9b70f236f; path=/; domain=journals.aol.com; HttpOnly
Set-Cookie: MC_UNAUTH=1; path=/; domain=journals.aol.com
Location: http://journals.aol.com/_atom/journal/panzerjohn/abstractioneer/entryid=168
Transfer-Encoding: chunked
Content-Type: application/atom+xml;charset=UTF-8

<?xml version="1.0" encoding="utf-8" ?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:aj="http://journals.aol.com/_atom/aj#">
<link rel="alternate" type="text/html" href="http://journals.aol.com/panzerjohn/abstractioneer/entries/2006/12/12/blog-entry-title/168" />
<link rel="http://journals.aol.com/service.edit" type="application/atom+xml"
href="http://journals.aol.com/_atom/journal/panzerjohn/abstractioneer/entryid=168" />
<link rel="http://journals.aol.com/comments" type="application/atom+xml" title="Comments feed for this entry"
href="http://journals.aol.com/panzerjohn/abstractioneer/entries/2006/12/12/blog-entry-title/168/atom.xml" />
<id>tag:journals.aol.com,2003:/panzerjohn/abstractioneer/168</id>
<title type="text"><![CDATA[Blog entry title]]></title>
<updated>2006-12-12T18:21:00Z</updated>
<published>2006-12-12T18:21:00Z</published>
<author>
<name>panzerjohn</name>
</author>
<aj:entrySource>AtomAPI</aj:entrySource>
<aj:mood>0</aj:mood>
<aj:commentCount>0</aj:commentCount>
<content type="html"><![CDATA[   Hello World!]]></content>
</entry>
 
There are a lot of other parts of the API, but they're best left for a full document rather than a blog post.  There's also at least one known bug, where our servers don't accept the 'xhtml' content type.  That should be fixed on beta.journals.aol.com this Wednesday.

Monday, December 4, 2006

At IIW2006b

I'm at the Internet Identity Workshop (part B), listening to a bunch of smart people like Dick Hardt, Johannes Ernst, Kim Cameron, and of course Kaliya.  Looking forward to hearing a lot of exciting developments.  Already people are announcing open source libraries supporting OpenID.

Dec 5, 11:45am: There's a good article just put up at ZDNet: "The case for Openid" It's been Slashdotted already.  At IIW, I've been sitting in on the basic OpenID discussions, finding out what's new with 2.0, and listening in on the user experience/microformats discussion.  The latter is potentially interesting; at least there are specific short-term obvious next steps, like supporting XFN, that would help enable potential applications down the road.  This is a very difficult thing to sell to business people, though.  Maybe there's a session on that -- evangelizing to the business?


Tags:

Monday, November 20, 2006

Caching for AOL Journals

We're continuing to work on improving the scalability of the AOL Journals servers.  Our major problem recently has been traffic spikes to particular blog pages or entries caused by links on the AOL Welcome page.  Most of this peak traffic is from AOL clients using AOL connections, meaning they're using AOL caches, known as Traffic Servers.  Unfortunately, there was a small compatibility issue that prevented the Traffic Servers from caching pages served with only ETags.  That was resolved last week in a staged Traffic Server rollout.  Everything we've tested looks good so far; the Traffic Servers are correctly caching pages that can be cached, and not the ones that can't.  We're continuing to monitor things, and we'll see what happens during the next real traffic spike.

The good thing about this type of caching is that our servers are still notified about every request through validation requests, so we'll be able to track things fairly closely, and we're able to add caching without major changes to the pages.

The down side is of course that our servers can still get hammered if these validation requests themselves go up enough.  This is actually the case even for time-based caching if you get enough traffic; you still have to scale up your origin server farm to accommodate peak traffic, because caches don't really guarantee they won't pass through traffic peaks. 

We're continuing to carefully add more caching to the system while monitoring things.  We're currently evaluating using a Squid reverse caching proxy.  It's open source and has been used in other situations -- and it would give us a lot of flexibility in our caching strategy.  We're also making modifications to our databases to both distribute load and take advantage of in-memory caching, of course.  But we can scale much higher and more cheaply by pushing caching out to the front end and avoiding any work at all in the 80% case where it's not really needed.

Tags: , , , , , , ,

Friday, November 17, 2006

Jonathan Miller: 破釜沉舟

Jonathan Miller wasn't a charismatic leader.  But he recognized the need to fundamentally change AOL's strategy, mapped out a new direction, and got people moving the way that he was pointing.  Since last year, especially, he had been executing well and really seemed to "get it".  Giving AOL's services away for free this summer forced the organization to focus on the new world rather than the old; a Rubicon crossing, or as Jon's Sifu might say, "Break the woks and sink the boats(破釜沉舟)".  And as Ted notes, the strategy that Jon architected is starting to show good results.

Given all of this, Jon's departure was a shock. Neither the communications to the rank and file nor to Jon himself were handled well.  There are plenty of rumors and speculation flying around.  I hope that the Time Warner leadership team handles the situation going forward with the openness and honesty that are due to the people who have worked so hard to turn AOL around.

Jon, you'll be missed.

Wednesday, November 15, 2006

The AIM Network: AIM6, AIMPages, Buddy Feeds

I love launches.  This week, we're launching AIM 6 along with a major upgrade to the AIMPages beta.  And they work together!

The new AIM is a big improvement; I've been running the various betas over the past many months and they've been both rising in quality and slimming down in footprint.  And the UI is finally reasonable: I can now once again edit my buddy list right there in the main window.  And buddies now have a little (i) icon that tells you when your buddy has published something -- anywhere.  Like blog entries!  Profile updates!  Or, if they've set things up, new Flickr photos, Diggs, Myspace or Blogger or Xanga updates, or any custom Atom or RSS feeds you care to add.

There are some problems:  It keeps telling me about their away message status, and I don't really care that Kevin Lawver is away at lunch.  And I see that Kevin posted a photo but there's no thumbnail in the feed...  but this is a first release, we can fix these nits.  (The feed data is available at http://buddyfeed.oscar.aol.com/rss-push/aol:buddy_feed?request=user&sn=<screenname>, and as an AIMPage module named "What's New" which I demoed at Widgets Live last week.)

There's a good review of all of this on GigaOM by Liz Gannes.

The only downside of all this is that I'm mostly on a Mac these days, and there's no Mac AIM 6 client right now.  On the other hand, Adium plus a feed reader works pretty well too.

AIMPages has also added AIM Pages Buddies.  I don't think this is the best name, but the concept is good.  It's two way, meaning that both people have to opt in to it.  And by default your AIM Pages Buddies are shown on your profile (so you really don't want your whole Buddy List showing up there).  The invitation mechanism is easy:  When you start to add, the system sends an IM to the buddy asking if they agree.  A lot better than email, if they're online.  If not, they'll get reminded with a little status link at the top of their AIM Page: "Buddy Requests (42)".

Aside:  By default, for newly created profiles, only your AIM Pages Buddies can post comments on your profile.  You can change this in the settings to open it up if you want to.  On the other hand, it's also a way to get more buddy requests.

Tags: , , , , ,

Tuesday, November 14, 2006

Gold stars for good feed readers!

Over the past few weeks, we've been rolling out support for proper cache control on the Journals blogging platform, which will help reduce both bandwidth and database load, and also make things faster in general.  Last week we added ETag based cache control for both Atom and RSS feeds.  Today I took a quick peek at our logs to see how things are going...

The good news is that a lot of feed readers are being great citizens.  Around 33% of our feed requests get satisfied with 304 responses, meaning that clients only need to do a quick validation that they have the latest content, rather than fetching everything all over again.  Here's a quick list of feed readers, in no particular order, which are doing the right thing with our servers.  Gold stars for everybody!
  • Bloglines (http://www.bloglines.com)
  • FeedDemon/2.0
  • NewsGatorOnline/2.0
  • NetNewsWire/2.0
  • LiteFeeds/2.0
  • http://www.squeet.com
  • Firefox/1.5.0.7
  • Thunderbird/1.5.0.8
  • FacebookFeedParser/1.0 (UniversalFeedParser/4.1;) +http://facebook.com/
  • Windows-RSS-Platform/1.0
  • LiveJournal.com
  • Planet GBT +http://planetgbt.priyadi.net Planet/1.0~pre1-terasi +http://www.planetplanet.org UniversalFeedParser/3.3 +http://feedparser.org/
    Google Desktop
Pretty clearly, the UniversalFeedParser library should get a lot of the credit.  It's great that the Windows RSS Platform is doing the right thing, considering the likely amount of traffic it's going to generate.

Surprisingly, we actually have a lower cache hit ratio for Atom feeds than RSS ones... mostly due to one major crawler that seems to prefer Atom feeds and never gets a 304, presumably because it's never sending an If-None-Match header:
  • Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
Hey Google: It would really help the stats if you guys could support ETags and If-None-Match.


Thursday, November 9, 2006

Web 2.0: 2 Quick Takes

  • Digg Labs' Swarm demo is hypnotic.
  • Marissa Mayer is all about the data.  Also, she has hypnotic sparkly things hanging from her belt.

Launching AOL Developer Network

We've been talking about the launch of our new dev.aol.com site all week.  The goal is to provide a place to talk about our open services in one standard place.  (We've tried the strategy of hiding our APIs in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying 'Beware of the leopard', but it didn't work out too well.)

We really want to get feedback on both the site itself and the APIs that we're exposing.  Both are evolving rapidly and I anticipate that we'll be adding some new APIs there in the very near future -- there are a couple I'm going to be pushing for.  So please, give us feedback, or just link to dev.aol.com and we'll pick it up.

Wednesday, November 8, 2006

Web 2.0, Sudoku, and EC2

I'm at Web 2.0 now -- it took secret ninja moves to actually get a pass, even though AOL is a triple iridium sponsor.  The place is so thick with VPs you can barely reach the free food.

In breaking news, our own Michael Chowla just won HCL's Sudoku contest and will be taking home a very nice trophy.  I'll add the picture... as soon as the network actually lets my camera phone upload it... oh darn.  Apparently the Ning demo suffered from some network issues earlier today too.  Infrastructure!

Speaking of infrastructure:  Amazon's Elastic Compute Cloud looks very cool.  You could run a startup now with nothing more than a laptop and a table at a WiFi-enabled cafe.

Tags: , , , ,

Monday, November 6, 2006

AIM Pages Blog Widgets

Just finished the Blog sidebar widgets panel at Widgets Live!.  I was the only one foolish enough to try a live demo... which of course crashed and burned.  I'm so glad I made a backup animation to show instead.  I'm also told that I was nearly inaudible during most of the presentation.  Note to self: Lean into the mike! Also, I forgot my VGA converter cable and had to jump off the platform right as we started to go grab it. 

On the other hand, the panel discussion was fun.  We're certainly in early stages; we had four people on the panel, and four completely separate terms for essentially the same things (widgets, gadgets, modules, parts).

Link: Widgets Live! AIM Pages Blog Widgets Presentation (html).

At Widgets Live!

Widgets Live and DojoIt's pretty packed and I'm in the back next to the power outlets.  It was great to see Alex Russell mention AOL's support for Dojo this morning.  I've been struggling a bit with the WiFi -- it's better than at some conferences, but something's definitely wrong when it takes 2 seconds to ping the router.  It's still a little slow even now but not horrible:

PING 172.17.0.1 (172.17.0.1): 56 data bytes
64 bytes from 172.17.0.1: icmp_seq=0 ttl=64 time=188.881 ms
64 bytes from 172.17.0.1: icmp_seq=1 ttl=64 time=135.392 ms
64 bytes from 172.17.0.1: icmp_seq=2 ttl=64 time=74.55 ms
64 bytes from 172.17.0.1: icmp_seq=3 ttl=64 time=63.111 ms
64 bytes from 172.17.0.1: icmp_seq=4 ttl=64 time=237.2 ms



Friday, November 3, 2006

REST Web Services: The Book

Something to look for on Amazon soon: REST Web Services.  The outline looks good; I would love to have a book to hand to people who want to find out about REST.

Especially Chapter 8: Resource-Oriented versus Service-Oriented Architectures:
The main event. We single-handledly take on the SOA, WSDL, and the WS-* stack: an enormous, multi-billion-dollar project that has alrady had about 25 books written about it. We believe this conglomorate will ultimately end up like CORBA, the OSI protocol suite, ATM, and other overengineered big-money flops; and that simpler, more flexible solutions will swallow anything good to come out of it.

In the spirit of comity and friendship, following the smackdown will be a section describing what REST services can learn from the SOA and take from the WS-* technologies.

Tags: ,

Thursday, November 2, 2006

Widgets Live!: Blog sidebar widgets

I'll be on a Blog sidebar widgets panel at Widgets Live! next week, doing a short demo and discussing the nuts and bolts of how to more easily hook up widgets and blogs.  Widgets are to web pages what feeds are to HTTP -- a technical shift that opens up new possibilities and is a potential game changer.  It should be fun, barring demo gremlins.

(Feed)

Tags: , , ,

Journals R9 Update: Shiny!

Stephanie just blogged about our latest Journals R9 update, which has escaped from beta and is currently rampaging throughout our production servers on journals.aol.com.  The most visible change is the new, less blue color scheme.  There are a lot of less visible changes.  One of them is that we now support HTTP authentication for both RSS and Atom feeds -- so if you are a reader of someone's private Journal and use a feed reader that supports it, you can get their daily thoughts delivered to your computer automatically, just like you can for public Journals.  Does your feed reader support it?  If it's web based, probably not.  Otherwise, it probably does.

Here's a bonus Easter Egg, just for fun.  You can apply a theme to any Journals page by adding an argument "?skin=css url" to the page URL.  Like this.  Or this, or this... or anything over at the AIMPages themes directory.  It shows a direction we're headed; not the end of a journey, but the start of a trip.

We've also updated our whitelist to allow more cool widgets to be put in your posts and About Me box.  I just added one of the neater ones -- MyBlogLog's Recent Readers widget.  I've put it on my sidebar as an experiment; is it cool, or just weird, to see your visits being tracked and displayed?  (Hi Frank!)




Tuesday, October 31, 2006

Journal Beta update

We just pushed out some new updates to beta.journals.aol.com, so please go play with it.  A lot of the changes are for International Journals (beta.blogs.aol.fr, beta.tagebuch.aol.de, etc) but there's also one fairly visible change:  We've modernized the look and feel of the "owner" pages, mostly by changing colors and buttons.  Newly created Journals will also get a different "Silver" default color theme.  It's not a huge functional change but it helps spruce up the look of the place.

Saturday, October 28, 2006

Call me =john.panzer

Several Internet eons ago, I registered for a newfangled xri:=john.panzer universal identifier after listening to a conference presentation.  Didn't know if this system was going to go anywhere or not, but I did know that it was free to stake a claim in this potentially interesting namespace.

Well, after a while I did get a free i-name, a contact page, and now it turns out I can actually use it to log in to something useful, to wit, the new OpenID Wiki.  Turns out that the library used for the Wiki authentication automatically supports user IDs like =john.panzer; it just works.  Which is of course the goal of the technology, and a good demonstration that user centric identity can become ubiquitous.

Thursday, September 28, 2006

On Magic

We discovered an interesting IE6 feature when we pushed out caching changes earlier this week for Journals.  For posterity, here are the technical details.  Our R8 code started using "conditional GET" caching, meaning that we supported both If-Modified-Since: and If-None-Match: HTTP headers.  The way this works is that, if a client has a version of a page in its cache, it can send one or both of these headers to our servers.  Like this:
If-Modified-Since: Tue, 26 Sep 2006 21:47:18 GMT
If-None-Match: "1159307238000-ow:c=2303"
If-None-Match, which passes an "entity tag" or ETag, is better to use and was designed to replace the If-Modified-Since header.   (If-Modified-Since has granularity only down to a second, and can't  be used to indicate non-time-based changes.)  In our case we actually have two versions of our pages which can be served up, one for viewers and another one for owners.  We really only want to cache the viewers' page.

When our server sees a request like the one above, it first does a quick check (in this case it'll ignore the If-Modified-Since and use the ETag) to see if the client already has the latest version; if it does, it returns a 304 Not Modified result.  The big win is that this can be done very very quickly and efficiently, while building a 200KB web page takes lots of work.  If the client doesn't have the right version, though, the server returns a 200 and sends new headers, like these:

Last-Modified: Tue, 26 Sep 2006 21:47:18 GMT
Etag: "1159307238000-c=2303"

If you're obsessive with details you might notice that the modification date is the same as before, but the ETag has changed (the -ow:c has changed to a -c).  When the second request was made, it sent cookies that told the server that the user was the owner of the blog.  So the page is different and therefore the ETag is different, but the last date modified is the same.  We're expecting browsers and caches to detect the change and refresh the page.

This all works fine... except for IE6 (and the AOL client, which uses IE6 under the hood).  IE6 seems to see the Last-Modified: timestamp above and simply stop, ignoring the Etag: header and the fact that we're returning a 200 response with new content.  I've sat and watched the data flow in and out of my Internet connection and verified that IE just drops the 60K or so of content on the floor, as well as the new ETag, and re-uses its old version.  The only way to prevent it is to force a reload using ctrl-Reload, or clearing your Temporary Internet Files.

What this means is that if you change "who you are" by logging in or out, and nothing else changes, you will get a stale, cached version of your own blog's page.  Which is certainly not good.

As of this morning, we're running with caching turned off on journals.aol.com but with a bug fix on beta.journals.aol.com.  The bug fix is simple:  Don't send Last-Modified: headers.  So we only send back the Etag:
Etag: "1159307238000-c=2303"
Which forces IE6 to pay attention to it, fixing the problem with IE6.  IE7, by the way, works either way; go Microsoft!

This all means that we're not going to try to enable caching for non-Etag-aware clients and caches.  Since non-Etag-aware seems to pretty much equate to old or buggy, and not having caching is just a minor performance hit, this seems to be a pretty reasonable approach in theory.  The question now is, will practice accord with theory?  We really need people to hammer on http://beta.journals.aol.com/yourscreenname over the next few days and give us feedback.  See Stephanie's post: Like Magic, We're Back Where We Began... and please leave us feedback!

Thursday, August 31, 2006

Another use for feed licenses: Splogicide

Doc Searls just changed his blog license to Attribution-NonCommercial-ShareAlike 2.5... in order to clearly deny splogs reblogging rights to his content.  Interesting, though I think there may be some unintended fallout.  But there are some cool applications for this.  What if someone built a tool to make it easy to find such copy right violators (academics use these tools to find plagarism)? With an accompanying service to aggregate complaints and, when they reach a sufficiently remunerative level, send attack lawyers after sploggers.

Update: The collective intelligence of the blogosphere is a mighty thing.  In a comment below, Doc points at an open source plagarism detector from UCSB (my alma mater) that already does Internet searches.  Hmmm.... 

AOL's Open Source Contributions to Dojo

It's nice to see something your team has worked on being put to good use.  In addition to other things, we open sourced the JavaScript Compiler tool my group has used internally for several years.  (It's actually more of a linker and compressor than a compiler since of course the output is still JavaScript.)  We're also contributing new code to Dojo, including a pretty cool cross domain XML request mechanism.

Monday, August 28, 2006

Sam Ruby presents "Teenage Mutant CyberSurfers"

This afternoon, Sam Ruby was nice enough to reprise a version of his recent talk at AOL Silicon Valley.  There's a real independent convergence between Sam's observations and predictions and the fictional world described in Vernor Vinge's Rainbows End.  Which as I understand Sam is just now reading :).

Update:  The audio is up on our podcast feed already!

Thursday, August 10, 2006

AOL Pictures Beta and woohoo

In conjunction with the new Pictures Beta, there's a cool tool called woohoo which displays a an AIMPages picture gallery in a widget, embeddable anywhere.  Fun stuff...  for example:

Wednesday, August 2, 2006

Feeds Best Practices

The slides from my RSS/Atom Feeds Best Practices talk from lunch today are up now.  There was a good discussion afterwards and hopefully we'll put up a podcast. I'm more convinced than ever that the key to a successful presentation is: free food.

Friday, July 28, 2006

URL format change for Journals

One of the recent changes this week in Journals is an update to the format of our entry URLs.  We're essentially adding the entry date and title to the URL, for example:

.../entries/2006/07/13/rest-and-the-authorization-header/1354

So anyone who may be parsing our URLs -- be aware that the format has changed :).  We will, however, do a permanent redirect from the old URLs to the new ones.

Thursday, July 13, 2006

REST and the Authorization: Header

Talking to lots of people about identity, mashups, web services, and sustainability of the mashup ecology today at Mashup Camp.  I'm wondering why LID apparently is using a new X- header for passing pointers to authentication information rather than re-using the existing extensible Authorization: header.  Both GData and Amazon Web Services  allow Authorization: as at least one option in their REST interfaces:

Authorization: GoogleLogin auth ...
Authorization: AWS ...

I know that GData uses 401 Unauthorized and WWW-Authenticate: challenge headers and I'm going to assume that AWS does too:

WWW-Authenticate: AuthSub realm="https://www.google.com/accounts/AuthSubRequest" 

So, existing services are using the RFC 2617 framework; it's working for them; why not build on top of that instead of inventing new headers?

Wednesday, July 12, 2006

Mashup Camp: Identity and Access Control in Mashups

Some notes I just took for the AccessControl session (my first session of the camp).  Here's a shot of the campers organizing the schedule an hour ago:

Tuesday, June 6, 2006

Yet Another Post on Feed Licensing

Back in April, I blogged about copyright and licensing standards for feeds.  People were happy with most of it, with one big exception:  Nobody can agree on what to do with feeds lacking an explicit license.  People clearly felt I was nuts to suggest that the implicit permissions involved in this situation could be captured in any flavor of Creative Commons license.  Obviously the owner holds copyright, and obviously fair use applies in the US, and unfortunately nobody can specify what that means in a way that software can understand.

So, here's my new summary, which is shorter but more complicated:
You'll need to consult your legal department on what fair use is in each case, and figure out how to deal with international jurisdictions too.

All of which makes embedding licenses in feeds even more important.

Tags: , Creative Commons, RSS, Atom, syndication

Monday, June 5, 2006

User Centric Identity

I just gave a talk at AOL about user centric, distributed identity, titled "Identity 2.Open" just to draw the punters. This came out of various thoughts I've had regarding opnness and authentication, and from the recent Internet Identity Workshop 2006(a).  Fair warning: There may be a podcast up sometime soon, in which anyone who cares to do so can listen to me butcher explanations about identity, security, and game theory.

This was also my first foray into the S5 slide show system. I gave up on PowerPoint when it refused to accept .PNG images.  I mean, come on guys, it's only been about 10 years since PNG was introduced.  Sheesh.  S5 works great, though I would love to see a few more generic themes that I could snag and use without working on visual designs.

Tags: , ,

Thursday, June 1, 2006

Searching Structured Data Using Microformats

In the well-of-course-he-did department, Tantek just announced the beta of Technorati's Microformats Search.  Hopefully this will help accelerate deployment of structured data on the public web.  Their beta main page adds on some special sections if a keyword search gets results containing microformats:
And you can do domain-specific searches as well, for example for events matching "San Francisco".  So far you can't do much with the data except jump to the source page.  The obvious thing is to support an add to contacts or add to calendar option.

There's a companion service, pingerati.net, to enable real time indexing; it's also a ping hub.  The idea here is that if you want to get updates about structured data from around the web, you can register with pingerati and get their feed of pings as well, without needing to go around and convince everyone to ping your service.  For example, evdb will get auto-notified if you ping pingerati.net.  Naturally, any existing Technorati feed update pings also get fed in.

Being a hub is nice.

Tags: , , , , , , ,

Thursday, May 25, 2006

Meta-tagging in plain English

"Tag Bundles" in del.icio.us are quite possibly the Right Way to do hierarchy for tags:
This feature allows you to combine several related tags into a logical grouping. So, for instance, you might combine the tags “hitter”, “pitcher”, and “fielder” into a bundle and call it “Baseball”. It doesn’t change anything about the existing tags, but does allow you to create another level of heirarchy. When looking at your bookmarks, del.icio.us will show this bundle and all the tags grouped under it as a separate section.
This is really meta-tagging but explained in plain English with a perfectly sensible immediate end user benefit. Plus bigger benefits down the line if these bundles are shared.

And the daily meetings will continue until productivity improves!

We're doing a lot of daily meetings these days.  Often they're a waste of time; sometimes they're a lifesaver.  I think they're primarily insurance.  You're paying an up front daily cost to mitigate your risks due to missed communications, forgetfulness, lack of shared understanding, and lack of commitments.

Perhaps there is an optimal strategy for daily meetings that treats them like insurance and adjusts them according to your risk forecast.  If your project is bright red, maybe you need a one hour video conference every day with the full team.  If you're green, maybe it's sufficient to have an optioanl 10 minute conference call.  And something in between for the vast majority of projects.   The goal would be to minimize your expected wasted time in a rational way.

Note: We of course do our level best to ensure that engineers are pulled into daily meetings only when they absolutely need to be there!  

Friday, May 12, 2006

AOL Greenhouse

A long-awaited site has been unwrapped this week as well -- AOL Greenhouse. Things are happening fast and furious (there's another soft launch this week that I can't talk about yet... hopefully next week we'll make it public).

Greenhouse is particularly special because it's all about getting some sunlight onto some of the cool things that we come up with.  Also, it has a monkey.  Can't go wrong with a monkey.

At the moment, the blog aggregator seems to be hosed -- it's showing Yoel's post about 23 times. Stan?

Wednesday, May 10, 2006

AIM Pages Lives!

Our AIM Pages beta is now up:  www.aimpages.com.  Kevin just blogged about it.  Check it out, have fun, let us know what you think!

Thursday, May 4, 2006

Memories of IIW2006

Some quick visuals from the Internet Identity Conference (it might help to quietly hum "Memories" to yourself). The pictures are mostly Phil Windley's.

Kaliya and others reviewing the Identity Timeline at the start of the workshop:


The lunch that AOL (partially) sponsored, from DeeDee's.  Best meal of the workshop:


Dave Winer discussing OPML 2.0 and identity contact URLs with a bunch of very smart people.  I am off to the side putting on a wise expression.


This slide is a cool vision statement of how identity URLs enable an entire open ecosystem:


Yan Cheng talks about the dimensions of identity.  He convinces everybody that we're good guys.  Maybe we should adopt a Googleian corporate motto: "Do good."


Circle time!  I'm trying to learn the lyrics.  Or maybe falling off my seat.  Hard to be sure.


And last but not least, the AOL logo up on the wall:


Tuesday, May 2, 2006

Internet Identity Workshop 2006

I'm at IIW2006 in Mountain View today (and tomorrow as well).  I'm a highly interested observer who just wants stable identity and authentication system(s) to build useful things upon.  From my point of view, the really useful thing that's happening at this conference is the interactions between lots of really smart people who are motivated to interoperate and provide really useful identity services for real people.

We had a good discussion about interoperability with AOL's Yan Cheng talking about different dimensions of functionality which are, at least to a first approximation, orthogonal.  For example, exactly how authentication is handled is mostly orthogonal to the issues of how public identity and reputation is handled.  I do think that we need to talk about these things in the context of real world examples.  It's the minor little things that trip these simple scenarios up -- like, how do we auto-discover authentication capabilities from a user without adding even more login steps?

It seems like this space is going into a consolidation/cooperation phase, where everybody agrees to work together using a few very basic building blocks in an extensible framework.

Wednesday, April 26, 2006

Atom feed updates: Pagination

One of the hidden changes from last week's release is the support for Atom pagination.  This will potentially let tools browse of the entries in a blog, copy them, archive them, search them, etc.  Technically, this means we're supporting the link@rel="first", "last", "next", and "previous" relations.  Get the current feed and follow the "previous" links until you run out of data; then you've got all of the entries in a blog, in standard Atom format.  And, we're valid according to http://feedvalidator.org.  Let me know if you see any problems.

Friday, April 14, 2006

Tags: Web Bumper Stickers

Our new entry tagging secret beta stealth feature might be a little difficult to see since it doesn't work on IE yet, though Joe did a great job with screen shots.  (Joe: It's not much of a stealth feature if you tell everybody about it, is it?)

Tags are just labels that you can apply to your entries; since they're public, they're kind of like electronic bumper stickers.  If you use Firefox or Mozilla, you can play with them on beta.journals.aol.com/<your screen name>.  Otherwise, well, here's a little animation:

Picture from Hometown

...and you can see the results below.  I have no idea what "stealth" is going to link to, since right now it just does a general web-wide tag search.  I think that's kind of fun, actually, but your mileage may vary.  We're looking at various ideas, including having the links go to a blog-specific search page (but perhaps with links off to the general web search to see what other people have chosen the same bumper stickers).  Also, we'll leverage the results to provide better categorization tools for your entries and blogs.  It's pretty wide open at the moment.  So if you have opinions, let us know.

Whoops. Seriously.

This week, AOL accidentally started spam blocking email with "dearaol.com" URLs embedded in the text.  And then we fixed itNever ascribe to malice that which is adequately explained by... um... never mind.  I know of no conspiracy and the people running our spam filters are good folks, it's just sometimes the software they wrangle gets a little obstreperous.

(techdirt)

Tuesday, April 11, 2006

Code, and other laws... (part 2)

In part 1 I talked about the ideal world where feeds were all clearly licensed. So now I'll turn to the real world, and I'll be very US-centric because this article is quite long enough as it is. You might want to skip to the happy fun summary at the bottom.

Millions of feeds aren't explicitly licensed.  Some can't be because their generators don't allow for it.  For others, the owner doesn't know or care about licensing.  For unlicensed feeds, it's not reasonable to make the default assumption "nothing more than fair use" because there are millions of feeds out there whose owners want their content syndicated as-is (headline feeds with links back to content, for example).  On the other hand, if you assume anything more than fair use, you also need to be prepared handle exceptions.  So how to do both of these in a way that minimizes overhead and lets aggregation happen without lawyers while respecting copyright?

My take is that a reasonable default assumption is to assume the Creative Commons Attribution license only if the feed owner hasn't specified otherwise. 
This means that by default, we'd assume that copying of feed content is allowed as long as attribution is given through an appropriate hyperlink.  Then, provide easy ways to let feed owners specify a different license whenever they explicitly declare one. 

If a feed owner is happy with the default, they need to do nothing.  My sense is that this covers 98% of unlicensed feeds.  For the remainder, a feed owner could go to individual aggregators and tell them explicitly what license they prefer.  They can always choose a completely restrictive license that allows only fair use for the general public.  Or, they can choose a noncommercial license.  My take is that something equivalent to the current Creative Commons license chooser is sufficient.

Of course, what we'd all really prefer is for feed owners to put the licenses in their feeds directly.  That way, our AOL proxies and caches would simply pass the information along to clients, which would make appropriate decisions about what to do based on the particular license.  If we're dealing with a small number of well understood licenses, this is the easy part.

How should the feed licenses work?  There's a pretty good page with reasonable recommendations at Creative Commons on the subject.  James Snell's Feed License Link Relation works well for Atom and is pretty flexible:
<link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
The Creative Commons RSS Module works for RSS 2.0:
<creativeCommons:license>http://www.creativecommons.org/licenses/by-nc/2.5</creativeCommons:license>. 
Both of these work with CC and other licenses and have been deployed in real implementations  There's an RDF version for RSS 1.0 as well (cc:license).

Finally there's the RSS 2.0 <copyright> element, which is just plain text.  But, given that some tools might allow people to put text in this field but not embed the other types of licenses, I think it's reasonable to look for a known license URL in the copyright text as well:
<copyright>The contents of this feed are licensed to the public under http://creativecommons.org/licenses/by-nc-sa/1.0/</copyright>
If a processor can't find any of the above licenses, I'm proposing that AOL feed consumers fall back to a license based on an explicit list that AOL maintains by feed owner request.  This would be part of our feed infrastructure.  I see this working two ways.  First, we would add metadata to feeds which are requested via our feed proxies.  For Atom and RSS 2.0, the two output formats we support, this would be a namespaced extension, aol:declared-license:
<aol:declared-license>
      <link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
</aol:declared-license>
It would contain a Feed License Link Relation indicating which license the owner specified to AOL.  It could potentially contain multiple license links.  It could contain other namespaced elements in the future as well, but feed consumers can ignore ones they don't understand.

A client might also want to inquire about a feed's declared license without retrieving it.  For this, we could provide a simple REST API:
GET http://example.aol.com/declared-license/example.org/feed/atom.xml
which returns a simple XML document:
<?xml version="1.0" encoding="utf-8" ?>
<declared-license xmlns="http://example.aol.com/2006/aolfeeds">
    <link rel="license" href="http://creativecommons.org/licenses/by/2.5/"/>
</declared-license>
Note that non-AOL clients could potentially make use of this; you'd just have to believe that AOL is maintaining a good declared license list (the licenses themselves are the ones the feed owners want to provide to the general public, not to AOL specifically).  We could even potentially share these lists between feed aggregators.  An embedded (original) license would always override any declared license; this would let feed owners easily start embedding their own licenses in the future.  (Should we eliminate any declared license as soon as the source feed starts licensing itself?  I think so, but our legal team would need to weigh in on that.)

Finally, we'd advertise a variety of ways for feed owners to contact us and declare their licenses.  There does need to be some sort of validation step to ensure they really own the feed.  As part of the hopefully painless process we'd ask them to pick from one of the existing Creative Commons licenses.  If these aren't sufficient we can add other licenses but it's easier all around if people can agree on a small set.

How about a real world example?  Brian Alvey of Weblogs Inc. recently announced support for excerpt feeds, for example Engadget full vs. Engadget headlines.  The full Engadget feed has the copyright statement:
<copyright>Copyright 2006 Weblogs, Inc. The contents of this feed are available for non-commercial use only.</copyright>
Translating into license-speak, we'd get an Attribution-NonCommercial-NoDerivs license for the full feed, meaning no commercial exploitation, links back are required, and editing of the material is not allowed beyond fair use:
<creativeCommons:license>http://creativecommons.org/licenses/by-nc-nd/2.5/</creativeCommons:license>
The excerpt Engadget feed has the copyright statement:
<copyright>Copyright 2006 Blogsmith, LLC. The contents of this headlines and excerpts feed are available for limited commercial distribution. You may repost this feed to your site provided you link back to the original story, do not edit the material, and do not remove this copyright notice.</copyright>
Translating into license-speak, we'd get Attribution-NoDerivs for the excerpt feed, meaning that commercial use is OK but links back are required and the material may not be edited:
<creativeCommons:license>http://creativecommons.org/licenses/by-nd/2.5/</creativeCommons:license>
(I'm assuming here that the restriction on editing applies to the individual articles, not the feed document as a whole, since feed documents are not intended to be kept intact in any case.  This minor ambiguity goes away with Atom's Feed License Link Relation.)

So far, so good.  Having multiple versions does raise the question of how automated processors are supposed to find these feeds.  I think that's going to have to be a followup post.

That's about it.
In summary:
None of this is black or white.  I should also mention that I'm completely conflicted here, in that my company both syndicates and aggregates content and I'm directly involved on both sides.  I'm coming at this from the viewpoint of someone trying to provide online feed aggregation services where the end users subscribe to the feeds; they're not being selected or screened by editors.  In other situations other rules about default licences might be better.  Explicit licences are definitely best to avoid problems down the road.  Here are some other links I've stumbled across:  A basic practical primer on copyright and RSS. One re-aggregator's viewpoint (Palfrey).  Producer's viewpoints: Shelley Powers, Om Malik (here and here) .  Some legal discussion (with Wendy Seltzer, previously of the EFF, weighing in). (Feedburner already does CC licensing following the methodsoutlined above, except that they're using the creativeCommons namespace extension for Atom as well as RSS 2.0; consumers should look for either one in Atom feeds.)

Tags: , Creative Commons, RSS, Atom, syndication

Monday, April 10, 2006

Buddy Updates for Blog Entries

Greg of aiminfo blogs about IM Triton release 1.2.37.2 :

"Buddy Updates allow you to view changes or additions your buddies make to their away messages, message boards and profiles.  You will see a new icon next to the buddy in the buddy list when an update has happened: "

You can grab the latest AIM Triton here.  What Greg doesn't mention is that this also works for blog entries made through Journals.  So if you use the latest AIM client, you'll be notified about your buddies' latest blog posts.  If you try it out, please let me (or Susan or Joe or John) know what you think.  This only works for public blogs, the ones that you can find through AOL or Google search in any case, but it does give you an up-to-the-minute picture of what's going on with your buddies.

Oh, and we have an update for Journals going out tomorrow morning.  After it's complete, one nonobvious change is that you'll be able to see the list of Journals someone publishes by going to their screen name on Journals (for example, http://beta.journals.aol.com/panzerjohn/ will give you a list of mostly test blogs).  The page lists public blogs, plus any private blogs that you're a reader of.  (Others are invisible.)  Also, the page has a nifty search box where you can type in screen names to try to find their Journals if they have any.  Again, let us know what you think.  It's sort of a hidden feature right now in that you have to know to type in the right URL.  So feedback is welcomed!

Monday, April 3, 2006

Danah Boyd at AOL Mountain View

Danah Boyd just wrapped up a great talk about online social spaces here at AOL Mountain View (the podcast is up already).  She delivered information via firehose. Some random notes...

There were several reasons why Friendster faded, and some lessons.
  • Conflict between the user community and the space creators (they wanted a dating site, the users wanted to do a lot of other things).  Lesson: Listen to the community; be flexible; adjust the business plan when needed.
  • Servers buckled under load when it got too popular.  Lesson: The technology has to work or people will lose patience and go to the competition.
  • When Friendster started to try to go mainstream beyond the early adopter clusters, new users couldn't find any friends on the site so it wasn't useful to them.  Lesson:  Network effects work in reverse too.  Start with small clusters and grow organically.
MySpace did a big thing right: When people started 'hacking' HTML in their own spaces, the creators let it happen, then made it easier by adding the features that people actually wanted to use, like sound files (for indie bands) and videos from YouTube. This means the community is designing the service as much as the creators are; not just putting content, but guiding much of the design direction, along with highly visible and passionate designers engaged with the community.  Danah calls this Embedded Community Design. 

Best quote: "[Teens are] immune to bouncy visual overload." They've been immunized to this by mass media. (What does this mean for advertising as a business model?)

Last week, teens used MySpace to organize mass school walkouts to protest HR 4437. That's impressive regardless of your political views.

Tags: , ,

Tuesday, March 21, 2006

Code, and other laws... (part 1)

There are tens of millions of RSS and Atom feeds published on the Web.  And nearly all of them are copyrighted.

If an author doesn't explicitly give up all rights to a work, which might be a bit tricky, it's automatically copyrighted in the United States and most other countries.  Of course the same is true of web pages.  But web pages are mostly intended to be viewed in a browser.   Feeds are generally intended to be syndicated, which means that their content is going to be sliced and diced in various and unforeseeable ways.  This makes a difference.

In what ways is an application allowed to copy and present a given feed's content?  To start with, it can do things covered by fair use (*).  There are some interesting issues around what exactly fair use means in the context of web feeds, but ignore those for the moment.  What about copying beyond what fair use allows? 

It would be awfully helpful if every feed simply included a machine readable license.  For example, a <link rel="license" href="..."/> element (http://www1.tools.ietf.org/wg/atompub/draft-snell-atompub-feed-license-00.txt).  We could then write code that follows the author's license for things beyond fair use.

Specifically, if a feed author wanted to put their feed content in the public domain, they would simply link to the Creative Commons public domain license which includes the following RDF code:
<rdf:RDF xmlns="http://web.resource.org/cc/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<License rdf:about="http://web.resource.org/cc/PublicDomain">
<permits rdf:resource="http://web.resource.org/cc/Reproduction"/>
<permits rdf:resource="http://web.resource.org/cc/Distribution"/>
<permits rdf:resource="http://web.resource.org/cc/DerivativeWorks"/></License></rdf:RDF>
The code here is a machine-readable approximation of "put this in the public domain". 

Alternatively, if a feed author just wanted to require attribution, they'd instead use http://creativecommons.org/licenses/by/2.5/.  To allow copying for non-commercial use only, they'd use the popular http://creativecommons.org/licenses/by-nc/2.5/.  This license means that the content must be attributed, and may be freely copied only for non-commercial uses.   Plus, of course, fair uses.

According to the Creative Commons proposed best practice guidelines, the non-commercial license would mean that a web site re-syndicating the feed would not in general be able to display advertisements next to the feed content.  Such an application could fall back to fair use only for that feed (perhaps showing only headlines), or it could suppress ads for that one feed.  The main point is that it would know what it needed to do.

So, in this perfect world where everything is clearly licenced, I think life is fairly simple.  Let me know if you think I'm missing something.

In part 2 I intend to return to the messy real world and start complicating things.
--
(*) ...or other applicable national legal codes, since fair use applies only in the U.S. as Paul pointed out.

Tags:

Monday, March 20, 2006

"Yet another departure from AOL's infamous 'walled garden' days."

I think there's some API momentum building: The I Am Alpha site is officially announced and getting good reactions.  Greg just linked to the IAmAlpha blog.  I'd like to note that the developers have been working from the start with the microformats community to make everything as open as possible.

Tags:

Monday, March 6, 2006

AIM SDK

The AIM guys just announced the Open AIM SDK.  Very cool; you can do AIM Triton plugins or write your own AIM network client.

Friday, March 3, 2006

New Feature: Common Feeds Icon

Another update to Journals this week was the switchover to the Common Feeds Icon (). This is the same icon used by Firefox, and soon Internet Explorer and Opera.  Also, AOL's Favorites Plus.  It's also being adopted by web sites at an astonishing rate.

(Why a new icon to indicate feeds?  Because "RSS" doesn't exactly scream "dynamic feed of updates for this web page".  And lots of people our testing thought the tiny icon said "R55", which is even more useless.)

Rogers Cadenhead has a nice discussion here (see the comment thread too).

There's a debate over whether this icon should indicate an action (subscribe to this feed) or be a link to the feed resource (see the feed, and maybe subscribe).  I personally don't think this is a huge issue, as long as a user isn't left staring at XML source.  If an application only lets you do one thing with a feed, jumping directly to subscribing seems like a good idea.  If you can do multiple things, give a menu of some kind (like Journals does) or a preview with options... humans can figure it out given reasonable feedback.  Machines can't, but then they're not looking at the icon, they're looking at the <link rel="alternate" type="application/atom+xml" ...> markup in the page header.

Also, they're called "feeds".  Doesn't matter if they're RSS 0.91, RSS 0.92, RSS 1.0, RSS 1.1, RSS 2.0, RSS 3.0, or Atom.  And it really doesn't matter that they're in XML (well, except for RSS 3.0 :) ).  What matters is that you just look for the if you want to keep track of what's new.  Simple.

Wednesday, March 1, 2006

New Feature Demo: Flickr Photos

One of the little things we did for our latest update was to open things up a little bit to allow some types of iframes inside entries -- basically iframes from trusted sites.  We'll be adding to the list of allowed sites as we go forward.  Right now Flickr makes for a great demo:


Tuesday, February 21, 2006

Mashup Camp Concludes

Mashup Camp just concluded in the Computer History Museum.  A very nice, direct, simple mash-up -- PodBot -- won 'best mashup'; second prize went to the everyhing-you-ever-wanted-to-know-about-Chicago-crime-statistics mashup ChicagoCrime.org.  PodBot does just one simple thing but does it for data around the world; ChicagoCrime does lots of things for just one city and subject.  Also, PodBot is all about having fun and ChicagoCrime is about not getting killed.  Sort of a yin/yang thing.

The unconference itself was good.  A lot of good sessions conflicted, so I had some quandaries; but I think I made the best locally optimal choices possible.  No regrets.  The WiFi was good (better than most conferences with 250 people in a single room) but still not perfect.  But it wasn't too bad since there was little need for a backchannel.

Friday, February 17, 2006

Software Development's Evolution towards Product Design

Software Development's Evolution towards Product Design -- Danc Redmond writes a great article from the perspective of a product designer.  I agree with nearly everything he says, especially the need for small, unified, cross-functional teams.  A few minor caveats:
  • Programming, done properly, is not a production activity that can easily be separated from product design.  If it could be, it's basically rote work that can and should be automated.  The non-rote work that programmers should focus on is all about figuring out how to hack the universe in order to deliver superior benefits to the customer.  Which is part and parcel of product design, which is why those small, cross-functional teams are so valuable.
  • A big factor in game development and web design companies' successes was sheer volume and high competition.  There are plenty of terrible user experience in both camps (books have been written), but the industries have thus had a chance to learn from lots of successes and failures and iteratively improve. 
  • Nit: The benefits summary is great -- 98% success vs. 18% success is a great statistic that can get business people to sit up and listen.  But y'know, this needs to be backed up with specific references to really pack the necessary punch.
In the past, most of the software industry has been busy creating dancing bears.  It's not how well the bear dances, but that it dances at all.  That is changing.  There will be some isolated refugia for internal corporate software, niche verticals, and bleeding-edge hardware interfaces where those clumsy dancers will survive and even thrive.  But the rest of the industry is going to start competing on more than bare functionality, and that will quickly shake out the companies which won't or can't adapt.

Tuesday, January 31, 2006

AOL and Dojo

Several engineers(*) at AOL had a very interesting meeting with the Dojo guys last week.  One of the results is the announcement that AOL is hosting the Dojo toolkit on our content distribution network.  The reason this is great is because the major barrier to adoption of DHTML/Ajax/etc UIs is, honestly, the download times for the Javascript code; you only pay this once but it's a major concern.  The CDN helps enormously with this since it does automatic compression, caching, intelligent routing, proper browser bug workarounds, etc.  If enough people adopt this, it would be a win for everyone (only the first application to require a library module pays anything, the rest get it for 'free').

I hope we'll be able to do some more interesting things and help contribute to Dojo as well.

(*) OK, technically I'm a manager, but they let me wear the engineer hat sometimes.

Thursday, January 19, 2006

Another One on Tagging: Data on Folksonomies

This folksonomies article is good for the questions it raises, but also for the data it collects in one place -- lots of good statistics on del.icio.us and flickr usage of tags in this paper:

Folksonomies: Tidying Up Tags?
"This article looks at what makes folksonomies work. The authors agree with the premise that tags are no replacement for formal systems, but they see this as being the core quality that makes folksonomy tagging so useful. The authors begin by looking at the issue of "sloppy tags", a problem to which critics of folksonomies are keen to allude, and ask if there are ways the folksonomy community could offset such problems and create systems that are conducive to searching, sorting and classifying. They then go on to question this "tidying up" approach and its underlying assumptions, highlighting issues surrounding removal of low-quality, redundant or nonsense metadata, and the potential risks of tidying too neatly and thereby losing the very openness that has made folksonomies so popular." Commentary by Marieke Guy and Emma Tonkin, UKOLN. [D-Lib Magazine]

(You gotta hand it to the old-school Digital Library people.)

Wednesday, January 18, 2006

Why Tag?

One of the questions that keeps coming up in discussions about tagging is whether private tagging is useful and if so, how?  Is public tagging really the important application to keep in mind and if so, why?  By private tagging, I mean someone applying tags but not sharing them with anyone -- so they're useful for personal organization but not for sharing with others.

Empirical evidence suggests that tagging is most useful when public and shared.  But why, exactly?  Caterina Fake, in a panel at Syndicate, noted that people on Flickr get to "ride free" on top of compulsive categorizers.  I think this is certainly part of it, and maybe tagging is good occupational therapy too, but I have a gut feel there's more to the story.

My fifteen month old son is an inveterate tagger.  His tag cloud looks something like this at the moment (somewhat elided):
airplane água ana bird book bulldozer bus bye choo-choo-train dada dog down mama phone tractor truck up wow
...which I know because he tags things repeatedly and excitedly, especially when someone else is around.  And I think this is the key point -- this is a natural behavior, and a social one.  (He'll talk to himself, but it's really second best -- he wants to share his view of the world with other people!)  And of course it's accompanied by pointing -- the original hyperlink.

That's as far as I've gotten.  Fortunately, Rashmi Sinha, in A social analysis of tagging, does a great job of analyzing exactly how tagging facilitates social interactions.  Go read it.  Also, read her earlier cognitive analysis of tagging as well.  Both great forays into the "whys" of public tagging.

I think this all suggests that private tagging might be useful in the same way that talking to yourself might be useful (yes, sometimes, but not a primary use case).  More interesting is social-but-private where you share with a limited number of people; this is more difficult to do well than either totally private or totally public; is it valuable?  How?  When?