Tuesday, August 7, 2007

RESTful partial updates: PATCH+Ranges

Over the past couple of months, there's been a lot of discussion about the problem of partial updates in REST-over-HTTP [1][2][3][4][5].  The problem is harder than it appears at first glance.  The canonical scenario is that you've just retrieved a complicated resource, like an address book entry, and you decide you want to update just one small part, like a phone number.  The canonical way to do this is to update your representation of the resource and then PUT the whole thing back, including all of the parts you didn't change.  If you want to avoid the lost update problem, you send back the ETag you got from the GET with your PUT inside an If-Match: header, so that you know that you're not overwriting somebody else's change.

This works, but it doesn't scale well to large resources or high update rates, where "large" and "high" are relative to your budget for bandwidth and tolerance for latency.  It also means that you can't simply and safely say "change field X, overwriting whatever is there, but leave everything else as-is".

I've seen the same thought process recapitulated a few times now on how to solve this problem in a RESTful way.  The first thing that springs to mind is to ask if PUT can be used to send just the part you want to change.  This can be made to work but has some major problems that make it a poor general choice. 
  • A PUT to a resource generally means "replace", not "update", so it's semantically surprising.
  • In theory it could break write-through caches.  (This is probably equivalent to endangering unicorns.)
  • It doesn't work for deleting optional fields or updating flexible lists such as Atom categories.
The next idea is generally to simply use POST to update the resource.  This does work in many cases, but conflicts with the use of POST to add a resource to a collection.  That is, if you POST to a collection, are you trying to add an element to the collection, or perform some other update to the collection's metadata?  It's possible disambiguate using MIME types but it feels fragile.  It also doesn't capture the fact that the operation is retryable; POST in general is not retryable.

A good solution to the partial update problem would be efficient, address the canonical scenario above, be applicable to a wide range of cases, not conflict with HTTP, extend basic HTTP as little as possible, deal with optimistic concurrency control, and deal with the lost update problem.  The method should be discoverable (clients should be able to tell if a server supports the method before trying it). It would also be nice if the solution would let us treat data symmetrically, both getting and putting sub-parts of resources as needed and using the same syntax.

There are three contenders for a general solution pattern:

Expose Parts as Resources.  PUT to a sub-resource represents a resources' sub-elements with their own URIs.   This is in spirit what Web3S does.  However, it pushes the complexity elsewhere:  Into discovering the URIs of sub-elements, and into how ETags work across two resources that are internally related.  Web3S appears to handle only hierarchical sub-resources, not slicing or arbitrary selections.

Accept Ranges on PUTs.  Ranged PUT leverages and extends the existing HTTP Content-Range: header to allow a client to specify a sub-part of a resource, not necessarily just byte ranges but even things like XPath expressions.  Ranges are well understood in the case of GET but were rejected as problematic for PUT a while back by the HTTP working group.  The biggest concern was that it adds a problematic must-understand requirement.  If a server or intermediary accepts a PUT but doesn't understand that it's just for a sub-range of the target resource, it could destroy data.   But, this does allow for symmetry in reading and writing.  As an aside, the HTTP spec appears to contradict itself about whether range headers are extensible or are restricted to just byte ranges.  This method works fine with ETags; additional methods for discovery need to be specified but could be done easily.

Use PATCHPATCH is a method that's been talked about for a while but is the subject of some controversy.  James Snell has revived Lisa Dusseault's draft PATCH RFC[6] and updated it, and he's looking for comments on the new version.  I think this is a pretty good approach with a few caveats.  The PATCH method may not be supported by intermediaries, but if it fails it does fail safely.  It requires a new verb, which is slightly painful.  It allows for variety of patching methods via MIME types.  It's unfortunately asymmetric in that it does not address the retrieval of sub-resources.  It works fine with ETags.  It's discoverable via HTTP headers (OPTIONS and Allow: PATCH).

The biggest issue with PATCH is the new verb.  It's possible that intermediaries may fail to support it, or actively block it.  This is not too bad, since PATCH is just an optimization -- if you can't use it, you can fall back to PUT.  Or use https, which effectively tunnels through most intermediaries.

On balance, I like PATCH.  The controversy over the alternatives seem to justify the new verb.  It solves the problem and I'd be happy with it.  I would like there to be a couple of default delta formats defined with the RFC. 

The only thing missing is symmetrical retrieval/update.  But, there's an interesting coda:  PATCH is defined so that Content-Range is must-understand on PATCH[6]:
The server MUST NOT ignore any Content-* (e.g.  Content-Range) 
headers that it does not understand or implement and MUST return
a 501 (Not Implemented) response in such cases.
So let's say a server wanted to be symmetric; it could advertise support for XPath-based ranges on both GET and PATCH. A client would use PATCH with a range to send back exactly the same data structure it retrieved earlier with GET.  An example:
GET /abook.xml
Range: xpath=/contacts/contact[name="Joe"]/work_phone
which retrieves the XML:

Updating the phone number is very symmetrical with PATCH+Ranges:
PATCH /abook.xml
Content-Range: xpath=/contacts/contact[name="Joe"]/work_phone
The nice thing about this is that no new MIME types need to be invented; the Content-Range header alerts the server that the stuff you're sending is just a fragment; intermediaries will either understand this or fail cleanly; and the retrievals and updates are symmetrical. 

[1] http://www.snellspace.com/wp/?p=683
[2] http://www.25hoursaday.com/weblog/2007/06/09/WhyGDataAPPFailsAsAGeneralPurposeEditingProtocolForTheWeb.aspx
[3] http://www.dehora.net/journal/2007/06/app_on_the_web_has_failed_miserably_utterly_and_completely.html
[4] http://tech.groups.yahoo.com/group/rest-discuss/message/8412
[5] http://tech.groups.yahoo.com/group/rest-discuss/message/9118
[6] http://www.ietf.org/internet-drafts/draft-dusseault-http-patch-08.txt

No comments: