Sunday, March 28, 2010

URI Fragments

I learned something interesting when reading the Architecture of the World Wide Web, Volume One. It turns out that URI fragments (the part of a URI after a '#' character) are not interpreted as part of a URI:

Note that the HTML implementation in Emma's browser did not need to understand the syntax or semantics of the SVG fragment (nor does the SVG implementation have to understand HTML, WebCGM, RDF ... fragment syntax or semantics; it merely had to recognize the # delimiter from the URI syntax [URI] and remove the fragment when accessing the resource). This orthogonality (§5.1) is an important feature of Web architecture; it is what enabled Emma's browser to provide a useful service without requiring an upgrade.

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution are therefore dependent on the type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced. If no such representation exists, then the semantics of the fragment are considered unknown and, effectively, unconstrained. Fragment identifier semantics are orthogonal to URI schemes and thus cannot be redefined by URI scheme specifications.

Interpretation of the fragment identifier is performed solely by the agent that dereferences a URI; the fragment identifier is not passed to other systems during the process of retrieval. This means that some intermediaries in Web architecture (such as proxies) have no interaction with fragment identifiers and that redirection (in HTTP [RFC2616], for example) does not account for fragments.

While I had an intuitive understanding of how browsers work with URI fragments in HTML documents (retrieve the document, find the fragment, display the document starting at the fragment), I hadn't considered the semantic split, nor the implications of URI fragments being applied to other kinds of representations.

I find the handling of fragments interesting in a number of ways. For one thing, it means that as new content types become part of the web, the creators of those content types are free to map URI fragments into that content type. So in HTML the format of URI fragments is generally a textual name that appears in the html. But in a 3D modeling format, it might take the form of [position,orientation,scale] to define a location from which the model is being viewed, the direction the camera is facing, and the scaling factor. That's nice because it allows URI fragments to be structured in a manner most appropriate for the kind of representation being retrieved.

One possibly surprising consequence of this split is that URI fragments are not considered in URI resolution activities such as interacting with proxies or redirection. It also means that you shouldn't try to use URI's with fragments as if they represented actual resources, since the web isn't allowed to cache individual fragments and no semantic interpretation is allowed from the '#' character onward.

5 comments:

Anonymous said...

yea - browsers simply remove the fragment part from the url when they form the http requests.

Unknown said...

You suggest that a fragment might be used for something other than a label or anchor, with the example of position, orientation info for a 3D view. Parameterized arguments are more commonly handled in a query string, as key-value pairs after a ? rather than a #.

Brian Gilstrap said...

Simon suggests that query parameters are a common way of handling 'parameterized arguments'.

It might seem at first glance that the difference between query parameters and a URI fragment is just some difference in syntax and a '?' versus a '#'. But semantically, there is a key distinction.

A set of query parameters represent a conceptually different resource (the base URI plus all the distinctions specified by the query parameters). But a URI fragment after a '#' represents what essentially represents a 'view' into a single resource.

In a RESTful system, that distinction is crucial.

Bret Stateham said...

I know I'm a little late to the party, but another key distinction between query parameters and fragments is that query parameters are processed by the resource server where as the fragment is processed by the user-agent (browser) after the resource is retrieved from the server.

Bret Stateham said...

I know I'm a little late to the party, but another key distinction between query parameters and fragments is that query parameters are processed by the resource server where as the fragment identifier is processed by the user-agent (browser)