The WG has not approved URI as a separated draft.

This is a First Public Working Draft and shows the working group's current thinking and direction. Comments and new use cases are particularly welcome via public-dwbp-comments@w3.org (subscribe, archives).

There are many outstanding issues associated with the use cases presented here that are being addressed. Where those issue are related to a specific use case or requirement, they are highlighted in the body of the document below.

Conventions of this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

The objective of this Best Practice for Web Data URI is to address plain data resources as opposed to web sites with rich aesthetics and dynamic data. The proposed techniques must function for networked and local web data; for dynamic and static web data.

Web data should be as statically as possible: it is harder the long-term preservation of dynamic data as one has to keep the unmodified data and the programs to generate the modified data.

One URI can have multiple variants of the same information. For example, http://example.com/rome can have the Treaty of Rome in 24 languages and each language in HTML, PDF and plain text; in addition to metadata and other useful information (Linked Data).

Introduction

Scope

The objective of Web Data URI (DURI) is to address URI for plain data resources as opposed to URI for web sites with rich aesthetics and dynamic data.

Being a best practice, it cannot break existing standards and it should follow current practices. Additional conventions might be needed such as a per resource page variant XHTML list (foo.var.html).

Aspects such as data packing and preservation are considered from the URI point of view: how pack the data and to preserve the data itself is out of scope.

Design goals and considerations

Examples

Get a resource

 http://example.com/foo
 

Direct variant addressing using files extensions

 http://example.com/1122.fr         # French variant
 http://example.com/1122.xhtml      # XHTML variant   
 http://example.com/1122.es.pdf     # Spanish PDF variant 
 

Get the variant XHTML list

http://example.com/foo.var.html

Terminology

The original terminology is copied into this document as these concepts are needed to comprehend the subject at hand. The alternatives of just making references or writing the original texts are less efficient.

Explicit definitions (from terminology sections or similar) in the original documents are copied verbatim. When not available, definitions are written transcribing the meaning in the original documents.

New terminology might be in the section below or as definitions in the appropriate section.

New terminology

DURI - duri
Abbreviation of Web Data URI. A noun that follows the appropriate morphological of the language. For example, Durip, durip, durips.
Variant XHTML list
A human and machine readable XHTML resource with all the available variants.
Language neutral URI
A URI with little or no natural language content.

Terminology from URI

Uniform Resource Identifier (URI)
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.
Resource
This specification does not limit the scope of what might be a resource; rather, the term "resource" is used in a general sense for whatever might be identified by a URI. Familiar examples include an electronic document, an image, a source of information with a consistent purpose (e.g., "today's weather report for Los Angeles"), a service (e.g., an HTTP-to-SMS gateway), and a collection of other resources. A resource is not necessarily accessible via the Internet; e.g., human beings, corporations, and bound books in a library can also be resources. Likewise, abstract concepts can be resources, such as the operators and operands of a mathematical equation, the types of a relationship (e.g., "parent" or "employee"), or numeric values (e.g., zero, one, and infinity).

Terminology from Linked Data

Useful information
Metadata and other auxiliary data.

Terminology from HTTP

resource
A network data object or service that can be identified by a URI. Resources may be available in multiple representations (e.g. multiple languages, data formats, size, resolutions) or vary in other ways.
variant
A resource may have one, or more than one, representation(s) associated with it at any given instant. Each of these representations is termed a variant. Use of the term variant does not necessarily imply that the resource is subject to content negotiation.

Terminology from TCN

variant list
A list containing variant descriptions, which can be bound to a transparently negotiable resource.
variant description
A machine-readable description of a variant resource, usually found in a variant list. A variant description contains the variant resource URI and various attributes which describe properties of the variant.
variant resource
A resource from which a variant of a negotiable resource can be retrieved with a normal HTTP/1.x GET request, i.e. a GET request which does not use transparent content negotiation.
list response
A list response returns the variant list of the negotiable resource, but no variant data. It can be generated when the server does not want to, or is not allowed to, return a particular best variant for the request.

Web data

Web friendly data: Data adapted to the web technologies.
Synonymous: Web data.

Dimensions of web friendly data:

Networked and local web data

Networked web data: Data accessed with the http(s) schemes.
Synonymous: On the web data.

Local web data: Data accessed with the file scheme.
Synonymous: Off the web data.

One requires web data techniques that work for networked and local data. There are two approaches:

The simplest is to have the same data structure for both. The negative side is that one has to dumb-down to the file scheme; for example, for http://example.com servers would usually send the file index.html and the file scheme would do a directory listing, hence anchors has to be in the form of http://example.com/index.html.

One also requires web data packing techniques. Further details in Xdossier.

Networked web data can also use the HTTP mechanisms to request variants. For example, TCN, header fields and server configuration.

If other techniques are used, URI should take priority, For example, if the appropriate header field request the German variant of the resource and the URI request the Spanish variant, the server should send the Spanish variant.

Dynamic and static web data

Dynamic web data: Data resulting from the output of a program. The program might have input data that is modified. For example, the CGI_to_upper_case program that transform into upper case.

Dynamic server web data: Dynamic web data modified by a server.

Dynamic client web data: Dynamic web data modified by a client; for example, with Javascript.

DURI examples:

local web data
  file:///foo                    # latest
  file:///foo3                   # version 3

network web data
  http://example.com/foo
  http://example.com/foo3

To have the same web data structure in the network and local, static data is simpler . It is also safer for data preservation; for example, it might not be possible to run some programs in the future if the the appropriate environment is not available. The negative side is the that generated data might be big and some functionalities might be lost.

Addressing techniques

Examples:

http://en.example.com/foo
http://example.com/foo.en

Variant XHTML list

A per resource file with the reserved named foo.var.html with the functionally of the variant list. It is human readable and machine processable using the technique of XHTML with IDs

Big and small data

One must be able to address a whole database (e.g., 1TB) and a single record (e.g., 1kb).

{TODO: expand}

Multilingual data

The particular case of multilingual data must be resolved in the wide context of web data: it must be treated as a variant in the same fashion as the format ([http://www.iana.org/assignments/media-types/media-types.xhtml media type]).

Language neutral URI

It is a URI with little or no natural language content. Language neutral URI is important to multilingual web data.

The language neutral aspect is mostly directed to the path component. The ultimate language neutral URI is using numbers; the negative side is poor mnemonics.

http://example.com/1234        # example of numeric URI

Example of problem with using language dependent strings

http://example.com/london      # URI using the English word london
http://example.com/london.en   # Spanish variant
http://example.com/londres     # BAD Spanish variant

Human readable and machine processable

Slogan: Structure the data: the how is secondary.

Human must be able to read the data and machines must be able process the data. It is recommended to have one format valid for both, such as XHTML with IDs. Other formats might be more specialized for human or machine.

XHTML with IDs

In most browsers, the code below will be readable by humans and with proper marking it is machine procesable, though it is better if the server can supply the format desired by the requester. See also http://microformats.org Microformats.

<html>
 <head>
  <title>example<title>
 <head>
 <body>
  <div>
   <span name="key">date<span>
   <span name="value">2011-08-24<span>
  <div>
  <div>
   <span name="key">creator<span>
   <span name="value">M.T. Carrasco Benitez<span>
  <div>
  <div>
   <span name="key">version<span>
   <span name="value">1<span>
  <div>
 <body>
<html>

Example of transformation to XML

<example>
 <date>2011-08-11<date>
 <creator>M.T. Carrasco Benitez<creator>
 <version>1<version>
<example>

Example of transformation to key-value

date=2011-08-24
creator=M.T. Carrasco Benitez
version=1

Example of transformation to JSON

{
  date    : '2011-08-24'             ,
  creator : 'M.T. Carrasco Benitez'  ,
  version : 1
}

URI pattern for metadata information of dataset

An specific format of query string may be used to address to a resource with metadata information about the dataset.

For instance, showing some meta information about the dataset in JSON format:

http://example.com/dataset?about
dataset: {
  name        : "Dataset name"                  ,
  description : "Data collected via scrappers"  ,
  author      : "Data owner"                    ,
  version     : 1.3                             ,
  created     : "2012-03-12"                    ,
  modified    : "2012-08-10"                    ,
  structure   : {
    website     : "http://website"              ,
    title       : "Web Site"                    ,
    accessed    : "2013-01-09 23h12"
 }
}

More on variants

Existing specifications

Feature

"Feature negotiation intends to provide for all areas of negotiation not covered by the type, charset, and language dimensions. Examples are negotiation on

References