This was part of Comuri.
The nature of the resources must take into consideration. This section partially illustrate some of the resources.
URI creation has to take into account all identification scenarios: original site, archival sites, and offline data; they might exist in parallel. One has to consider some relations to the data such as data organisation and long-term data preservation [[DOCP]] [[RFC4810]], tough data itself is out of scope.
Scheme processing capability is the scheme property to process data before delivery. The processing capability could be in the server and/or client. For example, a browser might or might not have a Javascript engine.
Examples:
http
: server and browser sidefile
: browser side
Official data server
is a web server maintained by an organisation,
such as a government,
for the publication of authoritative datasets.
Often the second or the third level domain is called
data
,
such as
data.gov
or
data.gov.uk.
http://data.example.com
It is fuzzy what should be into a official data server and what should be separated domain server,
though clearly all cannot go into official data servers.
For example,
the significant dataset (or group, collection, concept, type, etc)
foo
could be a in the official data server or in a separated domain server.
http://data.example.com/foo # in official data server - longer URI http://foo.example.com # separated domain server - shorter URI
The centralised governance rules must be only for interoperability. Delegate as much as possible and let each party to follow the Comuri philosophy: similar to subsidiarity or Auftragstaktik.
The DNS administrator assigns the appropriate level domain and the web site and other administrators manage paths: it is up the administrators at the proper level to follow reasonable mnemonics and rules that comply with the Comuri philosophy. For example, a web site could delegate every path starting with 10 to a department and the departmental administrator must follow the Comuri philosophy.
First segment administrator is the entity with the mandate to allocate first segments of the path. Similar for the second and subsequent segments. The role similar to DNS administrator.
URIs are forever and they should be redirected to the new URI. Redirectio services are out of scope.
The appropriate
status codes
such
301 Moved Permanently
should be considered.
The following data archiving techniques are considered:
http://example.com # original site http://example.org/example.com # online archival file:///example.com # offline archival http://example.org/example.com.zip # online pack file:///example.com.zip # offline pack http://example.com/foo # original site http://example.org/example.com/foo # online archival file:///example.com/foo # ofline archival
The appropriate
status codes
such
301 Moved Permanently
should be considered.
Data packing techniques that works with both schemes
(file
and http
)
should be considered
[[XDOSSIER]].
One must be able to directly identify a whole database (e.g., 1TB) and a single record (e.g., 1kb).
Online data
(http
)
and
offline data
(file
)
refers to the schemes used to access the data.
Online data can be processes by the server and the client;
offline data only the client.
One requires techniques that work for both.
The simplest and safest is to use static data.
There are two approaches:
The simplest is to have the same data structure for both.
The negative side is that one has to dumb-down to the
file
scheme;
for example,
for http://example.com
servers would usually send the file
index.html
and the
file
scheme would do a directory listing,
hence anchors has to be in the form of
http://example.com/index.html
.
Online data
is data accessed with the http
scheme.
Online data can also use the HTTP mechanisms to request variants. For example, TCN, header fields and server configuration.
If other techniques are used, URI should take priority, For example, if the appropriate header field request the German variant of the resource and the URI request the Spanish variant, the server should send the Spanish variant.
Offline data
is data accessed with the file
scheme.
It is much easier to use static data to have the same data structure in the network and offline. It is also safer for data preservation; for example, it might not be possible to run some programs in the future if the appropriate environment is not available. There are also negative aspects; for example, the generated static data might be big and some functionalities might be lost.
Static data is data delivered without modification.
file:///foo # latest file:///foo3 # version 3
Dynamic data is data resulting from the output of a program. It could be server side, client side or both. For example, from a CGI, Javascript or a combination of both.
http://example.com/foo http://example.com/foo3
The particular case of multilingual data must be resolved in the wide context of web data: it must be treated as a variant in the same fashion as the format [[!MEDIA-TYPES]].
It is a URI with little or no natural language content. Language neutral URI is important to multilingual web data.
The language neutral aspect is mostly directed to the path component. The ultimate language neutral URI is using numbers; the negative side is poor mnemonics.
http://example.com/1234 # language neutral and opaque
http://example.com/london # URI using the English word london
http://example.com/london.es # Spanish variant
http://example.com/londres # BAD Spanish variant
Language variants can be identified in URIs as:
http://example.com/foo.en # dot extension http://en.example.com/foo # domain - à la Wikipedia
Mnemonic is a technique to assist the memory.
Mnemonics are very dependent in many aspects:
language, cultural, profession, age, group, etc.
Something might be very mnemonic for one person and totally unmnemonic for another.
There are some general rules:
short strings are usually more mnemonic;
for example,
most people should be able to remember
3
.
One can taylor mnemonics for a certain group at the cost of penalising larger audiences.
Targeting larger audiences usally means lowering the mnemonics for everybody.
It is recommended to use standard abbreviations;
this might be challenging in a multilingual context.
URI guessing is along the same lines.
For multilingual audience, mnemonics should be as language neutral as possible.
Numbers are arguably the most language neutral characters:
3
should be acceptable to Arabic speakers, though some people might insist on
٣
.
In countries with less known character repertoire,
there is a tendency to use numbers for aspects such as wifi passwords;
for example, in Armenian.
Numbers can backfire in other ways;
for example,
http://example.com/mon
(for Monday)
could be transformed into
http://example.com/1
or
http://example.com/2
.
Base 36
should be a good compromise to a wide number of people around the world.
Using words in base 36 from widely spoken language should be acceptable,
particularly if the words spelled the same across languages;
example,
exit
.
One can always use
Internationalized Resource Identifiers (IRIs) [[!RFC3987]]
with a richer character repertoire: just pay the price.
http://example.com/中国 # For Chinese speaker: good mnemonic - non-Chinese speakers: bad mnemonic http://example.com/china # For Chinese speaker: acceptable - non-Chinese speakers: better
Mnemonic tokes should be composed of short prounsable syllables combined into short prounsable words in most languages; having many consonantes together tends to be a hindrance. Numbers with letters acting as separators works.