Date: Sun, 12 Mar 2000 21:25:54 +0100 I was thinking about language handling and was provoked by TBLs text: http://www.w3.org/DesignIssues/InterpretationProperties.html I would like this to be considered as something that should be included in future RDF and RDFS specs. The content of literals or other resources ------------------------------------------ RDF models consists of triples {predicate, subject, object}. The object can be a Resource or Literal. The literal can consist of any data. It could be an image, sound, algorithm, text etc. Each of these literal types are objects witch can have internal metadata. A mp3 object has the title of the song embedded in its own format. The language of a text is considered to be a part of the text object. Section 6 of the M & S states: The xml:lang attribute may be used as defined by [XML] to associate a language with the property value. There is no specific data model representation for xml:lang (i.e., it adds no triples to the data model); the language of a literal is considered by RDF to be a part of the literal. The type of literals or other resources --------------------------------------- The RDF Schema spec let you describe the range of properties, like this: That means that the object should be of that type, regardless of if its a literal or other type of resource. Age as a literal: 98 Age as a resource: I will later argue for that http://www.datatypes.org/useful_types#Integer should be a subClassOf http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal. Refering to properties in literals or other resources ----------------------------------------------------- In a application based on RDF, you want to represent everything about a specific object as triples. Lets say that you want to know the number of colours in a gif image. To get that information, the application has to know (1) that the data is in gif format and (2) how to get the information out of the gif data. The gif data could be stored somwhere on the web, retrievable by an URL or placed as a literal in the model. RDF has the rdf:type property for describing the type of an object. This property can universaly be aplied to describe the type of data in RDF. For gif images and related formats, the mime-type can be used. Gif as a literal: gif89followdbyalotof...binarydata Gif as a resource: Saving the extracted information from a literal or other resource ----------------------------------------------------------------- Provided that the application both knows that the data is a image/gif and how to extract data from that gif, it can act as a proxy for other applications / methods / algoritms that want to consider / reason / find / present data based on the properties of the gif. The image has to have a URI to be represented in the RDF model. Metadata about gif, the literal: gif89followdbyalotof...binarydata 32 Metadata about gif, the resource: 32 Handling of language of literals or other resources --------------------------------------------------- The language of a string is taken to be a part of the literal. This can work for dedicated RDF applications. But a generalized RDF application want to internaly represent the language in the same manner as all other data. If the language of a object is represented as a property; that statement could be reified in the same way as all other statmeents. And the language could be used in RDF reasoning or querying or whatever. The xml representation can still use xml:lang, but we would like to also represent that information in the model. As with images, text is just another data format. The application parses the text and extracts the wanted information - in this case the language information. Language of literal: (The type declaration here is redundant) Hej alla barn se Language of resource: (language info will come from http header) se Updating the literal/resource or updating the statement ------------------------------------------------------- Then you represent a literal as above, it gets its own resource URI. Every statement you do is about that special resource. The resource does not represent every literal or object with the same content. It does only represent the specific literal/resource at hand. Any RDF document could give an ID to its described literal or resource. That means that you could have multiple resources with the same content. Lets say we have an application that lets you update a short description about your intrests. This discription could be a literal or a resource. What should the application do if you update that description? Should it (a) Create a new text object and change the description statement to point to the new object (b) Change the value of the text object The answer to this is up to the application. Some application wants to handle version history. Other applications want to make sure the value doesn't change. But many applications probably wouldn't care and would like to just change the text value, without having to create a whole new object. Literals as resources --------------------- I will try to explain why literals should be regarded as resources and how this will work with RDF M&S and RDF Schema. Literals are data. Most of the time it's textual data. If it is a short piece of textual data, you would like to inline it in a xml representation. But if it's a large text, you will often prefere to refere to the text as an external objekt. Why should the model differ between data inlined in the xml syntax and data stored apart from the xml document? If all literals would be modeld as a resource having a value property, we wouldn't have to make any diffrence between literals and other resources. The application could regard all literals as resources. This would mean that the application would be able to access the value of any resource. It wouldn't matter if the resource is a image, text or XML. It wouldn't matter if the value originaly came inlined in XML representation of a RDF model or if it is laying still at a URL on the web or sitting in a database or hiding insida an LDAP system or anyghing else. How do the application know which resources represent data? ----------------------------------------------------------- The application will have to decide on how it will handle each of the known resources. When doing a presentation of a resource; should it only list the known attributes and relations to other resources, or sholud it also present the content of the resource? If the resource is an image, a representation in html could be . A text resource could be represented as
Hej
alla barn
. How the application decides what resources to "resolve" and what resources represent "unfetchable" objects, is its own problem. It could use all sorts of heuristics to make the decision. But I think that the obvious way would be to use the rdf:type property. If its a http://the.standard.org/text/plain or http://the.standard.org/image/gif or http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal, it should be fetched. Most literals would of course already have been fetched, and are waiting in the cahce. What about resources of unknown types? -------------------------------------- This would be most useful if there was a standard way to know what types represent values and what types represent other objects. I suggest that this should be done by considering all data objects as of type Literal. That means that Text/plain, Image/gif, etc would all be subClassOf MimeType. And Integer, Float, Boolean and even MimeType would be a subClassOf Literal. Then a application comes by a resource of a unknown type, it can still tell if it is a value by parsing the schema and see if the type is a subClassOf Literal. Handling of classic inline literals ----------------------------------- A RDF document parser will read the literals inlined in the document. It will return triples with the object marked as a literal or resource. The parser could do the same thing by instead generating an extra triple stating that this resource is indeed a Literal. The current RDF/XML syntax permits undefined URIs for subjects. These URIs must be found or generated by the parser. The same would go for all the inline literals: the parser would have to give them one URI each. Suggested additions to the RDF XML syntax ----------------------------------------- I suggest that the syntax should make it possible to state all the URIs involved. This could be done by a literalID for the literals. Like this: Ora Lassila Ora's Home Page That would complete the equivalence between literals and other resources. On the nature of the reified statement -------------------------------------- By considering the example in RDF M&S spec section 4.2, i realize that every statement indeed should have a unique URI. There could exist several statements with identical {p,s,o}, but with diffrent URIs. They must be treated as individual statements, because: (1) the attributes of the statements cant't be mixed up. (2) Changing or deleting a statement from one source will not affect a statement from another source. I have an example of the first point, described under "The nature of the statement" in this message: http://lists.w3.org/Archives/Public/www-rdf-interest/2000Feb/0118.html