Subject: Arguments against digest URIs Resent-Date: Sun, 2 Jan 2000 15:25:47 -0500 (EST) Resent-From: www-rdf-interest@w3.org Date: Sun, 02 Jan 2000 21:25:11 +0100 From: Jonas Liljegren To: RDF Intrest Group CC: Sergey Melnik I haven't read the list since december 18th. Some holidays have come in the way. But more than that: I didn't want to write this text. Sergey Melnik has done so much work... It's about digest URIs. There have come up a number of considerations against the use of digest URIs. Not only digest URIs. But any kind of algorithm for common URIs. That includes the x-pointer suggestion. I was going to write a long summary of the issues and arguments for and against digest URIs. But I'm not up to it. So I just list the things that comes to mind. The three things that needs a calculated URI is: * model URI * statement URI * anonymous resource URI I think that digest URIs is not the complete solution for the problems it tries to confront. A complete solution still has to incorporate more layers of metadata. It's better to just don't have digest URIs. Higher threshold for implementation ----------------------------------- There will hopefully be many implementations of RDF. Some will just be able to read a specifik form of the XML serialization. Other will be more generic. There is a point in not requireing too much from an implementation. MD5 or SHA-1 is maby not that hard to use, (there are support for both in Perl modules,) but it does limit the ways to implement RDF for a specific purpose. And you can't depend on digest URIs if not everyone is using them. URI aliases ----------- What about URI aliases? Two URIs could be used do denote the same thing. Persons often have a diffrent identifier for every membership register. There will have to be ways to express the relationships between resources, regardless of if it's about the same sort of statement, the same model or the same thing. It's not enough to have a common algorithm to give unique identifiers for anonymous resources. You will still have to be able to say that two URIs is aliases for the same resource. So why not use this handling of aliases to handle other cases there you want to say that one URI for, say, a model is an alias for another URI. Value equivalence ----------------- The digest URI for a triple is calculated on the actual string of bytes for the literal part. But the literal could be encoded in diffrent formats. Unicode, Latin1, or others. The object of a statement could be considered the same even if it is diffrent in the byte for byte way. If the object is a person, it could be a literal with the name. But the name could be written in a couple of diffrent forms. It could also be an URI for the person, or an anonymous URI to a resource that specify the person by describing the first and last name separately, and maby giving them a type arc each. You will have to be able to specify their equivalence. If you have a rule for digest URIs, you would have that way, and on top of that the more complete way to express equivalence. So why not skip the digest way and go for the complete soulution? (The complete solution would be to introduce more statements, containing metadata about the resources.) Not realy unique ---------------- A digest is not guaranteed to be unique. There are a theoretical chanse that two diffrent things will get the same URI. There would still have to be an extra layer for determining URI equivalence. The nature of the statement --------------------------- In a reification of a statement, every reification should be handled separately, as separate events. They have properties like source, time, probability and context of statement. Even if the statement in itself would have a unique URI, there would have to be separate URIs for every stating event. So why not add a few more data about the statement, and use those data for handling equivalence between two statements. Equivalence could in general be determined by examining the subject, predicate and object properties, regardless of the URI representing the statement. URIs can be unknown ------------------- Many things could have official URIs. There would be cases there those is not used in the XML serialization. This would result in the parser or serializer generating an unofficial URI for that anonymous/unknown resource. That would lead to two URIs for the same thing. There will often be temporary URIs. They could also be used in queries to denote a unknown entity that you would like to find a more proper URI for. A application with the ability to handle this will also be able to handle URIs generated from XML serializations with anonymous resources. Thus, there is no need for a special algorithm for generating the URIs for the anonymous resources. Version handling ---------------- Statements, resources, literals and models will come in diffrent versions. Some versions will be chronological. Other will be variations of the content, like different languages or different target groups. There is many ways to handle new versions. Many applications would like to keep a statement URI, even if the object part of it changes. They would often like to keep the URI of a resource, even if its content changed. They would like to keep the URI of the model, even if new statements would be added. Some applications would like to handle a history of versions, of statements in different times. Others would only concern temself with the present. The use of digest URIs for statements and models will force every application to deal with history, and to deal with it in a way that could be incompatible with what is needed. I think that it would be better to let the version handling be a separate layer, that could be included or excluded, and that could evolve by itself to meet the needs. Open / closed models -------------------- How will you maintain metadata about a model, with digest URIs? The metadata would have to be linked to the model. But every change in the model would modify the model URI. The metadata would point to a nonexistent resource. It would be even harder to embed the metadata in the model itself. The metadata would depend on the model URI and the model URI would depend on the metadata. Statements as models -------------------- A model is a group of statements. we could reify a single statement, but you would maby more often like to say somethng about a group of statements. This group could be given a explicit URI. That would be the same thing as to give a explicit URI to a model. The grouping of the statement could be done on one site and used on other sites. The handling of those things is something that belongs on a higher level. It's not something to be handled with digest URIs. --- I have not summarized this in the they I intended from the start. The feeling of destroying the work of Sergey Melnik made me loose my spirit... But... I suggest that we just skip this unique URI concern. The problems of aliases and version handling is a topic for another day. Not something that should go into either the core API nor the schema layer. The generated URIs should all be based on your own namespace, guaranteed to be unique. (And now avaits a hundred new emails to read... :) -- / Jonas - http://paranormal.o.se/perl/proj/rdf/schema_editor/