Vincent Massol wrote:
Hi,
In the new rendering code I need to call some code that transforms
[[[wiki:][Space.]][Doc]] into a link. I'm proposing to introduce 2 new
classes/components in Core:
* DocumentName: Represents a Document's Name. It'll have 3 properties:
- String wiki
- String space
- String page
See below.
* DocumentNameFactory: Create a DocumentName from a
string
representing a Document's name. Transforms [[[wiki:][Space.]][Doc]]
into a DocumentName object.
See below.
* The DocumentNameFactory would depend on the
Execution component so
that it can use the current wiki, current space and current document
if these are not specified.
+1
* This raises the question as to whether we should
continue passing a
String representing a document name in our APIs in the future or
instead pass a DocumentName. I'm not yet sure what is the best answer
to this...
DocumentName whenever possible, but also allow Strings for backwards compatibility, and
for easy
access from scripts. At leas for the moment, maybe later we can drop strings, if we see
that working
with DocumentName-s is good, simple and easy.
* Other question: In the Document object do we store
the DocumentName
object or do we store instead only Space and Wiki objects? If it's the
latter then we need to fetch them from the DB which takes time. We
could also decide to only fetch them when requested with getSpace()
and getWiki() (i.e. lazy loading).
I don't know why we need to store wiki objects. As far as things are now, wikis
don't share the same
database/schema. Sure, the document should be able to access the wiki it belongs to, but I
see no
need for a persistent relationship between the two. A reference to the wiki object can be
added when
creating the Document object.
As for spaces, right now I think that first we must define what a space is (or will be),
and then
see if it makes sense to make the link between documents and spaces, and if this link
should be
persisted in the Document object.
* BTW this also raises the question as to whether we
want to have a
representation for space and wiki or not and instead only use tags, in
which case a document name is simply a String like "mypage". But then
it should be unique. So it could also be made of a list of identity
tags as in: "space=sp1,sp2:wiki:wiki1:language=fr:mypage". Or we could
standardize it as "wiki1:sp1,sp2:fr:mypage" and have the
DocumentNameFactory transform it into tags. In that case the
DocumentName object would be a Map of tags + the document name
("mypage"). I think we need to decide ASAP if we want to keep the
strict and hardcoded notion of Wiki>Space>Document>Object>Property or
instead go full tags since this changes completely the v2 interfaces
and code we're writing.
There have been many posts on the folksonomy vs ontology, tags vs hierarchies, loose
semantics vs
rigid semantics debates. So far, neither is winning (at least not on all points, and not
for everybody).
My take is that tags have the advantage that they are much more flexible and sometimes
better at
organizing data, but hierarchies are needed, too. We cannot get rid of spaces. A lot of
users
require them (and require even deeper hierarchies). A lot of our features and strong
points come
from here, although these features could be mostly reworked to be based on tags.
So, I think that we should put more power in tags. And we should keep spaces. And add
hierarchical
spaces, too. But we should change the way spaces and documents work.
The major problem with the current way spaces are implemented is that there is a strong
link between
spaces, document IDs, URLs and the whole platform code. This is wrong. URLs should be a
way to
access the wiki, and not a strict, unique reference to documents. Spaces should be a way
to organize
documents, not a major part of the document definition. Like in a FS, a file is NOT
defined by the
directory it resides in. You can move files around without changing the way the work or
the data
they contain. We should do the same. To go further, documents are not at all dependent on
the
document name, either. In a modern FS, a document can have several names in several
places, as
_hard_ and symbolic links.
One of the best things about XWiki (and in general of wikis as opposed to CMSs) is that
documents
have names, and not just numbers. But XWiki went too far with this, by using the names as
internal
identifiers. Confluence got it right, and internal IDs are unique identifiers, but pretty
names are
displayed/used by the users.
- Spaces should not be a part of documents, but a "feature", or
"property" of them.
- A document should have _at least_ one access name
- The way URLs identify documents should be pluggable. We kind of have this in one
direction, with
the URL factories, but we don't have it the other way around. The XWiki giant class
has only one
method for finding a document, given a URL.
- We should also have pluggable document identifier components. For example, the language
field
should not be hardcoded in the Document class, but another optional feature of documents.
The space
feature as well.
- When retrieving documents from the database, retrieve not an exactly identified
document, but one
that best matches a set of criterias. For example, retrieve a document that has the
"name"
"WebPreferences" (matches 172 documents); and which has the "space"
"Documents.Media.Music" (matches
3 documents); and which has the "language" "en" (matches 1 document).
All these should be optional
document properties. The only required document property is the unique ID.
- All these document identification features could (should?) be components (as in
Plexus/IoC
components).
Now, back to the DocumentName, it should not have a strong fixed type (wiki, name, space
and
language), but a loose collection of features. It should be able to have a constructor
that
interprets one map-like string (like Vincent proposed), a constructor that can interpret
old-style
document names, a constructor that receives a map (string -> string, feature name ->
value), and a
plain constructor, with the features being set later with
identifier.set("property", "value").
Since these are loose features, we can allow a Document object to have two names, or three
spaces.
Do we want to link features between them? Like, 2 names x 3 spaces = 6 classic
identifiers, or
should we be able to say the name N1 is only valid in space S1, and N2 is valid in S2 and
S3.
And it should not be called DocumentName. Maybe DocumentIdentifier is better? I'm too
sleepy right
now to come up with a good name.
Back to the DocumentNameFactory, as stated above, we should have several such factories,
each one
able to construct a document identifier from some specific data.
ServletURLDocumentIndentifierFactory accepts URLs as used in a servlet-based wiki,
PortletURLDocumentIdentifierFactory accepts URLs as used in a portlet wiki,
HierarchicalServletURLDocumentIndentifierFactory also accepts hierarchies, or extended
spaces,
XmlRpcDocumentIdentifierFactory works with XmlRpc, and so on.
One thing we must keep in mind is that XWiki also uses URLs to identify attachments and
files inside
attachments, and skin files located in the FS. Thus, I'm not sure these factories
should return
document identifiers, or a more general resource identifier (this would allow to identify
objects
and properties, too, or even the generic fragment identifiers Stephane used in
http://arkub.net/xwiki/bin/Blog/Farewell_SMTP).
On the tags vs. document features, I see them as related, but different in one essential
point:
features are typed tags. Now, do we want them to share the same storage mechanism, and the
same way
to access them from the Document object? Should they be stored together, with normal tags
as
features with no type, or with features as tags with a special syntax?
--
Sergiu Dumitriu
http://purl.org/net/sergiu/