Hey Jim,
thanks for this long e-mail ! i see you did a lot of work and deep
thinking since our last conversation, and if you don't mind i'm going
to move this conversation to the developer's list, because i think
other people may have interesting points to raise into this
conversation ...
I need some time to reply to all the points you list, but several
points can be adressed right now :
- HQL is Hibernate query langage. Similar to SQL, but RDBMS
independant and allows you to directly use Plain Old Java Objects
(POJOs) and collections of POJO (one-to-many, many-to-many, etc ...).
You need to read the mapping (xwiki.hbm.xml) which maps those POJOs to
tables, columns, and relations to understand the queries there.
- because we use this framework, most of the benefit of stored
procedures (no sql in code, independence) is of no use for our
particular application. Regarding performance, i personally don't
think the models and problems of the xwiki data stores are complex
enough to be moved to the backend database. This might not be the case
for applications built on top of xwiki, and hibernate provides a very
limited support for calling stored procedures.
- about FK and PK constraints, it's true we might run into situations
where the data model would be corrupted by an untested commit and/or a
badly developped plugin. We need to balance the cost of enhancing the
data model to what you propose and its real benefit. My opinion is
that this would make the learning curve less steep for people starting
to develop on our platform. But the Document model is well
encapsulated at the very bottom of the Object model, and further
isolated by Hibernate, so i don't think this is that much of a
problem.
Again, your mileage may vary and this is the beauty of the OSS
development model :-)
Always happy to discuss this matter,
Erwan
On 11/23/05, Jim Stuttard <jastuttard(a)yahoo.co.uk> wrote:
  Hi Erwin,
 Thanks for the offer of some help in looking at normalisation.
 I've now managed to read the hibernate docs, run some queries against my
 mysql ISAM database
 and make a few notes.
 It would be very useful at least if you wouln't mind checking, correcting
 and refuting some of the things I'm trying to document. My aim would be to
 have some more sound assertions about xwiki to build on?
 Here are some preliminary topics we might consider:
 1. MySql | Hypersonic | HQL
 * remove all sql into stored procedures. I see mysql now supports them;
 don't know anything about hypersonic.
 * what language is this? It appears to be joining a property value
 classname to a StringProperty name property id.id?
 "select prop.value from BaseObject as obj, StringProperty as prop where
 obj.className='Blog.Categories' and prop.id.id = obj.id and
 prop.id.name='name'</sql>"
 A bit difficult to understand how this fits in with my mysql tables but
 I'll pursue this and a few other queries in the code. (PS. I've probably
 read it wrong anyway)
 *  I think I understand that the back-end store is not very important for
 referential integrity, more a data dump which doesn't necessarily have to
 be very organised or have any semantics. I think the smart idea used to be
 to dump as much processing into the back-end, Oracle, Siebel, SAP and even
 DB2 etc. because only their prefered architecture would ever run reliably,
 never mind fast enough.
 I think still agree with that. Stored procedures are not part of the data
 model; operations are part of some business logic controller. I can choose
 to partition my MVC that way and (for performance,SOC and maintenance
 reasons probably), I have never come across a better way. So I think I'd
 like to ask how much processing should go into the back end?
 To keep lightweight, portable and upgradeable I see that minimum
 interfaces are required. I don't think I would ever use an RDBMS which
 didn't support stored procs. SQL syntax translation isn't too difficult is
 it?
 * I note that InnoDB (vs ISAM) is needed in mysql to support explicit
 foreign key constraints.
         a. Is this digging in to techniques which will end up as a lock-in? Or
         b. Foreign key constraints are one's best handled by the DB? Or
         c. Use surrogate keys wherever possible, only natural keys where it makes
 performance sense and avoid composite
         keys. However there *are* still FK constraints between table columns.
 IMHO they should be enforced ASAP in                    processing, ie. before stuff
comes
 out of the database.
 * The XWiki documentation seems to refer to HSQL a lot. Are Xprtnet
 planning to stablise on that or HQL, both or other?
 2. Hibernate
 * I see Ludovic has followed the joined-table syntax with slistclasses and
 dblistclasses. This seems the best option for inheritance. I haven't quite
 managed to get my head round when to use the union-table syntax?
 * The Hibernate documentation says one thing and does another when it
 comes to natural and composite keys. I suppose it's the same with me if
 you talk about address field sets: very inefficient to normalise so use a
 surrogate key.
 * I can only see one-table-per class as the least risk first choice model.
 At a quick scan, per hierarchy and concrete class options seem to have too
 many drawbacks. But when it comes down to it the Hibs use natural
 composites, then add an OID.
 I thinks OIDs are irrelevant to an RDBMS except when we have multi-valued
 fields stored as indexed lists or arrays.
 Then there is a number column eg. listitem number and this is what
 hibernate persists.
 * Anyway I assume there is a stable consensus for the time being on
 following some hibernate OR architecture?
 2. Data Analysis
 * Anyway down to work. So we have something to work on, I've attached a
 dumped zip xls file of current xwiki PK data.
 It seems easier to read these things on a spreadsheet. Can you check that
 this is a fairly default set of data with a few additional pages? Because
 I have had to manually delete a few "question" and "answer" fields
from
 both strings table and properties table when failing to build a FAQ, or
 othe reasons, it might be corrupt.
 * If we had a reference data set we could possibly infer a lot more
 relationships?
 * I have loads of questions, notes and assumptions but this is enough for
 one email. I'll send the next one ASAP.
 3. Goals
 SPerhaps a specification of some AOP module for run-time data integrity
 checking? Might be useful and good AOP and annotation practice?
 Best Regards
 Jim
 --
 Jim Stuttard