Hey Jim,
thanks for this long e-mail ! i see you did a lot of work and deep
thinking since our last conversation, and if you don't mind i'm going
to move this conversation to the developer's list, because i think
other people may have interesting points to raise into this
conversation ...
I need some time to reply to all the points you list, but several
points can be adressed right now :
- HQL is Hibernate query langage. Similar to SQL, but RDBMS
independant and allows you to directly use Plain Old Java Objects
(POJOs) and collections of POJO (one-to-many, many-to-many, etc ...).
You need to read the mapping (xwiki.hbm.xml) which maps those POJOs to
tables, columns, and relations to understand the queries there.
- because we use this framework, most of the benefit of stored
procedures (no sql in code, independence) is of no use for our
particular application. Regarding performance, i personally don't
think the models and problems of the xwiki data stores are complex
enough to be moved to the backend database. This might not be the case
for applications built on top of xwiki, and hibernate provides a very
limited support for calling stored procedures.
- about FK and PK constraints, it's true we might run into situations
where the data model would be corrupted by an untested commit and/or a
badly developped plugin. We need to balance the cost of enhancing the
data model to what you propose and its real benefit. My opinion is
that this would make the learning curve less steep for people starting
to develop on our platform. But the Document model is well
encapsulated at the very bottom of the Object model, and further
isolated by Hibernate, so i don't think this is that much of a
problem.
Again, your mileage may vary and this is the beauty of the OSS
development model :-)
Always happy to discuss this matter,
Erwan
On 11/23/05, Jim Stuttard <jastuttard(a)yahoo.co.uk> wrote:
Hi Erwin,
Thanks for the offer of some help in looking at normalisation.
I've now managed to read the hibernate docs, run some queries against my
mysql ISAM database
and make a few notes.
It would be very useful at least if you wouln't mind checking, correcting
and refuting some of the things I'm trying to document. My aim would be to
have some more sound assertions about xwiki to build on?
Here are some preliminary topics we might consider:
1. MySql | Hypersonic | HQL
* remove all sql into stored procedures. I see mysql now supports them;
don't know anything about hypersonic.
* what language is this? It appears to be joining a property value
classname to a StringProperty name property id.id?
"select prop.value from BaseObject as obj, StringProperty as prop where
obj.className='Blog.Categories' and prop.id.id = obj.id and
prop.id.name='name'</sql>"
A bit difficult to understand how this fits in with my mysql tables but
I'll pursue this and a few other queries in the code. (PS. I've probably
read it wrong anyway)
* I think I understand that the back-end store is not very important for
referential integrity, more a data dump which doesn't necessarily have to
be very organised or have any semantics. I think the smart idea used to be
to dump as much processing into the back-end, Oracle, Siebel, SAP and even
DB2 etc. because only their prefered architecture would ever run reliably,
never mind fast enough.
I think still agree with that. Stored procedures are not part of the data
model; operations are part of some business logic controller. I can choose
to partition my MVC that way and (for performance,SOC and maintenance
reasons probably), I have never come across a better way. So I think I'd
like to ask how much processing should go into the back end?
To keep lightweight, portable and upgradeable I see that minimum
interfaces are required. I don't think I would ever use an RDBMS which
didn't support stored procs. SQL syntax translation isn't too difficult is
it?
* I note that InnoDB (vs ISAM) is needed in mysql to support explicit
foreign key constraints.
a. Is this digging in to techniques which will end up as a lock-in? Or
b. Foreign key constraints are one's best handled by the DB? Or
c. Use surrogate keys wherever possible, only natural keys where it makes
performance sense and avoid composite
keys. However there *are* still FK constraints between table columns.
IMHO they should be enforced ASAP in processing, ie. before stuff
comes
out of the database.
* The XWiki documentation seems to refer to HSQL a lot. Are Xprtnet
planning to stablise on that or HQL, both or other?
2. Hibernate
* I see Ludovic has followed the joined-table syntax with slistclasses and
dblistclasses. This seems the best option for inheritance. I haven't quite
managed to get my head round when to use the union-table syntax?
* The Hibernate documentation says one thing and does another when it
comes to natural and composite keys. I suppose it's the same with me if
you talk about address field sets: very inefficient to normalise so use a
surrogate key.
* I can only see one-table-per class as the least risk first choice model.
At a quick scan, per hierarchy and concrete class options seem to have too
many drawbacks. But when it comes down to it the Hibs use natural
composites, then add an OID.
I thinks OIDs are irrelevant to an RDBMS except when we have multi-valued
fields stored as indexed lists or arrays.
Then there is a number column eg. listitem number and this is what
hibernate persists.
* Anyway I assume there is a stable consensus for the time being on
following some hibernate OR architecture?
2. Data Analysis
* Anyway down to work. So we have something to work on, I've attached a
dumped zip xls file of current xwiki PK data.
It seems easier to read these things on a spreadsheet. Can you check that
this is a fairly default set of data with a few additional pages? Because
I have had to manually delete a few "question" and "answer" fields
from
both strings table and properties table when failing to build a FAQ, or
othe reasons, it might be corrupt.
* If we had a reference data set we could possibly infer a lot more
relationships?
* I have loads of questions, notes and assumptions but this is enough for
one email. I'll send the next one ASAP.
3. Goals
SPerhaps a specification of some AOP module for run-time data integrity
checking? Might be useful and good AOP and annotation practice?
Best Regards
Jim
--
Jim Stuttard