[xwiki-devs] [proposal] Add an xwiki-store module with a UUID user defined type.
Hi all, In order to allow XWikiAttachment to reference data in multiple places, we need XWikiAttachment to know the location of it's content. XWikiAttachmentContent uses an id which is identical to the XWikiAttachment id meaning XWikiAttachmentContent has no id of it's own, only a foreign key which points to the content. To remedy this I propose the addition of a UUID hibernate UserType. This type will store a UUID as a 16 byte VARBINARY entry (little endian encoding) in order to minimize the size and database load. Note: I discovered that a UUID type was added in a later version of hibernate so when we upgrade we can decide whether to begin using their implementation. I would like to add the UUID type as the first class in a new module xwiki-store (in a submodule called xwiki-store-hibernate). I think the best approach is to add new database code to xwiki-store slowly until eventually the storage drivers in the core are no longer used thus "moving a mountain one shovel full at a time". The UUID implementation: http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-h... How it works: Add this to the xwiki.hbm.xml where you want to add a UUID: <property name="contentUUID" type="org.xwiki.store.hibernate.types.UUIDToBinaryType"> <column name="contentuuid" /> </property> (type can be mapped to a shorter name such as "UUID") Add this to the bean class: public UUID getContentUUID() public void setContentUUID(final UUID contentUUID) Hibernate takes care of the rest. WDYT? Caleb
No opinions on this whatsoever? Should the lack of support be taken as a general -1? In this case I will have to rethink how I can improve attachment storage as an entirely separate drop in module. This is unfortunate since rewriting will mean delays and inability to alter the core will probably mean hacks. Caleb On 11/05/2010 04:27 AM, Caleb James DeLisle wrote:
Hi all,
In order to allow XWikiAttachment to reference data in multiple places, we need XWikiAttachment to know the location of it's content. XWikiAttachmentContent uses an id which is identical to the XWikiAttachment id meaning XWikiAttachmentContent has no id of it's own, only a foreign key which points to the content.
To remedy this I propose the addition of a UUID hibernate UserType. This type will store a UUID as a 16 byte VARBINARY entry (little endian encoding) in order to minimize the size and database load.
Note: I discovered that a UUID type was added in a later version of hibernate so when we upgrade we can decide whether to begin using their implementation.
I would like to add the UUID type as the first class in a new module xwiki-store (in a submodule called xwiki-store-hibernate). I think the best approach is to add new database code to xwiki-store slowly until eventually the storage drivers in the core are no longer used thus "moving a mountain one shovel full at a time".
The UUID implementation: http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-h...
How it works: Add this to the xwiki.hbm.xml where you want to add a UUID: <property name="contentUUID" type="org.xwiki.store.hibernate.types.UUIDToBinaryType"> <column name="contentuuid" /> </property>
(type can be mapped to a shorter name such as "UUID")
Add this to the bean class: public UUID getContentUUID()
public void setContentUUID(final UUID contentUUID)
Hibernate takes care of the rest.
WDYT?
Caleb
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
Hi Caleb, On Tue, Nov 9, 2010 at 21:08, Caleb James DeLisle <[email protected]>wrote:
No opinions on this whatsoever? Should the lack of support be taken as a general -1? In this case I will have to rethink how I can improve attachment storage as an entirely separate drop in module. This is unfortunate since rewriting will mean delays and inability to alter the core will probably mean hacks.
No, I rather think that the lack of answers comes from the high technical level of the overall work to be done in this area combined with a general lack of expertise on the developer's side with regard to what's the best thing to do to improve the storage (else it would have been done already). In short: you've got my support to make things better in this area by whatever means you deem fit, although I'm of no help about practical implementation details. Let's say this is my non-binding +1 ;-) Guillaume
Caleb
On 11/05/2010 04:27 AM, Caleb James DeLisle wrote:
Hi all,
In order to allow XWikiAttachment to reference data in multiple places, we need XWikiAttachment to know the location of it's content. XWikiAttachmentContent uses an id which is identical to the XWikiAttachment id meaning XWikiAttachmentContent has no id of it's own, only a foreign key which points to the content.
To remedy this I propose the addition of a UUID hibernate UserType. This type will store a UUID as a 16 byte VARBINARY entry (little endian encoding) in order to minimize the size and database load.
Note: I discovered that a UUID type was added in a later version of hibernate so when we upgrade we can decide whether to begin using their implementation.
I would like to add the UUID type as the first class in a new module xwiki-store (in a submodule called xwiki-store-hibernate). I think the best approach is to add new database code to xwiki-store slowly until eventually the storage drivers in the core are no longer used thus "moving a mountain one shovel full at a time".
The UUID implementation:
http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-h...
How it works: Add this to the xwiki.hbm.xml where you want to add a UUID: <property name="contentUUID"
type="org.xwiki.store.hibernate.types.UUIDToBinaryType">
<column name="contentuuid" /> </property>
(type can be mapped to a shorter name such as "UUID")
Add this to the bean class: public UUID getContentUUID()
public void setContentUUID(final UUID contentUUID)
Hibernate takes care of the rest.
WDYT?
Caleb
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
Caleb, I share the thoughts of Guillaume: it's definitely something I wish to encourage and that big platforms such as Curriki and i2geo would enjoy for scalability! Is there any reason to use a varbinary instead of a simple string? At least string would be easier to export and develop with. paul On 10 nov. 2010, at 10:02, Guillaume Lerouge wrote:
On Tue, Nov 9, 2010 at 21:08, Caleb James DeLisle <[email protected]>wrote:
No opinions on this whatsoever? Should the lack of support be taken as a general -1? In this case I will have to rethink how I can improve attachment storage as an entirely separate drop in module. This is unfortunate since rewriting will mean delays and inability to alter the core will probably mean hacks.
No, I rather think that the lack of answers comes from the high technical level of the overall work to be done in this area combined with a general lack of expertise on the developer's side with regard to what's the best thing to do to improve the storage (else it would have been done already).
In short: you've got my support to make things better in this area by whatever means you deem fit, although I'm of no help about practical implementation details. Let's say this is my non-binding +1 ;-)
Guillaume
Caleb
On 11/05/2010 04:27 AM, Caleb James DeLisle wrote:
Hi all,
In order to allow XWikiAttachment to reference data in multiple places, we need XWikiAttachment to know the location of it's content. XWikiAttachmentContent uses an id which is identical to the XWikiAttachment id meaning XWikiAttachmentContent has no id of it's own, only a foreign key which points to the content.
To remedy this I propose the addition of a UUID hibernate UserType. This type will store a UUID as a 16 byte VARBINARY entry (little endian encoding) in order to minimize the size and database load.
Note: I discovered that a UUID type was added in a later version of hibernate so when we upgrade we can decide whether to begin using their implementation.
I would like to add the UUID type as the first class in a new module xwiki-store (in a submodule called xwiki-store-hibernate). I think the best approach is to add new database code to xwiki-store slowly until eventually the storage drivers in the core are no longer used thus "moving a mountain one shovel full at a time".
The UUID implementation:
http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-h...
How it works: Add this to the xwiki.hbm.xml where you want to add a UUID: <property name="contentUUID"
type="org.xwiki.store.hibernate.types.UUIDToBinaryType">
<column name="contentuuid" /> </property>
(type can be mapped to a shorter name such as "UUID")
Add this to the bean class: public UUID getContentUUID()
public void setContentUUID(final UUID contentUUID)
Hibernate takes care of the rest.
WDYT?
Caleb
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
On 11/10/2010 10:02 AM, Paul Libbrecht wrote:
Caleb,
I share the thoughts of Guillaume: it's definitely something I wish to encourage and that big platforms such as Curriki and i2geo would enjoy for scalability!
Is there any reason to use a varbinary instead of a simple string? At least string would be easier to export and develop with.
The reason is because varbinary will occupy 16 bytes while a string will occupy 36. This becomes important when the database has to find an entry, This will require comparison of the entries and 16 byte entries will compare about twice as fast as 36 byte entries. If UUIDs are used for everything then it's conceivable to have in a document with 10 objects each having 5 properties, 100 lookups per getDocument. So it's a question of performance vs ease of use (when accessing the database via sql). Caleb
paul
On 10 nov. 2010, at 10:02, Guillaume Lerouge wrote:
On Tue, Nov 9, 2010 at 21:08, Caleb James DeLisle <[email protected]>wrote:
No opinions on this whatsoever? Should the lack of support be taken as a general -1? In this case I will have to rethink how I can improve attachment storage as an entirely separate drop in module. This is unfortunate since rewriting will mean delays and inability to alter the core will probably mean hacks.
No, I rather think that the lack of answers comes from the high technical level of the overall work to be done in this area combined with a general lack of expertise on the developer's side with regard to what's the best thing to do to improve the storage (else it would have been done already).
In short: you've got my support to make things better in this area by whatever means you deem fit, although I'm of no help about practical implementation details. Let's say this is my non-binding +1 ;-)
Guillaume
Caleb
On 11/05/2010 04:27 AM, Caleb James DeLisle wrote:
Hi all,
In order to allow XWikiAttachment to reference data in multiple places, we need XWikiAttachment to know the location of it's content. XWikiAttachmentContent uses an id which is identical to the XWikiAttachment id meaning XWikiAttachmentContent has no id of it's own, only a foreign key which points to the content.
To remedy this I propose the addition of a UUID hibernate UserType. This type will store a UUID as a 16 byte VARBINARY entry (little endian encoding) in order to minimize the size and database load.
Note: I discovered that a UUID type was added in a later version of hibernate so when we upgrade we can decide whether to begin using their implementation.
I would like to add the UUID type as the first class in a new module xwiki-store (in a submodule called xwiki-store-hibernate). I think the best approach is to add new database code to xwiki-store slowly until eventually the storage drivers in the core are no longer used thus "moving a mountain one shovel full at a time".
The UUID implementation:
http://svn.xwiki.org/svnroot/xwiki/contrib/sandbox/xwiki-store/xwiki-store-h...
How it works: Add this to the xwiki.hbm.xml where you want to add a UUID: <property name="contentUUID"
type="org.xwiki.store.hibernate.types.UUIDToBinaryType">
<column name="contentuuid" /> </property>
(type can be mapped to a shorter name such as "UUID")
Add this to the bean class: public UUID getContentUUID()
public void setContentUUID(final UUID contentUUID)
Hibernate takes care of the rest.
WDYT?
Caleb
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
Caleb, I think that any performance-oriented comparison of keys will use indices. And in this case, the comparison between strings (even fairly long) and binary bits has the exact same performance. paul On 10 nov. 2010, at 16:50, Caleb James DeLisle wrote:
Is there any reason to use a varbinary instead of a simple string? At least string would be easier to export and develop with.
The reason is because varbinary will occupy 16 bytes while a string will occupy 36. This becomes important when the database has to find an entry, This will require comparison of the entries and 16 byte entries will compare about twice as fast as 36 byte entries. If UUIDs are used for everything then it's conceivable to have in a document with 10 objects each having 5 properties, 100 lookups per getDocument. So it's a question of performance vs ease of use (when accessing the database via sql).
On 11/10/2010 04:25 PM, Paul Libbrecht wrote:
Caleb,
I think that any performance-oriented comparison of keys will use indices. And in this case, the comparison between strings (even fairly long) and binary bits has the exact same performance.
I remember reading this which said that UUIDs should be stored as binary rather than varchar. http://kekoav.com/blog/36-computers/58-uuids-as-primary-keys-in-mysql.html It appears that inserts can be given as SET id=0x1e8ef774581c102cbcfef1ab81872213 and (in mysql) can be returned in human readable form using SELECT HEX(id) FROM... http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_hex Regarding indeces, my understanding is an index makes the column read as if it were sequential thus allowing for a binary search. Still a binary search means the number of load and compare operations is log2(length of table) so in a million entry table, 20 lookups are needed to find an entry (if the index is up to date). Perhaps I am splitting hairs on this, I know premature optimization is often a fool's errand. Another issue which is more difficult to measure is memory bounding. If a primary key column can be loaded into the processor cache then lookups will be orders of magnitude faster because loading from ram memory can take around 500 processor cycles. Unfortunately this kind of thing is rather magical unless one is programming in assembly. Those are my points, you are a user so I'm most interested to hear how you would benefit from storage as a string. Caleb
paul
On 10 nov. 2010, at 16:50, Caleb James DeLisle wrote:
Is there any reason to use a varbinary instead of a simple string? At least string would be easier to export and develop with.
The reason is because varbinary will occupy 16 bytes while a string will occupy 36. This becomes important when the database has to find an entry, This will require comparison of the entries and 16 byte entries will compare about twice as fast as 36 byte entries. If UUIDs are used for everything then it's conceivable to have in a document with 10 objects each having 5 properties, 100 lookups per getDocument. So it's a question of performance vs ease of use (when accessing the database via sql).
_______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs
participants (3)
-
Caleb James DeLisle -
Guillaume Lerouge -
Paul Libbrecht