On 08/09/2012 04:28 AM, Caleb James DeLisle wrote:
My changes are in a the
feature-attachment-outputstream branch
https://github.com/xwiki/xwiki-platform/commits/feature-attachment-outputst…
IMO the Hibernate blob handling is never really going to work in a cross-platform way.
Even if that is not true, it absolutely won't work with Cassandra.
For loading an attachment piecewise, OutputStreams are not the only possible solution
but they are (in my analysis) the best.
If we hold off on adding the API because there is no consumer in platform proper,
I'll
have to code around it and find another (inferior) solution and if piecewise attachment
loading ever does reach platform proper, everything will be developed and debugged
around
inferior solution.
That's my case for it, let me know if you have any specific concerns or if your veto
is on the general principle.
I want to get an idea of whether this should be cleaned up or killed quickly.
Thanks,
Caleb
After discussing with Caleb on IRC [1], I've come to the conclusion that
while the OutputStream method isn't mandatory, it would make things
easier for the NoSQL engines which don't actually have data streaming at
their core. So, working with byte array chunks is going to be a (the?)
way of supporting large attachments. And gathering a sequence of
BlobChunks and serializing them into an attachment works better with an
OutputStream where they can be written.
So, I'm re-changing my vote to +1 for the getContentOutputStream().
[1]
http://dev.xwiki.org/xwiki/bin/view/IRC/xwikiArchive20120810
On 08/07/2012 11:31 PM, Caleb James DeLisle wrote:
>
>
> On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote:
>> On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote:
>>> On 08/07/2012 03:43 PM, Denis Gervalle wrote:
>>>> From the current interface, I would use getContentOutputStream(),
since
>>>> this would be the opposite of getContentInputStream(). This seems to me
>>>> very descriptive compare to a setContent() returning an OutputStream,
>>>> since
>>>> a set is not supposed to return anything. I would use
>>>> setContent(OutputStream) if it was your goal, but this probably not what
>>>> you expect.
>>>>
>>>> So, +1 for getContentOutputStream()
>>>
>>> +1 as well.
>>
>> Actually, one thing I don't like is that the API is getting even bigger, with
lots of different ways of doing the same thing. Ideally, there should be only one way of
providing the content, and only one way of retrieving it. And IMO working with byte
streams is the right way, and I think that the current setContent(InputStream) method is
the one that's best.
>>
>> Returning an OutputStream where the caller can write the content is opening the
door to lots of potential problems. Providing such a method becomes API, and APIs should
be well designed. And an OutputStream in my head means some implicit constraints that must
be well documented:
>>
>
> I can answer for the implementation which I wrote as a result of this discussion.
>
>> - should the stream be closed at the end?
>
> Yes, if it is not closed then the content does not become "live".
>
>> - must the response be written before a transaction ends?
>
> It doesn't matter when it is written since it's just dealing with FileItems.
>
>> - what if nothing is written in the stream, does that mean that the old content
is preserved, or that the attachment is going to be truncated?
>
> The attachment becomes is made empty. It is legal to have a 0 byte file.
>
>>
>> Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since the
backend will get the data when it needs it, it doesn't have to document how long
it's going to wait for something to be pushed.
>>
>>
>> I agree that our current way of dealing with Hibernate is not right. We should
use proper blob streams for big data, like attachment content. And Hibernate's
LobCreator expects a Blob object that offers access to its content via an InputStream that
can be read. Again, Hibernate expects an input stream from which it will read when it
needs to, it doesn't allow you to write the content into an OutputStream when you
want.
>
> There are a few issues here. When I investigated this, Hibernate's blob handling
was (still is?) just a wrapper around blob management in the database which basically
doesn't work.
> Oracle has a proprietary blob storage method which works with streams, I think
Hibernate does not support this though.
> As I remember, mysql and postgres have blob APIs but they buffer everything in ram
making it nonsensical.
> This was investigated to solve attachment memory consumption issues and it was
concluded that it does not work, leading to FS attachment store.
>
> A solution which will work is to piece out the attachment and store each piece
separately, flushing the PersistenceManager or Session between save/load operations.
> they can be run in discrete transactions as long as the metadata contains a pointer
to the pieces and the old pieces are not removed until after all of the new pieces are
saved.
> On the load, you are loading a list of attachment pieces one by one and their content
must be written out before it is removed from memory.
>
> The options for making this work are:
>
> * copy the content into a temp file then copy it again the InputStream from that
file. This double buffers the content.
> * use PipedOutputStream to bounce it between two threads. (which I am doing now but
need to debug a problem with it.)
> * extend XWikiAttachmentContent for this use case.
> * add a method to XWikiAttachmentContent which allows for writing to it using an
OutputStream.
>
> Since this pattern would be applicable to Hibernate and the technology which came
from Cassandra attachments
> could be ported into the main trunk, I wanted to do the latter.
> If you are not swayed then I will return to evaluating the former methods.
>
> Caleb
>
>
>>
>>
>> So, consider this a -1 until you can convince me that it's indeed something
we want to include in our APIs.
>>
>>>> +0 for getOutputStream()
>>>> -0 for other previous proposals.
>>>>
>>>> On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle <
>>>> calebdelisle(a)lavabit.com> wrote:
>>>>
>>>>> getOutputStream() is not very descriptive although I suppose a good
>>>>> javadoc comment would alleviate
>>>>> the issue, I wrestled with the name myself and settled on
setContent()
>>>>> because it overloads the existing setContent() so it should be a bit
>>>>> easier to remember.
>>>>>
>>>>> If you guys like getOutputStream(), I'm happy enough with it.
>>>>>
>>>>> Caleb
>>>>>
>>>>>
>>>>> On 08/07/2012 03:24 AM, Thomas Mortagne wrote:
>>>>>> +1 for the idea in general but same comment than Marius
>>>>>>
>>>>>> On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea
>>>>>> <mariusdumitru.florea(a)xwiki.com> wrote:
>>>>>>> I understand the need and I'm +1 but I don't like the
method name
>>>>>>> (neither setContent() nor addContent()). WDYT about
getOutputStream()
>>>>>>> ?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Marius
>>>>>>>
>>>>>>> On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle
>>>>>>> <calebdelisle(a)lavabit.com> wrote:
>>>>>>>> Hi,
>>>>>>>> In the development of Cassandra attachments, I found I
want to
>>>>>>>> load an
>>>>> attachment one chunk at a time and
>>>>>>>> write that chunk to a provided OutputStream, this is how
I envision
>>>>> next generation Hibernate attachments working too.
>>>>>>>>
>>>>>>>> I would like to add to XWikiAttachmentContent:
>>>>>>>>
>>>>>>>> public OutputStream addContent();
>>>>>>>>
>>>>>>>> which returns an OutputStream that allows writing the
attachment
>>>>> content and upon close,
>>>>>>>> sets the attachment content as dirty and resets the size
field.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> Caleb
>>
--
Sergiu Dumitriu
http://purl.org/net/sergiu/