On 08/07/2012 10:52 PM, Sergiu Dumitriu wrote:
On 08/07/2012 03:44 PM, Sergiu Dumitriu wrote:
On 08/07/2012 03:43 PM, Denis Gervalle wrote:
From the current interface, I would use
getContentOutputStream(), since
this would be the opposite of getContentInputStream(). This seems to me
very descriptive compare to a setContent() returning an OutputStream,
since
a set is not supposed to return anything. I would use
setContent(OutputStream) if it was your goal, but this probably not what
you expect.
So, +1 for getContentOutputStream()
+1 as well.
Actually, one thing I don't like is that the API is getting even bigger, with lots of
different ways of doing the same thing. Ideally, there should be only one way of providing
the content, and only one way of retrieving it. And IMO working with byte streams is the
right way, and I think that the current setContent(InputStream) method is the one
that's best.
Returning an OutputStream where the caller can write the content is opening the door to
lots of potential problems. Providing such a method becomes API, and APIs should be well
designed. And an OutputStream in my head means some implicit constraints that must be well
documented:
I can answer for the implementation which I wrote as a result of this discussion.
- should the stream be closed at the end?
Yes, if it is not closed then the content does not become "live".
- must the response be written before a transaction
ends?
It doesn't matter when it is written since it's just dealing with FileItems.
- what if nothing is written in the stream, does that
mean that the old content is preserved, or that the attachment is going to be truncated?
The attachment becomes is made empty. It is legal to have a 0 byte file.
Basically, this is a _push_ API, and I for one prefer _pull_ APIs, since the backend will
get the data when it needs it, it doesn't have to document how long it's going to
wait for something to be pushed.
I agree that our current way of dealing with Hibernate is not right. We should use proper
blob streams for big data, like attachment content. And Hibernate's LobCreator expects
a Blob object that offers access to its content via an InputStream that can be read.
Again, Hibernate expects an input stream from which it will read when it needs to, it
doesn't allow you to write the content into an OutputStream when you want.
There are a few issues here. When I investigated this, Hibernate's blob handling was
(still is?) just a wrapper around blob management in the database which basically
doesn't work.
Oracle has a proprietary blob storage method which works with streams, I think Hibernate
does not support this though.
As I remember, mysql and postgres have blob APIs but they buffer everything in ram making
it nonsensical.
This was investigated to solve attachment memory consumption issues and it was concluded
that it does not work, leading to FS attachment store.
A solution which will work is to piece out the attachment and store each piece separately,
flushing the PersistenceManager or Session between save/load operations.
they can be run in discrete transactions as long as the metadata contains a pointer to the
pieces and the old pieces are not removed until after all of the new pieces are saved.
On the load, you are loading a list of attachment pieces one by one and their content must
be written out before it is removed from memory.
The options for making this work are:
* copy the content into a temp file then copy it again the InputStream from that file.
This double buffers the content.
* use PipedOutputStream to bounce it between two threads. (which I am doing now but need
to debug a problem with it.)
* extend XWikiAttachmentContent for this use case.
* add a method to XWikiAttachmentContent which allows for writing to it using an
OutputStream.
Since this pattern would be applicable to Hibernate and the technology which came from
Cassandra attachments
could be ported into the main trunk, I wanted to do the latter.
If you are not swayed then I will return to evaluating the former methods.
Caleb
So, consider this a -1 until you can convince me that it's indeed something we want
to include in our APIs.
> +0 for getOutputStream()
> -0 for other previous proposals.
>
> On Tue, Aug 7, 2012 at 9:01 PM, Caleb James DeLisle <
> calebdelisle(a)lavabit.com> wrote:
>
>> getOutputStream() is not very descriptive although I suppose a good
>> javadoc comment would alleviate
>> the issue, I wrestled with the name myself and settled on setContent()
>> because it overloads the existing setContent() so it should be a bit
>> easier to remember.
>>
>> If you guys like getOutputStream(), I'm happy enough with it.
>>
>> Caleb
>>
>>
>> On 08/07/2012 03:24 AM, Thomas Mortagne wrote:
>>> +1 for the idea in general but same comment than Marius
>>>
>>> On Tue, Aug 7, 2012 at 7:35 AM, Marius Dumitru Florea
>>> <mariusdumitru.florea(a)xwiki.com> wrote:
>>>> I understand the need and I'm +1 but I don't like the method
name
>>>> (neither setContent() nor addContent()). WDYT about getOutputStream()
>>>> ?
>>>>
>>>> Thanks,
>>>> Marius
>>>>
>>>> On Tue, Aug 7, 2012 at 12:06 AM, Caleb James DeLisle
>>>> <calebdelisle(a)lavabit.com> wrote:
>>>>> Hi,
>>>>> In the development of Cassandra attachments, I found I want to
>>>>> load an
>> attachment one chunk at a time and
>>>>> write that chunk to a provided OutputStream, this is how I envision
>> next generation Hibernate attachments working too.
>>>>>
>>>>> I would like to add to XWikiAttachmentContent:
>>>>>
>>>>> public OutputStream addContent();
>>>>>
>>>>> which returns an OutputStream that allows writing the attachment
>> content and upon close,
>>>>> sets the attachment content as dirty and resets the size field.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Caleb