Hi Vincent - thanks for the reply. Are both the 7.x series and 6.4.x using Tika 1.7?
Garth
-----Original Message-----
Message: 5
Date: Wed, 18 Feb 2015 15:32:17 +0100
From: "=?utf-8?Q?vincent=40massol.net?=" <vincent(a)massol.net>
To: Marius Dumitru Florea <mariusdumitru.florea(a)xwiki.com>om>, XWiki
Users <users(a)xwiki.org>
Subject: Re: [xwiki-users] XWiki search/Solr support for additional
filetypes
Message-ID: <etPan.54e4a271.12200854.269a(a)vmassol.local>
Content-Type: text/plain; charset="utf-8"
On 16 Dec 2014 at 11:48:44, Marius Dumitru Florea
(mariusdumitru.florea@xwiki.com(mailto:mariusdumitru.florea@xwiki.com)) wrote:
On Tue, Dec 16, 2014 at 2:11 AM, Arnold, Garth wrote:
Hello Marius - thank you for the detailed reply.
My goal is (2) - to find all documents with a .7z attachment, where those attachments
include file(s) containing "foo". If I read your email correctly, Tika 1.6 (5)
is root cause for my failure to search successfully for text within the files contained in
a .7z attachment. I am successful with my search when using a .zip file as the attachment
- so we will instruct wiki users to avoid .7z attachments.
Yes, at least until we upgrade to Tika 1.7.
FTR we?re now using Tika 1.7 in the latest versions of XWiki.
Thanks
-Vincent
Thanks,
Marius
>
> Garth
>
>> -----Original Message-----
>> Message: 2
>> Date: Thu, 11 Dec 2014 08:42:20 +0200
>> From: Marius Dumitru Florea
>> To: XWiki Users
>> Subject: Re: [xwiki-users] XWiki search/Solr support for additional
>> filetypes
>> Message-ID:
>> > >> AkCcNA(a)mail.gmail.com>
>> Content-Type: text/plain; charset=UTF-8
>>
>> It depends what you mean by "search attachments that are 7-Zip .7z
>> archives":
>>
>> (1) Give me all the documents that have an attachment of mime type
>> application/x-7z-compressed
>> (2) Give me all the documents that have a 7-Zip archive attached that
>> includes a file that contains the word "foo"
>>
>> If you use Solr, the default search engine for XWiki 6.2.4, then the
>> code that is responsible for indexing the attachments is
>> AttachmentSolrMetadataExtractor [1]. This is a component so it can be
>> overridden as per [2]. The current implementation uses Tika [3] to:
>>
>> (1) detect the mime type of the attachment
>> (2) extract indexable content from the attachment (whatever its mime
>> type may be)
>>
>> For (1) Tika supports detecting the 7-Zip mime type since version 1.2
>> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP
>> archives but there were some issues in 1.6 that have been fixed in
>> 1.7. We are currently using Tika 1.6 in XWiki. We should probably
>> upgrade.
>>
>> Hope this helps,
>> Marius
>>
>> [1]
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
>> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
>> search-solr-
>> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr
>> MetadataExtractor.java
>> [2]
>>
http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module
>> #HOverrides
>> [3]
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
>> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
>> search-solr-
>> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet
>> adataExtractor.java#L458
>> [4]
https://issues.apache.org/jira/browse/TIKA-940
>> [5]
https://issues.apache.org/jira/browse/TIKA-1411
>>
>> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth wrote:
>> > Hello - is it possible to enable searching of additional filetypes within
XWiki
>> 6.2.4? Specifically I would like to be able to search attachments that are
7-Zip
>> .7z archives. It looks to me as though the underlying library (Commons
>> Compress) supports this filetype, but I am a new XWiki user and non-java
>> programmer so I may be assuming too much.
>> >
>> > Thanks in advance for your thoughts on this -
>> >
>> > Garth Arnold
________________________________
GHC Confidentiality Statement
This message and any attached files might contain confidential information protected by
federal and state law. The information is intended only for the use of the individual(s)
or entities originally named as addressees. The improper disclosure of such information
may be subject to civil or criminal penalties. If this message reached you in error,
please contact the sender and destroy this message. Disclosing, copying, forwarding, or
distributing the information by unauthorized individuals or entities is strictly
prohibited by law.