On 18 Feb 2015 at 17:41:29, Arnold, Garth (arnold.g@ghc.org(mailto:arnold.g@ghc.org))
wrote:
  Hi Vincent - thanks for the reply. Are both the 7.x
series and 6.4.x using Tika 1.7? 
 Garth
 -----Original Message-----
 Message: 5
 Date: Wed, 18 Feb 2015 15:32:17 +0100
 From: "=?utf-8?Q?vincent=40massol.net?="
 To: Marius Dumitru Florea , XWiki
 Users
 Subject: Re: [xwiki-users] XWiki search/Solr support for additional
 filetypes
 Message-ID:
 Content-Type: text/plain; charset="utf-8"
 On 16 Dec 2014 at 11:48:44, Marius Dumitru Florea
(mariusdumitru.florea@xwiki.com(mailto:mariusdumitru.florea@xwiki.com)) wrote:
  On Tue, Dec 16, 2014 at 2:11 AM, Arnold, Garth
wrote:
  Hello Marius - thank you for the detailed reply.
My goal is (2) - to find all documents with a .7z attachment, where those attachments
include file(s) containing "foo". If I read your email correctly, Tika 1.6 (5)
is root cause for my failure to search successfully for text within the files contained in
a .7z attachment. I am successful with my search when using a .zip file as the attachment
- so we will instruct wiki users to avoid .7z attachments. 
 Yes, at least until we upgrade to Tika 1.7. 
 FTR we?re now using Tika 1.7 in the latest versions of XWiki.
 Thanks
 -Vincent
 > Thanks,
 > Marius
 >
 > >
 > > Garth
 > >
 > >> -----Original Message-----
 > >> Message: 2
 > >> Date: Thu, 11 Dec 2014 08:42:20 +0200
 > >> From: Marius Dumitru Florea
 > >> To: XWiki Users
 > >> Subject: Re: [xwiki-users] XWiki search/Solr support for additional
 > >> filetypes
 > >> Message-ID:
 > >> > >> AkCcNA(a)mail.gmail.com>
 > >> Content-Type: text/plain; charset=UTF-8
 > >>
 > >> It depends what you mean by "search attachments that are 7-Zip .7z
 > >> archives":
 > >>
 > >> (1) Give me all the documents that have an attachment of mime type
 > >> application/x-7z-compressed
 > >> (2) Give me all the documents that have a 7-Zip archive attached that
 > >> includes a file that contains the word "foo"
 > >>
 > >> If you use Solr, the default search engine for XWiki 6.2.4, then the
 > >> code that is responsible for indexing the attachments is
 > >> AttachmentSolrMetadataExtractor [1]. This is a component so it can be
 > >> overridden as per [2]. The current implementation uses Tika [3] to:
 > >>
 > >> (1) detect the mime type of the attachment
 > >> (2) extract indexable content from the attachment (whatever its mime
 > >> type may be)
 > >>
 > >> For (1) Tika supports detecting the 7-Zip mime type since version 1.2
 > >> [4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP
 > >> archives but there were some issues in 1.6 that have been fixed in
 > >> 1.7. We are currently using Tika 1.6 in XWiki. We should probably
 > >> upgrade.
 > >>
 > >> Hope this helps,
 > >> Marius
 > >>
 > >> [1] 
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
 > >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
 > >> search-solr-
 > >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr
 > >> MetadataExtractor.java
 > >> [2]
 > >> 
http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module
 > >> #HOverrides
 > >> [3] 
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
 > >> core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
 > >> search-solr-
 > >> api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet
 > >> adataExtractor.java#L458
 > >> [4] 
https://issues.apache.org/jira/browse/TIKA-940
 > >> [5] 
https://issues.apache.org/jira/browse/TIKA-1411
 > >>
 > >> On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth wrote:
 > >> > Hello - is it possible to enable searching of additional filetypes
within XWiki
 > >> 6.2.4? Specifically I would like to be able to search attachments that are
7-Zip
 > >> .7z archives. It looks to me as though the underlying library (Commons
 > >> Compress) supports this filetype, but I am a new XWiki user and non-java
 > >> programmer so I may be assuming too much.
 > >> >
 > >> > Thanks in advance for your thoughts on this -
 > >> >
 > >> > Garth Arnold