-----Original Message-----
Message: 2
Date: Thu, 11 Dec 2014 08:42:20 +0200
From: Marius Dumitru Florea <mariusdumitru.florea(a)xwiki.com>
To: XWiki Users <users(a)xwiki.org>
Subject: Re: [xwiki-users] XWiki search/Solr support for additional
filetypes
Message-ID:
<CALZcbBbprk=SJjhqGKKX1tx-TcMQbcq+qby6ZfnQqXZ-
AkCcNA(a)mail.gmail.com>
Content-Type: text/plain; charset=UTF-8
It depends what you mean by "search attachments that are 7-Zip .7z
archives":
(1) Give me all the documents that have an attachment of mime type
application/x-7z-compressed
(2) Give me all the documents that have a 7-Zip archive attached that
includes a file that contains the word "foo"
If you use Solr, the default search engine for XWiki 6.2.4, then the
code that is responsible for indexing the attachments is
AttachmentSolrMetadataExtractor [1]. This is a component so it can be
overridden as per [2]. The current implementation uses Tika [3] to:
(1) detect the mime type of the attachment
(2) extract indexable content from the attachment (whatever its mime
type may be)
For (1) Tika supports detecting the 7-Zip mime type since version 1.2
[4]. For (2) judging by [5] it seems Tika also supports reading 7-ZIP
archives but there were some issues in 1.6 that have been fixed in
1.7. We are currently using Tika 1.6 in XWiki. We should probably
upgrade.
Hope this helps,
Marius
[1]
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
search-solr-
api/src/main/java/org/xwiki/search/solr/internal/metadata/AttachmentSolr
MetadataExtractor.java
[2]
http://extensions.xwiki.org/xwiki/bin/view/Extension/Component+Module
#HOverrides
[3]
https://github.com/xwiki/xwiki-platform/blob/master/xwiki-platform-
core/xwiki-platform-search/xwiki-platform-search-solr/xwiki-platform-
search-solr-
api/src/main/java/org/xwiki/search/solr/internal/metadata/AbstractSolrMet
adataExtractor.java#L458
[4]
https://issues.apache.org/jira/browse/TIKA-940
[5]
https://issues.apache.org/jira/browse/TIKA-1411
On Wed, Dec 10, 2014 at 9:20 PM, Arnold, Garth <arnold.g(a)ghc.org> wrote:
Hello - is it possible to enable searching of
additional filetypes within XWiki
6.2.4? Specifically I would like to be able to
search attachments that are 7-Zip
.7z archives. It looks to me as though the underlying library (Commons
Compress) supports this filetype, but I am a new XWiki user and non-java
programmer so I may be assuming too much.
>
> Thanks in advance for your thoughts on this -
>
> Garth Arnold
________________________________
GHC Confidentiality Statement
This message and any attached files might contain confidential information protected by
federal and state law. The information is intended only for the use of the individual(s)
or entities originally named as addressees. The improper disclosure of such information
may be subject to civil or criminal penalties. If this message reached you in error,
please contact the sender and destroy this message. Disclosing, copying, forwarding, or
distributing the information by unauthorized individuals or entities is strictly
prohibited by law.
_______________________________________________
users mailing list
users(a)xwiki.org