I was able to quickly integrate our lab environment’s Google
Mini Appliance with xwiki today…
The set-up of the appliance was simple (after some
experimentation on what to filter out to reduce redundancy and confusion);
1. List
of urls to crawl (e.g http://hostname.domain:8080/xwiki
)
2. List
of patterns to follow (e.g. hostname.domain:8080/xwiki)
3. List
of patterns to NOT crawl – I added to the default list the
following
a. contains:?viewer=code
b. contains:?format=rtf
c. contains:?format=pdf
d. contains:?tag=
e. contains:?xpage=print
f.
contains:?rev=
and then let it start crawling…
I added the following to a page:
{image:Google.gif}
----
<!--
Search Google Appliance -->
<form
method="get" action="http://googleappliance.domain/search">
<table>
<tr>
<td>
<input type="text" name="q" size="25"
maxlength="255" value=""/>
<input type="submit" name="btnG" value="Google
Search"/>
<input type="hidden" name="site"
value="default_collection"/>
<input type="hidden" name="client"
value="default_frontend"/>
<input type="hidden" name="proxystylesheet"
value="default_frontend"/>
<input type="hidden" name="output"
value="xml_no_dtd"/>
</td>
</tr>
</table>
</form>
<!--
Search Google Appliance-->
… and it was up and running with an inline Google
Search. It indexed the pages and attachments quickly, and it’s a pleasure
to use.
The Lucene engine will still be useful to us, because it
permits filtering and understands the structure (spaces, authors) and we have
adapted it for our reputation engine.
--
Jim Dowson
CTO, Global Services, EMC Corporation
Linx: (617) 598-0505