I was able to quickly integrate our lab environment's Google Mini
Appliance with xwiki today...
The set-up of the appliance was simple (after some experimentation on
what to filter out to reduce redundancy and confusion);
1. List of urls to crawl (e.g
http://hostname.domain:8080/xwiki )
2. List of patterns to follow (e.g. hostname.domain:8080/xwiki)
3. List of patterns to NOT crawl - I added to the default list
the following
a. contains:?viewer=code
b. contains:?format=rtf
c. contains:?format=pdf
d. contains:?tag=
e. contains:?xpage=print
f. contains:?rev=
and then let it start crawling...
I added the following to a page:
{image:Google.gif}
----
<!-- Search Google Appliance -->
<form method="get"
action="http://googleappliance.domain/search">
<table>
<tr>
<td>
<input type="text" name="q" size="25"
maxlength="255" value=""/>
<input type="submit" name="btnG" value="Google
Search"/>
<input type="hidden" name="site"
value="default_collection"/>
<input type="hidden" name="client"
value="default_frontend"/>
<input type="hidden" name="proxystylesheet"
value="default_frontend"/>
<input type="hidden" name="output"
value="xml_no_dtd"/>
</td>
</tr>
</table>
</form>
<!-- Search Google Appliance-->
... and it was up and running with an inline Google Search. It indexed
the pages and attachments quickly, and it's a pleasure to use.
The Lucene engine will still be useful to us, because it permits
filtering and understands the structure (spaces, authors) and we have
adapted it for our reputation engine.
--
Jim Dowson
CTO, Global Services, EMC Corporation
Linx: (617) 598-0505