Hi Jim, thanks for the feedback.
I was able to quickly integrate our lab environment's Google Mini Appliance with xwiki today…
The set-up of the appliance was simple (after some experimentation on what to filter out to reduce redundancy and confusion);
1. List of urls to crawl (e.g http://hostname.domain:8080/xwiki )
2. List of patterns to follow (e.g. hostname.domain:8080/xwiki)
3. List of patterns to NOT crawl – I added to the default list the following
a. contains:?viewer=code
b. contains:?format=rtf
c. contains:?format=pdf
d. contains:?tag=
e. contains:?xpage=print
f. contains:?rev=