Hi Jim, thanks for the feedback.I was able to quickly integrate our lab environment's Google Mini Appliance with xwiki today…
The set-up of the appliance was simple (after some experimentation on what to filter out to reduce redundancy and confusion);
1. List of urls to crawl (e.g http://hostname.domain:8080/xwiki )
2. List of patterns to follow (e.g. hostname.domain:8080/xwiki)
3. List of patterns to NOT crawl – I added to the default list the following
a. contains:?viewer=code
b. contains:?format=rtf
c. contains:?format=pdf
d. contains:?tag=
e. contains:?xpage=print
f. contains:?rev=
I don't think there's a risk, but you may want to add "contains:delete", "contains:edit", "contains:inline" & "contains:?editor=" to your list... At worse it will make indexing faster.Guillaume