I was able to quickly integrate our lab environment’s Google Mini Appliance with xwiki today…

 

The set-up of the appliance was simple (after some experimentation on what to filter out to reduce redundancy and confusion);

 

1.       List of urls to crawl (e.g  http://hostname.domain:8080/xwiki )

2.       List of patterns to follow (e.g. hostname.domain:8080/xwiki)

3.       List of patterns to NOT crawl – I added to the default list  the following

a.       contains:?viewer=code

b.      contains:?format=rtf

c.       contains:?format=pdf

d.      contains:?tag=

e.      contains:?xpage=print

f.        contains:?rev=

 

and then let it start crawling…

 

I added the following to a page:

 

{image:Google.gif}

----

<!-- Search Google Appliance -->

<form method="get" action="http://googleappliance.domain/search">

  <table>

    <tr>

      <td>

        <input type="text" name="q" size="25" maxlength="255" value=""/>

        <input type="submit" name="btnG" value="Google Search"/>

        <input type="hidden" name="site" value="default_collection"/>

        <input type="hidden" name="client" value="default_frontend"/>

        <input type="hidden" name="proxystylesheet" value="default_frontend"/>

        <input type="hidden" name="output" value="xml_no_dtd"/>

      </td>

    </tr>

  </table>

</form>

<!-- Search Google Appliance-->

 

 

… and it was up and running with an inline Google Search. It indexed the pages and attachments quickly, and it’s a pleasure to use.

 

The Lucene engine will still be useful to us, because it permits filtering and understands the structure (spaces, authors) and we have adapted it for our reputation engine.

--

Jim Dowson

CTO, Global Services, EMC Corporation

Linx: (617) 598-0505