Re: [xwiki-devs] apache lucene mahout : for advanced xwiki "search" ?

7 Jan 2010

Niels,

do you have successful usages already?
Mahout is promising for sure but it seems quite at its debut.

One thing that seemed almost implementable is "preference matching":  
Mahout taste:
	http://lucene.apache.org/mahout/taste.html
But I did not find the time to use it.

paul

Le 07-janv.-10 à 23:10, Niels Mayer a écrit :

...
  http://lucene.apache.org/mahout/
<http://lucene.apache.org/mahout/ 
 Mahout's  goal is to build scalable
machine learning libraries. With scalable  
 we mean:

   -

   Scalable to reasonably large data sets. Our core algorithms for
   clustering, classfication and batch based collaborative filtering  
 are
   implemented on top of Apache Hadoop using the map/reduce paradigm.  
 However
   we do not restrict contributions to Hadoop based implementations:
   Contributions that run on a single node or on a non-Hadoop cluster  
 are
   welcome as well. The core libraries are highly optimized to allow  
 for good
   performance also for non-distributed algorithms.

 http://www.manning.com/owen/

    Mahout is a machine learning library. The algorithms it  
 implements fall
  under the broad umbrella of “machine 
 learning,” or “collective intelligence.” This can mean many things,  
 but at
  the moment for Mahout it means primarily 
 recommender engines, clustering, and classification.

    It is scalable. It attempts to provide implementations that use  
 modern
  frameworks for splitting huge 
 computations efficiently across many machines. Mahout aims to be the  
 machine
  learning tool of choice when the 
 data to be processed is far too big for a single machine. In its  
 current
  incarnation, these scalable implementations

 are written in Java and built upon Apache's Hadoop project.

    It is a Java library. It does not provide a user interface, a
  pre-packaged server, or installer. It is a
framework of 
 tools intended to be used and adapted by developers. Mahout can be  
 deployed
  to solve problems if you are 
 developing modern, intelligent applications or if you are a leading a
  product team or startup that will leverage

 machine learning to create a competitive advantage.

    If you are a researcher in artificial intelligence, machine  
 learning and
  related areas your biggest obstacle is 
 probably translating new algorithms into practice. Mahout provides a  
 fertile
  framework for testing and deploying 
 new large-scale algorithms.

 ...
 some example usage:
 ...

  Recommender Engines 
 Recommender engines are perhaps the most immediately recognizable  
 machine
  learning technique in use today. 
 We've all seen services or sites that attempt to recommend books or  
 movies
  or articles based on our past actions. 
 They try to infer tastes and preferences and identify unknown items  
 that are
  of interest: 
         Amazon.com is perhaps the most famous commerce site to deploy
  recommendations. Based on purchases 
    •

         and site activity, Amazon recommends books and other items  
 likely
  to be of interest. See figure 1.1. 
         Netflix similarly recommends DVDs that may be of interest, and
  famously offered a $1,000,000 prize to 
    •

         researchers that could improve the quality of their
  recommendations. 
         Social networking sites like Facebook use variants on  
 recommender
  techniques to identify people most 
    •

         likely to be an as-yet-unconnected friend.

 ....

  Clustering 
 Clustering turns up in less obvious but equally well-known contexts.  
 As its
  name implies, clustering techniques 
 attempt to group a large number of things together into clusters  
 that share
  some similarity. It is a way to discover 
 hierarchy and order in a large or hard-to-understand data set, and  
 in that
  way reveal interesting patterns or make 
 the data set easier to comprehend.

         Google News groups news articles according to their topic  
 using
  clustering techniques in order to present 
     •

         news grouped by logical story, rather than a raw listing of  
 all
  articles. Figure 1.2 below illustrates this.

         Search engines like Clusty group search results for similar
  reasons. 
     •

 ...

  Classification 
 Classification techniques decide how much a thing is or isn't part  
 of some
  type or category, or, does or doesn't 
 have some attribute. Classification is likewise ubiquitous though  
 even more
  behind-the-scenes. Often these 
 systems “learn” by reviewing many instances of items of the  
 categories in
  question in order to deduce classification

 rules. This general idea finds many applications:

          Yahoo! Mail decides whether incoming messages are spam, or  
 not,
  based on prior emails and spam 
     •

          reports from users, as well as characteristics of the e-mail
  itself. A few messages classified as spam are

          shown in figure 1.3.

          Picasa (http://picasa.google.com/) and other photo management
  applications can decide when a region of 
     •

          an image contains a human face.

          Optical character recognition software classifies small  
 regions of
  scanned text into individual characters by

     •

          classifying the small areas as individual characters.

 Niels
 http://nielsmayer.com
 _______________________________________________
 devs mailing list
 devs(a)xwiki.org
 http://lists.xwiki.org/mailman/listinfo/devs 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [xwiki-devs] apache lucene mahout : for advanced xwiki "search" ?