Niels,
do you have successful usages already?
Mahout is promising for sure but it seems quite at its debut.
One thing that seemed almost implementable is "preference matching":
Mahout taste:
But I did not find the time to use it.
paul
Le 07-janv.-10 à 23:10, Niels Mayer a écrit :
http://lucene.apache.org/mahout/
<http://lucene.apache.org/mahout/
Mahout's
goal is to build scalable
machine learning libraries. With scalable
we mean:
-
Scalable to reasonably large data sets. Our core algorithms for
clustering, classfication and batch based collaborative filtering
are
implemented on top of Apache Hadoop using the map/reduce paradigm.
However
we do not restrict contributions to Hadoop based implementations:
Contributions that run on a single node or on a non-Hadoop cluster
are
welcome as well. The core libraries are highly optimized to allow
for good
performance also for non-distributed algorithms.
http://www.manning.com/owen/
Mahout is a machine learning library. The algorithms it
implements fall
under the broad umbrella of “machine
learning,” or “collective intelligence.” This can mean many things,
but at
the moment for Mahout it means primarily
recommender engines, clustering, and classification.
It is scalable. It attempts to provide implementations that use
modern
frameworks for splitting huge
computations efficiently across many machines. Mahout aims to be the
machine
learning tool of choice when the
data to be processed is far too big for a single machine. In its
current
incarnation, these scalable implementations
are written in Java and built upon Apache's Hadoop project.
It is a Java library. It does not provide a user interface, a
pre-packaged server, or installer. It is a
framework of
tools intended to be used and adapted by developers. Mahout can be
deployed
to solve problems if you are
developing modern, intelligent applications or if you are a leading a
product team or startup that will leverage
machine learning to create a competitive advantage.
If you are a researcher in artificial intelligence, machine
learning and
related areas your biggest obstacle is
probably translating new algorithms into practice. Mahout provides a
fertile
framework for testing and deploying
new large-scale algorithms.
...
some example usage:
...
Recommender Engines
Recommender engines are perhaps the most immediately recognizable
machine
learning technique in use today.
We've all seen services or sites that attempt to recommend books or
movies
or articles based on our past actions.
They try to infer tastes and preferences and identify unknown items
that are
of interest:
Amazon.com is perhaps the most famous commerce site to deploy
recommendations. Based on purchases
•
and site activity, Amazon recommends books and other items
likely
to be of interest. See figure 1.1.
Netflix similarly recommends DVDs that may be of interest, and
famously offered a $1,000,000 prize to
•
researchers that could improve the quality of their
recommendations.
Social networking sites like Facebook use variants on
recommender
techniques to identify people most
•
likely to be an as-yet-unconnected friend.
....
Clustering
Clustering turns up in less obvious but equally well-known contexts.
As its
name implies, clustering techniques
attempt to group a large number of things together into clusters
that share
some similarity. It is a way to discover
hierarchy and order in a large or hard-to-understand data set, and
in that
way reveal interesting patterns or make
the data set easier to comprehend.
Google News groups news articles according to their topic
using
clustering techniques in order to present
•
news grouped by logical story, rather than a raw listing of
all
articles. Figure 1.2 below illustrates this.
Search engines like Clusty group search results for similar
reasons.
•
...
Classification
Classification techniques decide how much a thing is or isn't part
of some
type or category, or, does or doesn't
have some attribute. Classification is likewise ubiquitous though
even more
behind-the-scenes. Often these
systems “learn” by reviewing many instances of items of the
categories in
question in order to deduce classification
rules. This general idea finds many applications:
Yahoo! Mail decides whether incoming messages are spam, or
not,
based on prior emails and spam
•
reports from users, as well as characteristics of the e-mail
itself. A few messages classified as spam are
shown in figure 1.3.
Picasa (
http://picasa.google.com/) and other photo management
applications can decide when a region of
•
an image contains a human face.
Optical character recognition software classifies small
regions of
scanned text into individual characters by
•
classifying the small areas as individual characters.
Niels
http://nielsmayer.com
_______________________________________________
devs mailing list
devs(a)xwiki.org
http://lists.xwiki.org/mailman/listinfo/devs