What is it?
Apache Mahout is an Apache Hadoop-based library for machine learning. Mahout operates on the basis of the MapReduce algorithm, but it is not at all limited by it. The module contains a set of basic libraries which are also optimized in terms of undistributed algorithms.
The library, for the most part, draws on the Apache Hadoop system, which makes it possible to operate on large data sets. When data is stored in HDFS, Mahout is the right solution in the data science category. It can be used to automatically find patterns in those sets and to extrapolate suitable business value. According to the authors – the aim of the project is to ensure an easy path to converting big data into big information.
Mahout supports four main uses, but it is not at all limited to them:
searching content suited to individual users
for instance grouping similar documents in terms of subject or attributes
the system learns on the basis of existing documents with assigned categories and then it is able to allocate appropriate categories to new documents
Frequent itemset mining
makes it possible to find elements which usually appear together. For example, a shopping cart at an online shop, in which the algorithm searches for items which often appear together during one session and suggests them to the client.
Mahout may be used in parallel with other machine learning libraries (e.g. Apache Spark) and it has appropriate connectors to easily exchange data between systems.
What undoubtedly is an advantage is that the system operates on an Apache open source licence and it can be used in commercial products.
BlueSoft successfully uses the Apache Mahout technology at its clients representing such industries as financial, telecoms or life science, while our expertise allows us to fully utilize its possibilities.
Our company has ample experience in the realm of business analysis, which helps our clients choose appropriate issues that can be improved using machine learning algorithms. A team of experienced programmers deploys them while keeping the costs in check.
See other technologies, which we use in this area