Apache Mahout
Vector
is instantiated and filled in for each object.import org.apache.hadoop.io.Text;
import org.apache.mahout.clustering.kmeans.Cluster;
...
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(
"output/clusters-2/part-r-00000"), conf);
Text text = new Text();
Cluster cluster = new Cluster();
while (reader.next(text, cluster)) {
System.out.println(text + " " + cluster);
}
reader.close();
In above case, each row of reading is (text, cluster). When we open the raw Hadoop output file with a text editor, we will see the text and cluster at the head of the file.
SEQorg.apache.hadoop.io.Text+org.apache.mahout.clustering.kmeans.Cluster...
An example of another case:
SEQorg.apache.hadoop.io.Text%org.apache.mahout.math.VectorWritable...
The Mahout project uses the Apache Maven build and release system.
There are many Maven plug-ins. We need install the correct one:
m2e - Maven Integration for Eclipse http://eclipse.org/m2e/
eclipse install Url: http://download.eclipse.org/technology/m2e/releases/
Within Eclipse, we need install the Subclipse
Install the Subclipse for the Eclipse
Eclipse install Url: http://subclipse.tigris.org/update_1.8.x
RA layer request failed
Configuring proxy setting of Eclipse preferences will not solve this problem.
We have to edit the file named “servers” that is stored in the Subversion runtime configuration area.
On Windows: Open the Run dialog and enter %APPDATA%and click OK.
On Linux: $ cd ~/.subversion/ $ vim servers
Uncomment http-proxy-host
and http-proxy-port
under the [global]
section.
Making sure that the edited lines are without any space:
http-proxy-host = www-proxy.xxxx.se
http-proxy-port = 8080
to
http-proxy-host=www-proxy.xxxx.se
http-proxy-port=8080
Otherwise, it gives a error message:
Malformed file
svn: C:\Users\xxxxxx\AppData\Roaming\Subversion\servers:144: Option expected
Checkout the laterest Mahout source code