Skip to content

Commit 7039851

Browse files
committed
adding docs svm results
1 parent b06ab16 commit 7039851

File tree

2 files changed

+50
-22
lines changed

2 files changed

+50
-22
lines changed

docs/examples/ml/kmeans/kmeans.md

Lines changed: 26 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The implementation details of k-means clustering in Twister2 is pictorially repr
1616
The constants which are used by the k-means algorithm to specify the number of workers, parallelism, dimension, size of datapoints,
1717
size of centroids, file system, number of iterations, datapoints and centroids directory.
1818

19-
```text
19+
```java
2020
public static final String WORKERS = "workers";
2121
public static final String DIMENSIONS = "dim";
2222
public static final String PARALLELISM_VALUE = "parallelism";
@@ -38,7 +38,7 @@ parses the command line parameters submitted by the user for running the K-Means
3838
It first sets the submitted variables in the JobConfig object and put the JobConfig object into the
3939
Twister2Job Builder, set the worker class (KMeansWorker.java in this example) and submit the job.
4040

41-
```text
41+
```java
4242
edu.iu.dsc.tws.examples.batch.kmeans.KMeansWorkerMain
4343
```
4444

@@ -58,7 +58,7 @@ The main functionality of the first task graph is to partition the data points,
5858
partitioned datapoints into two-dimensional array, and write the two-dimensional array into their
5959
respective task index values.
6060

61-
```text
61+
```java
6262
/* First Graph to partition and read the partitioned data points **/
6363
DataObjectSource dataObjectSource = new DataObjectSource(Context.TWISTER2_DIRECT_EDGE,
6464
dataDirectory);
@@ -70,7 +70,7 @@ respective task index values.
7070
First, add the source, compute, and sink tasks to the task graph builder for the first task graph.
7171
Then, create the communication edges between the tasks for the first task graph.
7272

73-
```text
73+
```java
7474
taskGraphBuilder.addSource("datapointsource", dataObjectSource, parallelismValue);
7575
ComputeConnection datapointComputeConnection = taskGraphBuilder.addCompute("datapointcompute",
7676
dataObjectCompute, parallelismValue);
@@ -87,7 +87,7 @@ Then, create the communication edges between the tasks for the first task graph.
8787

8888
Finally, invoke the taskGraphBuilder to build the first task graph, get the task schedule plan and execution plan for the first task graph, and call the execute() method to execute the datapoints task graph. Once the execution is finished, the output values are retrieved in the "datapointsObject".
8989

90-
```text
90+
```java
9191
//Build the first taskgraph
9292
DataFlowTaskGraph datapointsTaskGraph = taskGraphBuilder.build();
9393
//Get the execution plan for the first task graph
@@ -106,7 +106,7 @@ Finally, write the partitioned datapoints into their respective edges. The Local
106106
partition the datapoints based on the block whereas the LocalFixedInputPartitioner partition the
107107
datapoints based on the length of the file. For example, if the task parallelism is 4, if there are 16 data points each task will get 4 datapoints to process.
108108

109-
```text
109+
```java
110110
@Override
111111
public void prepare(Config cfg, TaskContext context) {
112112
super.prepare(cfg, context);
@@ -122,7 +122,7 @@ This class receives the partitioned datapoints as "IMessage" and convert those d
122122
two-dimensional for the k-means clustering process. The converted datapoints are send to the
123123
KMeansDataObjectDirectSink through "direct" edge.
124124

125-
```text
125+
```java
126126
while (((Iterator) message.getContent()).hasNext()) {
127127
String val = String.valueOf(((Iterator) message.getContent()).next());
128128
String[] data = val.split(",");
@@ -140,7 +140,7 @@ This class receives the message object from the DataObjectCompute and write into
140140
task index values. First, it store the iterator values into the array list then it convert the array
141141
list values into double array values.
142142

143-
```text
143+
```java
144144
@Override
145145
public boolean execute(IMessage message) {
146146
List<double[][]> values = new ArrayList<>();
@@ -158,7 +158,7 @@ list values into double array values.
158158
Finally, write the appropriate data points into their respective task index values with the entity
159159
partition values.
160160

161-
```text
161+
```java
162162
@Override
163163
public DataPartition<double[][]> get() {
164164
return new EntityPartition<>(context.taskIndex(), dataPointsLocal);
@@ -175,7 +175,7 @@ but, with one major difference of read the complete file as one partition.
175175
2. KMeansDataObjectCompute, and
176176
3. KMeansDataObjectDirectSink
177177

178-
```text
178+
```java
179179
DataFileReplicatedReadSource dataFileReplicatedReadSource = new DataFileReplicatedReadSource(
180180
Context.TWISTER2_DIRECT_EDGE, centroidDirectory);
181181
KMeansDataObjectCompute centroidObjectCompute = new KMeansDataObjectCompute(
@@ -185,7 +185,7 @@ but, with one major difference of read the complete file as one partition.
185185

186186
Similar to the first task graph, it add the source, compute, and sink tasks to the task graph builder for the second task graph. Then, create the communication edges between the tasks for the second task graph.
187187

188-
```text
188+
```java
189189
//Add source, compute, and sink tasks to the task graph builder for the second task graph
190190
taskGraphBuilder.addSource("centroidsource", dataFileReplicatedReadSource, parallelismValue);
191191
ComputeConnection centroidComputeConnection = taskGraphBuilder.addCompute("centroidcompute",
@@ -203,7 +203,7 @@ Similar to the first task graph, it add the source, compute, and sink tasks to t
203203

204204
Finally, invoke the build() method to build the second task graph, get the task schedule plan and execution plan for the second task graph, and call the execute() method to execute the centroids task graph. Once the execution is finished, the output values are retrieved in the "centroidsDataObject".
205205

206-
```text
206+
```java
207207
//Build the second taskgraph
208208
DataFlowTaskGraph centroidsTaskGraph = taskGraphBuilder.build();
209209
//Get the execution plan for the second task graph
@@ -221,7 +221,7 @@ This class uses the "LocalCompleteTextInputParitioner" to read the whole file fr
221221
directory and write into their task respective task index values using the "direct" task edge.
222222
For example, if the size of centroid value is 16, each task index receive 16 centroid values completely.
223223

224-
```text
224+
```java
225225
public void prepare(Config cfg, TaskContext context) {
226226
super.prepare(cfg, context);
227227
ExecutionRuntime runtime = (ExecutionRuntime) cfg.get(ExecutorContext.TWISTER2_RUNTIME_OBJECT);
@@ -236,7 +236,7 @@ The third task graph has the following classes namely KMeansSource, KMeansAllRed
236236
CentroidAggregator. Similar to the first and second task graph, first we have to add the source,
237237
sink, and communication edges to the third task graph.
238238

239-
```text
239+
```java
240240
/* Third Graph to do the actual calculation **/
241241
KMeansSourceTask kMeansSourceTask = new KMeansSourceTask();
242242
KMeansAllReduceTask kMeansAllReduceTask = new KMeansAllReduceTask();
@@ -259,7 +259,7 @@ The datapoint and centroid values are sent to the KMeansTaskGraph as "points" ob
259259
object as an input for further processing. Finally, it invokes the execute() method of the task
260260
executor to do the clustering process.
261261

262-
```text
262+
```java
263263
//Perform the iterations from 0 to 'n' number of iterations
264264
for (int i = 0; i < iterations; i++) {
265265
ExecutionPlan plan = taskExecutor.plan(kmeansTaskGraph);
@@ -280,7 +280,7 @@ This process repeats for ‘n’ number of iterations as specified by the user.
280280
new centroid value is calculated and the calculated value is distributed across all the task instances.
281281
At the end of every iteration, the centroid value is updated and the iteration continues with the new centroid value.
282282

283-
```text
283+
```java
284284
//retrieve the new centroid value for the next iterations
285285
centroidsDataObject = taskExecutor.getOutput(kmeansTaskGraph, plan, "kmeanssink");
286286
```
@@ -289,7 +289,7 @@ At the end of every iteration, the centroid value is updated and the iteration c
289289

290290
First, the execute method in KMeansJobSource retrieve the partitioned data points into their respective task index values and the complete centroid values into their respective task index values.
291291

292-
```text
292+
```java
293293
@Override
294294
public void execute() {
295295
int dim = Integer.parseInt(config.getStringValue("dim"));
@@ -302,13 +302,13 @@ First, the execute method in KMeansJobSource retrieve the partitioned data point
302302
```
303303
The retrieved data points and centroids are sent to the KMeansCalculator to perform the actual distance calculation using the Euclidean distance.
304304

305-
```text
305+
```java
306306
kMeansCalculator = new KMeansCalculator(datapoints, centroid, dim);
307307
double[][] kMeansCenters = kMeansCalculator.calculate();
308308
```
309309

310310
Finally, each task instance write their calculated centroids value as given below:
311-
```text
311+
```java
312312
context.writeEnd("all-reduce", kMeansCenters);
313313
}
314314
```
@@ -317,7 +317,7 @@ Finally, each task instance write their calculated centroids value as given belo
317317

318318
The KMeansAllReduceTask write the calculated centroid values of their partitioned datapoints into their respective task index values.
319319

320-
```text
320+
```java
321321
@Override
322322
public boolean execute(IMessage message) {
323323
LOG.log(Level.FINE, "Received centroids: " + context.getWorkerId()
@@ -343,13 +343,13 @@ The KMeansAllReduceTask write the calculated centroid values of their partitione
343343

344344
The CentroidAggregator implements the IFunction and the function OnMessage which accepts two objects as an argument.
345345

346-
```text
346+
```java
347347
public Object onMessage(Object object1, Object object2)
348348
```
349349

350350
It sums the corresponding centroid values and return the same.
351351

352-
```text
352+
```java
353353
ret.setCenters(newCentroids);
354354
```
355355

@@ -371,6 +371,7 @@ K-Means clustering process.
371371

372372
### Sample Output
373373

374+
```bash
374375
[2019-03-25 15:27:01 -0400] [INFO] [worker-0] [main] edu.iu.dsc.tws.examples.batch.kmeans.KMeansWorker:
375376
Final Centroids After 100 iterations [[0.2535406313735363, 0.25640515489554255],
376377
[0.7236140928643464, 0.7530306848028933], [0.7481226889281528, 0.24480221871888594],
@@ -389,3 +390,6 @@ Worker finished executing - 0
389390

390391
[2019-03-25 15:27:01 -0400] [INFO] [-] [JM] edu.iu.dsc.tws.master.server.JobMaster: All 2 workers have completed.
391392
JobMaster is stopping.
393+
394+
395+
```

docs/examples/ml/svm/svm.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,6 +350,23 @@ to convert data to the dense format.
350350
./bin/twister2 submit standalone jar examples/libexamples-java.jar edu.iu.dsc.tws.examples.ml.svm.SVMRunner -ram_mb 4096 -disk_gb 2 -instances 1 -alpha 0.1 -C 1.0 -exp_name test-svm -features 22 -samples 35000 -iterations 10 -training_data_dir <path-to-training-csv> -testing_data_dir <path-to-testing-csv> -parallelism 8 -workers 1 -cpus 1 -threads 4
351351
```
352352

353+
#### Sample Outpit
354+
355+
```bash
356+
======================================================================================
357+
SVM Task Summary : [test-svm]
358+
======================================================================================
359+
Training Dataset [/home/vibhatha/data/svm/w8a/training.csv]
360+
Testing Dataset [/home/vibhatha/data/svm/w8a/testing.csv]
361+
Data Loading Time (Training + Testing) = 1.943881115 s
362+
Training Time = 7.978291269 s
363+
Testing Time = 0.828260105 s
364+
Total Time (Data Loading Time + Training Time + Testing Time) = 10.750432489 s
365+
Accuracy of the Trained Model = 88.904494382 %
366+
======================================================================================
367+
368+
369+
```
353370

354371

355372
##Distributed SVM Batch Model - Tset Example
@@ -582,4 +599,11 @@ For that a simple map function can be plugged into the TSetLink.
582599

583600
```
584601

602+
#### Sample Output
603+
604+
```bash
605+
[2019-03-28 16:40:31 -0400] [INFO] [worker-0] [main] edu.iu.dsc.tws.examples.ml.svm.job.SvmSgdTsetRunner: Training Accuracy : 88.049368
606+
607+
```
608+
585609
###### Note make sure you have formatted the CSV files as instructed in the SVM Task Example.

0 commit comments

Comments
 (0)