Save and Load Partially Trained Classifier
JBoost has an ability to store a partially trained classifier and continue training it using new data. This ability of JBoost is useful in many cases such as when the dataset is too large to fit in memory, when one wants to extend a general classifier to a specialized classifier.
The overview of multisession training is the following. A classifier is trained with -serialTreeOutput option. By doing so, JBoost will also output the classification tree in Java Object format. This tree can then be used as input for -serialTreeInput in later training sessions.
When a serialized tree is given at the command line, JBoost allows users to specify a minimum weight threshold at which an example gets accepted, -weightThreshold. For instance, if -serialTreeInput example.tree and -weightThreshold 0.1 are given at the command line, JBoost will skip any examples with boosting weight less than 0.1 with respect to example.tree.
Make sure that your current directory is /demo. The following command
runs 5 iterations of AdaJBoost on spambase dataset. Since
-serialTreeOutput option is given, JBoost is going to output the
Java Object classification tree to
java jboost.controller.Controller -b Adaboost -S spambase_0 -t spambase.train -T spambase.test -n spambase.spec -serialTreeOutput spambase_iter_5.tree -numRounds 5
Now we decide to continue training using
java jboost.controller.Controller -b AdaBoost -S spambase_1 -t spambase.train -T spambase.test -n spambase.spec -serialTreeInput spambase_iter_5.tree -numRounds 5 -weightThreshold 0.1
Here is an excerpt from the output stream.
Read 2300 training examples
Read 2383 training examples in 1.095 seconds.
Reject 1298 training examples
Monitor log level: 2
Reading test data
read 100 test examples
read 200 test examples
You can see that JBoost rejects 1298 training examples with boosting weight less than 0.1. However, if you didn't want JBoost to reject any examples, you can do so by not using -weightThreshold option or, equivalently, by using -weightThreshold 0. In this case, using -weightThreshold 0.1 does not increase the test error at all.
Output from .info using -weightThreshold 0.1
iter bound train test
1 0.9341 0.1494 0.1065
2 0.8848 0.1318 0.1000
3 0.8575 0.1267 0.0957
4 0.8286 0.1125 0.0826
5 0.8007 0.1108 0.0804
Output from .info using -weightThreshold 0
iter bound train test
1 0.9566 0.0967 0.1065
2 0.9263 0.0853 0.1000
3 0.9083 0.0820 0.0957
4 0.8912 0.0731 0.0826
5 0.8769 0.0717 0.0804
This page last modified Thursday, 18-Jun-2009 03:10:41 UTC