|
Tutorial
This is a walk-through of steps to use JBoost. Complete documentation
of features can be found in the documentation
section. All the links are internal to this site and give
examples/documentation on how to do the specified task.
-
Download and install JBoost.
-
Decide which features you are going to use for your dataset.
Boosting, unlike many other machine learning methods, is fairly robust
to "useless" features. In other words: include as many features as
time permits.
-
Create an input file. Start by
creating a single file with all examples. Alternatively, you can use
one of the datafile examples in the
demo directory.
-
Now that you have one file with all examples, you can do one of two
things: 1) perform cross validation or 2)
separate the data file into a training set and a test set. For
preliminary analysis, a training and test set will be faster and may
help you decide how to add more features and tweak boosting parameters.
-
To assist you in tweaking parameters, JBoost comes with a variety of
visualization tools. The two
most important things to look at are: error and margin curves. In
addition, you should also look at the pdf/png of the ADTree for a
sanity check.
-
You eventually have to do cross
validation. This is one of the best pieces of evidence to show
that your classifier works.
-
At this point you have completed the above steps and you have a robust
classification system as demonstrated through cross validation. Typically, classifying an
instance that already had a label (as is the case in test/train sets
and CV analysis) isn't really helpful. Thus, JBoost provides a
mechanism for outputting the classifier
to enable classification independent of JBoost. This classifier can
incorporated into your own web server or software so that your
classifier can be used by others.
|