JBoost Logo
| Home | Downloads | Documentation | FAQ | Publications |

Tips

Machine learning classifiers can be very sensitive to data sets and parameters. The boosting methods in JBoost have been shown to be very robust in contrast to many other methods. However, even JBoost has limitations. To minimize the effect of these limitations, we have composed a set of useful tips.

  • Perform Cross Validation. A script nfold.py has been provided in the ./scripts directory. This will give the user a better idea of the generalization error of the classifier on their dataset.
  • Boost until test error asymptotes. It has been shown empirically and theoretically that boosting for more rounds is frequently better and does not result in overfitting. This is not universally true, so use the script error.py to see how many rounds seems appropriate. Note that you can do this while the program is running. To learn more, see error visualization tools
  • Look at the margin distribution. The script margin.py has been provided for visualizing margin distributions. See visualizing the margin for more details.
  • Consider asymmetric cost. If you have a dataset that is similar to needles in a haystack, consider using an asymmetric cost. That is, if it is acceptable to find a lot of hay to find a single needle, than be sure to tell JBoost this fact. Sophisticated methods are currently being implemented.
  • Margin is more important than classification. The margin can be considered the amount of confidence the boosting algorithm has in a prediction. Not all classifications are made with equal confidence, and the user can take advantage of this knowledge.
  • Use BrownBoost for noisy data. If your dataset has known outliers or generalization error of AdaBoost is poor, then BrownBoost may be able to give better results. For documentation, see the BrownBoost algorithm and using BrownBoost in JBoost.

Valid CSS! Valid HTML 4.01 Transitional SourceForge.net Logo

This page last modified Wednesday, 03-Jun-2009 21:45:33 UTC