`
yawl
  • 浏览: 59534 次
最近访客 更多访客>>
社区版块
存档分类
最新评论

classifier算法优缺点

阅读更多
railsconf时,在oreilly展台见到一本'Programming Collective Intelligence'的书,其实是讲data mining的。比其他的教科书类书易懂的多。下面摘抄了一下有用的内容:

=knn=

+ new data can be added at any time--does not require any computation at all; the data is simply added to the set.

-  it requires all the trainning data to be present in order to make predictions. In a dataset with millions of examples, this is not just a space issue but also a time issue.

=svm=

+ after training they are very fast to classify new observations.

- black box technique. A SVM may give great answers, but you will never really know why.

- require retrainning if the data changes


=neural network=

+ allow incremental training and generally don't require a lot space to store the trained models.

- black box technique

=decision tree=

+ easy to interpret  trained model, brings important factors to the top of the tree.

-  Have to start from scartch each time (decision trees that support incremental training are an active area of research)

- tree can becomes extremely large and complex and would be slow to make classification.

=naive bayesian=



+ speed is good for training and querying, even with large data set

+ incremental

+ easy to interpret what the classifier has actually learned

- unable to deal with outcomes that change based on combinations of features.
分享到:
评论
4 楼 yawl 2008-11-07  
"SVM are some of the most accurate classifiers for text; no other kind of classifier has been known to outperform it across the board over a large number of document collections"

--Soumen Chakrabarti, 'Mining the web'
3 楼 coderplay 2008-11-06  
SVM有啥优点? 没整过, 只是知道算法是咋回事. 依算法描述来说它能找到几个类相隔的超平面, 找到对分类最有用的几个feature, 然后弄个核函数. 因此SVM对于高维数据分类来说很快,就算直接hash一样, 是不是?
2 楼 yawl 2008-10-31  
我现在在做sentiment analysis的,看这方面的paper大多以用SVM为主,倒不太常见提到knn做比较,可能和主要和分类的time-cost有关吧.
1 楼 coderplay 2008-10-27  
有conf参加真好! 果然通俗易懂.
kNN还有一点没提到,它是隋性的, 也就是说训练是在分类时做的. 训练时间为0, 分类的time-cost就大了些, 不过在文本分类中它是准确度最高的. 而且它几乎是机器学习中最简单的算法.

相关推荐

Global site tag (gtag.js) - Google Analytics