1 Introduction
usual class in decision tree:ID3,C4.5,CART
ID3:/Informattion Entropy,基于信息熵和信息增益
C4.5:/信息增益率,base on the ID3
CART:/基尼系数,using regress or class
2 achieving
1.1 ID3 decision tree
D-training set,a-attribute
$input:a=\{a^{1},a^{2},...,a^{v}\}$
$output:Gain(D,a)$
model:
$p_{i}:the\ i\ sample\ take\ part\ in\ the\ D$
$Ent(D)=-\sum_{i=1}^{\|n\|}p_{i}log_{2}p_{i}$
$Ent(D|a)=\sum^{V}_{v=1}\frac{|D^{v}|}{D}Ent(D^{v})$
$Information\ Gain:Gain(D,a)=Ent(D)-Ent(D|a)$
chosing the max of vartex of Gain
1.2 C4.5 decision tree
defect of id3:when the class of sample are overmach,it's class less precison.
base on the Intrinsic Value
$Gain:Gain(D,a)=Ent(D)-Ent(D|a)$
$intrinsic\ value\ of\ a:$
$IV(a)=-\sum^{V}_{v=1}\frac{|D^{v}|}{|D|}log_{2}\frac{|D^{v}|}{|D|}$
$GainRatio(D,a)=\frac{Gain(D,a)}{IV(a)}$
1.3 CART decision tree
CART(Classification and regression tree),using the Gini index todevide sample.
sklearn model in 'python' using cart mathods
-Classification tree:aimed data divide or scatter
-Regression tree:aimed data continuous
$Gini(D)=-\sum^{|n|}_{i=1}\sum_{i'={i}}p_{i}^{i'}=1-\sum^{|n|}_{i=1}p^{2}_{i}$
$GiniIndex(D,a)=-\sum^{V}_{v=1}\frac{|D^{v}|}{|D|}Gini(D^{v})$
3 sample
T餐饮企业作为大型的连锁企业,生产的产品种类比较多,另外涉及的分店所处的位置也不同、数目比较多。对于企业的高层来讲,了解周末和非周末销量是否有大的区别,以及天气、促销活动等因素是否能够影响门店的销量,对采取合理的营销策略,提高企业利润非常重要。因此,为了让决策者准确地了解和销量有关的一系列影响因素,需要构建模型来分析天气、是否周末和是否有促销等活动对其销量的影响。各属性的取值如下:
4 code
...
5 problem
...
标签:frac,decision,tree,Decision,Tree,MT,Ent,Gain,sum From: https://www.cnblogs.com/TangBao111/p/17659559.html