1 Logistic 回归算法描述
工作原理:
为了实现 Logistic 回归分类器,可以在每个特征上都乘以一个回归系数,然后把所有结果的值相加,将这个总和带入 Sigmoid 函数中,进而得到一个范围在 0-1 之间的数值。任何大于0.5的数据被分入1类别,任何小于0.5的数据被分入0类别。Logistic 回归也可以被看成是一种概率估计。
2 梯度上升算法伪代码
(1)梯度上升算法伪代码:
每个回归系数初始化为1
重复R次:
计算整个数据集的梯度
使用 alpha*gradient 更新回归系数的向量
返回回归系数
(2)随机梯度上升算法伪代码:
梯度上升算法在每次更新回归系数时,都需要遍历整个数据集,计算的复杂度太高。一种改进方法是,一次仅用一个样本点来更新回归系数,该方法称为随机梯度上升算法。
由于可以在新样本到来时对分类器进行增量式更新,因而随机梯度上升算法是一个在线学习算法。一次处理所有数据被称为是“批处理”。
所有回归系数初始化为1
对数据集中的每个样本:
计算该样本的梯度
使用 alpha*gradient 更新回归系数值
返回回归系数
3 Wiki-最小二乘法、最大似然估计、Sigmoid函数、梯度上升法、梯度下降法、缺失值处理
(1)最小二乘法(LSE)
找到一个(组)估计值,使得实际值与估计值的距离最小。本来用两者差的绝对值汇总并使之最小是最理想的,但绝对值在数学上求最小值比较麻烦,因而替代做法是,找一个(组)估计值,使得实际值与估计值之差的平方加总之后的值最小,称为最小二乘。
“二乘”的英文为least square,其实英文的字面意思是“平方最小”。这时,将这个差的平方的和式对参数求导数,并取一阶导数为零,就是OLSE。
(2)最大似然估计(MLE)
现在已经拿到了很多个样本(数据集中所有因变量),这些样本值已经实现,最大似然估计就是去找到那个(组)参数估计值,使得前面已经实现的样本值发生概率最大。
因为你手头上的样本已经实现了,其发生概率最大才符合逻辑。这时是求样本所有观测的联合概率最大化,是个连乘积,只要取对数,就变成了线性加总。此时通过对参数求导数,并令一阶导数为零,就可以通过解方程(组),得到最大似然估计值。
利用已知的样本结果,反推最有可能(最大概率)导致这样结果的参数值。
(3)Sigmoid 函数
(4)梯度上升算法
最大化似然函数,就用梯度上升。
(5)梯度下降算法
最小化损失函数,就用梯度下降。
(6)缺失值处理
- 使用可用特征的均值来填补缺失值
- 使用特殊值来填补缺失值,例如-1
- 忽略有缺失值的样本
- 使用相似样本的均值填补缺失值
- 使用另外的机器学习算法预测缺失值
4 Logistic 回归的优点与缺点
(1)优点
计算代价不高,容易理解和实现
(2)缺点
容易欠拟合,分类精度可能不高
5 Python代码实现
(1)Logistic回归梯度上升优化算法
import numpy as np
def loadDataSet():
"""
打开文本文件,并逐行读取
:return:
"""
# 数据
dataMat = []
# 标签
labelMat = []
# 打开测试集文本
fr = open('testSet.txt')
# 逐行读取
for line in fr.readlines():
# 去掉 去读行 首尾 空格 ,并 以 空白 进行拆分
lineArr = line.strip().split()
# X0 X1 X2
dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])
# 标签
labelMat.append(int(lineArr[2]))
return dataMat, labelMat
def sigmoid(inX):
"""
sigmoid 函数
:param inX:
:return:
"""
return 1.0/(1 + np.exp(-inX))
# gradient 梯度
# ascent 上升
def gradAscent(dataMatIn, classLabels):
"""
:param dataMatIn:
:param classLabels:
:return:
"""
# 将 数据集 转换为 numpy 中的 matrix
dataMatrix = np.mat(dataMatIn)
# 将 分类标签 集 转换为 numpy 矩阵,并进行 转置(从 行向量 转换为 列向量)
labelMat = np.mat(classLabels).transpose()
# 获得 数据矩阵 的 形状
m, n = np.shape(dataMatrix)
# 向 目标 移动的步长
alpha = 0.001
# 迭代次数
maxCycles = 500
# 回归系数 矩阵(列向量) n*1
weights = np.ones((n, 1))
for k in range(maxCycles):
# h 为 列向量,列向量的元素个数为 样本个数
# m*n 矩阵【点乘】 n*1 矩阵,生成 m*1 矩阵
h = sigmoid(dataMatrix*weights)
# 计算 真是类别 与 预测类别 的 差值
# error 为 误差值的列向量
error = (labelMat - h)
# 按照 差值 方向 调整 回归系数
# dataMatrix.transpose()*error 为 n*m 矩阵 【点乘】 m*1 矩阵 结果为 n*1 矩阵
weights = weights + alpha*dataMatrix.transpose()*error
return weights
# result_weight = gradAscent(loadDataSet()[0], loadDataSet()[1])
# print(result_weight)
# print(type(result_weight))
(2)随机梯度上升算法
def stocGradAscent0(dataMatrix, classLabels):
m,n = shape(dataMatrix)
alpha = 0.01
# 1*n 的数组
weights = ones(n) #initialize to all ones
for i in range(m):
# dataMatrix[i]*weights 为 1*n 的数组 乘以 1*n 的数组,算术运算
# h 为一个数值
h = sigmoid(sum(dataMatrix[i]*weights))
error = classLabels[i] - h
weights = weights + alpha * error * dataMatrix[i]
return weights
(3)优化的随机梯度上升算法
# stochastic 随机的
def stocGradAscent1(dataMatrix, classLabels, numIter=150):
import random
dataMatrix = np.array(dataMatrix)
m, n = np.shape(dataMatrix)
# 行向量
weights = np.ones(n)
for j in range(numIter):
dataIndex = list(range(m))
for i in range(m):
# alpha 在每次迭代的时候都会调整,缓解数据波动或者高频波动
# 随着迭代次数不断减小,但永远不会减小到 0,以此保证新数据仍然具有一定影响
# 如果处理的问题是动态变化的,那么可以适当加大常数项,以此来确保新的值获得更大的回归系数
# j 是 迭代次数, i 是 样本点的下标
alpha = 4/(1.0 + j + i) + 0.01
# 从 数据集 中 随机选择样本
randIndex = int(random.uniform(0, len(dataIndex)))
# 数值
h = sigmoid(sum(dataMatrix[dataIndex[randIndex]]*weights))
# 数值
error = classLabels[dataIndex[randIndex]] - h
weights = weights + alpha * error * dataMatrix[dataIndex[randIndex]]
del(dataIndex[randIndex])
return weights
# result_weight = stocGradAscent1(loadDataSet()[0], loadDataSet()[1])
# print(result_weight)
# plotBestFit(result_weight)
6 示例:从疝气病症预测病马的死亡率
"""
使用 Logistic 回归估计 马伤病死亡率
1、收集数据:
给定数据文件
2、准备数据:
用 Python 解析文本并填充缺失值
3、分析数据:
可视化,并观察数据
4、训练算法:
使用优化算法,找到最佳的系数
5、测试算法:
为了量化回归的效果,需要观察错误率。
根据错误率决定是否回退到训练阶段,通过改变迭代的次数和步长等参数得到更好的回归系数
6、使用算法:
实现一个简单的命令行程序来收集马的症状并输出预测结果很容易
说明:
数据集中有 30% 的数据时缺失的
解决数据缺失的方法:
1、使用可用特征的均值来填补缺失值
2、使用特殊值来填补缺失值
3、忽略有缺失值的样本
4、使用相似样本的均值来填补缺失值
5、使用其他的机器学习算法来预测缺失值
"""
from LogisticRegres import *
def classifyVector(inX, weights):
"""
:param inX: 特征向量
:param weights: 回归系数
:return: 分类结果
"""
prob = sigmoid(sum(inX*weights))
if prob > 0.5:
return 1.0
else:
return 0.0
def colicTest():
frTrain = open('horseColicTraining.txt')
frTest = open('horseColicTest.txt')
traingingSet = []
traingingLabels = []
for line in frTrain.readlines():
currLine = line.strip().split()
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
traingingSet.append(lineArr)
traingingLabels.append(float(currLine[21]))
traingingWeights = stocGradAscent1(np.array(traingingSet), traingingLabels, 500)
errorCount = 0
numTestVec = 0.0
for line in frTest.readlines():
numTestVec += 1.0
currLine = line.strip().split()
lineArr = []
for i in range(21):
lineArr.append(float(currLine[i]))
if int(classifyVector(np.array(lineArr), traingingWeights)) != int(currLine[21]):
errorCount += 1
errorRate = float(errorCount)/numTestVec
print("the error rate is %f" % errorRate)
return errorRate
def multiTest():
numTests = 10
errorSum = 0.0
for k in range(numTests):
errorSum += colicTest()
print("after {} iterations the average error rate is : {}".format(numTests, errorSum/float(numTests)))
multiTest()
"""
Logistic 回归的目的是寻找一个非线性函数 Sigmoid 的最佳拟合参数,求解过程可以由最优化算法来完成
随机梯度上升算法 与 梯度上升算法 效果相当,但占用更少的计算资源
随机梯度上升算法是一个在线算法,可以在新数据到来时完成参数更新,而不需要重新读取整个数据集来进行运算
"""
7 使用 pandas 和 scikit-learn 实现书上的例子
In [2]:
import pandas as pd
import numpy as np
from pandas import Series, DataFrame
np.set_printoptions(precision=4)
×
…
In [11]:
col_names = []
for i in range(21):
col_names.append('feature_{}'.format(i))
# print(col_list)
train_dataSet_df = pd.read_table('horseColicTraining.txt', names=col_names+['label'])
train_dataSet_df
×
Out[11]:
feature_0 | feature_1 | feature_2 | feature_3 | feature_4 | feature_5 | feature_6 | feature_7 | feature_8 | feature_9 | … | feature_12 | feature_13 | feature_14 | feature_15 | feature_16 | feature_17 | feature_18 | feature_19 | feature_20 | label | |
0 | 2.0 | 1.0 | 38.5 | 66.0 | 28.0 | 3.0 | 3.0 | 0.0 | 2.0 | 5.0 | … | 0.0 | 0.0 | 0.0 | 3.0 | 5.0 | 45.0 | 8.4 | 0.0 | 0.0 | 0.0 |
1 | 1.0 | 1.0 | 39.2 | 88.0 | 20.0 | 0.0 | 0.0 | 4.0 | 1.0 | 3.0 | … | 0.0 | 0.0 | 0.0 | 4.0 | 2.0 | 50.0 | 85.0 | 2.0 | 2.0 | 0.0 |
2 | 2.0 | 1.0 | 38.3 | 40.0 | 24.0 | 1.0 | 1.0 | 3.0 | 1.0 | 3.0 | … | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 33.0 | 6.7 | 0.0 | 0.0 | 1.0 |
3 | 1.0 | 9.0 | 39.1 | 164.0 | 84.0 | 4.0 | 1.0 | 6.0 | 2.0 | 2.0 | … | 1.0 | 2.0 | 5.0 | 3.0 | 0.0 | 48.0 | 7.2 | 3.0 | 5.3 | 0.0 |
4 | 2.0 | 1.0 | 37.3 | 104.0 | 35.0 | 0.0 | 0.0 | 6.0 | 2.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 74.0 | 7.4 | 0.0 | 0.0 | 0.0 |
5 | 2.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 1.0 | 3.0 | 1.0 | 2.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 3.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
6 | 1.0 | 1.0 | 37.9 | 48.0 | 16.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 1.0 | 1.0 | 0.0 | 3.0 | 5.0 | 37.0 | 7.0 | 0.0 | 0.0 | 1.0 |
7 | 1.0 | 1.0 | 0.0 | 60.0 | 0.0 | 3.0 | 0.0 | 0.0 | 1.0 | 0.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 4.0 | 44.0 | 8.3 | 0.0 | 0.0 | 0.0 |
8 | 2.0 | 1.0 | 0.0 | 80.0 | 36.0 | 3.0 | 4.0 | 3.0 | 1.0 | 4.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 5.0 | 38.0 | 6.2 | 0.0 | 0.0 | 0.0 |
9 | 2.0 | 9.0 | 38.3 | 90.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | 5.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 0.0 | 40.0 | 6.2 | 1.0 | 2.2 | 1.0 |
10 | 1.0 | 1.0 | 38.1 | 66.0 | 12.0 | 3.0 | 3.0 | 5.0 | 1.0 | 3.0 | … | 2.0 | 1.0 | 3.0 | 2.0 | 5.0 | 44.0 | 6.0 | 2.0 | 3.6 | 1.0 |
11 | 2.0 | 1.0 | 39.1 | 72.0 | 52.0 | 2.0 | 0.0 | 2.0 | 1.0 | 2.0 | … | 1.0 | 1.0 | 0.0 | 4.0 | 4.0 | 50.0 | 7.8 | 0.0 | 0.0 | 1.0 |
12 | 1.0 | 1.0 | 37.2 | 42.0 | 12.0 | 2.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 3.0 | 1.0 | 0.0 | 4.0 | 5.0 | 0.0 | 7.0 | 0.0 | 0.0 | 1.0 |
13 | 2.0 | 9.0 | 38.0 | 92.0 | 28.0 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | … | 3.0 | 0.0 | 7.2 | 1.0 | 1.0 | 37.0 | 6.1 | 1.0 | 0.0 | 0.0 |
14 | 1.0 | 1.0 | 38.2 | 76.0 | 28.0 | 3.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 2.0 | 2.0 | 0.0 | 4.0 | 4.0 | 46.0 | 81.0 | 1.0 | 2.0 | 1.0 |
15 | 1.0 | 1.0 | 37.6 | 96.0 | 48.0 | 3.0 | 1.0 | 4.0 | 1.0 | 5.0 | … | 2.0 | 3.0 | 4.5 | 4.0 | 0.0 | 45.0 | 6.8 | 0.0 | 0.0 | 0.0 |
16 | 1.0 | 9.0 | 0.0 | 128.0 | 36.0 | 3.0 | 3.0 | 4.0 | 2.0 | 4.0 | … | 3.0 | 0.0 | 0.0 | 4.0 | 5.0 | 53.0 | 7.8 | 3.0 | 4.7 | 0.0 |
17 | 2.0 | 1.0 | 37.5 | 48.0 | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
18 | 1.0 | 1.0 | 37.6 | 64.0 | 21.0 | 1.0 | 1.0 | 2.0 | 1.0 | 2.0 | … | 1.0 | 1.0 | 0.0 | 2.0 | 5.0 | 40.0 | 7.0 | 1.0 | 0.0 | 1.0 |
19 | 2.0 | 1.0 | 39.4 | 110.0 | 35.0 | 4.0 | 3.0 | 6.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 55.0 | 8.7 | 0.0 | 0.0 | 1.0 |
20 | 1.0 | 1.0 | 39.9 | 72.0 | 60.0 | 1.0 | 1.0 | 5.0 | 2.0 | 5.0 | … | 3.0 | 1.0 | 0.0 | 4.0 | 4.0 | 46.0 | 6.1 | 2.0 | 0.0 | 1.0 |
21 | 2.0 | 1.0 | 38.4 | 48.0 | 16.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | … | 2.0 | 3.0 | 5.5 | 4.0 | 3.0 | 49.0 | 6.8 | 0.0 | 0.0 | 1.0 |
22 | 1.0 | 1.0 | 38.6 | 42.0 | 34.0 | 2.0 | 1.0 | 4.0 | 0.0 | 2.0 | … | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 48.0 | 7.2 | 0.0 | 0.0 | 1.0 |
23 | 1.0 | 9.0 | 38.3 | 130.0 | 60.0 | 0.0 | 3.0 | 0.0 | 1.0 | 2.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 50.0 | 70.0 | 0.0 | 0.0 | 1.0 |
24 | 1.0 | 1.0 | 38.1 | 60.0 | 12.0 | 3.0 | 3.0 | 3.0 | 1.0 | 0.0 | … | 3.0 | 2.0 | 2.0 | 0.0 | 0.0 | 51.0 | 65.0 | 0.0 | 0.0 | 1.0 |
25 | 2.0 | 1.0 | 37.8 | 60.0 | 42.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
26 | 1.0 | 1.0 | 38.3 | 72.0 | 30.0 | 4.0 | 3.0 | 3.0 | 2.0 | 3.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 5.0 | 43.0 | 7.0 | 2.0 | 3.9 | 1.0 |
27 | 1.0 | 1.0 | 37.8 | 48.0 | 12.0 | 3.0 | 1.0 | 1.0 | 1.0 | 0.0 | … | 1.0 | 1.0 | 0.0 | 1.0 | 3.0 | 37.0 | 5.5 | 2.0 | 1.3 | 1.0 |
28 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
29 | 2.0 | 1.0 | 37.7 | 48.0 | 0.0 | 2.0 | 1.0 | 1.0 | 1.0 | 1.0 | … | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 45.0 | 76.0 | 0.0 | 0.0 | 1.0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
269 | 1.0 | 1.0 | 37.5 | 60.0 | 50.0 | 3.0 | 3.0 | 1.0 | 1.0 | 3.0 | … | 2.0 | 2.0 | 3.5 | 3.0 | 4.0 | 35.0 | 6.5 | 0.0 | 0.0 | 0.0 |
270 | 1.0 | 1.0 | 37.7 | 80.0 | 0.0 | 3.0 | 3.0 | 6.0 | 1.0 | 5.0 | … | 2.0 | 3.0 | 0.0 | 3.0 | 1.0 | 50.0 | 55.0 | 3.0 | 2.0 | 1.0 |
271 | 1.0 | 1.0 | 0.0 | 100.0 | 30.0 | 3.0 | 3.0 | 4.0 | 2.0 | 5.0 | … | 3.0 | 3.0 | 0.0 | 4.0 | 4.0 | 52.0 | 6.6 | 0.0 | 0.0 | 1.0 |
272 | 1.0 | 1.0 | 37.7 | 120.0 | 28.0 | 3.0 | 3.0 | 3.0 | 1.0 | 5.0 | … | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 65.0 | 7.0 | 3.0 | 0.0 | 0.0 |
273 | 1.0 | 1.0 | 0.0 | 76.0 | 0.0 | 0.0 | 3.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
274 | 1.0 | 9.0 | 38.8 | 150.0 | 50.0 | 1.0 | 3.0 | 6.0 | 2.0 | 5.0 | … | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 50.0 | 6.2 | 0.0 | 0.0 | 0.0 |
275 | 1.0 | 1.0 | 38.0 | 36.0 | 16.0 | 3.0 | 1.0 | 1.0 | 1.0 | 4.0 | … | 3.0 | 3.0 | 2.0 | 3.0 | 0.0 | 37.0 | 75.0 | 2.0 | 1.0 | 0.0 |
276 | 2.0 | 1.0 | 36.9 | 50.0 | 40.0 | 2.0 | 3.0 | 3.0 | 1.0 | 1.0 | … | 3.0 | 1.0 | 7.0 | 0.0 | 0.0 | 37.5 | 6.5 | 0.0 | 0.0 | 1.0 |
277 | 2.0 | 1.0 | 37.8 | 40.0 | 16.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 37.0 | 6.8 | 0.0 | 0.0 | 1.0 |
278 | 2.0 | 1.0 | 38.2 | 56.0 | 40.0 | 4.0 | 3.0 | 1.0 | 1.0 | 2.0 | … | 2.0 | 2.0 | 7.5 | 0.0 | 0.0 | 47.0 | 7.2 | 1.0 | 2.5 | 1.0 |
279 | 1.0 | 1.0 | 38.6 | 48.0 | 12.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 36.0 | 67.0 | 0.0 | 0.0 | 1.0 |
280 | 2.0 | 1.0 | 40.0 | 78.0 | 0.0 | 3.0 | 3.0 | 5.0 | 1.0 | 2.0 | … | 1.0 | 1.0 | 0.0 | 4.0 | 1.0 | 66.0 | 6.5 | 0.0 | 0.0 | 0.0 |
281 | 1.0 | 1.0 | 0.0 | 70.0 | 16.0 | 3.0 | 4.0 | 5.0 | 2.0 | 2.0 | … | 2.0 | 1.0 | 0.0 | 4.0 | 5.0 | 60.0 | 7.5 | 0.0 | 0.0 | 0.0 |
282 | 1.0 | 1.0 | 38.2 | 72.0 | 18.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 35.0 | 6.4 | 0.0 | 0.0 | 1.0 |
283 | 2.0 | 1.0 | 38.5 | 54.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 2.0 | 1.0 | 0.0 | 1.0 | 0.0 | 40.0 | 6.8 | 2.0 | 7.0 | 1.0 |
284 | 1.0 | 1.0 | 38.5 | 66.0 | 24.0 | 1.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 2.0 | 1.0 | 0.0 | 4.0 | 5.0 | 40.0 | 6.7 | 1.0 | 0.0 | 1.0 |
285 | 2.0 | 1.0 | 37.8 | 82.0 | 12.0 | 3.0 | 1.0 | 1.0 | 2.0 | 4.0 | … | 1.0 | 3.0 | 0.0 | 0.0 | 0.0 | 50.0 | 7.0 | 0.0 | 0.0 | 0.0 |
286 | 2.0 | 9.0 | 39.5 | 84.0 | 30.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 28.0 | 5.0 | 0.0 | 0.0 | 1.0 |
287 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
288 | 1.0 | 1.0 | 38.0 | 50.0 | 36.0 | 0.0 | 1.0 | 1.0 | 1.0 | 3.0 | … | 0.0 | 0.0 | 0.0 | 3.0 | 0.0 | 39.0 | 6.6 | 1.0 | 5.3 | 1.0 |
289 | 2.0 | 1.0 | 38.6 | 45.0 | 16.0 | 2.0 | 1.0 | 2.0 | 1.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 43.0 | 58.0 | 0.0 | 0.0 | 1.0 |
290 | 1.0 | 1.0 | 38.9 | 80.0 | 44.0 | 3.0 | 3.0 | 3.0 | 1.0 | 2.0 | … | 2.0 | 2.0 | 7.0 | 3.0 | 1.0 | 54.0 | 6.5 | 3.0 | 0.0 | 0.0 |
291 | 1.0 | 1.0 | 37.0 | 66.0 | 20.0 | 1.0 | 3.0 | 2.0 | 1.0 | 4.0 | … | 1.0 | 0.0 | 0.0 | 1.0 | 5.0 | 35.0 | 6.9 | 2.0 | 0.0 | 0.0 |
292 | 1.0 | 1.0 | 0.0 | 78.0 | 24.0 | 3.0 | 3.0 | 3.0 | 1.0 | 0.0 | … | 2.0 | 1.0 | 0.0 | 0.0 | 4.0 | 43.0 | 62.0 | 0.0 | 2.0 | 0.0 |
293 | 2.0 | 1.0 | 38.5 | 40.0 | 16.0 | 1.0 | 1.0 | 1.0 | 1.0 | 2.0 | … | 0.0 | 0.0 | 0.0 | 3.0 | 2.0 | 37.0 | 67.0 | 0.0 | 0.0 | 1.0 |
294 | 1.0 | 1.0 | 0.0 | 120.0 | 70.0 | 4.0 | 0.0 | 4.0 | 2.0 | 2.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 5.0 | 55.0 | 65.0 | 0.0 | 0.0 | 0.0 |
295 | 2.0 | 1.0 | 37.2 | 72.0 | 24.0 | 3.0 | 2.0 | 4.0 | 2.0 | 4.0 | … | 3.0 | 1.0 | 0.0 | 4.0 | 4.0 | 44.0 | 0.0 | 3.0 | 3.3 | 0.0 |
296 | 1.0 | 1.0 | 37.5 | 72.0 | 30.0 | 4.0 | 3.0 | 4.0 | 1.0 | 4.0 | … | 2.0 | 1.0 | 0.0 | 3.0 | 5.0 | 60.0 | 6.8 | 0.0 | 0.0 | 0.0 |
297 | 1.0 | 1.0 | 36.5 | 100.0 | 24.0 | 3.0 | 3.0 | 3.0 | 1.0 | 3.0 | … | 3.0 | 1.0 | 0.0 | 4.0 | 4.0 | 50.0 | 6.0 | 3.0 | 3.4 | 1.0 |
298 | 1.0 | 1.0 | 37.2 | 40.0 | 20.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 4.0 | 1.0 | 36.0 | 62.0 | 1.0 | 1.0 | 0.0 |
299 rows × 22 columns
…
In [12]:
x
test_dataSet_df = pd.read_table('horseColicTest.txt', names=col_names+['label'])
test_dataSet_df
×
Out[12]:
feature_0 | feature_1 | feature_2 | feature_3 | feature_4 | feature_5 | feature_6 | feature_7 | feature_8 | feature_9 | … | feature_12 | feature_13 | feature_14 | feature_15 | feature_16 | feature_17 | feature_18 | feature_19 | feature_20 | label | |
0 | 2 | 1 | 38.5 | 54 | 20 | 0 | 1 | 2 | 2 | 3 | … | 2 | 2 | 5.9 | 0 | 2 | 42.0 | 6.3 | 0 | 0.0 | 1 |
1 | 2 | 1 | 37.6 | 48 | 36 | 0 | 0 | 1 | 1 | 0 | … | 0 | 0 | 0.0 | 0 | 0 | 44.0 | 6.3 | 1 | 5.0 | 1 |
2 | 1 | 1 | 37.7 | 44 | 28 | 0 | 4 | 3 | 2 | 5 | … | 1 | 1 | 0.0 | 3 | 5 | 45.0 | 70.0 | 3 | 2.0 | 1 |
3 | 1 | 1 | 37.0 | 56 | 24 | 3 | 1 | 4 | 2 | 4 | … | 1 | 1 | 0.0 | 0 | 0 | 35.0 | 61.0 | 3 | 2.0 | 0 |
4 | 2 | 1 | 38.0 | 42 | 12 | 3 | 0 | 3 | 1 | 1 | … | 0 | 0 | 0.0 | 0 | 2 | 37.0 | 5.8 | 0 | 0.0 | 1 |
5 | 1 | 1 | 0.0 | 60 | 40 | 3 | 0 | 1 | 1 | 0 | … | 3 | 2 | 0.0 | 0 | 5 | 42.0 | 72.0 | 0 | 0.0 | 1 |
6 | 2 | 1 | 38.4 | 80 | 60 | 3 | 2 | 2 | 1 | 3 | … | 2 | 2 | 0.0 | 1 | 1 | 54.0 | 6.9 | 0 | 0.0 | 1 |
7 | 2 | 1 | 37.8 | 48 | 12 | 2 | 1 | 2 | 1 | 3 | … | 2 | 0 | 0.0 | 2 | 0 | 48.0 | 7.3 | 1 | 0.0 | 1 |
8 | 2 | 1 | 37.9 | 45 | 36 | 3 | 3 | 3 | 2 | 2 | … | 2 | 1 | 0.0 | 3 | 0 | 33.0 | 5.7 | 3 | 0.0 | 1 |
9 | 2 | 1 | 39.0 | 84 | 12 | 3 | 1 | 5 | 1 | 2 | … | 1 | 2 | 7.0 | 0 | 4 | 62.0 | 5.9 | 2 | 2.2 | 0 |
10 | 2 | 1 | 38.2 | 60 | 24 | 3 | 1 | 3 | 2 | 3 | … | 3 | 3 | 0.0 | 4 | 4 | 53.0 | 7.5 | 2 | 1.4 | 1 |
11 | 1 | 1 | 0.0 | 140 | 0 | 0 | 0 | 4 | 2 | 5 | … | 1 | 1 | 0.0 | 0 | 5 | 30.0 | 69.0 | 0 | 0.0 | 0 |
12 | 1 | 1 | 37.9 | 120 | 60 | 3 | 3 | 3 | 1 | 5 | … | 2 | 2 | 7.5 | 4 | 5 | 52.0 | 6.6 | 3 | 1.8 | 0 |
13 | 2 | 1 | 38.0 | 72 | 36 | 1 | 1 | 3 | 1 | 3 | … | 2 | 1 | 0.0 | 3 | 5 | 38.0 | 6.8 | 2 | 2.0 | 1 |
14 | 2 | 9 | 38.0 | 92 | 28 | 1 | 1 | 2 | 1 | 1 | … | 3 | 0 | 7.2 | 0 | 0 | 37.0 | 6.1 | 1 | 1.1 | 1 |
15 | 1 | 1 | 38.3 | 66 | 30 | 2 | 3 | 1 | 1 | 2 | … | 3 | 2 | 8.5 | 4 | 5 | 37.0 | 6.0 | 0 | 0.0 | 1 |
16 | 2 | 1 | 37.5 | 48 | 24 | 3 | 1 | 1 | 1 | 2 | … | 1 | 1 | 0.0 | 3 | 2 | 43.0 | 6.0 | 1 | 2.8 | 1 |
17 | 1 | 1 | 37.5 | 88 | 20 | 2 | 3 | 3 | 1 | 4 | … | 0 | 0 | 0.0 | 0 | 0 | 35.0 | 6.4 | 1 | 0.0 | 0 |
18 | 2 | 9 | 0.0 | 150 | 60 | 4 | 4 | 4 | 2 | 5 | … | 0 | 0 | 0.0 | 0 | 0 | 0.0 | 0.0 | 0 | 0.0 | 0 |
19 | 1 | 1 | 39.7 | 100 | 30 | 0 | 0 | 6 | 2 | 4 | … | 1 | 0 | 0.0 | 4 | 5 | 65.0 | 75.0 | 0 | 0.0 | 0 |
20 | 1 | 1 | 38.3 | 80 | 0 | 3 | 3 | 4 | 2 | 5 | … | 2 | 1 | 0.0 | 4 | 4 | 45.0 | 7.5 | 2 | 4.6 | 1 |
21 | 2 | 1 | 37.5 | 40 | 32 | 3 | 1 | 3 | 1 | 3 | … | 2 | 1 | 0.0 | 0 | 5 | 32.0 | 6.4 | 1 | 1.1 | 1 |
22 | 1 | 1 | 38.4 | 84 | 30 | 3 | 1 | 5 | 2 | 4 | … | 2 | 3 | 6.5 | 4 | 4 | 47.0 | 7.5 | 3 | 0.0 | 0 |
23 | 1 | 1 | 38.1 | 84 | 44 | 4 | 0 | 4 | 2 | 5 | … | 1 | 3 | 5.0 | 0 | 4 | 60.0 | 6.8 | 0 | 5.7 | 0 |
24 | 2 | 1 | 38.7 | 52 | 0 | 1 | 1 | 1 | 1 | 1 | … | 0 | 0 | 0.0 | 1 | 3 | 4.0 | 74.0 | 0 | 0.0 | 1 |
25 | 2 | 1 | 38.1 | 44 | 40 | 2 | 1 | 3 | 1 | 3 | … | 0 | 0 | 0.0 | 1 | 3 | 35.0 | 6.8 | 0 | 0.0 | 1 |
26 | 2 | 1 | 38.4 | 52 | 20 | 2 | 1 | 3 | 1 | 1 | … | 2 | 1 | 0.0 | 3 | 5 | 41.0 | 63.0 | 1 | 1.0 | 1 |
27 | 1 | 1 | 38.2 | 60 | 0 | 1 | 0 | 3 | 1 | 2 | … | 1 | 1 | 0.0 | 4 | 4 | 43.0 | 6.2 | 2 | 3.9 | 1 |
28 | 2 | 1 | 37.7 | 40 | 18 | 1 | 1 | 1 | 0 | 3 | … | 1 | 1 | 0.0 | 3 | 3 | 36.0 | 3.5 | 0 | 0.0 | 1 |
29 | 1 | 1 | 39.1 | 60 | 10 | 0 | 1 | 1 | 0 | 2 | … | 0 | 0 | 0.0 | 4 | 4 | 0.0 | 0.0 | 0 | 0.0 | 1 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
37 | 2 | 1 | 37.5 | 44 | 0 | 1 | 1 | 1 | 1 | 3 | … | 0 | 0 | 0.0 | 0 | 0 | 45.0 | 5.8 | 2 | 1.4 | 1 |
38 | 2 | 1 | 38.2 | 42 | 16 | 1 | 1 | 3 | 1 | 1 | … | 0 | 0 | 0.0 | 1 | 0 | 35.0 | 60.0 | 1 | 1.0 | 1 |
39 | 2 | 1 | 38.0 | 56 | 44 | 3 | 3 | 3 | 0 | 0 | … | 2 | 1 | 0.0 | 4 | 0 | 47.0 | 70.0 | 2 | 1.0 | 1 |
40 | 2 | 1 | 38.3 | 45 | 20 | 3 | 3 | 2 | 2 | 2 | … | 2 | 0 | 0.0 | 4 | 0 | 0.0 | 0.0 | 0 | 0.0 | 1 |
41 | 1 | 1 | 0.0 | 48 | 96 | 1 | 1 | 3 | 1 | 0 | … | 2 | 1 | 0.0 | 1 | 4 | 42.0 | 8.0 | 1 | 0.0 | 1 |
42 | 1 | 1 | 37.7 | 55 | 28 | 2 | 1 | 2 | 1 | 2 | … | 0 | 3 | 5.0 | 4 | 5 | 0.0 | 0.0 | 0 | 0.0 | 1 |
43 | 2 | 1 | 36.0 | 100 | 20 | 4 | 3 | 6 | 2 | 2 | … | 1 | 1 | 0.0 | 4 | 5 | 74.0 | 5.7 | 2 | 2.5 | 0 |
44 | 1 | 1 | 37.1 | 60 | 20 | 2 | 0 | 4 | 1 | 3 | … | 0 | 2 | 5.0 | 3 | 4 | 64.0 | 8.5 | 2 | 0.0 | 1 |
45 | 2 | 1 | 37.1 | 114 | 40 | 3 | 0 | 3 | 2 | 2 | … | 0 | 0 | 0.0 | 0 | 3 | 32.0 | 0.0 | 3 | 6.5 | 1 |
46 | 1 | 1 | 38.1 | 72 | 30 | 3 | 3 | 3 | 1 | 4 | … | 2 | 1 | 0.0 | 3 | 5 | 37.0 | 56.0 | 3 | 1.0 | 1 |
47 | 1 | 1 | 37.0 | 44 | 12 | 3 | 1 | 1 | 2 | 1 | … | 0 | 0 | 0.0 | 4 | 2 | 40.0 | 6.7 | 3 | 8.0 | 1 |
48 | 1 | 1 | 38.6 | 48 | 20 | 3 | 1 | 1 | 1 | 4 | … | 0 | 0 | 0.0 | 3 | 0 | 37.0 | 75.0 | 0 | 0.0 | 1 |
49 | 1 | 1 | 0.0 | 82 | 72 | 3 | 1 | 4 | 1 | 2 | … | 0 | 3 | 0.0 | 4 | 4 | 53.0 | 65.0 | 3 | 2.0 | 0 |
50 | 1 | 9 | 38.2 | 78 | 60 | 4 | 4 | 6 | 0 | 3 | … | 0 | 0 | 0.0 | 1 | 0 | 59.0 | 5.8 | 3 | 3.1 | 0 |
51 | 2 | 1 | 37.8 | 60 | 16 | 1 | 1 | 3 | 1 | 2 | … | 1 | 2 | 0.0 | 3 | 0 | 41.0 | 73.0 | 0 | 0.0 | 0 |
52 | 1 | 1 | 38.7 | 34 | 30 | 2 | 0 | 3 | 1 | 2 | … | 0 | 0 | 0.0 | 0 | 0 | 33.0 | 69.0 | 0 | 2.0 | 0 |
53 | 1 | 1 | 0.0 | 36 | 12 | 1 | 1 | 1 | 1 | 1 | … | 1 | 1 | 0.0 | 1 | 5 | 44.0 | 0.0 | 0 | 0.0 | 1 |
54 | 2 | 1 | 38.3 | 44 | 60 | 0 | 0 | 1 | 1 | 0 | … | 0 | 0 | 0.0 | 0 | 0 | 6.4 | 36.0 | 0 | 0.0 | 1 |
55 | 2 | 1 | 37.4 | 54 | 18 | 3 | 0 | 1 | 1 | 3 | … | 2 | 2 | 0.0 | 4 | 5 | 30.0 | 7.1 | 2 | 0.0 | 1 |
56 | 1 | 1 | 0.0 | 0 | 0 | 4 | 3 | 0 | 2 | 2 | … | 0 | 0 | 0.0 | 0 | 0 | 54.0 | 76.0 | 3 | 2.0 | 1 |
57 | 1 | 1 | 36.6 | 48 | 16 | 3 | 1 | 3 | 1 | 4 | … | 1 | 1 | 0.0 | 0 | 0 | 27.0 | 56.0 | 0 | 0.0 | 0 |
58 | 1 | 1 | 38.5 | 90 | 0 | 1 | 1 | 3 | 1 | 3 | … | 2 | 3 | 2.0 | 4 | 5 | 47.0 | 79.0 | 0 | 0.0 | 1 |
59 | 1 | 1 | 0.0 | 75 | 12 | 1 | 1 | 4 | 1 | 5 | … | 0 | 3 | 5.8 | 0 | 0 | 58.0 | 8.5 | 1 | 0.0 | 1 |
60 | 2 | 1 | 38.2 | 42 | 0 | 3 | 1 | 1 | 1 | 1 | … | 2 | 1 | 0.0 | 3 | 2 | 35.0 | 5.9 | 2 | 0.0 | 1 |
61 | 1 | 9 | 38.2 | 78 | 60 | 4 | 4 | 6 | 0 | 3 | … | 0 | 0 | 0.0 | 1 | 0 | 59.0 | 5.8 | 3 | 3.1 | 0 |
62 | 2 | 1 | 38.6 | 60 | 30 | 1 | 1 | 3 | 1 | 4 | … | 1 | 1 | 0.0 | 0 | 0 | 40.0 | 6.0 | 1 | 0.0 | 1 |
63 | 2 | 1 | 37.8 | 42 | 40 | 1 | 1 | 1 | 1 | 1 | … | 0 | 0 | 0.0 | 3 | 3 | 36.0 | 6.2 | 0 | 0.0 | 1 |
64 | 1 | 1 | 38.0 | 60 | 12 | 1 | 1 | 2 | 1 | 2 | … | 1 | 1 | 0.0 | 1 | 4 | 44.0 | 65.0 | 3 | 2.0 | 0 |
65 | 2 | 1 | 38.0 | 42 | 12 | 3 | 0 | 3 | 1 | 1 | … | 0 | 0 | 0.0 | 0 | 1 | 37.0 | 5.8 | 0 | 0.0 | 1 |
66 | 2 | 1 | 37.6 | 88 | 36 | 3 | 1 | 1 | 1 | 3 | … | 1 | 3 | 1.5 | 0 | 0 | 44.0 | 6.0 | 0 | 0.0 | 0 |
67 rows × 22 columns
…
In [14]:
x
from sklearn.linear_model import LogisticRegression
logistic_reg = LogisticRegression()
×
…
In [16]:
x
logistic_reg.fit(train_dataSet_df.ix[:, :-1], train_dataSet_df.ix[:, -1])
×
Out[16]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
…
In [17]:
xxxxxxxxxx
logistic_reg.predict(test_dataSet_df.ix[:, :-1])
×
Out[17]:
array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0.,
1., 1., 1., 1., 0., 0., 1., 0., 1., 0., 0., 1., 1.,
1., 0., 1., 1., 1., 0., 0., 0., 0., 0., 1., 1., 1.,
1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 0., 0., 1.,
1., 0., 1., 1., 1., 1., 1., 0., 1., 0., 1., 1., 1.,
1., 1.])
…
In [18]:
logistic_reg.predict_proba(test_dataSet_df.ix[:, :-1])
×
Out[18]:
array([[ 0.175 , 0.825 ],
[ 0.088 , 0.912 ],
[ 0.3629, 0.6371],
[ 0.3702, 0.6298],
[ 0.4648, 0.5352],
[ 0.0989, 0.9011],
[ 0.2272, 0.7728],
[ 0.2162, 0.7838],
[ 0.082 , 0.918 ],
[ 0.7505, 0.2495],
[ 0.1831, 0.8169],
[ 0.8869, 0.1131],
[ 0.844 , 0.156 ],
[ 0.4586, 0.5414],
[ 0.1879, 0.8121],
[ 0.3058, 0.6942],
[ 0.2214, 0.7786],
[ 0.7102, 0.2898],
[ 0.9123, 0.0877],
[ 0.443 , 0.557 ],
[ 0.7109, 0.2891],
[ 0.3688, 0.6312],
[ 0.7929, 0.2071],
[ 0.9282, 0.0718],
[ 0.0656, 0.9344],
[ 0.2568, 0.7432],
[ 0.0797, 0.9203],
[ 0.5667, 0.4333],
[ 0.1443, 0.8557],
[ 0.188 , 0.812 ],
[ 0.0726, 0.9274],
[ 0.9156, 0.0844],
[ 0.8757, 0.1243],
[ 0.5044, 0.4956],
[ 0.8525, 0.1475],
[ 0.5457, 0.4543],
[ 0.3535, 0.6465],
[ 0.2268, 0.7732],
[ 0.0804, 0.9196],
[ 0.055 , 0.945 ],
[ 0.039 , 0.961 ],
[ 0.1589, 0.8411],
[ 0.5845, 0.4155],
[ 0.6948, 0.3052],
[ 0.9126, 0.0874],
[ 0.7312, 0.2688],
[ 0.3331, 0.6669],
[ 0.5374, 0.4626],
[ 0.1555, 0.8445],
[ 0.7095, 0.2905],
[ 0.8362, 0.1638],
[ 0.0734, 0.9266],
[ 0.1559, 0.8441],
[ 0.5269, 0.4731],
[ 0.0674, 0.9326],
[ 0.1093, 0.8907],
[ 0.227 , 0.773 ],
[ 0.4529, 0.5471],
[ 0.3762, 0.6238],
[ 0.9505, 0.0495],
[ 0.1314, 0.8686],
[ 0.8362, 0.1638],
[ 0.3192, 0.6808],
[ 0.0816, 0.9184],
[ 0.3924, 0.6076],
[ 0.3461, 0.6539],
[ 0.3246, 0.6754]])
…
In [20]:
logistic_reg.score(test_dataSet_df.ix[:, :-1], test_dataSet_df.ix[:, -1])
×
Out[20]:
0.73134328358208955
…
标签:0.00,笔记,学习,01.01,算法,weights,Logistic,00.01,00.00 From: https://blog.51cto.com/u_16152603/6438385