目录
说在前面数据处理python code结果
说在前面
操作系统:win10python版本:3.6.3kettle版本:8.3数据集:SodaApriori算法:【数据挖掘】笔记二-决策树
数据处理
原数据格式 目标格式 Kettle转换 【数据挖掘】Kettle去除空记录&添加标记
python code
import numpy
as np
import scipy
as sp
from sklearn
import tree
from sklearn
.metrics
import precision_recall_curve
from sklearn
.metrics
import classification_report
from sklearn
.model_selection
import train_test_split
data
= []
labels
= []
with open("file.txt",encoding
="utf-8") as ifile
:
for line
in ifile
:
tokens
= line
.strip
().split
(';')
bol
= 0
if tokens
[0] == '雨':
bol
= 1
data_elem
=[bol
,float(tokens
[1]),float(tokens
[2])]
data
.append
(data_elem
)
labels
.append
(tokens
[3].rstrip
())
x
= np
.array
(data
)
labels
= np
.array
(labels
)
y
= np
.zeros
(labels
.shape
)
y
[labels
=='[0,10)']=0
y
[labels
=='[10,60)']=1
y
[labels
=='[60,-)']=2
x_train
, x_test
, y_train
, y_test
= train_test_split
(x
, y
, test_size
= 0.4)
clf
= tree
.DecisionTreeClassifier
(criterion
='entropy')
clf
.fit
(x_train
, y_train
)
answer
= clf
.predict
(x_test
)
print(classification_report
(y_test
, answer
))
结果
测试集结果 准确率不是很高,可能算法不好,或者数据之间联系不大,俺也木有办法啊
可视化 可视化需要
pip
install graphviz
并且安装 这个
import graphviz
import os
os
.environ
['PATH'] += os
.pathsep
+ 'D:/Program Files (x86)/Graphviz/bin'
dot_data
= tree
.export_graphviz
(clf
, out_file
=None,
feature_names
=["PH","elect"],
class_names
=["[0,10)","[10,60)","[60,-)"],
filled
=True, rounded
=True)
graph
= graphviz
.Source
(dot_data
)
graph
.format = 'png'
graph
.render
("water", view
=True)