输出:
Prediction X_new:[0]prediction X_new belong to ['setosa']test score1:0.97test score2:0.97
测试精度
knn的邻居设置会影响测试精度,举例说明:
import matplotlib.pyplot as plt import mglearn from scipy import sparse import numpy as np import matplotlib as mt import pandas as pd from IPython.display import display from sklearn.datasets import load_breast_cancer import sklearn as sk from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier cancer = load_breast_cancer() X_train,X_test,y_train,y_test =train_test_split(cancer.data,cancer.target,stratify=cancer.target,random_state=66) training_accuracy=[] test_accuracy=[] neighbors_settings = range(1,11) for n_neighbors in neighbors_settings: clf = KNeighborsClassifier(n_neighbors=n_neighbors) clf.fit(X_train,y_train) training_accuracy.append(clf.score(X_train,y_train)) test_accuracy.append(clf.score(X_test,y_test)) plt.plot(neighbors_settings,training_accuracy,label="training accuracy") plt.plot(neighbors_settings,test_accuracy,label="test accuracy") plt.xlabel("n_neighbors") plt.ylabel("accuracy") plt.legend() plt.show()可以看出,6是最优。
KNN算法的优点是简单可解释性强,
缺点是:
样本大的时候性能不好特征多(几百个+)的时候效果不好稀疏数据集不适用