pos機怎么驗證正規(guī)

新聞資訊2 | 2023-07-24 11:59 | 投稿人：pos機之家

網(wǎng)上有很多關于pos機怎么驗證正規(guī),驗證方法及示例的知識，也有很多人為大家解答關于pos機怎么驗證正規(guī)的問題，今天pos機之家(www.www690aa.com)為大家整理了關于這方面的知識，讓我們一起來看下吧!

本文目錄一覽：

1、pos機怎么驗證正規(guī)

pos機怎么驗證正規(guī)

在這篇文章中，我們將討論以下概念，這些概念都旨在評估機器學習分類模型的性能：

交叉驗證模型。混淆矩陣。ROC曲線。Cohen\'s κ score。

導入Python庫

import numpy as npimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltimport warningswarnings.filterwarnings(\'ignore\')

我們首先創(chuàng)建具有三個特征和二元標簽的簡單機器學習數(shù)據(jù)集。Python代碼如下：

from sklearn.model_selection import train_test_split# Creating the datasetN = 1000 # number of samplesdata = {\'A\': np.random.normal(100, 8, N), \'B\': np.random.normal(60, 5, N), \'C\': np.random.choice([1, 2, 3], size=N, p=[0.2, 0.3, 0.5])}df = pd.DataFrame(data=data)# Labeling def get_label(A, B, C): if A < 95: return 1 elif C == 1: return 1 elif B > 68 or B < 52: return 1 return 0df[\'label\'] = df.apply(lambda row: get_label(row[\'A\'],row[\'B\'],row[\'C\']),axis=1)# Dividing to train and test setX = np.asarray(df[[\'A\', \'B\', \'C\']])y = np.asarray(df[\'label\'])X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

讓我們嘗試使用簡單的邏輯回歸來進行演示。

from sklearn import linear_modelfrom sklearn.model_selection import cross_val_scoreclf = linear_model.LogisticRegression()clf.fit(X_train, y_train)print(">> Score of the classifier on the train set is: ", round(clf.score(X_test, y_test),2))

>> Score of the classifier on the train set is: 0.74

交叉驗證

交叉驗證背后的想法很簡單 - 我們選擇一些數(shù)字k，通常k = 5或k = 10（5是sklearn中的默認值）。我們將數(shù)據(jù)分成k個大小相等的部分，并在其中的k - 1個部分上對機器學習模型進行訓練，在剩余部分上對機器學習模型的性能進行檢驗。我們這樣做k次，我們可以平均分數(shù)得到一個CV分數(shù)。

優(yōu)點：使用交叉驗證可以讓您了解模型的運行情況，它的優(yōu)點是非常健壯(與簡單的訓練-測試分離不同)。它還可以用于參數(shù)的超調(diào)整:對于給定參數(shù)，使用CV分數(shù)以魯棒方式優(yōu)化其值。

讓我們看看我們的例子的CV得分：

scores = cross_val_score(clf, X_train, y_train, cv=10)print(\'>> Mean CV score is: \', round(np.mean(scores),3))pltt = sns.distplot(pd.Series(scores,name=\'CV scores distribution\'), color=\'r\')

>> Mean CV score is: 0.729

也可以使用CV分數(shù)的值導出置信區(qū)間，在置信區(qū)間中，我們可以確保以高概率找到實際分數(shù)。

混淆矩陣

這個想法很簡單。我們希望顯示真陽性（TP），真陽性（TN），假陽性（FP）和假陰性（FN）。當有幾個標簽時，我們顯示屬于標簽i但被分類為j的數(shù)據(jù)點的數(shù)量。這個數(shù)字被定義為混淆矩陣的(i,j)項。

from sklearn.metrics import confusion_matrixC = confusion_matrix(clf.predict(X_test),y_test)df_cm = pd.DataFrame(C, range(2),range(2))sns.set(font_scale=1.4)pltt = sns.heatmap(df_cm, annot=True,annot_kws={"size": 16}, cmap="YlGnBu", fmt=\'g\')ROC曲線

讓我們仔細看看混淆矩陣。如果我們允許FP為1，那么TP也將等于1; 通常，如果TP 和FP相等，我們的預測與隨機猜測一樣好。

ROC曲線定義為TP作為FP的函數(shù)的圖。因此，從上面的討論中，ROC曲線將位于線y = x 之上。

ROC曲線的構造是由我們的分類器給每個點分配的概率得到的;對于標簽li∈{0,1}預測的每個數(shù)據(jù)點xi，我們有一個概率pi∈[0,1]，使得yi=li。如果我們允許概率是

from sklearn.metrics import confusion_matrix, accuracy_score, roc_AUC_score, roc_curvepltt = plot_ROC(y_train, clf.predict_proba(X_train)[:,1], y_test, clf.predict_proba(X_test)[:,1])

有一些重要的概念：

（1）ROC曲線下面積(AUC)是衡量分類器質(zhì)量的重要指標。ROC AUC是機器學習中常用的一種工具。

（2）圖中標記的點是TP和FP rates，正如我們在混淆矩陣中看到的那樣。

（3）如果ROC曲線位于y = x 線以下，則意味著通過反轉分類器的結果，我們可以得到一個信息分類器。下面是繪制ROC曲線的Python代碼。

def plot_ROC(y_train_true, y_train_prob, y_test_true, y_test_prob): \'\'\' a funciton to plot the ROC curve for train labels and test labels. Use the best threshold found in train set to classify items in test set. \'\'\' fpr_train, tpr_train, thresholds_train = roc_curve(y_train_true, y_train_prob, pos_label =True) sum_sensitivity_specificity_train = tpr_train + (1-fpr_train) best_threshold_id_train = np.argmax(sum_sensitivity_specificity_train) best_threshold = thresholds_train[best_threshold_id_train] best_fpr_train = fpr_train[best_threshold_id_train] best_tpr_train = tpr_train[best_threshold_id_train] y_train = y_train_prob > best_threshold cm_train = confusion_matrix(y_train_true, y_train) acc_train = accuracy_score(y_train_true, y_train) auc_train = roc_auc_score(y_train_true, y_train) fig = plt.figure(figsize=(10,5)) ax = fig.add_subplot(121) curve1 = ax.plot(fpr_train, tpr_train) curve2 = ax.plot([0, 1], [0, 1], color=\'navy\', linestyle=\'--\') dot = ax.plot(best_fpr_train, best_tpr_train, marker=\'o\', color=\'black\') ax.text(best_fpr_train, best_tpr_train, s = \'(%.3f,%.3f)\' %(best_fpr_train, best_tpr_train)) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.xlabel(\'False Positive Rate\') plt.ylabel(\'True Positive Rate\') plt.title(\'ROC curve (Train), AUC = %.4f\'%auc_train) fpr_test, tpr_test, thresholds_test = roc_curve(y_test_true, y_test_prob, pos_label =True) y_test = y_test_prob > best_threshold cm_test = confusion_matrix(y_test_true, y_test) acc_test = accuracy_score(y_test_true, y_test) auc_test = roc_auc_score(y_test_true, y_test) tpr_score = float(cm_test[1][1])/(cm_test[1][1] + cm_test[1][0]) fpr_score = float(cm_test[0][1])/(cm_test[0][0]+ cm_test[0][1]) ax2 = fig.add_subplot(122) curve1 = ax2.plot(fpr_test, tpr_test) curve2 = ax2.plot([0, 1], [0, 1], color=\'navy\', linestyle=\'--\') dot = ax2.plot(fpr_score, tpr_score, marker=\'o\', color=\'black\') ax2.text(fpr_score, tpr_score, s = \'(%.3f,%.3f)\' %(fpr_score, tpr_score)) plt.xlim([0.0, 1.0]) plt.ylim([0.0, 1.0]) plt.xlabel(\'False Positive Rate\') plt.ylabel(\'True Positive Rate\') plt.title(\'ROC curve (Test), AUC = %.4f\'%auc_test) plt.savefig(\'ROC\', dpi = 500) plt.show() return best_thresholdCohen\'s κ score

Cohen\'s κ score在同一數(shù)據(jù)上給出了兩個分類器的一致性。它被定義為κ = 1-（1- po） /（1- pe），其中po是觀察到的一致性概率，pe是一致性的隨機概率。

我們來看一個例子吧。我們需要再使用一個分類器。

from sklearn import svmclf2 = svm.SVC()clf2.fit(X_train, y_train)print(">> Score of the classifier on the train set is: ", round(clf.score(X_test, y_test),2))

>> Score of the classifier on the train set is: 0.74

我們計算訓練組的κ。

y = clf.predict(X_test)y2 = clf2.predict(X_test)n = len(y)p_o = sum(y==y2)/n # observed agreementp_e = sum(y)*sum(y2)/(n**2)+sum(1-y)*sum(1-y2)/(n**2) # random agreement: both 1 or both 0kappa = 1-(1-p_o)/(1-p_e)print(">> Cohen\'s Kappa score is: ", round(kappa,2))

>> Cohen\'s Kappa score is: 0.4

這表明兩個分類器之間存在一定的一致性。k= 0表示沒有一致性，而當兩個分類器之間存在不一致時，κ <0也會發(fā)生。

最后

我們討論了幾種基本策略來評估機器學習模型并將其與其他模型進行比較。在應用機器學習算法并比較它們的性能時，請務必牢記這些概念。

以上就是關于pos機怎么驗證正規(guī),驗證方法及示例的知識，后面我們會繼續(xù)為大家整理關于pos機怎么驗證正規(guī)的知識，希望能夠幫助到大家！