机器学习评估指标 auc_机器学习中的14种流行评估指标

论坛 期权论坛     
选择匿名的用户   2021-6-2 18:54   199   0
<article style="font-size: 16px;">
<p>机器学习评估指标 auc</p>
<div>
  <section>
   <div>
    <div>
     <p>The evaluation metric is used to measure the performance of a machine learning model. A correct choice of an evaluation metric is very essential for a model. This article will cover all the metrics used in classification and regression machine learning models.</p>
     <p> 评估指标用于衡量机器学习模型的性能。 正确选择评估指标对于模型非常重要。 本文将介绍分类和回归机器学习模型中使用的所有指标。 </p>
     <p>Evaluation Metrics discussed in the article:</p>
     <p> 文章中讨论的评估指标: </p>
     <figure style="display:block;text-align:center;">
      <div>
       <div>
        <div>
         <div style="text-align: center;">
          <img alt="Image for post" height="425" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-05428710e12a3c95b137da51ac15f069.png" style="outline: none;" width="567">
         </div>
        </div>
       </div>
      </div>
     </figure>
     <blockquote>
      <p>Metrics used in Classification Models:</p>
      <p> 分类模型中使用的指标: </p>
     </blockquote>
     <p>For a classification machine learning algorithm, the output of the model can be a target class label or probability score. The different evaluation metric is used for these two approaches.</p>
     <p> 对于分类机器学习算法,模型的输出可以是目标类别标签或概率分数。 这两种方法使用了不同的评估指标。 </p>
     <h2> 指标当ML模型的预测是类标签时使用: <span style="font-weight: bold;">(</span>Metric Used when the prediction of the ML model is a class label:<span style="font-weight: bold;">)</span></h2>
     <blockquote>
      <p><strong>Confusion Matrix:</strong></p>
      <p> <strong>混淆矩阵:</strong> </p>
     </blockquote>
     <p>A confusion matrix is the easiest way to measure the performance of a classification problem. It is used to visualize and observe the performance of the prediction of ML models. For a k class classification model, a matrix of size k*k is used to observe the prediction. For a binary class classification problem, a standard 2*2 size matrix is used.</p>
     <p> 混淆矩阵是衡量分类问题性能的最简单方法。 它用于可视化和观察ML模型的预测性能。 对于ak类分类模型,使用大小为k * k的矩阵来观察预测。 对于二进制分类问题,使用标准的2 * 2大小矩阵。 </p>
     <figure style="display:block;text-align:center;">
      <div>
       <div>
        <div>
         <div>
          <div style="text-align: center;">
           <img alt="Image for post" height="362" src="https://beijingoptbbs.oss-cn-beijing.aliyuncs.com/cs/5606289-3864b135e4eeacd9fd3e9b3e5bf7658e.png" style="outline: none;" width="825">
          </div>
         </div>
        </div>
       </div>
      </div>
      <figcaption>
       <a href="https://towardsdatascience.com/decoding-the-confusion-matrix-bb4801decbb" rel="noopener noreferrer" target="_blank">Source</a>, Confusion Matrix for binary Classification
       <a href="https://towardsdatascience.com/decoding-the-confusion-matrix-bb4801decbb" rel="noopener noreferrer" target="_blank">源</a> ,二进制分类混淆矩阵
      </figcaption>
     </figure>
     <pre class="blockcode"><code class="blockcode"><strong>Notations,</strong><strong>TP: True Postive:</strong> Number of Points which are actually positive and predicted to be positive<strong>FN: False Negative: </strong>Number of Points which are actually positive but predicted to be negative<strong>FP: False Positive: </strong>Number of Points which are actually negative but predicted to be positive<strong>TN: True Negative: </strong>Number of Points which are actually negative and predicted to be negative</code></pre>
     <p>An ML model is considered good if the numbers on principal diagonal are maximum and the number on off-diagonal should be minimum. For a binary confusion matrix, <strong>TP and TN</strong> should be high and <strong>FN and FP</strong> should be low.</p>
     <p> 如果主对角线上的数字最大而非对角线上的数字最小,则ML模型被认为是好的。 对于二进制混淆矩阵, <strong>TP和TN</strong>应该较高,而<strong>FN和FP</strong>应该较低。 </p>
     <figure style="display:block;text-align:center;"></figure>
     <p>Different problems have different metrics to choose from:</p>
     <p> 不同的问题有不同的度量标准可供选择: </p>
     <ul><li><p>For the problem of cancer diagnosis, TP should be high and <strong>FN should me very low</strong> close to 0. Patients having cancer should never be predicted to be not cancer which is the case of FN.</p><p> 对于癌症诊断问题,TP应该很高,而<strong>FN应该非常低,</strong>接近于0。永远不要预测患有癌症的患者不是癌症,FN就是这种情况。 </p></li><li><p>For the problem of spam detection, <strong>FP should be very low</strong>. No mails should be predicted to be spam which is not spam.</p><p> 对于垃圾邮件检测问题, <strong>FP应该非常低</strong> 。 不应将任何邮件预测为垃圾邮件,而不是垃圾邮件
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:3875789
帖子:775174
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP