熵权法原理及其 Python 实现
信息熵
信息熵是不确定性的一个度量,反映信息量的多少。信息量多少与随机事件发生概率有关,概率越大,不确定性越小,包含信息就越少,所以随机事件的信息量随着其发生概率递减。信息熵计算公式如下:
$$ H(X) = - \displaystyle\sum^n_{i = 1} p_i \log(p_i) $$
其中, xi
为随机变量 X
的取值, p
为随机事件 xi
发生的概率。
熵权法
根据信息熵特性,可以用来衡量一个指标的离散程度,指标离散程度越大,该指标对综合评价对影响越大,权重越大。熵权法是一种依赖于数据本身离散性的客观赋值法,用于结合多种指标对样本进行综合打分,实现样本间比较。
实现方法
假定有 n
个样本 m
个维度,用如下方式表示每个随机变量的取值:
$$ x_{ij}, i = 1,...,n, j = 1,...,m $$
为避免量纲造成的影响,首先要对指标进行标准化处理。根据指标含义,可将指标分为正向指标(取值越大越好)和逆向指标(取值越小越好),分别通过如下方法进行标准化:
$$ x^{'}_{ij} = \frac{x_{ij} - \min(\sum x_j)}{\max(\sum x_j) - \min(\sum x_j)} $$
$$ x^{'}_{ij} = \frac{\max(\sum x_j) - x_{ij}}{\max(\sum x_j) - \min(\sum x_j)} $$
然后计算每个维度的熵
$$ E_j = -k \displaystyle\sum^n_{i = 1} p_{ij}\ln(p_{ij}) $$
其中:
$$ p_{ij} = \frac{x^{'}_{ij}}{\displaystyle\sum^n_{i = 1}x^{'}_{ij}}, i=1,...,n, j=1,...,m $$
$$ k = \frac{1}{\ln(n)} > 0, E_j > 0 $$
计算冗余度:
$$ d_j = 1 - E_j $$
计算权重:
$$ w_j = \frac{d_j}{\displaystyle\sum_j d_j} $$
计算综合得分:
$$ s_i = \displaystyle\sum_j w_j x^{'}_{ij} $$
Python 代码实现
#!/usr/bin/env python
# coding=utf-8
import numpy as np
def postiveIndex(data):
n, m = data.shape
x = np.ones([n, m])
for j in range(m):
max_xj, min_xj = max(data.T[j]), min(data.T[j])
for i in range(n):
x[i, j] = (data[i, j] - min_xj) / (max_xj - min_xj)
return x
def negativeIndex(data):
n, m = data.shape
x = np.ones([n, m])
for j in range(m):
max_xj, min_xj = max(data.T[j]), min(data.T[j])
for i in range(n):
x[i, j] = (max_xj - data[i, j]) / (max_xj - min_xj)
# print(1)
return x
def ln(x):
if x <= 0:
return 0
else:
return np.log(x)
def calcP(data):
n, m = data.shape
p = np.ones([n, m])
for j in range(m):
for i in range(n):
p[i, j] = (data[i, j] / np.sum(data.T[j])).astype(np.float64)
return p
def calcEntropy(data):
data = calcP(data)
print("Calculated P:")
print(data)
n, m = data.shape
k = 1.0 / ln(n)
E = np.ones(m)
for j in range(m):
sum = 0
for i in range(n):
sum += (data[i, j] * ln(data[i, j]))
print("Sum = ", sum)
E[j] = -k * sum
return E
def calcWeight(data):
sum = 0
weight = np.ones(len(data))
for i in data:
sum += (1 - i)
for i in range(len(data)):
weight[i] = (1 - data[i]) / sum
return weight
def calcScore(weight, data):
n, m = data.shape
s = np.ones(n)
for i in range(n):
s[i] = 0
for j in range(m):
s[i] += weight[j] * data[i, j] * 100
return s
if __name__ == "__main__":
data = np.loadtxt("./data/test.txt") # Read Data
pIndex = postiveIndex(data) # Get Positive Index
print("Positive Index:")
print(pIndex)
nIndex = negativeIndex(data) # Get Negative Index
print("Negative Index:")
print(nIndex)
pEntropy = calcEntropy(pIndex) # Get Postive Index Entropy
nEntropy = calcEntropy(nIndex) # Get Negative Index Entropy
pWeight = calcWeight(pEntropy) # Get Positive Index Weight
print("Positive Index Weight:")
print(pWeight)
nWeight = calcWeight(nEntropy) # Get Negative Index Weight
print("Negative Index Weight:")
print(nWeight)
pScore = calcScore(pWeight, pIndex) # Get Positive Index Score
print("Positive Index Score:")
print(pScore)
nScore = calcScore(nWeight, nIndex) # Get Negative Index Score
print("Negative Index Score")
print(nScore)
版权声明:
以上原理部分内容来自 综合评价之熵权法