Matlab feature selection总结_涛声依旧e

http://blog.sina.com.cn/u/2182821854

首页博文目录关于我

个人资料

微博

加好友发纸条

写留言加关注

博客等级：
博客积分：

博客访问：
关注人气：
获赠金笔：0支
赠出金笔：0支
荣誉徽章：

正文字体大小：大中小

Matlab feature selection总结

(2018-03-07 11:24:56)

分类：科研经验

常用的feature selection方法有ReliefF, Fisher, 以及prevent lunch attack中的mutual information，代码总结如下

1.ReliefF 在matlab中有现成代码，可以直接调用。

2. Fisher，在网上有toolbox，https://au.mathworks.com/matlabcentral/fileexchange/56937-feature-selection-library ，里面包含了很多方法（包括ReliefF），但是其中的fisher方法并不能直接返回fisher score（在lib中的fisher.m能返回，但是需要自己改一些程序得到这个返回object）。这个网上的代码可以直接返回fisher score: https://au.mathworks.com/matlabcentral/fileexchange/54906-feature-rank-input-labels-numindices-method-?focused=5868668&tab=function ，但是里面的代码有一点错误（在comments里可以看到），修改后的代码如下

% Function to compute Fisher-Score or Discriminating Coefficient

% Inputs: Input: input data matrix where each row is a

% feature and each column corresponds to an instance or example

% labels: grouping variable that contains class

% labels. It can be cell array of strings,

% numerical array or logical array

% numIndices: (optional)

% Number of significant features to be returned

% mehod: (optional)

% Method for feature-ranking

% 'Fisher_Score'(default) or

% 'Discriminating_Coefficient'

% Outputs: featureScore: score of each feature according to

% ranking criteria used

% index: indices of features according to

% the feature score

% References: Y. W. Chen and C. J. Lin,

% “Combining SVMs with various feature selection strategies”, Feature Extraction, Foundations and

% Applications. New York, Springer-Verlag, 2006

% T. Markiewicz and S. Osowski1

% "Data mining techniques for feature selection in blood cell recognition",

% Proceedings of European Symposium on Artifical Neaural Networks, April 2006

% Author: Vishnu Muralidharan

% Department of Electrical and

% Computer Engineering

% University of Alabama in Huntsville

function [index,featureScore] = feature_rank(Input,labels,numIndices,method)

%% if the a specified number of significant features are needed

if nargin <4

numIndices = NaN;

end

%% if the discriminating co-efficient needs to be calculated

if nargin == 3

criteria = method;

else

criteria = 'Fisher_Score';

end

%% Check for number of instances being equal to number of labels

if size(Input,2) ~= size(labels,1)

error('Input and labels must have the same number of rows');

end

%% Inicdes for class labels

groupLabels = unique(labels); % fidn the unique lables and hence number of classes

labelType = whos('labels');

% finding indices of respective classes in data according grouping variable

% data type

switch labelType.class

case 'cell'

idxA = find(strcmp(labels,groupLabels{1}));

idxB = find(strcmp(labels,groupLabels{2}));

case 'double'

idxA = find(labels==groupLabels(1));

idxB = find(labels==groupLabels(2));

case 'logical'

idxA = find(labels==groupLabels(1));

idxB = find(labels==groupLabels(2));

otherwise

error('Grouping vector of undefined data type');

end

numClassA = length(idxA); % number of instances of class A

numClassB = length(idxB); % number of insatnces of class B

numFeat = size(Input,1); % number of features in dataset

numInst = size(Input,2); % number of instances or examples in dataset

%% Compute Discriminating Co-efficient or Fisher Score

if strcmp(criteria,'Discriminating_Coefficient')

% compute value of discriminating coefficient between two classes for

% each feature

for i=1:1:numFeat

vectorA = Input(i,idxA);

vectorB = Input(i,idxB);

muA = mean(vectorA); % mean of given feature for class A

muB = mean(vectorB); % mean of given feature for class B

sigmaA = std(vectorA); % standard deviation of given feature for class A

sigmaB = std(vectorB); % standard deviation of given feature for class B

scoreIndex(i,1) = (abs(muA - muB))/(sigmaA + sigmaB); % compute Discriminating Coefficient

scoreIndex(i,2) = i; % store index of feature

end

else

% compute Fisher Score between two classes for each feature

for i=1:1:numFeat

muFeat = mean(Input(i,:)); % extract mean of feature for both classes combined

vectorA = Input(i,idxA);

vectorB = Input(i,idxB);

muA = mean(vectorA); % mean of given feature for class A

muB = mean(vectorB); % mean of given feature for class B

numer = ((muA - muFeat)^2) + ((muB - muFeat)^2); % numerator of Fisher Score equation

sumA = 0;

sumB = 0;

for k=1:1:numClassA

sumA = sumA + (vectorA(k) - muA)^2;

end

term1 = sumA/(numClassA -1);

for k=1:1:numClassB

sumB = sumB + (vectorB(k) - muB)^2;

end

term2 = sumB/(numClassB -1);

denom = term1 + term2; % denominator of Fisher Score equation

scoreIndex(i,1) = numer/denom; % compute Fisher Score for the feature

scoreIndex(i,2) = i; % store index of feature

end

%% Rank features according to score

for i=1:1:numFeat - 1

for j=1:1:numFeat - i

% rank features and store their respective indices

if scoreIndex(j,1) < scoreIndex(j+1,1)

tempScore = scoreIndex(j,1);

scoreIndex(j,1) = scoreIndex(j+1,1);

scoreIndex(j+1,1) = tempScore;

tempIdx = scoreIndex(j,2);

scoreIndex(j,2) = scoreIndex(j+1,2);

scoreIndex(j+1,2) = tempIdx;

end

%% Outputs

% Output scores of features

if isnan(numIndices)

featureScore = scoreIndex(:,1);

index = scoreIndex(:,2);

else

featureScore = scoreIndex(1:numIndices,1);

index = scoreIndex(1:numIndices,2);

end

3.mutual information 在上面的toolbox里也有，但是在使用中发现上面的toolbox只能处理二分类，并不支持多分类，因而找到了新的toolbox 支持多分类。 http://users.spa.aalto.fi/jpohjala/featureselection/

里面有mutual information的feature selection实现

阅读┊ 收藏 ┊ 喜欢 ▼ ┊打印┊举报/Report

前一篇：Latex的空格使用

后一篇：Raspberrypi如何连接wifi

新浪BLOG意见反馈留言板　欢迎批评指正