- Timestamp:
- 05/23/12 13:52:16 (12 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
distools/protselfd.m
r10 r30 5 5 % 6 6 % INPUT 7 % D Dataset, squaredissimilarity matrix7 % D Dataset, dissimilarity matrix 8 8 % K Integer, desired number of prototypes 9 % PAR 'LOO' - leave-one-out option. This should be used if 10 % the objects are related to themselves. If D is not square, 11 % it is assumed that the first sets of objects in columns and 12 % rows match. 13 % 'ALL' - use all objects (default). 9 % PAR 'SUPER' supervised selection using 1NN error on prototypes. 10 % 'LOO' - supervised selection using leave-one-out error estimation. 11 % 'MAXDIST' - unsupervised selection minimizing the maximum 12 % distance to the nearest prototype. 13 % 'MEANDIST' - unsupervised selection minimizing the average 14 % distance to the nearest prototype. 14 15 % 15 16 % OUTPUT 16 17 % W Selection mapping ('feature selection') 17 18 % E Error stimate as a function of number of selected prototypes 18 % (only reliable for prototype sizes >= class size) 19 % KOPT Estimate for best size in avoiding peaking 19 % (for supervised selection only reliable for prototype sizes >= class size) 20 % KOPT Estimate for best size in avoiding peaking 21 % (supervised selection only) 20 22 % 21 23 % DESCRIPTION 22 % This procedure for optimizing the representation set of a 23 % dissimilarity matrix is based on a greedy, forward selection of 24 % prototypes using the leave-one-out error estimate of the 1NN rule 25 % as a criterion. As this is computed on the given distances in 26 % D, the procedure is based on sorting and counting only and is 27 % thereby fast. In case K=1 just a single prototype has to be returned, 28 % but computing the 1NN error is not possible as all objects are assigned 29 % to the same class. In that case the centre object of the largest class 30 % will be returned. 31 % 32 % Note that the search continues untill K prototypes are found. 33 % This might be larger than desired due to peaking (curse of 34 % dimensionality, overtraining). Therefor an estimate for the 35 % optimal number of prototype is returned in KOPT. 36 % 37 % The prototype selection may be applied by C = B*W(:,1:KSEL), 38 % in which B is a dissimilarity matrix based on the same 39 % representation set as A (e.g. A itself) and C is a resulting 40 % dissimilarity matrix in which the KSEL (e.g. KOPT) best prototypes 41 % are selected. 24 % This procedure for optimizing the representation set of a dissimilarity 25 % matrix is based on a greedy, forward selection of prototypes. 26 % 27 % In case of supervised selection D should be a labeled dataset with 28 % prototype labels stored as feature labels. The 1NN error to the nearest 29 % prototype is used as a criterion. In case of leave-one-out error 30 % estimation it is assumed that the first objects in D correspond with the 31 % prototypes. 32 % 33 % In case K=1 just a single prototype has to be returned, but computing the 34 % 1NN error is not possible as all objects are assigned to the same class. 35 % In that case the centre object of the largest class will be returned. 36 % 37 % Note that the search continues untill K prototypes are found. This might 38 % be larger than desired due to peaking (overtraining). Therefor an 39 % estimate for the optimal number of prototype is returned in KOPT. 40 % 41 % The prototype selection may be applied by C = B*W(:,1:KSEL), in which B 42 % is a dissimilarity matrix based on the same representation set as A (e.g. 43 % A itself) and C is a resulting dissimilarity matrix in which the KSEL 44 % (e.g. KOPT) best prototypes are selected. 45 % 46 % In case of unsupervised selection the maximum or the mean distances to 47 % the nearest prototype are minimized. These criteria are the same as used 48 % in the KCENTRE and KMEDIOD cluster procedures. 42 49 % 43 50 % REFERENCE … … 55 62 % 56 63 57 function [R,e,D,J,nlab,clab] = protselfd(D,ksel,par,J,e,nlab,clab) 58 59 if nargin < 2, ksel = []; end 60 if nargin < 3 | isempty(par), par = 'all'; end 61 62 if nargin < 4 % user call 63 64 if nargin < 1 | isempty(D) % allow for D*protselfd([],pars) 65 R = mapping(mfilename,'untrained',{ksel,par}); 64 function [R,e,D] = protselfd(D,ksel,type) 65 66 if nargin < 2, ksel = []; end 67 if nargin < 3, type = []; end 68 69 if nargin < 1 || isempty(D) % allow for D*protselfd([],pars) 70 R = mapping(mfilename,'untrained',{ksel,type}); 66 71 R = setname(R,'Forward Prototype Sel'); 67 72 return 68 73 end 69 74 75 switch lower(type) 76 case {'loo','LOO','super','SUPER','',''} 77 [R,e,D,J,nlab,clab] = protselfd(D,ksel,type); 78 case {'maxdist','meandist'} 79 R = protselfd_unsuper(D,ksel,type); 80 otherwise 81 error('Unknown selection type') 82 end 83 84 return 85 86 87 function [R,e,D,J,nlab,clab] = protselfd_super_init(D,ksel,par) 88 % this routine takes care of the initialisation of supervised selection 89 90 isdataset(D); 70 91 [m,k,c] = getsize(D); 71 92 if isempty(ksel), ksel = k; end … … 99 120 % this will be a deep recursive call !!! 100 121 prwaitbar(ksel,'Forward prototype selection') 101 [R,e,D,J,nlab,clab] = protselfd (D,ksel,R,J,e,nlab,clab);122 [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab); 102 123 prwaitbar(0); 103 124 end … … 110 131 D = floor((Jopt(end)+Jopt(1))/2); 111 132 112 % done! 113 114 else % internal call, parameters may have another meaning! 115 116 R = par; % prototypes sofar 133 return 134 135 function [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab) 136 117 137 [m,k,c] = getsize(D); 118 138 d = +D; … … 135 155 de = sum(ds); 136 156 % if better, use it 137 if ee < emin | ((ee == emin)& (de < dmin))157 if ee < emin || ((ee == emin) && (de < dmin)) 138 158 emin = ee; 139 159 jmin = j; … … 145 165 end 146 166 147 if emin <= e(r) | 1 % we even continue if emin increases due to peaking167 if emin <= e(r) || 1 % we even continue if emin increases due to peaking 148 168 e(r+1) = emin; 149 169 R = Rmin; 150 170 if (r+1) < ksel 151 [R,e,D,J,nlab,clab] = protselfd (D,ksel,R,Jmin,e,nlab,clab);171 [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,Jmin,e,nlab,clab); 152 172 end 153 173 end 154 174 175 return 176 177 %PROTSELFD_UNSUPER Forward prototype selection 178 % 179 % N = PROTSELFD_UNSUPER(D,P,CRIT) 180 % 181 % INPUT 182 % D Square dissimilarity matrix, zeros on diagonal 183 % P Number of prototypes to be selected 184 % CRIT 'dist' or 'centre' 185 % 186 % OUTPUT 187 % N Indices of selected prototypes 188 % 189 % DESCRIPTION 190 % Sort objects given by square dissim matrix D using a greedy approach 191 % such that the maximum NN distance from all objects (prototypes) 192 % to the first K: max(min(D(:,N(1:K),[],2)) is minimized. 193 % 194 % This routines tries to sample the objects such that they are evenly 195 % spaced judged from their dissimilarities. This may be used as 196 % initialisation in KCENTRES. It works reasonably, but not very good. 197 % 198 % SEE ALSO 199 % KCENTRES 200 201 % Copyright: R.P.W. Duin, r.p.w.duin@prtools.org 202 % Faculty EWI, Delft University of Technology 203 % P.O. Box 5031, 2600 GA Delft, The Netherlands 204 205 function N = protselfd_unsuper(d,p,crit) 206 207 d = +d; 208 [m,k] = size(d); 209 if isempty(crit), crit = 'max'; end 210 if nargin < 2 || isempty(p), p = k; end 211 L = 1:k; 212 N = zeros(1,p); 213 switch crit 214 case 'maxdist' 215 [~,n] = min(max(d)); % this is the first (central) prototype 216 case 'meandist' 217 [~,n] = min(mean(d)); % this is the first (central) prototype 155 218 end 219 e = d(:,n); % store here the distances to the nearest prototype (dNNP) 220 f = min(d,repmat(e,1,k)); % replace distances that are larger than dNNP by dNNP 221 N(1) = n; % ranking of selected prototypes 222 L(n) = []; % candidate prototypes (all not yet selected objects) 223 224 for j=2:p % extend prototype set 225 switch crit % select the next prototype out of candidates in L 226 case 'maxdist' 227 [~,n] = min(max(f(:,L))); 228 case 'meandist' 229 [~,n] = min(mean(f(:,L))); 230 end 231 e = min([d(:,L(n)) e],[],2); % update dNNP 232 f = min(d,repmat(e,1,k)); % update replacement of distances that are larger 233 % than dNNP by dNNP 234 N(j) = L(n); % update list of selected prototypes 235 L(n) = []; % update list of candidate prototypes 236 end 237
Note: See TracChangeset
for help on using the changeset viewer.