Ignore:
File:
1 edited

Legend:

Unmodified
Added
Removed
  • distools/protselfd.m

    r30 r10  
    55%
    66% INPUT
    7 %   D     Dataset, dissimilarity matrix
     7%   D     Dataset, square dissimilarity matrix
    88%   K     Integer, desired number of prototypes
    9 %   PAR   'SUPER' supervised selection using 1NN error on prototypes.
    10 %         'LOO' - supervised selection using leave-one-out error estimation.
    11 %         'MAXDIST' - unsupervised selection minimizing the maximum
    12 %                     distance to the nearest prototype.
    13 %         'MEANDIST' - unsupervised selection minimizing the average
    14 %                      distance to the nearest prototype.
     9%   PAR  'LOO' - leave-one-out option. This should be used if
     10%          the objects are related to themselves. If D is not square,
     11%          it is assumed that the first sets of objects in columns and
     12%          rows match.
     13%        'ALL' - use all objects (default).
    1514%
    1615% OUTPUT
    1716%   W     Selection mapping ('feature selection')
    1817%   E     Error stimate as a function of number of selected prototypes
    19 %         (for supervised selection only reliable for prototype sizes >= class size)
    20 %   KOPT  Estimate for best size in avoiding peaking
    21 %         (supervised selection only)
     18%         (only reliable for prototype sizes >= class size)
     19%   KOPT  Estimate for best size in avoiding peaking
    2220%
    2321% DESCRIPTION
    24 % This procedure for optimizing the representation set of a dissimilarity
    25 % matrix is based on a greedy, forward selection of prototypes.
     22% This procedure for optimizing the representation set of a
     23% dissimilarity matrix is based on a greedy, forward selection of
     24% prototypes using the leave-one-out error estimate of the 1NN rule
     25% as a criterion. As this is computed on the given distances in
     26% D, the procedure is based on sorting and counting only and is
     27% thereby fast. In case K=1 just a single prototype has to be returned,
     28% but computing the 1NN error is not possible as all objects are assigned
     29% to the same class. In that case the centre object of the largest class
     30% will be returned.
    2631%
    27 % In case of supervised selection D should be a labeled dataset with
    28 % prototype labels stored as feature labels. The 1NN error to the nearest
    29 % prototype is used as a criterion. In case of leave-one-out error
    30 % estimation it is assumed that the first objects in D correspond with the
    31 % prototypes.
    32 %
    33 % In case K=1 just a single prototype has to be returned, but computing the
    34 % 1NN error is not possible as all objects are assigned to the same class.
    35 % In that case the centre object of the largest class will be returned.
     32% Note that the search continues untill K prototypes are found.
     33% This might be larger than desired due to peaking (curse of
     34% dimensionality, overtraining). Therefor an estimate for the
     35% optimal number of prototype is returned in KOPT.
    3636%
    37 % Note that the search continues untill K prototypes are found. This might
    38 % be larger than desired due to peaking (overtraining). Therefor an
    39 % estimate for the optimal number of prototype is returned in KOPT.
    40 %
    41 % The prototype selection may be applied by C = B*W(:,1:KSEL), in which B
    42 % is a dissimilarity matrix based on the same representation set as A (e.g.
    43 % A itself) and C is a resulting dissimilarity matrix in which the KSEL
    44 % (e.g. KOPT) best prototypes are selected.
    45 %
    46 % In case of unsupervised selection the maximum or the mean distances to
    47 % the nearest prototype are minimized. These criteria are the same as used
    48 % in the KCENTRE and KMEDIOD cluster procedures.
     37% The prototype selection may be applied by C = B*W(:,1:KSEL),
     38% in which B is a dissimilarity matrix based on the same
     39% representation set as A (e.g. A itself) and C is a resulting
     40% dissimilarity matrix in which the KSEL (e.g. KOPT) best prototypes
     41% are selected.
    4942%
    5043% REFERENCE
     
    6255%
    6356
    64 function [R,e,D] = protselfd(D,ksel,type)
     57function [R,e,D,J,nlab,clab] = protselfd(D,ksel,par,J,e,nlab,clab)
    6558
    66   if nargin < 2, ksel = []; end
    67   if nargin < 3, type = []; end
     59if nargin < 2, ksel = []; end
     60if nargin < 3 | isempty(par), par = 'all'; end
    6861
    69   if nargin < 1 || isempty(D)  % allow for D*protselfd([],pars)
    70     R = mapping(mfilename,'untrained',{ksel,type});
     62if nargin < 4 % user call
     63 
     64  if nargin < 1 | isempty(D)  % allow for D*protselfd([],pars)
     65    R = mapping(mfilename,'untrained',{ksel,par});
    7166    R = setname(R,'Forward Prototype Sel');
    7267    return
    7368  end
    7469 
    75   switch lower(type)
    76     case {'loo','LOO','super','SUPER','',''}
    77       [R,e,D,J,nlab,clab] = protselfd(D,ksel,type);
    78     case {'maxdist','meandist'}
    79       R = protselfd_unsuper(D,ksel,type);
    80     otherwise
    81       error('Unknown selection type')
    82   end
    83      
    84 return 
    85 
    86 
    87 function [R,e,D,J,nlab,clab] = protselfd_super_init(D,ksel,par)
    88 % this routine takes care of the initialisation of supervised selection
    89 
    90   isdataset(D);
    9170  [m,k,c] = getsize(D);
    9271  if isempty(ksel), ksel = k; end
     
    12099                % this will be a deep recursive call !!!
    121100                prwaitbar(ksel,'Forward prototype selection')
    122                 [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab);
     101                [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,J,e,nlab,clab);
    123102                prwaitbar(0);
    124103        end
     
    131110  D = floor((Jopt(end)+Jopt(1))/2);
    132111 
    133 return
     112  % done!
    134113 
    135 function [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab)
    136 
     114else  % internal call, parameters may have another meaning!
     115 
     116  R = par;  % prototypes sofar
    137117  [m,k,c] = getsize(D);
    138118  d = +D;
     
    155135    de = sum(ds);
    156136                % if better, use it
    157     if ee < emin || ((ee == emin) && (de < dmin))
     137    if ee < emin | ((ee == emin) & (de < dmin))
    158138      emin = ee;
    159139      jmin = j;
     
    165145  end
    166146
    167   if emin <= e(r) || 1 % we even continue if emin increases due to peaking
     147  if emin <= e(r) | 1 % we even continue if emin increases due to peaking
    168148    e(r+1) = emin;
    169149    R = Rmin;
    170150    if (r+1) < ksel
    171                         [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,Jmin,e,nlab,clab);
     151                        [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,Jmin,e,nlab,clab);
    172152    end
    173153  end
    174154 
    175 return
    176 
    177 %PROTSELFD_UNSUPER Forward prototype selection
    178 %
    179 %               N = PROTSELFD_UNSUPER(D,P,CRIT)
    180 %
    181 % INPUT
    182 %   D     Square dissimilarity matrix, zeros on diagonal
    183 %   P     Number of prototypes to be selected
    184 %   CRIT  'dist' or 'centre'
    185 %
    186 % OUTPUT
    187 %   N     Indices of selected prototypes
    188 %
    189 % DESCRIPTION
    190 % Sort objects given by square dissim matrix D using a greedy approach
    191 % such that the maximum NN distance from all objects (prototypes)
    192 % to the first K: max(min(D(:,N(1:K),[],2)) is minimized.
    193 %
    194 % This routines tries to sample the objects such that they are evenly
    195 % spaced judged from their dissimilarities. This may be used as
    196 % initialisation in KCENTRES. It works reasonably, but not very good.
    197 %
    198 % SEE ALSO
    199 % KCENTRES
    200 
    201 % Copyright: R.P.W. Duin, r.p.w.duin@prtools.org
    202 % Faculty EWI, Delft University of Technology
    203 % P.O. Box 5031, 2600 GA Delft, The Netherlands
    204 
    205 function N = protselfd_unsuper(d,p,crit)
    206 
    207 d = +d;
    208 [m,k] = size(d);
    209 if isempty(crit), crit = 'max'; end
    210 if nargin < 2 || isempty(p), p = k; end
    211 L = 1:k;
    212 N = zeros(1,p);
    213 switch crit
    214   case 'maxdist'
    215     [~,n] = min(max(d));    % this is the first (central) prototype
    216   case 'meandist'
    217     [~,n] = min(mean(d));   % this is the first (central) prototype
    218155end
    219 e = d(:,n);                 % store here the distances to the nearest prototype (dNNP)
    220 f = min(d,repmat(e,1,k));   % replace distances that are larger than dNNP by dNNP
    221 N(1) = n;                   % ranking of selected prototypes
    222 L(n) = [];                  % candidate prototypes (all not yet selected objects)
    223 
    224 for j=2:p                   % extend prototype set
    225   switch crit               % select the next prototype out of candidates in L
    226     case 'maxdist'
    227       [~,n] = min(max(f(:,L)));
    228     case 'meandist'
    229       [~,n] = min(mean(f(:,L)));
    230   end
    231   e = min([d(:,L(n)) e],[],2);   % update dNNP
    232   f = min(d,repmat(e,1,k));      % update replacement of distances that are larger
    233                                  % than dNNP by dNNP
    234   N(j) = L(n);                   % update list of selected prototypes
    235   L(n) = [];                     % update list of candidate prototypes
    236 end
    237 
Note: See TracChangeset for help on using the changeset viewer.