Changeset 30 for distools


Ignore:
Timestamp:
05/23/12 13:52:16 (12 years ago)
Author:
bduin
Message:

unsupervised criteria included in protselfd

File:
1 edited

Legend:

Unmodified
Added
Removed
  • distools/protselfd.m

    r10 r30  
    55%
    66% INPUT
    7 %   D     Dataset, square dissimilarity matrix
     7%   D     Dataset, dissimilarity matrix
    88%   K     Integer, desired number of prototypes
    9 %   PAR  'LOO' - leave-one-out option. This should be used if
    10 %          the objects are related to themselves. If D is not square,
    11 %          it is assumed that the first sets of objects in columns and
    12 %          rows match.
    13 %        'ALL' - use all objects (default).
     9%   PAR   'SUPER' supervised selection using 1NN error on prototypes.
     10%         'LOO' - supervised selection using leave-one-out error estimation.
     11%         'MAXDIST' - unsupervised selection minimizing the maximum
     12%                     distance to the nearest prototype.
     13%         'MEANDIST' - unsupervised selection minimizing the average
     14%                      distance to the nearest prototype.
    1415%
    1516% OUTPUT
    1617%   W     Selection mapping ('feature selection')
    1718%   E     Error stimate as a function of number of selected prototypes
    18 %         (only reliable for prototype sizes >= class size)
    19 %   KOPT  Estimate for best size in avoiding peaking
     19%         (for supervised selection only reliable for prototype sizes >= class size)
     20%   KOPT  Estimate for best size in avoiding peaking
     21%         (supervised selection only)
    2022%
    2123% DESCRIPTION
    22 % This procedure for optimizing the representation set of a
    23 % dissimilarity matrix is based on a greedy, forward selection of
    24 % prototypes using the leave-one-out error estimate of the 1NN rule
    25 % as a criterion. As this is computed on the given distances in
    26 % D, the procedure is based on sorting and counting only and is
    27 % thereby fast. In case K=1 just a single prototype has to be returned,
    28 % but computing the 1NN error is not possible as all objects are assigned
    29 % to the same class. In that case the centre object of the largest class
    30 % will be returned.
    31 %
    32 % Note that the search continues untill K prototypes are found.
    33 % This might be larger than desired due to peaking (curse of
    34 % dimensionality, overtraining). Therefor an estimate for the
    35 % optimal number of prototype is returned in KOPT.
    36 %
    37 % The prototype selection may be applied by C = B*W(:,1:KSEL),
    38 % in which B is a dissimilarity matrix based on the same
    39 % representation set as A (e.g. A itself) and C is a resulting
    40 % dissimilarity matrix in which the KSEL (e.g. KOPT) best prototypes
    41 % are selected.
     24% This procedure for optimizing the representation set of a dissimilarity
     25% matrix is based on a greedy, forward selection of prototypes.
     26%
     27% In case of supervised selection D should be a labeled dataset with
     28% prototype labels stored as feature labels. The 1NN error to the nearest
     29% prototype is used as a criterion. In case of leave-one-out error
     30% estimation it is assumed that the first objects in D correspond with the
     31% prototypes.
     32%
     33% In case K=1 just a single prototype has to be returned, but computing the
     34% 1NN error is not possible as all objects are assigned to the same class.
     35% In that case the centre object of the largest class will be returned.
     36%
     37% Note that the search continues untill K prototypes are found. This might
     38% be larger than desired due to peaking (overtraining). Therefor an
     39% estimate for the optimal number of prototype is returned in KOPT.
     40%
     41% The prototype selection may be applied by C = B*W(:,1:KSEL), in which B
     42% is a dissimilarity matrix based on the same representation set as A (e.g.
     43% A itself) and C is a resulting dissimilarity matrix in which the KSEL
     44% (e.g. KOPT) best prototypes are selected.
     45%
     46% In case of unsupervised selection the maximum or the mean distances to
     47% the nearest prototype are minimized. These criteria are the same as used
     48% in the KCENTRE and KMEDIOD cluster procedures.
    4249%
    4350% REFERENCE
     
    5562%
    5663
    57 function [R,e,D,J,nlab,clab] = protselfd(D,ksel,par,J,e,nlab,clab)
    58 
    59 if nargin < 2, ksel = []; end
    60 if nargin < 3 | isempty(par), par = 'all'; end
    61 
    62 if nargin < 4 % user call
    63  
    64   if nargin < 1 | isempty(D)  % allow for D*protselfd([],pars)
    65     R = mapping(mfilename,'untrained',{ksel,par});
     64function [R,e,D] = protselfd(D,ksel,type)
     65
     66  if nargin < 2, ksel = []; end
     67  if nargin < 3, type = []; end
     68
     69  if nargin < 1 || isempty(D)  % allow for D*protselfd([],pars)
     70    R = mapping(mfilename,'untrained',{ksel,type});
    6671    R = setname(R,'Forward Prototype Sel');
    6772    return
    6873  end
    6974 
     75  switch lower(type)
     76    case {'loo','LOO','super','SUPER','',''}
     77      [R,e,D,J,nlab,clab] = protselfd(D,ksel,type);
     78    case {'maxdist','meandist'}
     79      R = protselfd_unsuper(D,ksel,type);
     80    otherwise
     81      error('Unknown selection type')
     82  end
     83     
     84return 
     85
     86
     87function [R,e,D,J,nlab,clab] = protselfd_super_init(D,ksel,par)
     88% this routine takes care of the initialisation of supervised selection
     89
     90  isdataset(D);
    7091  [m,k,c] = getsize(D);
    7192  if isempty(ksel), ksel = k; end
     
    99120                % this will be a deep recursive call !!!
    100121                prwaitbar(ksel,'Forward prototype selection')
    101                 [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,J,e,nlab,clab);
     122                [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab);
    102123                prwaitbar(0);
    103124        end
     
    110131  D = floor((Jopt(end)+Jopt(1))/2);
    111132 
    112   % done!
    113  
    114 else  % internal call, parameters may have another meaning!
    115  
    116   R = par;  % prototypes sofar
     133return
     134 
     135function [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab)
     136
    117137  [m,k,c] = getsize(D);
    118138  d = +D;
     
    135155    de = sum(ds);
    136156                % if better, use it
    137     if ee < emin | ((ee == emin) & (de < dmin))
     157    if ee < emin || ((ee == emin) && (de < dmin))
    138158      emin = ee;
    139159      jmin = j;
     
    145165  end
    146166
    147   if emin <= e(r) | 1 % we even continue if emin increases due to peaking
     167  if emin <= e(r) || 1 % we even continue if emin increases due to peaking
    148168    e(r+1) = emin;
    149169    R = Rmin;
    150170    if (r+1) < ksel
    151                         [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,Jmin,e,nlab,clab);
     171                        [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,Jmin,e,nlab,clab);
    152172    end
    153173  end
    154174 
     175return
     176
     177%PROTSELFD_UNSUPER Forward prototype selection
     178%
     179%               N = PROTSELFD_UNSUPER(D,P,CRIT)
     180%
     181% INPUT
     182%   D     Square dissimilarity matrix, zeros on diagonal
     183%   P     Number of prototypes to be selected
     184%   CRIT  'dist' or 'centre'
     185%
     186% OUTPUT
     187%   N     Indices of selected prototypes
     188%
     189% DESCRIPTION
     190% Sort objects given by square dissim matrix D using a greedy approach
     191% such that the maximum NN distance from all objects (prototypes)
     192% to the first K: max(min(D(:,N(1:K),[],2)) is minimized.
     193%
     194% This routines tries to sample the objects such that they are evenly
     195% spaced judged from their dissimilarities. This may be used as
     196% initialisation in KCENTRES. It works reasonably, but not very good.
     197%
     198% SEE ALSO
     199% KCENTRES
     200
     201% Copyright: R.P.W. Duin, r.p.w.duin@prtools.org
     202% Faculty EWI, Delft University of Technology
     203% P.O. Box 5031, 2600 GA Delft, The Netherlands
     204
     205function N = protselfd_unsuper(d,p,crit)
     206
     207d = +d;
     208[m,k] = size(d);
     209if isempty(crit), crit = 'max'; end
     210if nargin < 2 || isempty(p), p = k; end
     211L = 1:k;
     212N = zeros(1,p);
     213switch crit
     214  case 'maxdist'
     215    [~,n] = min(max(d));    % this is the first (central) prototype
     216  case 'meandist'
     217    [~,n] = min(mean(d));   % this is the first (central) prototype
    155218end
     219e = d(:,n);                 % store here the distances to the nearest prototype (dNNP)
     220f = min(d,repmat(e,1,k));   % replace distances that are larger than dNNP by dNNP
     221N(1) = n;                   % ranking of selected prototypes
     222L(n) = [];                  % candidate prototypes (all not yet selected objects)
     223
     224for j=2:p                   % extend prototype set
     225  switch crit               % select the next prototype out of candidates in L
     226    case 'maxdist'
     227      [~,n] = min(max(f(:,L)));
     228    case 'meandist'
     229      [~,n] = min(mean(f(:,L)));
     230  end
     231  e = min([d(:,L(n)) e],[],2);   % update dNNP
     232  f = min(d,repmat(e,1,k));      % update replacement of distances that are larger
     233                                 % than dNNP by dNNP
     234  N(j) = L(n);                   % update list of selected prototypes
     235  L(n) = [];                     % update list of candidate prototypes
     236end
     237
Note: See TracChangeset for help on using the changeset viewer.