Context Navigation

← Previous Changeset
Next Changeset →

Changeset 30

Timestamp:

05/23/12 13:52:16 (13 years ago)

Author:

bduin

Message:

unsupervised criteria included in protselfd

File:

: 1 edited

distools/protselfd.m (modified) (6 diffs)

Legend:

: Unmodified
: Added
: Removed

distools/protselfd.m

-                      r10
+                      r30
+%
 % INPUT
 %   D     Dataset, square dissimilarity matrix
+%   D     Dataset, dissimilarity matrix
 %   K     Integer, desired number of prototypes
+%   PAR  'LOO' - leave-one-out option. This should be used if
+%          the objects are related to themselves. If D is not square,
+%          it is assumed that the first sets of objects in columns and
+%          rows match.
+%        'ALL' - use all objects (default).
+%   PAR   'SUPER' supervised selection using 1NN error on prototypes.
+%         'LOO' - supervised selection using leave-one-out error estimation.
+%         'MAXDIST' - unsupervised selection minimizing the maximum
+%                     distance to the nearest prototype.
+%         'MEANDIST' - unsupervised selection minimizing the average
+%                      distance to the nearest prototype.
+%
 % OUTPUT
 %   W     Selection mapping ('feature selection')
 %   E     Error stimate as a function of number of selected prototypes
+%         (only reliable for prototype sizes >= class size)
+%   KOPT  Estimate for best size in avoiding peaking
+%         (for supervised selection only reliable for prototype sizes >= class size)
+%   KOPT  Estimate for best size in avoiding peaking
+%         (supervised selection only)
+%
 % DESCRIPTION
+% This procedure for optimizing the representation set of a
+% dissimilarity matrix is based on a greedy, forward selection of
+% prototypes using the leave-one-out error estimate of the 1NN rule
+% as a criterion. As this is computed on the given distances in
+% D, the procedure is based on sorting and counting only and is
+% thereby fast. In case K=1 just a single prototype has to be returned,
+% but computing the 1NN error is not possible as all objects are assigned
+% to the same class. In that case the centre object of the largest class
+% will be returned.
+%
+% Note that the search continues untill K prototypes are found.
+% This might be larger than desired due to peaking (curse of
+% dimensionality, overtraining). Therefor an estimate for the
+% optimal number of prototype is returned in KOPT.
+%
+% The prototype selection may be applied by C = B*W(:,1:KSEL),
+% in which B is a dissimilarity matrix based on the same
+% representation set as A (e.g. A itself) and C is a resulting
+% dissimilarity matrix in which the KSEL (e.g. KOPT) best prototypes
+% are selected.
+% This procedure for optimizing the representation set of a dissimilarity
+% matrix is based on a greedy, forward selection of prototypes.
+%
+% In case of supervised selection D should be a labeled dataset with
+% prototype labels stored as feature labels. The 1NN error to the nearest
+% prototype is used as a criterion. In case of leave-one-out error
+% estimation it is assumed that the first objects in D correspond with the
+% prototypes.
+%
+% In case K=1 just a single prototype has to be returned, but computing the
+% 1NN error is not possible as all objects are assigned to the same class.
+% In that case the centre object of the largest class will be returned.
+%
+% Note that the search continues untill K prototypes are found. This might
+% be larger than desired due to peaking (overtraining). Therefor an
+% estimate for the optimal number of prototype is returned in KOPT.
+%
+% The prototype selection may be applied by C = B*W(:,1:KSEL), in which B
+% is a dissimilarity matrix based on the same representation set as A (e.g.
+% A itself) and C is a resulting dissimilarity matrix in which the KSEL
+% (e.g. KOPT) best prototypes are selected.
+%
+% In case of unsupervised selection the maximum or the mean distances to
+% the nearest prototype are minimized. These criteria are the same as used
+% in the KCENTRE and KMEDIOD cluster procedures.
+%
 % REFERENCE
 …
+%
+function [R,e,D,J,nlab,clab] = protselfd(D,ksel,par,J,e,nlab,clab)
+if nargin < 2, ksel = []; end
+if nargin < 3 | isempty(par), par = 'all'; end
+if nargin < 4 % user call
+  if nargin < 1 | isempty(D)  % allow for D*protselfd([],pars)
+    R = mapping(mfilename,'untrained',{ksel,par});
+function [R,e,D] = protselfd(D,ksel,type)
+  if nargin < 2, ksel = []; end
+  if nargin < 3, type = []; end
+  if nargin < 1 || isempty(D)  % allow for D*protselfd([],pars)
+    R = mapping(mfilename,'untrained',{ksel,type});
     R = setname(R,'Forward Prototype Sel');
     return
   end
+  switch lower(type)
+    case {'loo','LOO','super','SUPER','',''}
+      [R,e,D,J,nlab,clab] = protselfd(D,ksel,type);
+    case {'maxdist','meandist'}
+      R = protselfd_unsuper(D,ksel,type);
+    otherwise
+      error('Unknown selection type')
+  end
+return
+function [R,e,D,J,nlab,clab] = protselfd_super_init(D,ksel,par)
+% this routine takes care of the initialisation of supervised selection
+  isdataset(D);
   [m,k,c] = getsize(D);
   if isempty(ksel), ksel = k; end
 …
                 % this will be a deep recursive call !!!
                 prwaitbar(ksel,'Forward prototype selection')
                 [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,J,e,nlab,clab);
+                [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab);
                 prwaitbar(0);
         end
 …
   D = floor((Jopt(end)+Jopt(1))/2);
+  % done!
+else  % internal call, parameters may have another meaning!
+  R = par;  % prototypes sofar
+return
+function [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,J,e,nlab,clab)
   [m,k,c] = getsize(D);
   d = +D;
 …
     de = sum(ds);
                 % if better, use it
     if ee < emin | ((ee == emin) & (de < dmin))
+    if ee < emin || ((ee == emin) && (de < dmin))
       emin = ee;
       jmin = j;
 …
   end
   if emin <= e(r) | 1 % we even continue if emin increases due to peaking
+  if emin <= e(r) || 1 % we even continue if emin increases due to peaking
     e(r+1) = emin;
     R = Rmin;
     if (r+1) < ksel
                         [R,e,D,J,nlab,clab] = protselfd(D,ksel,R,Jmin,e,nlab,clab);
+                        [R,e,D,J,nlab,clab] = protselfd_super(D,ksel,R,Jmin,e,nlab,clab);
     end
   end
+return
+%PROTSELFD_UNSUPER Forward prototype selection
+%
+%               N = PROTSELFD_UNSUPER(D,P,CRIT)
+%
+% INPUT
+%   D     Square dissimilarity matrix, zeros on diagonal
+%   P     Number of prototypes to be selected
+%   CRIT  'dist' or 'centre'
+%
+% OUTPUT
+%   N     Indices of selected prototypes
+%
+% DESCRIPTION
+% Sort objects given by square dissim matrix D using a greedy approach
+% such that the maximum NN distance from all objects (prototypes)
+% to the first K: max(min(D(:,N(1:K),[],2)) is minimized.
+%
+% This routines tries to sample the objects such that they are evenly
+% spaced judged from their dissimilarities. This may be used as
+% initialisation in KCENTRES. It works reasonably, but not very good.
+%
+% SEE ALSO
+% KCENTRES
+% Copyright: R.P.W. Duin, r.p.w.duin@prtools.org
+% Faculty EWI, Delft University of Technology
+% P.O. Box 5031, 2600 GA Delft, The Netherlands
+function N = protselfd_unsuper(d,p,crit)
+d = +d;
+[m,k] = size(d);
+if isempty(crit), crit = 'max'; end
+if nargin < 2 || isempty(p), p = k; end
+L = 1:k;
+N = zeros(1,p);
+switch crit
+  case 'maxdist'
+    [~,n] = min(max(d));    % this is the first (central) prototype
+  case 'meandist'
+    [~,n] = min(mean(d));   % this is the first (central) prototype
 end
+e = d(:,n);                 % store here the distances to the nearest prototype (dNNP)
+f = min(d,repmat(e,1,k));   % replace distances that are larger than dNNP by dNNP
+N(1) = n;                   % ranking of selected prototypes
+L(n) = [];                  % candidate prototypes (all not yet selected objects)
+for j=2:p                   % extend prototype set
+  switch crit               % select the next prototype out of candidates in L
+    case 'maxdist'
+      [~,n] = min(max(f(:,L)));
+    case 'meandist'
+      [~,n] = min(mean(f(:,L)));
+  end
+  e = min([d(:,L(n)) e],[],2);   % update dNNP
+  f = min(d,repmat(e,1,k));      % update replacement of distances that are larger
+                                 % than dNNP by dNNP
+  N(j) = L(n);                   % update list of selected prototypes
+  L(n) = [];                     % update list of candidate prototypes
+end

Note: See TracChangeset for help on using the changeset viewer.

Context Navigation

Changeset 30

Legend:

distools/protselfd.m

Download in other formats: