Context Navigation

source: distools/genddat.m @ 27

Last change on this file since 27 was 10, checked in by bduin, 14 years ago

File size: 3.5 KB

Line
1	%GENDDAT Generate random training and test sets for dissimilarity data
2	%
3	% [DTR,DTE,I,J] = GENDDAT(D,M,K)
4	%
5	% INPUT
6	% D NxN dissimilarity dataset
7	% M Cx1 vector of class sizes or frequencies, or a single number / frequency in(0,1)
8	% K Cx1 vector of class sizes or frequencies, or a single number / frequency in(0,1)
9	% (optional, default: K = M)
10	%
11	% OUTPUT
12	% DTR Training dissimilarity dataset
13	% DTE Test dissimilarity dataset
14	% I Indices of the training objects
15	% J Indices of the test objects
16	%
17	% DESCRIPTION
18	% Generates random training and test sets from a square dissimilarity dataset D.
19	% Feature labels and object labels of D should be equal. Note that M and K can be
20	% either scalars or vectors with as many components as classes (=C), defining
21	% specific sizes/fractions per class.
22	%
23	% Per default, all training are used as a represenation set (M=K). If M is Cx1 vector,
24	% then GENDDAT selects at random M(i) vectors out of the i-th class in D and stores them
25	% in the dataset DTR of the size [sum(M)]x[sum(M)]. The remaining objects are stored in
26	% [N-sum(M)]x[sum(M)] dissimilarity data DTE. Classes are ordered by using RENUMLAB(GETLAB(D)).
27	% If M is a scalar, then M objects in total (given by number or frequency) are selected
28	% at random according to the class priors. I and J are the indices of the training
29	% and testing objects, respectively.
30	%
31	% If K is provided, then the first K(i) training objects per i-the class are used
32	% for the representation set. Note that K(i) can be a frequency.
33	%
34	% DEFAULT
35	% K = M
36	%
37	% EXAMPLE
38	% Let D be 100 x 100 dataset with two classes [40 60] and class priors [0.4 0.6].
39	% 1) [DTR,DTE] = GENDDAT(D,0.6)
40	% DTR is 60x60 and DTE is 40x60. 60% of objects of the first class (24 in total)
41	% and 40% of the objects of the second class (36 in total) are selected for DTR.
42	%
43	% 2) [DTR,DTE] = GENDDAT(D,0.6,0.1)
44	% DTR is 60x6 and DTE is 40x6. 60% of objects of the first class (24 in total)
45	% and 40% of the objects of the second class (36 in total) are selected for training.
46	% From that, 10% of the first training objects per class are selected for the
47	% represenatation set. 10% from 24 rounds to 2 objects for the first class, while
48	% 10% of 36 rounds to 4 objects for the second class.
49	%
50	% SEE ALSO
51	% DATASETS, RENUMLAB
52	%
53
54	% Copyright: R.P.W. Duin, r.p.w.duin@prtools.org, and
55	% Elzbieta Pekalska, ela.pekalska@googlemail.com
56	% Faculty EWI, Delft University of Technology, and
57	% School of Computer Science, University of Manchester
58
59
60	function [DTR,DTE,Itr,Ite] = genddat(D,m,k);
61
62	[n,nk,c] = getsize(D);
63	nlab = getnlab(D);
64	discheck(D,[],1); % allow for similarities
65
66	if nargin < 3,
67	k = [];
68	else
69	if length(k) == 1
70	k = k*ones(1,c);
71	elseif length(k) == c
72	;
73	else
74	error('Vector length of the number of objects should equal the number of classes.')
75	end
76
77	if ~(all(k == round(k))) & ~(all(k > 0 & k < 1))
78	error('K should be given either by integers or frequencies in (0,1).')
79	end
80	end
81	if ~(all(m == round(m))) & ~(all(m > 0 & m < 1))
82	error('M should be given either by integers or frequencies in (0,1).')
83	end
84
85	[ja,jb] = gendat(dataset([1:n]',nlab),m);
86	ja = +ja;
87	jb = +jb;
88
89	J = [];
90	for j=1:c
91	K = find(nlab(ja)==j);
92	if ~isempty(k)
93	if k(j) < 1,
94	k(j) = round(k(j)*length(K));
95	end
96	if k(j) > length(K)
97	error('Requested size of the representation set is not possible.')
98	end
99	K = K(1:k(j));
100	end
101	J = [J; K(:)];
102	end
103	M = setdiff([1:length(ja)]',J);
104
105	Itr = [ja(J); ja(M)];
106	Ite = jb;
107	DTR = D(Itr,ja(J));
108	DTE = D(Ite,ja(J));
109

Note: See TracBrowser for help on using the repository browser.

Download in other formats: