Changeset 136 for prdatasets/pr_readdataset.m
- Timestamp:
- 12/16/19 11:48:22 (5 years ago)
- File:
-
- 1 edited
Legend:
- Unmodified
- Added
- Removed
-
prdatasets/pr_readdataset.m
r132 r136 55 55 s = textscan(fid,forms,'Delimiter',del,'Headerlines',nhead); 56 56 end 57 if ~ischar(fid) ;57 if ~ischar(fid) 58 58 fclose(fid); 59 59 end … … 70 70 return 71 71 72 73 % function [x,strvals] = pr_readdataset(fname,strtype)74 % % [X,STRVALS] = PR_READDATASET(FNAME)75 % %76 % % Read the dataset from the text file FNAME. It can process categorical77 % % features, or features for which categories are given in text. A matrix78 % % X is returned containing the numerical values, or integers. The79 % % integers point to the entry in STRVALS containing for each80 % % (categorical) feature its string members.81 % %82 % % X = PR_READDATASET(FNAME,STRTYPE)83 % %84 % % The user can supply a vector STRTYPE that indicates for each feature85 % % if it is numerical (0) or string/categorical (1).86 % %87 % % X = PR_READDATASET(FNAME,STRTYPE,DELIMITER)88 % %89 % % For datasets that have a strange delimiter (not comma or space), you90 % % have to supply it.91 % if nargin<392 % delimiter = ',';93 % end94 % if nargin<295 % strtype = [];96 % end97 %98 % % try to open the file99 % [fid,message] = fopen(fname,'r');100 % if fid==-1101 % disp(message)102 % error('I cannot open file %s.',fname);103 % end104 % % get the first line:105 % dline = fgetl(fid);106 % % check if the delimiter is present:107 % I = find(dline==delimiter);108 % if isempty(I)109 % delimiter = ' ';110 % I = find(dline==delimiter);111 % if isempty(I)112 % error('Cannot determine the delimiter');113 % end114 % end115 %116 % % now run over all elements in the line:117 % I = [0 I length(dline)+1];118 % w = {};119 % for i=1:length(I)-1120 % w{i} = dline((I(i)+1):(I(i+1)-1));121 % end122 %123 % % remove the empty entries:124 % I = zeros(length(w),1);125 % for i=1:length(w)126 % if isempty(w{i})127 % I(i) = 1;128 % end129 % end130 % w(find(I)) = [];131 % n = length(w);132 % x = [];133 %134 % % see if we have strings or numbers, and put the result in the matrix:135 % strvals = {};136 % if isempty(strtype)137 % for i=1:n138 % num = str2double(w{i});139 % if isnan(num) % the feature is string140 % strtype(i) = 1; % remember that it is a string141 % strvals{i}{1} = w{i}; % put it to the collection142 % x(1,i) = 1;143 % else % feature is a number, life is simple144 % strtype(i) = 0;145 % x(1,i) = num;146 % end147 % end148 % else149 % for i=1:n150 % strtype(i) = 1; % remember that it is a string151 % strvals{i}{1} = w{i}; % put it to the collection152 % x(1,i) = 1;153 % end154 % end155 % % now run over the other lines:156 % nrx = 1;157 % while 1158 % dline = fgetl(fid);159 % if ~ischar(dline), break, end %end of file...160 %161 % % now process this line:162 % nrx = nrx+1;163 % % find delimiters again:164 % I = find(dline==delimiter);165 % % cut out the words:166 % I = [0 I length(dline)+1];167 % w = {};168 % for i=1:length(I)-1169 % w{i} = dline((I(i)+1):(I(i+1)-1));170 % end171 % % remove the empty entries:172 % I = zeros(length(w),1);173 % for i=1:length(w)174 % if isempty(w{i})175 % I(i) = 1;176 % end177 % end178 % w(find(I)) = [];179 % % check:180 % if length(w)~=n181 % error('I cannot find enough values on line %d.',nrx);182 % end183 % % fill the values in the matrix184 % for i=1:n185 % if strtype(i)==0 % we have a number:186 % tmp = str2double(w{i});187 % if isnan(tmp)188 % error('It seems that feature %d is not numeric (encountered "%s" on line %d).',i,w{i},nrx);189 % end190 % x(nrx,i) = tmp;191 % else192 % % we have to find matching strings for feature i:193 % I = strmatch(w{i},strvals{i});194 % if ~isempty(I) % it is found195 % x(nrx,i) = I;196 % else % we have to add this entry:197 % x(nrx,i) = length(strvals{i})+1;198 % strvals{i}{end+1} = w{i};199 % end200 % end201 % end202 % end203 %204 % fclose(fid);
Note: See TracChangeset
for help on using the changeset viewer.