Changeset 142


Ignore:
Timestamp:
01/05/20 23:22:59 (5 years ago)
Author:
bduin
Message:

Updated collection of datasets

Location:
prdatasets
Files:
44 added
12 edited

Legend:

Unmodified
Added
Removed
  • prdatasets/Contents.m

    r141 r142  
    11% PRDATASETS: Pattern Recognition Datasets in PRTools format
    2 % Version 3.0 22-Dec-2019
     2% Version 3.0.1 6-Jan-2020
    33%
    44%Feature based labeled datasets
    55%------------------------------
    6 %name      objects  feats  classes
    7 %x80          45      8       3    radial distances of characters
    8 %arrhythmia  420    278       2    presence or absence of cardia arrhythmia
    9 %auto_mpg        398      6       2    car/miles-per-gallon
    10 %biomed      194      5       2    various patient indicators
    11 %breast      683      9       2    Wisconsion breast cancer dataset
    12 %car        1728      6       4    Car evaluation database
    13 %cbands    12000     30      24    chromosome banding patterns
    14 %chromo     1143      8      24    chromosome blob features
    15 %diabetes    768      8       2    Pima Indians Diabetes Database
    16 %ecoli       272      7       3    protein localisation sites
    17 %glass       214      9       4    glass types from chemical components
    18 %heart       297     13       2    heart disease dataset
    19 %hepatitis   112     19       2    hepatitis database
    20 %imox        192      8       4    radial distances of characters
    21 %ionosphere  351     34       2    radar data
    22 %iris        150      4       3    Fisher's Iris dataset
    23 %liver       345      6       2    liver disorder
    24 %malaysia    291      8      20    segment features in utility symbols
    25 %satellite  6435     36       6    spectral data
    26 %sonar       208     60       2    rock / metal sonar features
    27 %soybean1    266     35      19    large Soybeans
    28 %soybean2    136     35       4    small Soybeans
    29 %twonorm    7400     20       2    Leo Breiman's two normal example.
    30 %ringnorm   7400     20       2    Leo Breiman's ringnorm example.
    31 %wine        178     13       3    wine recognition
    32 %mfeat_fac  2000    216      10    face features in digits dataset
    33 %mfeat_fou  2000     76      10    Fourier features in digits dataset
    34 %mfeat_kar  2000     64      10    Karhunen Loeve features in digits dataset
    35 %mfeat_pix  2000    240      10    pixel features in digits dataset
    36 %mfeat_zer  2000     53      10    Zernike moments in digits dataset
    37 %mfeat_mor  2000      6      10    morphological features in digits dataset
    38 %mfeat      2000    649      10    combined features of the mfeat datasets
     6%name          objects  feats classes
     7%abalone          4177      8   28  Abalone Age Estimation
     8%adult           45222     14    2  Census Income Original
     9%annealing         898      9    5  Steel Annealing Data
     10%arcene            200  10000    2  Arcene Mass Spectra
     11%arrhythmia        452    275   13  Arrhythmia normal
     12%audiology         226     63   24  Standardized Audiology
     13%australian_sl     690     14    2  Statlog Australian Credit
     14%auto_mpg          398      6    2  Auto MPG
     15%balance_scale     625      4    5  Balance Scale
     16%balloons           76      4    2  Balloons
     17%biomed            194      5    2  Biomedical Data
     18%breast            683      9    2  Breast Wisconsin
     19%car              1728      6    4  Car Evaluation
     20%cbands          12000     30   24  Chromosome Bands
     21%census         142521     41    2  Census Income KDD
     22%chromo           1143      8   24  Chromosome Features
     23%cmc              1473      9    3  Contraceptive Method Choice
     24%connect4        67557     42    3  Connect-4 Dataset
     25%credit            690     15    2  Credit Approval Dataset
     26%cylinderbands     540     39    2  Cylinder Bands Dataset
     27%diabetes          768      8    2  Diabetes Dataset
     28%ecoli             336      7    8  Ecoli Dataset
     29%flowcyto          612    254    3  Flow Cytometry 1
     30%german_num_sl    1000     24    2  Statlog German Credit Num
     31%german_sl        1000     20    2  Statlog German Credit
     32%glass             214      9    4  Glass Identification Dataset
     33%haberman          306      3    2  Haberman''s Survival
     34%heart             297     13    2  Heart Cleveland
     35%heart_sl          270     13    2  Statlog Heart
     36%hepatitis         112     19    2  Hepatitis Data Set
     37%imox              192      8    4  IMOX Characters
     38%imsegment        2310     19    7  Image Segmentation
     39%imsegment_sl     2310     19    7  Statlog Image Segmentation
     40%ionosphere        351     34    2  Ionosphere Dataset
     41%iris              150      4    3  Iris Dataset
     42%isolet           7797    617   26  Isolet
     43%letter          20000     16   16  Letter Recognition
     44%liver             345      6    2  Liver disorder dataset
     45%magic04         19020     10    2  Magic Gamma Telescope
     46%malaysia          291      8   20  Malaysia Data
     47%mammograph        961      5    2  Mammographic Mass
     48%mfeat            2000    649   10  MFEAT Combined Features
     49%mfeat_fac        2000    216   10  MFEAT Face Features
     50%mfeat_fou        2000     76   10  MFEAT Fourier Features
     51%mfeat_kar        2000     64   10  MFEAT KL Features
     52%mfeat_mor        2000      6   10  MFEAT Morphological Features
     53%mfeat_pix        2000    240   10  MFEAT Pixel Features
     54%mfeat_zer        2000     47   10  MFEAT Zernike Moments
     55%musk1             476    166    2  Musk version 1
     56%musk2            6598    166    2  Musk version 2
     57%optdigits        5620     64   10  Optical Digit Recognition
     58%pageblocks       5473     10    5  Page Blocks
     59%pendigits       10992     16   10  Pen Based Handwritten Digits
     60%ringnorm         7400     20    2  Ringnorm Data
     61%satellite        6435     36    6  Satellite dataset
     62%satellite_sl     6435     36    6  Statlog Satellite
     63%shuttle_sl      58000      9    7  Statlog Shuttle
     64%sonar             208     60    2  Sonar dataset
     65%soybean1          266     35   19  Large soybean dataset
     66%soybean2          136     35    4  Small soybean dataset
     67%spambase         4601     57    2  Spambase
     68%spectf             80     44    2  Spectf Heart
     69%spectrometer      531    101    9  Low Resolution Spectrometer
     70%teachassist       151      5    3  Teaching Assistant Evaluation
     71%tic_tac_toe       958      9    2  Tic Tac Toe
     72%twonorm          7400     20    2  Twonorm Data
     73%waveform1        5000     21    3  Simple Waveform Data
     74%waveform2        5000     40    3  Advanced Waveform Data
     75%wine              178     13    3  Wine Recognition
     76%x80                45      8    3  80X Characters
     77%yeast            1484      8   10  Protein Localization
     78%zoo               101     16    7  Animal Recognition
    3979%
    4080%Multi-band images (pixels are objects, bands are features)
     
    4383%emim      128*128    8      1  A seto of 5 8-band EM images
    4484%lena      256*256    3      1  full-color image
    45 %texturel  5*128*128  7      5  texture features for 5 different texture images
     85%texturel  5*128*128  7      5  texture features of 5 different images
    4686%texturet  256*256    7      5  composite texture image
    4787%
    4888%Image datasets (pixels are features, images are objects)
    4989%--------------------------------------------------------
    50 %name   images  pixels  classes
    51 %kimia     216   64*64    18  resampled Kimia dataset of silhouettes
    52 %mnist8  70000    8*8     10  normalized MNIST digits
    53 %nist16   2000   16*16    10  normalized NIST digits
    54 %nist32   5000   32*32    10  resemapled MNIST digits
     90%name     images  pixels  classes
     91%kimia       216   64*64    18  resampled Kimia dataset of silhouettes
     92%mnist     70000   28*28    10  MNIST8 Reduced Digits
     93%mnist8    70000    8*8     10  MNIST digits
     94%nist16     2000   16*16    10  normalized NIST digits
     95%nist32     5000   32*32    10  resemapled MNIST digits
     96%
     97%Most datasets are based on the <a
     98%href="http://archive.ics.uci.edu/ml/datasets/SPECTF+Heart">UCI Machine Learning Repository.</a>
  • prdatasets/biomed.m

    r138 r142  
    3232  opt.desc = 'The purpose of the analysis is to develop a screening procedure to detect carriers and to describe its effectiveness. ';
    3333  opt.link = 'http://lib.stat.cmu.edu/datasets/';
    34   opt.dsetname = 'Biomed';
     34  opt.dsetname = 'Biomedical Data';
    3535  a = pr_download('http://prtools.tudelft.nl/prdatasets/biomed.dat',[],opt);
    3636end
  • prdatasets/car.m

    r138 r142  
    2525  opt.desc = 'The purpose of the analysis is to develop a screening procedure to detect carriers and to describe its effectiveness. ';
    2626  opt.link = 'http://lib.stat.cmu.edu/datasets/';
    27   opt.dsetname = 'Car dataset';
     27  opt.dsetname = 'Car Evaluation';
    2828  a = pr_download('http://prtools.tudelft.nl/prdatasets/car.data',[],opt);
    2929end
  • prdatasets/diabetes.m

    r138 r142  
    2020  opt.link  = 'ftp://ftp.ics.uci.edu/pub/machine-learning-databases/pima-indians-diabetes/';
    2121  opt.desc  = 'The Pima Indians Diabetes Database from UCI.';
    22   opt.dsetname = 'Diabetes';
     22  opt.dsetname = 'Diabetes Dataset';
    2323  a = pr_download('http://prtools.tudelft.nl/prdatasets/diabetes.dat',[],opt);
    2424end
  • prdatasets/ecoli.m

    r137 r142  
    2020  opt.desc='The Ecoli database from UCI. Goal is to Predict the localization site of protein in a cell, by Kenta Nakai Institue of Molecular and Cellular Biology Osaka, University.';
    2121  opt.link = 'ftp://ftp.ics.uci.edu/pub/machine-learning-databases/ecoli/';
    22   opt.dsetname = 'Ecoli';
     22  opt.dsetname = 'Ecoli Dataset';
    2323  a = pr_download('http://prtools.tudelft.nl/prdatasets/ecoli.dat',[],opt);
    2424end
  • prdatasets/imox.m

    r137 r142  
    77% measured form the corners along the diagnoals and from the edge midpoints
    88% along the horizontal and vertical central axes.
     9%
     10% REFERENCES
     11% 1. R. Dubes and A.K. Jain, Clustering techniques: The user's dilemma,
     12% Pattern Recognition, Volume 8, Issue 4, October 1976, Pages 247-260.
     13% 2. A.K. Jain, R.C. Dubes, C.C. Chen, Bootstrap Techniques for Error Estimation
     14% IEEE Trans. Pattern Anal. and Mach. Intel., 9(5), pp. 628-633, 1987.
     15% 3. W.F. Schmidt, D.F. Levelt, and R.P.W. Duin, An experimental comparison
     16% of neural classifiers with traditional classifiers, in: E.S. Gelsema,
     17% L.N. Kanal (eds.), Pattern Recognition in Practice IV, Elsevier,
     18% 1994, 391-402.
    919%
    1020% See also DATASETS, PRDATASETS, X80
     
    1727
    1828a = pr_getdata;
    19 a = setname(a,'IMOX Dataset');
     29a = setname(a,'IMOX Characters');
    2030a = setlablist(a,char('I','M','O','X'));
    2131a = setfeatlab(a,char(...
  • prdatasets/pr_download_uci.m

    r135 r142  
    7878
    7979%% if matfiles available, use them
    80 [varargout{:}] = loadmatfile(comname);
     80[varargout{:}] = pr_loadmatfile(comname);
    8181if ~isempty(varargout{1}), return; end
    8282
     
    102102    dataname = comname;
    103103  end
    104   opt{j}.dsetname = dataname;
     104%   opt{j}.dsetname = dataname;
    105105  savemat = ~isfield(opt{j},'matfile') || opt{j}.matfile;
    106106  opt{j}.matfile  = false;
     107  opt{j}.delimeter= ',';
     108  opt{j} = fielddef(opt{j},'dsetname',callername);
    107109  a = pr_download(data.url,fullfile(datadir,dataname),opt{j});
    108110  a = setuser(a,data,'user'); % store dataset info
    109   a = setname(a,dataname);    % set dataset name
     111%   a = setname(a,dataname);    % set dataset name
    110112  if ~isfield(opt{j},'labfeat') || isempty(opt{j}.labfeat)
    111113    a = feat2lab(a,size(a,2));
     
    120122if numel(ucinames) > 1
    121123  % multiple datasets loaded, alignment might be needed
    122   [varargout{:}] = dset_align(varargout{:});
     124  [varargout{:}] = pr_dset_align(varargout{:});
    123125  a = vertcat(varargout{:});
    124126  a = setuser(a,data,'user'); % store dataset info
    125   a = setname(a,comname);    % set dataset name
     127  opt{end} = fielddef(opt{end},'dsetname',callername);
    126128  if ~isfield(opt{end},'matfile') || opt{end}.matfile
    127129    save(fullfile(datadir,comname),'a');
     
    167169    dataname = prname;
    168170  end
    169   filenames{j} = fullfile(thisdir,dataname);
     171  filenames{j} = fullfile(fullfile(thisdir,'data'),dataname);
    170172  if exist([filenames{j} '.mat'],'file') == 2
    171173    % if mat-file is available, use it
     
    174176    a = getfield(s,f{1});
    175177  else
    176     if ~exist('data')
     178    if ~exist('data','var')
    177179      % get UCI info
    178180      data = parselink(name);
     
    218220if anynew && numel(ucinames) > 1
    219221  % multiple datasets loaded, alignment might be needed
    220   [varargout{:}] = dset_align(varargout{:});
     222  [varargout{:}] = pr_dset_align(varargout{:});
    221223  for j=1:numel(ucinames)
    222224    a = varargout{j};
     
    273275data.type = type;
    274276
     277function s = fielddef(s,field,x)
     278  if ~isfield(s,field)
     279    s.(field) = x;
     280  end
    275281
    276282function name = callername(n)
  • prdatasets/pr_getdata.m

    r140 r142  
    77% By default DSET is COMMAND.mat with COMMAND the name of the calling
    88% m-file. If this is not available in the directory of COMMAND the URL will
    9 % be downloaded. If ASK = true (default), the user is asked for approval.
     9% be downloaded. If ASK = true, the user is asked for approval.
    1010% If given, SIZE (in MByte) is displayed in the request.
    1111%
     
    4949  url = ['http://prtools.tudelft.nl/prdatasets/' name '.mat'];
    5050end
    51 [dummy,uname,ext] = fileparts(url);
     51[~,uname,ext] = fileparts(url);
    5252
    5353if isempty(name)
     
    9191    out = [];
    9292  end
    93 else
    94   a = dset;
    9593end
    9694 
     
    120118
    121119% make sure we check for a matfile
    122 [dummy,dummy,ext] = fileparts(dset);
     120[~,~,ext] = fileparts(dset);
    123121if isempty(ext)
    124122  dsetmat = [dset '.mat'];
     
    145143  end
    146144elseif exist(dset,'dir') == 7
    147   [dummy,dfile] = fileparts(dset);
     145  [~,dfile] = fileparts(dset);
    148146  if exist(fullfile(dset,[dfile '.mat']),'file') == 2
    149147    out = prdatafile(dset);
  • prdatasets/pr_savematfile.m

    r137 r142  
    1818  if nargout == 1
    1919    a = vertcat(varargin{:});
    20     a = setname(a,name);
    2120    save(matfile,'a');
    2221    varargout{1} = a;
     
    2726  else
    2827    a = vertcat(varargin{:});
    29     a = setname(a,name);
    3028    save(matfile,'a');
    3129    for i=1:nargin
  • prdatasets/pr_showdsets.m

    r139 r142  
    11%PR_SHOWDSETS Show datasets and store results in DSET
    22
    3 forget = {'Contents','mfeat_all'};
     3forget = {'Contents','mfeat_all','check_','pr_'};
    44commands = struct2cell(dir('*.m'));
    55commands = commands(1,:);
     
    1010  commands{j} = commands{j}(1:end-2);
    1111end
     12J = [];
    1213for i=1:numel(forget)
    13   J = strcmp(forget{i},commands);
    14   commands(J) = [];
     14  J = [J strmatch(forget{i},commands)];
    1515end
     16commands(J) = [];
    1617 
    1718for j=1:numel(commands)
    1819  a = feval(commands{j});
    1920  [m,k,c] = getsize(a);
    20   fprintf('%6i %4i %4i %15s %s\n',m,k,c,commands{j},getname(a));
     21  fprintf('%c%-14s %6i %6i %4i  %s\n','%',commands{j},m,k,c,getname(a));
    2122end
    2223
  • prdatasets/wine.m

    r137 r142  
    1818  opt.delimeter = ',';
    1919  opt.desc      = 'These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.  The analysis determined the quantities of 13 constituents found in each of the three types of wines.';
     20  opt.link      = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.names';
    2021  opt.labfeat   = 1;
    2122  opt.featnames = char(...
     
    3435        'proline');
    3536  opt.classnames = {'cultivar 1','cultivar 2','cultivar 3'};
    36   opt.dsetname   = 'Wine recognition data';
     37  opt.dsetname   = 'Wine Recognition';
    3738  a = pr_download('http://prtools.tudelft.nl/prdatasets/wine.dat',[],opt);
    3839end
  • prdatasets/x80.m

    r140 r142  
    1515% 2. A.K. Jain, R.C. Dubes, C.C. Chen, Bootstrap Techniques for Error Estimation
    1616% IEEE Trans. Pattern Anal. and Mach. Intel., 9(5), pp. 628-633, 1987.
     17% 3. W.F. Schmidt, D.F. Levelt, and R.P.W. Duin, An experimental comparison
     18% of neural classifiers with traditional classifiers, in: E.S. Gelsema,
     19% L.N. Kanal (eds.), Pattern Recognition in Practice IV, Elsevier,
     20% 1994, 391-402.
    1721%
    1822% See also DATASETS, PRDATASETS, IMOX
     
    2529
    2630a = pr_getdata('http://prtools.tudelft.nl/prdatasets/80x.mat');
    27 a = setname(a,'80X Dataset');
     31a = setname(a,'80X Characters');
    2832a = setlablist(a,char('8','0','X'));
    2933a = setfeatlab(a,char(...
Note: See TracChangeset for help on using the changeset viewer.