Commit fca977e0 authored by bbguimaraes's avatar bbguimaraes
Browse files

Added dataset files.

parent c5955ef3
This diff is collapsed.
This diff is collapsed.
Description of the Dataset:
THIS CREDIT DATA ORIGINATES FROM QUINLAN (see below).
1. Title: Australian Credit Approval
2. Sources:
(confidential)
Submitted by quinlan@cs.su.oz.au
3. Past Usage:
See Quinlan,
* "Simplifying decision trees", Int J Man-Machine Studies 27,
Dec 1987, pp. 221-234.
* "C4.5: Programs for Machine Learning", Morgan Kaufmann, Oct 1992
4. Relevant Information:
This file concerns credit card applications. All attribute names
and values have been changed to meaningless symbols to protect
confidentiality of the data.
This dataset is interesting because there is a good mix of
attributes -- continuous, nominal with small numbers of
values, and nominal with larger numbers of values. There
are also a few missing values.
5. Number of Instances: 690
6. Number of Attributes: 14 + class attribute
7. Attribute Information: THERE ARE 6 NUMERICAL AND 8 CATEGORICAL ATTRIBUTES.
THE LABELS HAVE BEEN CHANGED FOR THE CONVENIENCE
OF THE STATISTICAL ALGORITHMS. FOR EXAMPLE,
ATTRIBUTE 4 ORIGINALLY HAD 3 LABELS p,g,gg AND
THESE HAVE BEEN CHANGED TO LABELS 1,2,3.
A1: 0,1 CATEGORICAL
a,b
A2: continuous.
A3: continuous.
A4: 1,2,3 CATEGORICAL
p,g,gg
A5: 1, 2,3,4,5, 6,7,8,9,10,11,12,13,14 CATEGORICAL
ff,d,i,k,j,aa,m,c,w, e, q, r,cc, x
A6: 1, 2,3, 4,5,6,7,8,9 CATEGORICAL
ff,dd,j,bb,v,n,o,h,z
A7: continuous.
A8: 1, 0 CATEGORICAL
t, f.
A9: 1, 0 CATEGORICAL
t, f.
A10: continuous.
A11: 1, 0 CATEGORICAL
t, f.
A12: 1, 2, 3 CATEGORICAL
s, g, p
A13: continuous.
A14: continuous.
A15: 1,2
+,- (class attribute)
8. Missing Attribute Values:
37 cases (5%) HAD one or more missing values. The missing
values from particular attributes WERE:
A1: 12
A2: 12
A4: 6
A5: 6
A6: 9
A7: 9
A14: 13
THESE WERE REPLACED BY THE MODE OF THE ATTRIBUTE (CATEGORICAL)
MEAN OF THE ATTRIBUTE (CONTINUOUS)
9. Class Distribution
+: 307 (44.5%) CLASS 2
-: 383 (55.5%) CLASS 1
10. There is no cost matrix.
[Desktop Entry]
Encoding=UTF-8
Name=UCI Machine Learning Repository: Statlog (Australian Credit Approval) Data Set
Type=Link
URL=http://archive.ics.uci.edu/ml/datasets/Statlog+(Australian+Credit+Approval)
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Description of the German credit dataset.
1. Title: German Credit data
2. Source Information
Professor Dr. Hans Hofmann
Institut f"ur Statistik und "Okonometrie
Universit"at Hamburg
FB Wirtschaftswissenschaften
Von-Melle-Park 5
2000 Hamburg 13
3. Number of Instances: 1000
Two datasets are provided. the original dataset, in the form provided
by Prof. Hofmann, contains categorical/symbolic attributes and
is in the file "german.data".
For algorithms that need numerical attributes, Strathclyde University
produced the file "german.data-numeric". This file has been edited
and several indicator variables added to make it suitable for
algorithms which cannot cope with categorical variables. Several
attributes that are ordered categorical (such as attribute 17) have
been coded as integer. This was the form used by StatLog.
6. Number of Attributes german: 20 (7 numerical, 13 categorical)
Number of Attributes german.numer: 24 (24 numerical)
7. Attribute description for german
Attribute 1: (qualitative)
Status of existing checking account
A11 : ... < 0 DM
A12 : 0 <= ... < 200 DM
A13 : ... >= 200 DM /
salary assignments for at least 1 year
A14 : no checking account
Attribute 2: (numerical)
Duration in month
Attribute 3: (qualitative)
Credit history
A30 : no credits taken/
all credits paid back duly
A31 : all credits at this bank paid back duly
A32 : existing credits paid back duly till now
A33 : delay in paying off in the past
A34 : critical account/
other credits existing (not at this bank)
Attribute 4: (qualitative)
Purpose
A40 : car (new)
A41 : car (used)
A42 : furniture/equipment
A43 : radio/television
A44 : domestic appliances
A45 : repairs
A46 : education
A47 : (vacation - does not exist?)
A48 : retraining
A49 : business
A410 : others
Attribute 5: (numerical)
Credit amount
Attibute 6: (qualitative)
Savings account/bonds
A61 : ... < 100 DM
A62 : 100 <= ... < 500 DM
A63 : 500 <= ... < 1000 DM
A64 : .. >= 1000 DM
A65 : unknown/ no savings account
Attribute 7: (qualitative)
Present employment since
A71 : unemployed
A72 : ... < 1 year
A73 : 1 <= ... < 4 years
A74 : 4 <= ... < 7 years
A75 : .. >= 7 years
Attribute 8: (numerical)
Installment rate in percentage of disposable income
Attribute 9: (qualitative)
Personal status and sex
A91 : male : divorced/separated
A92 : female : divorced/separated/married
A93 : male : single
A94 : male : married/widowed
A95 : female : single
Attribute 10: (qualitative)
Other debtors / guarantors
A101 : none
A102 : co-applicant
A103 : guarantor
Attribute 11: (numerical)
Present residence since
Attribute 12: (qualitative)
Property
A121 : real estate
A122 : if not A121 : building society savings agreement/
life insurance
A123 : if not A121/A122 : car or other, not in attribute 6
A124 : unknown / no property
Attribute 13: (numerical)
Age in years
Attribute 14: (qualitative)
Other installment plans
A141 : bank
A142 : stores
A143 : none
Attribute 15: (qualitative)
Housing
A151 : rent
A152 : own
A153 : for free
Attribute 16: (numerical)
Number of existing credits at this bank
Attribute 17: (qualitative)
Job
A171 : unemployed/ unskilled - non-resident
A172 : unskilled - resident
A173 : skilled employee / official
A174 : management/ self-employed/
highly qualified employee/ officer
Attribute 18: (numerical)
Number of people being liable to provide maintenance for
Attribute 19: (qualitative)
Telephone
A191 : none
A192 : yes, registered under the customers name
Attribute 20: (qualitative)
foreign worker
A201 : yes
A202 : no
8. Cost Matrix
This dataset requires use of a cost matrix (see below)
1 2
----------------------------
1 0 1
-----------------------
2 5 0
(1 = Good, 2 = Bad)
the rows represent the actual classification and the columns
the predicted classification.
It is worse to class a customer as good when they are bad (5),
than it is to class a customer as bad when they are good (1).
[Desktop Entry]
Encoding=UTF-8
Name=UCI Machine Learning Repository: Statlog (German Credit Data) Data Set
Type=Link
URL=http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment