Physiological data of patients tested for breast cancer.
Format
A data frame containing 699 patients (rows) and 9 variables (columns).
- thickness
Clump Thickness
- cellsize.unif
Uniformity of Cell Size
- cellshape.unif
Uniformity of Cell Shape
- adhesion
Marginal Adhesion
- epithelial
Single Epithelial Cell Size
- nuclei.bare
Bare Nuclei
- chromatin
Bland Chromatin
- nucleoli
Normal Nucleoli
- mitoses
Mitoses
- diagnosis
Criterion: Absence/presence of breast cancer.
Values:
FALSE
vs.TRUE
(65.0% vs.\ 35.0%).
Source
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Original)
Original creator:
Dr. William H. Wolberg (physician) University of Wisconsin Hospitals Madison, Wisconsin, USA
Details
We made the following enhancements to the original data for improved usability:
The ID number of the cases was excluded.
The numeric criterion with value
2
for benign and4
for malignant was converted to logical (i.e.,TRUE
/FALSE
).16 cases were excluded because they contained
NA
values.
Other than that, the data remains consistent with the original dataset.
See also
Other datasets:
blood
,
car
,
contraceptive
,
creditapproval
,
fertility
,
forestfires
,
heart.cost
,
heart.test
,
heart.train
,
heartdisease
,
iris.v
,
mushrooms
,
sonar
,
titanic
,
voting
,
wine