# Data and Data Types

Attribute (ordimensions, features, variables): a data field, representing a characteristic or feature of a data object.
属性 (或维度、特征、变量)：数据字段，表示数据对象的特征或特征。
- E.g., customer_ID, name, address
Types
- Categorical
  - Nominal 标称型变量
  - Binary 二元的
  - Ordinal 有序的
- Numerical
  - Interval-scaled or Ratio-scaled 间隔标度或比率标度
  - Discrete or continuous 离散和连续

一旦有了数据，则可能需要执行数据预处理。

# Data Preprocessing

Deal with Missing values 处理缺失值
- What are the possible solutions?
Missing Values
如果是数值型变量，则求解缺失数据。
如果是标称型变量，则填写缺失数据。
Reduce variance for a numerical variable: binning
减少数值变量的方差
Correlation analysis: for different variables
相关分析
Data Normalization
- Min-max, z-score, decimal scaling
Data Transformation: numerical <--> nominal Discretization
Feature Selection and Reduction
- Wrapper, Filter Models
- PCA
Outlier Detection

infer a (predictive) function from data associated with pre-defined targets/classes/labels
从与预定义目标 / 类别 / 标签相关的数据推断（预测）函数
Example: group objects by predefined labels
示例：按预定义标签对对象进行分组
Goal: Learn a model from labelled data (with multiple features) for future predictions
目标：从标记数据 (具有多个特征) 中学习模型，以用于未来的预测
Outcomes: We know outcomes: the predefined labels
结果：我们知道结果：预定义的标签
Evaluation: error/accuracy, and other more metrics
评估：错误 / 准确性和其他更多指标
Data Mining Task: Classification
数据挖掘任务：分类

discover or describe underlying structure from unlabelled data
从未贴标签的数据中发现或描述底层结构
Example: group objects by multiple features
示例：按多个特征对对象进行分组
Goal: Learn the structure from unlabelled data (with multiple features) Outcomes: We do not know the outcomes
目标：从未标记数据（具有多个特征）中学习结构结果：我们不知道结果
Evaluation: No clear performance or evaluation methods
评估：没有明确的绩效或评估方法
Data Mining Task: Clustering
数据挖掘任务：聚类

Partitional Clustering: just group objects to minimize intra-cluster distances and maximize inter-cluster distances, e.g., K-Means
只需将对象分组以最小化簇内距离并最大化簇间距离，例如 K - 均值
Hierarchical Clustering: a clustering process in order to discover the hierarchical structure, like a hierarchical tree Example: categories and subcategories; taxonomies
为了发现层次结构而进行的聚类过程，如层次树示例：类别和子类别；分类法