4. Dataset transformations

scikit-learn provides a library of transformers, which may clean (see 数据预处理), reduce (see Unsupervised dimensionality reduction), expand (see Kernel Approximation) or generate (see 特征提取) feature representations.

Like other estimators, these are represented by classes with fit method, which learns model parameters (e.g. mean and standard deviation for normalization) from a training set, and a transform method which applies this transformation model to unseen data. fit_transform may be more convenient and efficient for modelling and transforming the training data simultaneously.

Combining such transformers, either in parallel or series is covered in Pipeline and FeatureUnion: combining estimators. 配对矩阵,类别以及核函数 covers transforming feature spaces into affinity matrices, while 预测目标(y)的转换 considers transformations of the target space (e.g. categorical labels) for use in scikit-learn.