引言

为何选择 tsfresh？

tsfresh 用于对时间序列和其他顺序数据进行系统性的特征工程 [1]。这些数据的共同之处在于它们按一个独立变量排序。最常见的独立变量是时间（时间序列）。顺序数据的其他例子包括反射和吸收光谱，它们的排序维度是波长。为了简化，我们仅将所有不同类型的顺序数据统称为时间序列。

（是的，天气很冷！）

现在您想计算各种特征，例如最高或最低温度、平均温度或临时温度峰值的数量

如果没有 tsfresh，您将不得不手动计算所有这些特征；tsfresh 会自动执行此过程，自动计算并返回所有这些特征。

此外，tsfresh 与 Python 库 pandas 和 scikit-learn 兼容，因此您可以轻松地将特征提取集成到当前的例程中。

我们可以用这些特征做什么？

提取的特征可用于描述时间序列，即这些特征通常能提供关于时间序列及其动态的新见解。它们还可用于对时间序列进行聚类，以及训练对时间序列执行分类或回归任务的机器学习模型。

tsfresh 软件包已成功应用于以下项目

连铸过程中钢坯质量的预测 [2]，

同步传感器活动识别 [3]，

火山爆发预测 [4]，

书面文本样本的作者归属识别 [5]，

对含有缺失数据的系外行星系统时间序列进行表征 [6]，

传感器异常检测 [7]，

以及许多其他项目。

tsfresh 不能做什么？

目前，tsfresh 不适用于

流数据（流数据通常用于在线操作，而时间序列数据通常用于离线操作）

对提取的特征训练模型（我们不想重复造轮子，要训练机器学习模型请查看 Python 软件包 scikit-learn）

处理高度不规则的时间序列；tsfresh 仅使用时间戳来排序观测值，而许多特征与间隔无关（例如，峰值数量），可以针对任何序列确定，其他一些特征（例如，线性趋势）假设时间间隔相等，因此当不满足此假设时应谨慎使用。

然而，其中一些用例是可以实现的，如果您有想法的应用，请在 https://github.com/blue-yonder/tsfresh/issues 上开启一个 Issue，或者随时联系我们。

还有哪些类似工具？

有一个名为 hctsa 的 matlab 软件包，可用于自动从时间序列中提取特征。也可以通过 pyopy 软件包在 Python 中使用 hctsa。其他可用的软件包包括 featuretools、FATS 和 cesium。

参考文献

[1]
Christ, M., Braun, N., Neuffer, J. and Kempa-Liehr A.W. (2018). Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package). Neurocomputing 307 (2018) 72-77, doi: 10.1016/j.neucom.2018.03.067.

[2]
Christ, M., Kempa-Liehr, A.W. and Feindt, M. (2016). Distributed and parallel time series feature extraction for industrial big data applications. Asian Conference on Machine Learning (ACML), Workshop on Learning on Big Data (WLBD). https://arxiv.org/abs/1610.07717v1.

[3]
Kempa-Liehr, A.W., Oram, J., Wong, A., Finch, M. and Besier, T. (2020). Feature engineering workflow for activity recognition from synchronized inertial measurement units. In: Pattern Recognition. ACPR 2019. Ed. by M. Cree et al. Vol. 1180. Communications in Computer and Information Science (CCIS). Singapore: Springer 2020, 223–231. doi: 10.1007/978-981-15-3651-9_20.

[4]
D. E. Dempsey, S. J. Cronin, S. Mei, and A. W. Kempa-Liehr (2020). Automatic precursor recognition and real-time forecasting of sudden explosive volcanic eruptions at Whakaari, New Zealand. Nature Communications 11.3562, pp. 1–8. doi: 10.1038/s41467-020-17375-2.

[5]
Tang, Y., Blincoe, K., Kempa-Liehr, A.W. (2020). Enriching Feature Engineering for Short Text Samples by Language Time Series Analysis. EPJ Data Science 9.26 (2020), 1–59. doi: 10.1140/epjds/s13688-020-00244-9.

[6]
Kennedy, A., Gemma, N., Rattenbury, N., Kempa-Liehr, A.W. (2021). Modelling the projected separation of microlensing events using systematic time-series feature engineering. Astronomy and Computing 35.100460 (2021), 1–14, doi: 10.1016/j.ascom.2021.100460.

[7]
Hui Yie Teh, Kevin I-Kai Wang, and Andreas W. Kempa-Liehr (2021). Expect the Unexpected: Unsupervised feature selection for automated sensor anomaly detection. IEEE Sensors Journal 15.16, pp. 18033–18046. doi: 10.1109/JSEN.2021.3084970.