SAX Discretization Does Not Guarantee Equiprobable Symbols
- Submitting institution
-
University of York
- Unit of assessment
- 11 - Computer Science and Informatics
- Output identifier
- 54879740
- Type
- D - Journal article
- DOI
-
10.1109/TKDE.2014.2382882
- Title of journal
- IEEE Transactions on Knowledge and Data Engineering
- Article number
- -
- First page
- 1162
- Volume
- 27
- Issue
- 4
- ISSN
- 1041-4347
- Open access status
- Out of scope for open access requirements
- Month of publication
- December
- Year of publication
- 2014
- URL
-
-
- Supplementary information
-
-
- Request cross-referral to
- -
- Output has been delayed by COVID-19
- No
- COVID-19 affected output statement
- -
- Forensic science
- No
- Criminology
- No
- Interdisciplinary
- No
- Number of additional authors
-
1
- Research group(s)
-
-
- Citation count
- 7
- Proposed double-weighted
- No
- Reserve for an output with double weighting
- No
- Additional information
- This article exposes a serious flaw in a highly cited (1378+ citations, https://tinyurl.com/y57j8b6x) discretisation algorithm, SAX, used by NASA, among others, to compress large data streams with a minimum loss of information. While the original SAX article claims optimality with respect to information loss, we demonstrate that this is never the case when one of the steps, PAA, is implemented. We also describe a way to estimate the negative impact of this flaw from the properties of the dataset, namely, that there is a very high correlation between the time series autocorrelation and the incurred loss of information.
- Author contribution statement
- -
- Non-English
- No
- English abstract
- -