Learning from the Shape of Data

Learning from the Shape of Data

Sarita Rosenstock

Philosophy of Science 2020 conference proceedings volume (29 July 2020)

Data scientists take large quantities of noisy measurements and transform them into tractable, qualitative descriptions of the phenomena being measured. While this frequently involves statistical methods, the burgeoning field of data science distinguishes itself from statistics by branching out to a wider range of methods from mathematics and computer science. One such distinctly non-statistical method of growing popularity is topological data analysis (TDA).

Topology is the study of the properties of shapes that are invariant under continuous deformations, such as stretching, twisting, bending, or re-scaling, but not tearing or gluing. TDA aims to identify the essential "structure" of a data set as it "appears" in an abstract space of measurement outcomes. This paper is an attempt to reconstruct the reasoning given by data scientists as to why and how the resulting analysis should be understood as reflecting significant features of the systems that generated the data.

Find the paper here.