Animated examples from Autodesk called the "Datasaurus Dozen".Dynamic Applet made in GeoGebra showing the data & statistics and also allowing the points to be dragged (Set 5).Department of Physics, University of Toronto.^ Andrienko, Natalia Andrienko, Gennady Fuchs, Georg Slingsby, Aidan Turkay, Cagatay Wrobel, Stefan (2020), "Visual Analytics for Investigating and Processing Data", Visual Analytics for Data Scientists, Cham: Springer International Publishing, pp. 151–180, doi: 10.1007/978-6-8_5, ISBN 978-5-1, S2CID 226648414, retrieved.Decision Sciences Journal of Innovative Education. "Generating data sets for teaching the importance of regression analysis". "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing". ^ Matejka, Justin Fitzmaurice, George (2017).Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. "Generating Data with Identical Statistics but Dissimilar Graphics: A follow up to the Anscombe dataset". ^ a b Chatterjee, Sangit Firat, Aykut (2007).The Visual Display of Quantitative Information (2nd ed.). You've even number of values, so the median might be interpreted differently. Statistical Methods: The geometric approach. Returns M -0.2640 I did this calculation in matlab. One of these, the Datasaurus Dozen, consists of points tracing out the outline of a dinosaur, plus twelve other data sets that have the same summary statistics. Since its publication, several methods to generate similar data sets with identical statistics and dissimilar graphics have been developed. It is not known how Anscombe created his datasets. The x values are the same for the first three datasets. The quartet is still often used to illustrate the importance of looking at a set of data graphically before starting to analyze according to a particular type of relationship, and the inadequacy of basic statistic properties for describing realistic datasets. Finally, the fourth graph (bottom right) shows an example when one high-leverage point is enough to produce a high correlation coefficient, even though the other data points do not indicate any relationship between the variables.The calculated regression is offset by the one outlier, which exerts enough influence to lower the correlation coefficient from 1 to 0.816. In the third graph (bottom left), the modelled relationship is linear, but should have a different regression line (a robust regression would have been called for).A more general regression and the corresponding coefficient of determination would be more appropriate. For the second graph (top right), while a relationship between the two variables is obvious, it is not linear, and the Pearson correlation coefficient is not relevant.The first scatter plot (top left) appears to be a simple linear relationship, corresponding to two correlated variables, where y could be modelled as gaussian with mean linearly dependent on x.Coefficient of determination of the linear regression: R 2
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |