Foreword

Most undergraduate studies in science, engineering, psychology, business, social sciences and agriculture, to mention a few, require a unit in statistics. This book is was developed over some thirty years of teaching a first-level statistics at the University of Western Australia. Over the years the unit material has been refined and expanded. In particular, a complete set of problems and solutions have been developed.

The software used in this book is R, but the instructor and students may use any software of choice. However, the examples and problem sets will use R.

The purpose of this book is to enable students to:

  • understand the concepts and methods used in the analysis of data;
  • to apply these methods to a given data set;
  • to read, understand and critique documents containing statistical analysis;
  • interpret and report the results of statistical analysis; and
  • interact and collaborate with statisticians and data analysts.

The book begins with data collection and data exploration in Part I, followed by covering probability and random variables in Part II. Part III is on modelling and inference, and contains the standard statistical techniques, ending with multiple regression. Instructors may choose to omit any of the topics that are not in their syllabus. The book is organized in such a way that mathematical details that are not of interest may also be omitted.

The title of the book, Statistics: Meaning from data, is deliberate, since most of statistics is focused on estimating the mean effects. Almost all the methods covered in this book deals with estimating the mean and how this mean may depend on other variables. For example, we may be interested in the mean difference in salaries of male and female employees. In the Bank data set discussed in this book, we model the salaries using a regression model, which describes how the mean salary depends on the variable in the data, such as age, experience, education and sex. In our discussion of the Bank data we determine if there is a difference in mean salaries between males and females. The cover also encompasses the idea that meaning in data is hidden and needs to be extracted and interpreted using appropriate statistical models.

I would like to thank all my students over so many years who were part of the experiments on my way of teaching  statistics. My experiences have led to several papers and conference presentations in mathematics and statistics education.  Indeed, the business statistics unit has been a rich source of data and inspiration for me and a focus for teaching and learning research and experiments.

The book assumes basic algebra to the level of manipulating symbols, and numerical competence so that the student can compute numerical expressions. Some basic ideas of probability and data exploration will be useful but not necessary.

I would also like to thank Marty Firth and Berwin Turlach for several editorial corrections to the original lecture notes that this book is based on.  I also thank the CAUL group for encouraging me to write this book. Finally, after several years of intending to turn my material into a book, this vision is turning into reality. I also than the staff at the UWA library, without whose support, encouragement and technical assistance this book would not have materialised. To Stephanie Davenport, Amanda Dion and Chloe Czerwiec, thank you!

All the data sets used in the book are available from the book website. The book uses R for all data exploration and analysis. We recommend that students install the latest version of R and RStudio. We would encourage R scripting as a means of recording your work, so that you can use it to further your learning. Indeed practitioners of R commonly re-use existing R code for their own work.

R is usually a mystery for new user and seems unfathomable. However, with practice one gets familiar with the main aspects of R, and learning R is a continuous process. The student should get used to online searching for appropriate R functions and packages. Since R is a free and open source software, it keeps growing. R users also grow with this.

Below are some good resources for R. In addition to these many very good videos exist on youtube. Many other sources exist and will be added over time, and I encourage you to do your own searches. Note that R is not just a statistical analysis software, but more properly called a statistical environment. R is also a programming environment. We encourage that if you do R in any work and publications then you properly reference it, as we have below.

 

R for Beginners

R Tutorial:From Beginner to Expert in R Programming

Learning statistics for R: A tutorial for psychology students and other beginners

R Tutorial

Reference:

R Core Team (2023). _R: A Language and Environment for Statistical Computing_. R Foundation for Statistical Computing, Vienna, Austria. The R Project for Statistical Computing

R. Nazim Khan

30 April 2024

nazim.khan@uwa.edu.au

Licence

Icon for the Creative Commons Attribution-NonCommercial 4.0 International License

Statistics: Meaning from data Copyright © 2024 by Dr Nazim Khan is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where otherwise noted.

Share This Book