Statistics

Statistics is the foundation of the Master of Analytics course. The following assignments illustrate how to examine data and fit into relative statistical models to draw inferences:

  1. Body Measurements - Visualize the given “chest girth” data and justify whether they can pass the hypothesis test and fit into normal distributions.

  2. Supermarket Price Wars - Collect prices of same commodities from Coles and Woolworths, using paired t-test to investiage whether there is any general price difference (which one is generally cheaper to buy from? ) between these 2 retail giants.

  3. Bike Rental Investigation* - Can we prove that people only tend to use bike sharing program in Melbourne on cooler days? Use regression to justify the negative linear relationship between temperature and rental rate.

Data Wrangling

Transforming raw data into desired format is a fundamental process before commencing analysis. The following assignments show our development on manipulating data in ascending order:

  1. Weather Records* - Explore open data from common sources (eg. http://www.abs.gov.au/ ; https://www.kaggle.com ) and identify datatypes and learn how to filter and restructure data with dataframe. Weather records in Falls Creek are being examined in this assignment.

  2. WHO, Species and Surveys - Perform 10 given tasks on 3 given datasets. Beside transforming the data, we also need to handling missing values and identify outliners in this assignment.

  3. Formula One Racing* - Explore open data from common sources, come up with your own research questions and document your findings by using all the transformation and data cleansing (treatment of missing values and outliners) techniques. Formula One data from kaggle has been chosen and investigations were based on the following topics for this assignment:

    • Who are the winner(s) for all the Grand Prix races in 2017? Which constructor team(s) do these winners belong to?
    • Compare the results of the top 3 players for Japanese Grand Prix in Year 2016 and 2017.

Time Series Analysis

When data is composed of observations collected sequentially over time, we categorize this type of analysis as Time Series Analysis. Throughout the course, we learned how to identify trends and randomness (stochasticity) in time series data and model them accordingly. Refer to the Time Series Analysis page for further details on individual assignments.

Applied Bayesian Statistics

In classical statistics, we observe the cause and infer the effect. In Bayesian statistics, it is exactly opposite, we observe the effect and infer the cause. As we are in the era of data explosion, applying Bayesian statistics on the present data (the actual fact) to infer the cause of the events is considered to be the future of statistics. Refer to the Applied Bayesian Statistics page for further details on individual assignments.



N.B. * indicates open-ended assignment