November 2023 – My progress

ARIMA

ARIMA is a special way to predict what might happen in the future based on past information. It has three parts that work together to understand the patterns in the data. These parts are called AutoRegressive, Integrated, and Moving Average. Each part helps us learn different things about the data and make better predictions. AutoRegressive (AR) means that something is based on what came before it. It’s like a pattern or a sequence where each thing depends on the thing that came before it. It’s kind of like a chain reaction. The “I” component in the model stands for integrated. It means that the data used in the model has been adjusted to make it more predictable. This adjustment is done by taking the difference between each observation and its previous observation. The notation I(d) represents the degree of differencing, where “d” represents the number of times the data has been adjusted. Moving Average (MA) is a way to look at a set of numbers over time and see if they are changing. It helps us understand if there is a pattern or trend. The notation MA(d) tells us how many steps we need to take to see a clear pattern. The MA component is about looking at how the current observation is related to the difference between what we expect and what actually happened in the past. The notation MA(q) just means we are looking at a certain number, q, of these differences from the past.

By following these steps, we can better understand the numbers and make predictions about what might happen next: A stationarity check is like making sure that something doesn’t change too much over time. It’s like checking if a toy stays the same and doesn’t break or get damaged. We want to see if something stays consistent without big changes or surprises. Make sure that the data you are looking at stays the same over time, or change it so that it does. To figure out how to change it, you need to pick the right amount of changes to make. We need to figure out the values of “p,” “d,” and “q” by looking at patterns in the data. We use special mathematical tools to analyze these patterns, like autocorrelation and partial autocorrelation functions. This helps us find the right model for the data. After fitting the ARIMA model to the data using the chosen orders, we need to check if the model is working well. This means we want to make sure the model is accurately predicting the future values based on the patterns it learned from the data. For predicting the future, we need to make guesses based on what we know. The fitted model helps us make guesses about what might happen in the future. ARIMA models are really good at figuring out patterns in different types of data, like money, weather, and other things. They give us a way to understand how things change over time.

Moving Average Model

I read up upon the The Moving Average model, also known as MA(q), which helps us guess what will happen next by looking at what happened in the past. It’s helpful when we’re studying data that changes over time. The Moving Average (MA) model is a tool that helps us understand how things change over time. It is used in many different situations because it can show us the ups and downs that hapeen in the short term. Some examples of how we use it are in predicting the weather or figuring out how many people will visit a park on a certain day.
Interpreting a moving average (MA) chart involves analyzing patterns and the information it provides about the underlying time series data. Here is a guide on how to interpret the MA chart: The MA plot helps us understand how two things are different from each other. We can imagine two friends who have different amounts of toys. The MA plot shows us how many more or less toys one friend has compared to the other friend. It helps us see if there is a big difference or if they are similar. An MA plot is like a graph that shows the average of a bunch of numbers at different time periods. The average is calculated by looking at a specific group of numbers at a time. We use this plot to look for patterns and trends in the numbers. In the MA plot, we can look for things that happen again and again, like a pattern or trend. This can help us see the big picture and understand what is really happening, even if there are some small changes along the way. If we see certain patterns in a graph, like hills or dips, that happen regularly after a certain amount of time, it could mean that there is a pattern in the data that happens every year. This can help us find things that are different or unusual in the data. If the line that shows the average of the data suddenly goes up or down a lot at certain points, it could mean there are unusual or different things happening in the data. We might need to look more closely at those points. We can also compare this line with the original data to see if it matches or if there are differences. The MA plot is like a picture that shows us the smoothed trend of the data over time. It helps us see the patterns that happen over a long period. Look at how things change over time in the story. The first change is between things that are right next to each other, and the second change is between things that are two spaces apart, and so on. To check if a model is working properly, we can use a special graph called the MA plot. This graph helps us see if there are any big jumps in the pattern of the data at different times. If we see any big jumps, it means that the model might not have considered all the important connections between different points in time. Look at the MA plot to see if the model’s assumptions are true. If the plot looks different than what we expect, it might mean there’s a problem with the model. This is important because it helps us make better decisions. The MA plot helps us understand how things are changing over time and can give us important information to help us make smart choices. When we compare different models, we can see which one is better at predicting what will happen in the future. When trying out different ways to smooth out data, look at the pictures of the smoothing to see which one looks the best and shows us helpful patterns. It’s important to remember that different people might see different things in the pictures based on what they’re looking for and what the data is like. Looking at the pictures regularly helps us make our methods better.

Forecasting Principles

I read a copy of Rob J. Hyndman and George Athanasopoulos’ “Forecasting: Principles and Practice” which is an extensive and understandable introduction to the fundamentals of forecasting. The book, which is free to download online, covers a variety of forecasting approaches and techniques with an emphasis on real-world applications. It offers detailed instructions for putting forecasting models into practice and is especially meant for readers who have a working knowledge of basic statistics.

The writers present basic ideas like decomposition, time series analysis, and different forecasting techniques like ARIMA models and exponential smoothing. The significance of comprehending the data and selecting suitable models according to the time series properties is emphasized throughout the book. The text is filled with real-world examples and case studies that enable readers to put the writers’ perspective in mind.

Reading Douglas Hamilton’s take on Time Series

I read this interesting book called Time Series Analysis by James Douglas Hamilton which is a comprehensive textbook that provides an in-depth look at the principles and techniques of analyzing time series data.

This book covers a variety of topics, including basic concepts of time series, stochastic processes, and the mathematical tools needed for analysis. It explores univariate and multivariate time series models, discussing issues such as stationarity, autoregressive and moving average models, and forecasting. In the book Hamilton, the author teaches readers about really smart ideas like cointegration, state-space models, and studying economic time series. This book is great for students, researchers, and people who work in economics, finance, and similar fields. Hamilton also gives examples and uses real-life situations to help readers understand the complicated stuff in the book.

In summary, James Douglas Hamilton’s Time Series Analysis is a valuable resource for those who wish to gain an in-depth understanding of time series data and its applications in econometrics and related fields.

Hopefully I can implement these principles to their full capacity

Exploring Time Series

A collection of data points gathered or recorded over time, with each data point linked to a unique timestamp, is called a time series. Since the data points are usually arranged chronologically, trends, patterns, and behaviors throughout time can be examined. Many disciplines, including signal processing, finance, economics, and environmental research, frequently employ time series data.

Important features of time series data consist of:

Temporal Order: A measurement or observation at a specific time is represented by each data point, which is sorted in a particular order based on time.

Trends: Long-term movements or patterns that can point to underlying changes or developments are commonly seen in time series data.

Seasonality: Some time series data displays repeating patterns or cycles, known as seasonality, which may be influenced by regular, periodic factors like seasons, months, or days of the week.

Irregularity and Noise: Time series data can also contain irregularities and random fluctuations, referred to as noise, making it important to distinguish between true patterns and random variations.

Analyzing time series data involves various techniques, such as statistical methods, machine learning models, and forecasting approaches, to uncover insights, make predictions, or understand underlying dynamics.

Report – Beyond the Headlines: Deep Dive into Police Shootings Data and Patterns

This thorough report, which provides a thoughtful and nuanced analysis of police shooting incidents, is something I’m proud to present. We hope to provide light on trends, demographic differences, and predictive insights through careful data analysis, statistical testing, and machine learning approaches. In order to provide a better understanding of this important topic, this report is a committed attempt to disentangle the complexities surrounding police shootings.

Report_2_MTH522 (1)

Week 9 – November 6 ( Combination Analysis )

A combination analysis of two variables, “armed” and “fleeing,” involves examining how these two categorical variables are related or combined within a dataset. It aims to explore the patterns and associations between the different categories of these variables and may be useful for understanding various aspects of police incidents, such as the behavior of individuals involved.

This is what I am going to do to perform combination analysis for the “armed” and “fleeing” variables:

Data Preparation:
- First, ensure that the data is appropriately cleaned and organized, and that the “armed” and “fleeing” variables are categorical in nature, meaning they represent distinct categories or labels.
Cross-Tabulation (Contingency Table):
- Create a cross-tabulation (also known as a contingency table) that shows the counts or frequencies of each combination of categories. The rows represent the categories of the “armed” variable, and the columns represent the categories of the “fleeing” variable.
- The cells of the table display the count of occurrences for each combination, showing how many incidents fall into each category.
Chi-Square Test of Independence:
- To assess the statistical significance of the association between the “armed” and “fleeing” variables, you can perform a chi-square test of independence. This test evaluates whether there is a significant relationship between the two variables or if their distributions are independent.
- The null hypothesis of the chi-square test is that the two variables are independent, and the alternative hypothesis is that they are dependent.
- If the p-value obtained from the test is below a chosen significance level (e.g., 0.05), you may conclude that there is a significant association between the two variables.

Week 9 – November 3

Heatmaps are a handy tool for showing complicated and big datasets to uncover data patterns, trends, and connections between points. Here’s a simple explanation of heatmaps:

Color Codes: In a heatmap, each square in a grid gets a color based on its value. Darker colors mean higher values, while lighter ones represent lower values. Typically, a color scale goes from cool colors like blue for low values to warm colors like red for high values.
Applications: Heatmaps are versatile and can be used in various fields like data analysis, biology, finance, and more. They’re used for things like:
- Displaying relationships between data points, like which variables are related.
- Showing temperature changes, often on weather maps or in engineering.
- Highlighting areas of high or low risk in a dataset.
Interactive Options: Some heatmaps allow users to interact with them. This means you can hover over squares to see exact values, zoom in, or filter the data.
Sorting with Hierarchy: Heatmaps can be made even more informative by arranging rows and columns based on data similarities using a method called hierarchical clustering. This helps reveal hidden patterns in the data.

Week 9 – 1 Nov

I figured the two main differences between ANOVA and MANOVA:

Dependent Elements:

ANOVA: One continuous dependent variable can be analyzed using an ANOVA. It looks for variations in the means between two or more groups.
When dealing with two or more continuous dependent variables that are frequently associated, MANOVA is utilized. It looks at the links between these dependent variables and examines for variations in averages across several groups.

Analysis Difficulty:

ANOVA: An ANOVA looks at the variation in just one dependent variable since it is a univariate analysis. For every dependent variable, it gives information on group differences independently.
MANOVA: MANOVA is a type of multivariate analysis that lets you examine the total variance of several dependent variables at once. It takes into consideration any interdependencies among the variables and offers a more thorough assessment of group differences.