Will your lunch get you an A?

In this assignment, I chose a dataset on exam scores of students with data such as ethnicity, parental education, lunch preference, whether or not they did test preparation, and the scores on three standardized tests. This is the site from which I found the data, https://www.kaggle.com/datasets/whenamancodes/students-performance-in-exams?resource=download.

The questions I wanted to answer.

Overall did more students pass or fail? Did the ones who completed the test review gain an advantage? Was there any significance between the students’ average test scores and their lunch opportunities, Did students of parents with higher education have an advantage on math exams? Does gender affect writing scores?

To begin solving my first question did the students who did the test review gain an advantage? To begin I added all the tests for each student to get their average of the three tests and if the average was over or equal to 60% that is considered a pass below is a fail. The frequency of the pass-fail was 713 passes and 287 fails. I then compared this to the number of students that completed the test preparation course. Here is that graph.

As we can see of the students who passed their tests around 40% of the students completed their preparation course. and of those who failed around 17% took the preparation course. Now, what does this mean, in my conclusion, I would like to think that 40% who took the test prep most likely scored higher than those who passed without the prep. And those who failed while still completing the prep most likely didn’t fully understand the material.

My next question was if there was any significance between the student’s average test scores and their lunch opportunity. To do so I used the other column I made which was the percentage of the student’s total test average compared to the two lunch options given free/reduced versus standard. Before running my test to see if it was even worth testing I created a notched box plot.

Free/reduced lunch scores are much lower than standard lunch scores with free and reduced peaking at around 97 but only have a Q3 of 70 and standard have a Q3 of around 81. As you can see there is no overlap on the notches between the two lunches and the average test scores for each, so we are good to run the T-Test.

To run the T-test I made two separate arrays of the test scores separated by each lunch and ran them against each other getting a p-value of 3.18*10^-29. Seeing as this is way below 5% I was able to conclude that there is significance between which lunch students ate versus their average test scores.

The next question I wanted to figure out was if there are any big differences between race/ethnicity and the parental education of the students, here is what I found.

I made a bar chart of the race and ethnicity on the x and the different colors represent the education the parents received. Looking at this plot you can see there is not anything crazy high or low for each race, the graph looks like there is a lot more in certain ethnicities like group c but that is just because more people from that ethnicity were sampled whereas in the group A ethnicity there were fewer people sampled in comparison.

Another test I wanted to run was the math scores of students with parents with a master’s degree versus the parents with only some high school. Before doing so again I made a boxplot of all the parent’s education versus math scores and got this.

Again as I predicted the biggest difference was between the master’s degree parents and the same high school with master’s degrees Q3 around 83 and some high schools Q3 around 70. Seeing as there is no overlap in the notches I know it is now okay to run a T-test on these two separately. To do so first I made two separate arrays one of master’s and the other of some high school then ran them against each other getting a p-value of 1.23*10^-34 which like before is way below or alpha value so it is safe to assume there is a statistical significance between having a parent with a master’s degree or only some high school education and the math score the student received. The reason I was interested more in the math scores was that when growing up many kids who are struggling with math or need help go to their parents, so it is my hypothesis that if the student has a higher educated parent they have a better chance of receiving a good grade on their math test which was proven correct.

Another relationship I wanted to see was if there were any similarities in the writing versus reading test scores based on gender. Since these are both numeric values I made a scatter plot to show them versus each other. Here is what I found.

For starters, it is safe to say as writing scores go up so do reading scores, which is a good thing. This means if someone did good on a writing test they most likely also did well on the reading test. The reason I separated this by gender is just to see if there was any major difference in the males verse females when it came to reading or writing scores. and it seems there in fact are some big differences between males verse females. In both reading and writing, there was a larger chunk of females in a higher percentage than there are males.

Now I decided to run a T-test on the data set of males versus females on writing scores and came out with a P-value of 2.96*10^-15 concluding that there is the significance that females tend to do better. One thing I noticed was that this was the highest or closest to 0.05 P-value of all my tests, if I had to guess it is because declaring something like this is significantly based on gender is hard to say whether it is correct or not. In the future running this on a bigger dataset would be beneficial.

Overall by choosing this data set I was able to analyze and predict a lot of new things when it came to students possibly receiving better test scores. I concluded that reviewing the test practice gave the student a much higher chance of receiving a better score than not doing it. I was able to see that free/reduced lunch versus standard lunch also had significance in receiving a better test score. I also ran a test to compare students with parents who had received a master’s degree verse kids with parents of only some high school education and compared those to the student’s scores on the math tests, because often when a student is struggling in math the first person they go to is their parents. I found that there was significance between the two parents’ education and the student’s math scores. I also compared writing versus reading scores based on gender in the form of a scatter plot and was able to conclude that females had higher test scores in both sections of the test, which is what I predicted because women tend to be better writers. In the future, I would like to test gender on writing scores with a bigger dataset, I would like to compare parental education to all test scores individually, I would like to have more columns such as was the student and athlete or possibly build a generator to add in if the student was an athlete at random. I also think adding more to the columns would help prove the statistical tests more right or wrong such as adding in if the student packed lunch.

Leave a Reply

Your email address will not be published. Required fields are marked *