How do I calculate if a test like this is statistically significant?
I let people rate how much they like different things on a scale of 1-10. How do I actually tell if people like one thing more than another thing if the sample sizes are different? This is not about any real scientific study, more like a personal test :)
For example, if one thing got voted on 10 times and has an average value of 6.5, and another thing got voted on 6 times and has a 6.1, is the 6.5 thing actually more liked? Or is this small sample size still so random that it could with a high chance go both ways?
I've never done anything like this, if someone could explain it or direct me to the correct key words/links, that would be hugely appreciated :)
I've read up a bit on p-value determination, but I'm not sure what my "null hypothesis" is here actually, numerically. If I'd put it in words I guess my hypothesis would be "this thing is more liked than the other thing", but honestly, it seems like my specific case would be much simpler than all the stuff I'm reading here :D
You could use a few different null hypotheses here. One with minimal assumptions would be that the medians are equal. This can be tested using the Mann-Whitney U test.
https://en.m.wikipedia.org/wiki/Mann–Whitney_U_test