In one of our previous digital iterations, we noticed that it was obvious to people how to remove a word while starting their own game, they often tapped the wrong button and accidentally removed the wrong word.
One of the changes we made after that specific iteration, was making the font-size bigger, mainly for readability reasons. Consequently, the button to remove a word grew a bit in size (as it was dependent on font-size). However, we wanted to be sure that they are now functioning properly.
Goal: We want to find out how good our buttons to remove a word from the list are. Simple as that.
Method: We devised a pretty specific test for this one: We made a variation of our addWords screen, showing a list of ten words and asking to remove a specific word. When any word is removed, a new list is presented with another assignment. This is repeated ten times, so test users had to remove ten words in total. These were the same words in the same lists for every test user. In every list, the “right” word was in another position, so each position was tested exactly once in one test. Furthermore, to get some information on the influence of screen size & quality, we asked every test user to do the test twice: once on a Samsung Galaxy S3 (4.8 inches, android smartphone) and once on a Mobistel Cynus T1 (4 inches, low-end android smartphone). You can try it out yourself right here. Before reading on, we highly recommend trying it out for yourself, since everything will be way clearer 😉
Rationale: With this accuracy test, we can see how well our buttons are performing expressed in objective numbers: no opinions/think-aloud/questions, just right or wrong. We refrained from letting test users use their own smartphone, and chose to use the same devices for every test. Because accidentally removing the wrong word is super-extra-annoying, we were hoping to get an accuracy rate of at least 95 procent on each device.
Test subjects: We performed 30 tests: 15 test subjects, two tests each, on different smartphones. The test user group consisted of 9 males and 6 females, ranging in age from 18 to 24. All of them are students, which we think isn’t really a problem, because our application is aimed at students. Five out of the fifteen testers are doing a master’s in Computer science, one is doing his first year in engineering. There were two medicine students, a nursing student, three students that are doing an extra bachelor in ER & intensive care (which is a follow-up program of nursing) and a pharmacology student. Yes, we went to another campus 🙂 . Finally, we also tested with a nanotechnology student and a industrial sciences student.
Results: Letting 15 users remove ten words two times results in 300 test samples. However two people were a bit fast and accidentally tapped the “ok” button of the assignment twice, resulting in a failed test sample. Lucky for us, those occurrences weren’t on the same device. We discarded those two samples, leaving us with 298 samples, 149 per device. Obviously we’re not going to talk through all of time one-by-one. So, we’ll give a thorough summary:
- Overall score: in 267 out of the 298 cases, the right word was removed. This accumulates to 89,5 %, falling way short of our goal of 95 %.
- However, for the bigger device, the score was way better: 146/149 (98%), while the smaller device lead to more problems: 121/149 (81%).
- ALL of the 31 errors had a similar nature: instead of the right word being removed, the test subject accidentally removed the word below the right one.
- On the smaller device, we noticed a big correlation between error rate and position in the list: A position higher in the list leads to more errors. To give you an idea of the distribution of errors, we put the errors per position in the list in a nice diagram, and combined it with one of the lists (keep in mind that there were ten different lists, so test subject were never instructed to remove “sardine”. The word for this list was pizza):
- Also remarkable: Guys made 2.333 errors on average, while girls made only 1.66. Oops.
Interpretation: We think these results are quite remarkable. First of all, there is a very obvious difference between devices. While this possibly has to do with screen size, it might also be related to touchscreen quality: Samsung (and, according to some of our test users, apple) apparently know that people aim with the top of their thumb, and that the actual contact is a bit lower, explaining things like touchscreen calibration on the devices of those brands. The low-end phone we used doesn’t do anything like that, explaining the bad results on that device. This also explains why all of our errors were of the same nature: accidentally tapping the button below the one you intended to tap. Conclusion: people are aiming bad, and some devices aren’t making up for that, and some are. In either case, our app should work smoothly. A pretty obvious solution is upsizing the buttons again, or leaving a bit more margin between buttons, without exaggerating.
At first, we were a bit puzzled by the difference caused by position in the list. However, we think it has something to do with the fact that the contact surface between touch screen and thumb is smaller when touching the lower part of the screen. We made two pictures to explain the previous sentence:
Lower part of the screen
Upper part of the screen
As you can see, the contact area is bigger when touching higher parts of the screen, which is also where the “higher” parts of the list were located. It’s kinda hard to incorporate this in our app. Upsizing the buttons will obviously help anyway, and most of the newer smartphones already accommodate the errors made by human hands, so we don’t think we need to do a lot to fix this.
Finally, we don’t want to change anything about the difference between males and females. We’re fine with the fact that women are the more elegant creatures of our species. 😉 On the serious side, this was probably due to some random variation, and we don’t actually believe there is a significant difference in male/female usage of our app 🙂