Hello and welcome back. Today, we'll be talking about drafting and user test plan, which really brings back together quite a few topics that we've been covering in this course on specialization so far. So as it's been a while since we viewed course two videos or course four videos, I really suggest you review a few of them. In particularly, you will see a lot of parallels of designing a user research protocol that we covered in course two. And some of the ethics and consent of information that we also covered in course two. And additionally, more recently in course four the user testing goals formative and summative, the usability lab tour and usability lab examples. Give you some good kind of foundation that we're going to build this lecture on. So the first question that you should be asking yourself as you're drafting the user test plan is what do you hope to learn. Are hoping to make your UI better and get ideas for improving the system? If so, that's more of a formative evaluation. It tends to be more exploratory, the data you collect tend to be more qualitative because you're really interested in the why question than the how can we make this better questions. You typically have less control over what the user does because you'd really like to see them kind in a more natural way interacting with your system. It's typically less formal, and the user tasks are relatively open, it's simply something that you do in the design and prototyping stage. But maybe you really feel like you pretty much finished designing your user interface, and what you'd like to do is figure out how can you show that this user interface works better than another user interface, for example? So here the goal is really evaluation. The work tends to be a lot more quantitative, because you would actually like to have some numbers to compare between two interfaces. You generally have a lot more control over what this or does provide them specific user tasks, and it's generally a lot more formal. Because you really want it to be consistent to give you the kind of data that will allow you to compare two interfaces, or two alternatives, or two approaches. So, this is known as summative evaluation. Now, we've actually talked quite a bit about formative evaluations before when we talked about usability lab testing. So we'll really refer to these a little bit less in this set of slides. But we will be talking a lot more about this idea of summative evaluation, how you draft a user, a test plan for that. Now if you may remember from the usability lab tour. One of the usability testing steps is to plan ahead, and this is really the step that we'll be expanding more in this video. So, first of all why should you even develop a user test plan? Why not just go in there and see how the user does? So first of all, if you're comparing you really want some consistency between sessions. You don't want one user to get a task that is going to take them an hour. And then another user to get a task that's going to take them 30 seconds. Because you're not going to be able to compare between what those users did. You also want to manage your time and get to everything. Getting users into the lab is hard. And you want to let them know, okay, this is going to take you an hour. And if they're taking longer than anticipated for a particular task you may actually want to intervene and say, okay, thank you very much for your feedback. We may want to move on to the next task. You want to really anticipate and prepare for problems, and practice and refine your protocol. And if you don't have everything kind of written out and ready to be read, then you may not be able to actually practice and refine it in that way. And lastly, all these may sound familiar from developing a user research plan. But the last two points are new to this video which are knowing what you're measuring. You don't want to just go in there and have users do some tasks and not collect metrics that are later on going to be important for you. Such as, subjective metrics of usability or figuring out how long and how much time it took them to do a particular task or counting particular errors they had along the way. And if you're really doing a summative evaluation, if you want to show that your user interface works, you need to define upfront, what does that mean? What does success mean for you in terms of your interface? Is it that users are able to complete every tasks in under two minutes? Is is that your user interface out performs another user interface? Maybe something that the participants are currently using in their work? So, it's really important to actually define that so that you can then measure and actually have an answer in the end that tells you, did you use interface before now or not? So the first step of planning is selecting and documenting the following four things. Users and setting, methods and metrics for your goals, tasks and prompts, and researcher roles. So let's go through all of these in more detail. First of all, you want to select your users and your setting. You need to decide, is this a study that you're going to run in the lab? Is this going to be where you invite people to come to the lab and use your system? Or is this something that you're going to be doing in the field? So one variation of that might be some sort of an alpha release of your system that you're giving to a small subset of your eventual users. And just we don't want their feedback in the process, so that is an example of a test study and it's happening in the field, and in the real world. And there might be something else in between, where perhaps the study itself was happening in the users context. So, maybe you want them to try the system in their actual home, but you're not really deploying it and letting them have it for the next three months and the system just works well enough as is. Maybe it's more like you're visiting them in their home and you're watching them as they're using the system. And you really want to figure out how to recruit these representative users. So you don't want to just be user testing with your group of three closest friends. You definitely don't want to be doing user testing with your team because they know the system well. So you really just want to see if you can find a few users to represent each segment of your population for example. If you're doing formative testing, if you're more interested in improving your system, you can get away with a small number of users. So maybe five to seven will find the majority of usability errors in your system, and will really give you good way of figuring out where to go next with your interface. But if you are really conducting a summative evaluation, if you want to show that your system works, and you want to have a quantity data to back that up. I would suggest that exploring a bit more statistics in conducting an actual power analysis to figure out how many users do you need so that if there is in fact an effect of your system that you would be able to see the statistical significance given that number of users. This can be as many as 40 or even 60 users potentially if you're comparing two different systems. You also need to figure out what kind of information do you need to know about your users, that will help you understand their experience later. So, this may include things like basic demographics, how old are they. This may include baseline skills. So let's say that you built a system for graphic designers. Well, you might want to know how many years of experience they have in graphic design and what kinds of technologies they use right now. Because that can actually really influence their data in your findings. The next thing you should think about is selecting your method and metrics. And this is cause me very differently what you are doing in the lab or the field. So if you're doing in the lab, you can do an experiment comparing multiple alternatives. So perhaps you ask each user to use two different interfaces, or maybe you compare a cross users so each users just get to use one interface. But if you invite enough users which you can prepare averages across conditions. Or you can do something more formative like a think aloud qualitative usability study. And we provided you with a few examples about women gave you the tour of the usability lab. In the field, you can also do experiments. You can also ask people to spend some time using multiple alternatives. So for example, for two weeks you ask them to use the system that you have built, and for two weeks you ask them to use the existing system that you're trying to compare against. And you may want to switch the order of this for different participants. You can also do kind of mixed method skill studies. You leave your system out, and you collect lots of data about how its being used. Whether it's just log data, whether it's comparing against some individual baseline. And I will give an example of a field study in another video later when I talk about my own system share table and how we evaluated it. But for now just think about this, what makes more sense for your study? Is it the lab or the field? Then what kinds of information do you actually hope to collect? Now, if you’re interested in kind of quantitative metrics of usability. Understanding how well this system perform on different metrics, I think it's really important to decide what you going to define a success ahead of time. Is it time to learn specific feature? This may be relevant if you're really focusing on a system that somebody might use once or twice. So let's say a system to buy tickets on a rail for the train. Somebody might now use that everyday, they might be new to the city and so it's important for them to be able to learn new features very quickly. It maybe less relevant if what you're building is some sort of a system for experts that they all be trained in and they'll be using every day in their workplace. You might be interested in something like use time for specific tasks. How long is it actually take them to do something? You maybe interested in figuring out which features of your system are use and which features are not. This is particularly valuable from the field study for example. You can really help guide how much effort and resources you put in developing in particular feature if you find that your users don't really value it or use it. It can help you get to error rates. So things like how many times did somebody make a mistake along this process? Your system should be helping the user prevent errors. So if they're making these errors, then maybe there's something different about how your system approaches or about how the user thinks about the system. You can collect subjective measures of user satisfaction. We've discussed one questionnaire. There's lots of questionnaires available out there that ask basic questions usually on a like one to five or one to seven scale. Asking user to describe their experience with the system. But of course, since these are unlikely scales they can be converted to numbers as well. And I think the key and I borrowed this year, is that if you're taking quantitative metrics I think it's really important to be doing some server comparison. Either prior version of the system or an alternative version of the system, otherwise it's really hard to make sense of it. So if I told you that it took me 30 seconds to do a task on my phone, well is that a lot? Is that a little? Without any sort of context it's really hard to actually evaluate that finding. So comparing to something else, if I said it took me 30 seconds to do a task in this new app that I built. And if I use the existing app, it usually takes me two minutes to do the task. Well that sounds a little bit more significant. Specially, if I do that task many, many times in my day it might actually could save me quite a bit of time. In addition to these quantitative metrics are more about the summative evaluation of your system. You may also want to add a few formative aspects to your user test plan as well. So for example, you may actually want to note the trouble spots that people had in completing tasks, rather than just say, okay yeah, they had trouble with this task. You may want to see which features they found or didn't find in the process. You may gather some reactions to design elements or decisions. So for example, we decided to put this feature kind of deep inside the menu structure because we didn't think a lot of people would used it. What do you think about that? You can actually spent some time learning the user's mental models. That's one of the things that could be leading to a lot of error rates, or leading them to take more time with the task. If the way you've structured during the faces not actually match their mental model of what they think the education be structure or not, or what they think the interface is doing. And lastly, qualitative metrics are really good at getting to the why questions. Why are they having these error rates? Why are they using these features? Why are they having these particular use times? And I think that combining the two really gives your study the most strength. So do think about what aspects of your user test plan will be summative and should be gathering quantitative and perhaps even comparisons. And what aspects of your user test will be more formative and we'll be giving you qualitative data about how you can improve your interface in the future. And this is all part of that iterative design process. Well remember what we really done with the interface, you can always improve it a little bit more. The next thing you should think about is this idea of the tasks and prompts that you actually give to user. What instructions will you give? Will you just give them your interface and say, pretend that you were a tourist in a new city and use this as you would, and then you'll follow them around. Or will you give them very specific tasks? Such as, you are a French person visiting the United States for the first time and you're looking to find a particular restaurant nearby that your family friend recommended, use our interface to find this restaurant. So that's a much more concrete task. Now there are some advantages to both. So if you give specific tasks, you can collect metrics of how long took it them to do it. What errors they encountered, where there's a few just let somebody play with your system. You may actually find that they using an expected ways so that they may find certain features that you didn't think more valuable or less valuable. You kind of have this spectrum in terms of the tasks that you choose. You can choose the tasks that you think people will do with your system frequently, and then measure how much time it actually took to do that task. Because saving just a little bit of time off a task that you do three times a day, could actually save a lot of time in the long run. Maybe you'll actually choose tasks that you anticipate being particularly difficult so that you can see does your system succeed in making those tasks easier. Maybe you can choose tasks where you're not really certain about how a person will perform these particular tasks, observing them will give you a lot of information about where to proceed next in the design. And you should really thing about how long the tasks will be. If you're recruiting somebody and you're asking them to do a ten minute task on their phone, that's quite different from putting a system in the field and asking them to use it everyday for a year. As you think about these things, think about how your strategies for recruitment participants, and also use strategies for compensation, if you are asking somebody perhaps, a professional to use your system everyday for a year. You need to compensate them for their time so that it's not a huge burden on them. And lastly, I think it's really important to think about this idea of when the task will be over. I think perhaps this is more important in a lab, but in the field. But, let's say that the participant has finished the task that you set out for them, but they still seem to be messing with the interface. Maybe that's because they don't understand that they have actually completed the task. Will you stop them at that point, or will you let them continue until they actually explicitly say, I am done with this task. So all these things are really important to think about. Overall, for these three concepts, for users and test setting, for methods and metrics, for tasks and prompts, I just say, just make sure that these are consistent with your goals for actually running that user test study. So you want to keep coming back to this idea of, what do we hope to learn from this? And are these question going to help us get there? Are these task going to help us get the information we need? Is this the right metric for the information we need? Is this the right setting for the information we hope to gather? Are these the right users for us to be working with? Now the last thing you want to select is the researcher roles. It can be really hectic to go out and run a usability test study, or any sort of study at all. So it's really nice if you work with your team to figure out who's going to be doing what. So there isn't a tone of confusion and running around on the day of the actual study. So it's nice if one person has the job of sort of greeting, getting consent, facilitating, and debriefing the study and a different set of people are actually observing and collecting data. Because it can be really hard to combine the two. And so it's nice that there's actually a different set of rules for each of the people who are participating in usability study. You may actually decide that different people are going to collect different types of data. So maybe one person is responsible for keeping tracker for long it takes the user to do particular task and another person really focused on things like figuring out something is difficult for them or if they had an error accounting those errors. I think it really helps those kind of a clear definition, clear separation of roles. And I will put that in the new year user study test plan as well. Now also perhaps not part of this but it's also important to consider preparing things like your equipment. For example, are you going to record this or you're going to do eye tracking. Are you going to need to reserve reusability lab, planning for this ahead of time and also your instruments. So, are you going to have specific questionnaires that you give? Are you going to take notes of a structured notes sheet? Preparing all of that including as part of your user study test plan will help you plan, will help you pilot, will help you anticipate potential issues that might arise when you actually do the study to your users. And so that against me to my last point and I will end on this side, because I think that this is just absolutely the thing that separates a good user study from a bad one. Piloting, actually trying it out. Whether you first you try it out with somebody on your team just to see can they get through all the tasks in the time that you have planned for all of those tasks? Maybe then try it out with a friendly user who's not in the formal user study such as a family member of friend. And then lastly, even trying it out with somebody who is kind of an example user which you would include in your study. But not using that data but just using that as a way of refining your study. Almost any mistakes that you make in your user study test plan, anything that you forget in the process can be ameliorated if you actually pilot and find those mistakes ahead of time. So this is the number one thing I really encourage you to do after you design your user study test plan in order to improve it and to make it better. So thank you for joining me in this video. Good luck making your user study test plans.