So more information about writing closed ended survey questions, we're going to look into designing ordinal questions. What we're going to do is look at some general guidelines for writing ordinal closed-ended questions. And as a reminder, hopefully know at this point. Or know questions are closed-ended questions where we have responses ranked. Sometimes these are called likert questions and they're basically the bread and butter of the survey question industry. After we don't care necessarily about the nominal part these options, but we're really interested in the degree to which people believe certain things. There are two major types of ordinal questions that are unipolar and bipolar. Let's look at two examples of each. So how successfully do you think this redesign was? Completely successful, very successful, somewhat successful, slightly successful, or not at all successful. This is a unipolar question. Our zero point highlighted in red here is at not at all successful and everything plays off of that. How likely are unlikely are you to visit this site in the future? Very likely, somewhat likely, neither likely or unlikely, somewhat unlikely, very unlikely? This is a great example of a bipolar arrangement for an ordinal question. Here the zero point is in the middle. And that's basically what we mean by unipolar versus bipolar. Where does the zero point of the scale exist? On one end of the scale or in the middle of the scale. Another tip for picking the right ordinal question is to choose an appropriate scale length. In general we keep our ordinal scales between four or five categories. Now, why is that? You can have anywhere from two better than worst than to 100 plus options. But what are the tradeoffs? What do we think about we add more categories to ask in these ordinal questions? So if you add more options, there's a lot of positives that you can. You get more nuance between responses. You are able to have the respond and think more carefully about a difference between kind of like versus really like. You also get more data for later analysis. The more points you have, the more variance you get and that allows for easier analyses of different types. The benefits of fewer options in your response categories, one of course is that it's easier on respondent cognitive load. Respondent's not going to care about what's the difference between a 69 and a 70 on a 100 point scale. They're going to have a much kind of rougher grain granularity for what they care about. The other benefit that you get from having fewer options is that people skip or satisfice questions with too much burden. But if a respondent sees a question where it looks like it's going to be a giant pain to answer that question, they're just going to skip it and move on to the next question. And of course, you don't want that. It's better to have some data than no data. A lot of research has been done about what is the optimal level of granularity to have here. For bipolar scales you want to have between five and seven options. For unipolar scales it makes four to five options are best to have. Over multiple studies these two ranges have been shown to have the best coverage between having all the options you want to be able to actually answer the question and reducing respondent burden. Two common questions that always come up I think when people are writing questions and surveys. One is should we have a mid-point on a bipolar scale? The reason people ask this question is because they want to force an option, right? Neither likely nor unlikely could mean that the person truly is at the middle of a scale, or it could mean they're just trying to skip the question or they don't really know or haven't thought about it. So that midpoint can be problematic for a lot of survey researchers. There's a temptation to skip that midpoint in order to force a choice at one way or the other to have an even numbered scaled basically instead of an odd numbered scale for the bipolar arrangement. Lots of research has been done on on this. Yes, there is some danger to the data if you include that midpoint of that data necessarily not being a well-thought out answer. But in general, the survey methodology has shown that it's better to include it midpoint and it creates a better true sense of what the mean of a response is to a scale like this. A second question that always come up when writing these types of questions is, should we start with positive or negative options? Should I start with very satisfied or very dissatisfied? Again, as you can imagine, a lot of research has been done on this and to sum it all up in a slightly unsatisfactory way it turns out it doesn't matter too much, right? You can start either at the high end or at the low end. What really matters is that you're consistent from question to question. You don't want question one to start with very satisfactory and question two to start with very dis-satisfactory. That's too much cognitive load for your respondent and they're going to mess up because of that. So you can start either positive or negative, just make sure that all your questions start the same way. So I mentioned earlier that it's uncommon that we would have a hundred options in an ordinal scale. One common exception to that is what's known as the feeling thermometer. So this is from the American National Election Study. It's probably the most famous survey that uses a feeling thermometer. And in this case, they ask people to rate how you feel about a certain political figure. Very common, has been asked for many decades and has moved into UX research. I see this question being asked a lot for feelings about a product or about a company or service. This people can understand this. This uses a metaphor basically of a temperature thermometer and people even though they're many options here. More so than you would necessary have with seven point scale. People really do get this and the metaphor helps them to reduce their cognitive burden and it could be a good way to get more points along an ordinal scale. A third tip for writing ordinal questions is choosing direct labels to improve cognition. What do we mean by that? So let's look at this question. To what extent do you agree or disagree that the site was accessible on your mobile phone? Now our construct of interest in the question stem is accessibility. Now, we have a response category here. Very, very typical response category. Strongly agree, agree, neutral, disagree, strongly disagree. These scales are so common, they're almost ubiquitous. We're going to talk about why they're slightly problematic in a second. They're easy to write, you can copy and paste them, that's part of why they're so common, but does this really measure accessibility? A stronger framework for this question would be how accessible or inaccessible was the site on your mobile phone? And then the response categories match that concept of accessibility. Now you can see, both the are addressing the same core issue but the second option is going to be much more understandable for the respondent. And my guess is going to be that it's going to lead to much better data because the response categories are matching the questions now. So here's just a quick example of the mismatch between a question stem and response category. This is a survey I was invited to participate with very recently. It's run by FORESEE and it's for National car company. If I explore what the options look like a little bit. Question one, please rate the options available for navigating this site, one to ten. One is equals poor, excellent equals ten. Plus, a don't know category, 2, 3, 4. I have a couple of issues with this, and this is a great opportunity for us to critique a design. One of the things I really like is that they actually use shading to differentiate the questions from one another. That's a great UX kind of tip to use when you have a whole set of questions in a row like this. A couple of issues that I would be about as a survey designer. One is, this poor to excellent being matched with the construct of actions available for navigating, how well the site layout helps find what you need. Those don't match up very well. I get what they're after, they're after really this ten-point Likert scale where they really want to get a sense of where I fall on this scale and my belief system on this, but it's not matched up very well. How does poor page content loads really match with what I would think about with that actual concept? Two other really quick things just to presage a little bit of another module. One is again they don't have labeled all of the response categories here. It's not too bad you can see where they're trying to save space this was a pop up on a web page and they're trying to be parsimonious with the layout that they have and ask a whole bunch of questions. I can see why they did it, but it's in general it's better to label all of your response categories. And then as we just talked about in the last module they included a don't know category. Is it possible to really not know one of these questions? These are all opinion questions. They're all trying to measure my experience with a site. Do I really not know what my experience was? In this case, I can see why they would add a don't know because maybe it's an ethical consideration they're making, they want to give an out to respondents. But it's unlikely that it's going to lead to good data. And it's not really essential for the types of questions they're going to ask because it's possible for the respondent just to skip a question or to skip the survey entirely. I did not skip the survey, I'm a good survey participant, I actually took it. That brings us back to this trouble with agree and disagree scales. So how much do you agree or disagree with the following statement. Survey questions are more complicated than I thought. Strongly agree, some what agree, neither agree nor disagree, somewhat disagree, strongly disagree. Again, really common framework of response categories has been used for decades. And because of that there has been decades of survey methodology literature showing a couple of issues with these agree/disagree scales. The most important one is that they can increase what is called acquiescence bias. Remember we talked about very early in this modules how respondents subconsciously want to make the survey researcher happy. While if you have basically in agree/disagree scale like this, people want to be agreeable. And there are going to be more likely to choose the agreeable end of the scale than if you were to ask this question in other ways. Researchers specially at Stanford have shown this again and again over multiple studies that agree disagree scales can bias your results because they lead to acquiescence. They can often also increase cognitive burden. Here the agree/disagree scale is fairly straight forward. But we seen another example around accessibility where the concept didn't match what a agree/disagree scale was. That can actually lead to lot of cognitive burden for a respondent. And they may skip the question because they're satisficing their experience. I find this to especially true in matrix questions. Matrix questions are a specific type of ordinal question, where because you have the same response category you can ask multiple questions in a row like this. This is very common, especially when designers are trying to be very careful about the space that they use, or it's very easy to write these types of questions. I would prefer that you be very, very parsimonious, or very cheap about how you use these matrix questions. Matrix questions are problematic to me because they really depend on the respondent picking up the cognitive load. And they very much are kind of just a lazy way of writing questions. Now, yes, there are reasons to include matrix. I'm not saying never use them because they really are great for space. And sometimes, it really is easier depending on the type of question that you're asking. But in general, if a matrix question is to be overused and especially with these agree disagree scales, they can be really problematic for getting that concepts that you're after. Tip four, for writing ordinal questions is to choose specific metrics for your questions rather than vague quantifiers when possible. What do we mean by that? Let's go back to some of my research that I had done on Facebook earlier in my career. How often do you visit Facebook? Regularly, occasionally, rarely, or never. This is a fairly reasonable ordinal scale, trying to get at some measurement of time. And it's very common that survey designers will use a vague set of categories like this. Because we've already talked about how people's memories for time isn't very good, right? If I ask, how often you looked in your rear-view mirror of your car a month ago. You're going to have a hard time recalling that kinds of information. So people often use this as a crutch, they're going to use these vague categories. The problem becomes is that everybody is going to interpret these very differently, especially for something as kind of syncretic as Facebook use. What does regularly use Facebook mean to me is going to be very different than what regular using Facebook means to you? And that's going to lead to error in the data that might be systematic. A better way to ask this question, in the past seven days, how often have you looked at Facebook? Several times per day, at least once per day, every few days, not at all. Now, this is still an imperfect question. I'm not trying to send this up as a great example of a way to ask this question, but you see a couple of better choices than the last option. First is to restrain the period of memory to seven days, which is much more memorable that it vague sense of time overall. Then we also constraint to how often have you looked at Facebook, which creates a more universal time frame. And then we provide an actual set of response options that establishes for all users these same options that they would respond to. So every response is going to know what several times per day means as supposed as to some other options that we looked at previously. A fifth tip for writing ordinal questions is to provide a balanced scale, where categories are relatively equal distances apart conceptually. Especially for bipolar scales, this is really important. So let's look at a bad example, how would you rate the quality of your experience on this site? Excellent, very good, good, fair, and poor. This seems pretty typical so what's wrong with this? Well, the problem becomes is excellent, very good and good are three of the options in a five point scale. That means for instance if you're going to collapse good responses later in analysis. You are biasing your results because you don't have a true zero point for this scale, right? This is actually a bipolar scale and you have not set a zero point necessarily. A better way to ask this would be excellent, very good, fair, poor, very poor. Here, you have much more symmetry between either end of the spectrum and then allow for better responses from your respondent. The sixth tip for your writing your ordinal questions is to label all of your response options. We saw that example of my National car survey. In general, when possible, you want to label these. So how do you rate the quality of your experiences on the site? Excellent 2, 3, 4, very poor. We often do this to save space especially in that horizontal format that we looked at previously. But lots and lots of studies have looked at these types of response categories and found that if you label these and have labels for every response category you get higher reliability, validity and a better respondent experience overall. Multiple studies have shown that taking away labels from response categories makes your data worse. Now sometimes maybe it's worth it if you're really trying to save space and that's so important for the construction of your survey. But you should be very aware of the choices that you're making, when you remove response categories from your set of options like this. A seventh tip for writing your ordinal questions is to provide scales that approximated the actual values of the population. What does it mean? It means if you're going to have a set of ranked ordered options for your respondent you should make sure that that actually reflects what's going on in the population. So an example that I can't have asked in my previous research about how many Facebook friends do you have? 1,500 or more? Between 500 and 1,500? Between 250 and 500? Between 100 and 250? Fewer than 100? Now how would we know what's the right balance for this. You would have to kind of know the histogram or the distribution of Facebook friends across all Facebook users. When we first asked this question we were very wrong. Our categories were pretty bad, and we ended up getting useless data, because everybody said fewer than 200. We thought more people had more friends than that. In general, the average number of Facebook friends for people is actually between 250 and 500. So allowing for a pre-data collection that allows you to actually know what the population averages look like helps you to construct better scales when you're putting these things together. So some summarized points when thinking about writing your ordinal scales. Make sure that you're thinking carefully and choosing between unipolar and bipolar scales. Both can be great but they lead to very different types of data and you want to make sure that you're making a conscious choice about which one that you're picking. Make sure that you use an appropriate scale length, how many points? For unipolar scales you're looking at four to five options. For bipolar scales, five to seven. Do include a midpoint in your scales, especially when there's a pole at either end of them. Make sure to use direct labels. The agree, disagree scale I'm not going to say never use it but be careful about over using it. Think about how do you match the labels in your response categories to the actual concepts that you've operationalized in your question stem. Use specific metrics rather than vague quantifiers. Provide a balanced scale. Make sure that you're not introducing bias in a subconscious way by having a scale that's not balanced between all of your different categories. Make sure to label all of your response categories, it just reduces the burden for your respondents and gets you better data. And provide scales that approximate your population values, which may require a little bit of footwork upfront to make sure that you actually know what that approximation should be. Or no questions, again, those are the core of so many of the surveys that you're going to ask, and a great way to get opinion data and to get a sense of how people feel about different things in the UX world. In the next set of modules that we do, we're going to talk a little bit more about how do we construct questions from an actual design perspective, and what does design mean for writing our survey questions.