Hello, and welcome back. So we've been talking a lot about usability studies that occur in the lab or another controlled setting. But there are many contexts where you may actually want to run your study in a field setting. So I'll go through the process and what's involved in planning a field study. And I'll actually do this by example. I'll use one of the systems that I've built that I've talked about in this course before, the ShareTable system. And I'll talk about the deployment, the field study of the system as one example of how you might want to go about this work. So as I mentioned, I've already spoken about this ShareTable before, but if you're looking at videos out of order or you're just joining us now, the ShareTable is a system that helps parents and kids that little parts stay connected. So commonly parents and kids may be living apart because of divorce, separation, military, travel. And the idea of it is it's actually kind of a piece of furniture because the hope is to make it as easy to use as possible. There's no buttons or keyboard in this system. To start the connection you just open a set of cabinet doors. That makes the table on the other side ring and if somebody there opens it, it initiates the system. Now what it does is it combines something like video chat, so on the monitor you'd see the other person's face, with a ShareTable top space. The way the ShareTable top space is created is that there's a camera and a projector above each table. It captures what's happening on one table and projects it on top of the other. So in the image here, you see a paper board game being placed on one side is being projected on the other side, and as long as both people have something to use as tokens on their sides, the tokens will be projected back on to the paper board game. So, this is meant to support a lot of activities between parents and kids, because it's kind of boring to just talk to each other, and it's a lot more fun to do something together. So whether it's playing a board game like the example here or if a child puts a book on the table, the parent can see it. Or a worksheet from school, they can help with homework. Or if they want to have a tea party, they can grab some cups from the kitchen and now they're having a tea party. So we built this system, and we really wanted to see how it would be used by families that are actually living apart, that are actually facing divorce. So we wanted to run a field study and we had quite a few goals, some of these were summative and some of these are formative. On the summative side, we wanted to see whether parents and children would spend more more time communicating when they had the ShareTable versus a baseline comparison, versus what they did before we gave them the ShareTable system. We also wanted to see whether children initiated more connections with the ShareTable vs baseline because one of our goals was to make the system so easy to use that it didn't require this sort of sophisticated scheduling or setting settings or dealing buddy lists that young children may not be comfortable with. So we really were hoping to see that they would initiate more of these connections. We also wanted to use a few different validated metrics, validated questionnaires that other people have developed to see how the ShareTable compared with baseline in terms of emotional costs and benefits. And in terms of how it actually affected the relationship between different family members, both between the parents and children, but also between the different parents in say divorced family. We also had a few formative goals, because we really wanted to know where to go next with the system. We wanted to see what kinds of activities the parents and children do with the ShareTable. We had some ideas in mind. We had done a brief lab study to understand kind of how well it would work for different activities. But we really didn't know what would happen once we would leave the system at home with parents and kids and just let them do whatever they wanted to do with it. We also wanted to know what worked and didn't work, generally for the family, so we could work to improve the system in the future. And lastly, since I'm coming at this from a research perspective, not necessarily from an industry perspective. I also wanted to see, what were the promising next research challenges? So it could actually inform my program of research, my program of study, in the future. So the first thing we did is we wanted to recruit some users. And we really only had two prototypes of the system. So we could only run two households, or one family at a time. And we wanted to be very careful with our recruiting. So in our recruitment calls, which we kind of had a flyer. We had a few posts online about these. We were really looking for families that were divorced or separated with at least one child between the ages of seven and eleven. So really, older than that kids can deal with regular Skype and it's not as much of a problem, and younger than that we weren't sure if it was going to work, though we did have a few kind of younger siblings end up using it and it worked fairly well. But we weren't sure in the process. We also wanted to be able to interview the children about their experience. And I don't know if you've ever tried interviewing a child younger than seven, but it's definitely more of a challenge. The other requirement for us was that the homes, both the households needed to be at most a two hour drive from Atlanta, because we actually had to deliver the system and set it up in their home. And we were going back every week to do interviews. And so it would really have been unreasonable for the research team to drive more than two hours to visit the families. But we still wanted the families to be at least one hour drive from each other. Because if they, let's say, live on the same street already, we're not really sure if the ShareTable really adds anything. They can see each other in person almost every day. And lastly, was a requirement that we added after we had a few families reach out to us who were interested but we couldn't actually set up the system in their home. The last requirement was that high-speed internet had to be available in that area. Now the home didn't have to high-speed internet, we actually provided it for them. But if there was no availability for it at all our system just wouldn't work. And there were in fact a few rural areas and the study was done in Georgia. A few rural areas in Georgia where a high-speed internet was just not available at all. Now in addition to these criteria that we actually stated up front, there were a few kind of self-selection criteria. So basically, the families actually had to be willing to participate in a fairly significant study. Because it was an 8-week study that involves significant data collection, so it can be kind of a bit of a privacy violation. And it included weekly interviews, so it was quite a bit of a time commitment on the participant's part. Even though we compensated, really for lots of busy families it's really hard to find the time to dedicate to this kind of a study. And the other self-selection criteria was that the families had to be low-conflict enough that both the parents actually could agree to participate in the study. So we weren't really looking at families where like every form of contact had to be litigated by a judge. I think it would be too difficult to try to deploy this kind of a system in that setting. So in lots of cases the families just kind of self-selected, nobody would volunteer for the study if they didn't meet these two self-selection criteria. But, these were quite restrictive in terms of users. There was quite a challenge to actually recruit users for this study. And in the end we actually ended up having to go through a professional recruitment firm to do the recruiting for us. Because it was just, so many constrains, so many restraints on the kinds of families that could be in the study. But through this process we were able to recruit two families, so four different households to use this system for the amount of time that we wanted. So the setting as I mentioned this was done in the the field. So the ShareTables were actually deployed in families homes, so you see them here. In some cases, the parents choose to put them in kids room, in some cases they put them in kind of in a more communal family space. So the one labeled B is in the living room and as you see the cat has already appropriated the top shelf of the system. And the bottom one D of is kind of in the den, the hang out spot in the house. So it was really up to the families where they wanted to put the system. And we made sure that we brought enough extension cables and set up the internet in such a way that they could put it wherever they wanted. Now to make sure that our system worked well we did provide business class internet for each of our families. And we did use our own router just so we can do a little bit of computer science magic, make sure everything worked well. And we really needed business class internet because of the upload speeds. So typically download speeds were fine, but upload speeds are really restricted for residential internet and it just wouldn't work as well for us. Now, in terms of the actual methods. So what we wanted to do is a field what's called A-B-A deployment. So, A-B-A means that You measure some sort of a baseline for some period of time, that's the A part. Then you do your intervention, and you keep it there for some period of time, that's the B part. And then, after you remove the intervention, you do some measures as well, so the A part. And if your intervention is the thing that led to the change, what you would expect is that there would be a change during the B condition, but then that things would return to the way they were in the second A condition. In this case, our first A was a pre-deployment two-week baseline where we asked the parents and kids to just do what they usually do except keep communication diaries about any time that they used the telephone or Skype or texted or anything like that. Then we deployed the ShareTable for four weeks and we collected a bunch of measures during that, and finally we did a post-deployment two week contact, again, to see what happened after the ShareTable was removed. Now, in one of the two families, that was actually abbreviated, they had a new baby, a new addition to the family, and so it just became really hard for them to also do communication diaries, and this does happen in field studies. Over a long period of time there may be major changes in the families, or the homes, or the participants that you're studying and you need to know that there may be some attrition. So in terms of tasks and prompts, because we actually wanted them to use, they would use the system if they had bought it if nobody was actually telling them how to use it, we did not provide them with any tasks, though we did give them a manual, which had some ideas for some ways they might use the system. However, the families were asked to fill out short diaries after any communication session. So whether it's phone, text, video chat, ShareTable, any time that they communicated during any of the sessions in our study, AB or A, they had to fill out a short diary about it. In terms of the metrics we collected, so during the pre-deployment two week baseline, we collected baseline measures of relationship quality. So we used the NRI inventory for that, it's a validated psychological inventory. We collected these communication diaries that I just mentioned from both the parents and the children and we did weekly interviews just to kind of see what their experience was like. Then during the four week ShareTable deployment, we continued collecting communication diaries, but now that we've actually put our own system in place, we were also able to collect text logs any time the system was in use, so person A tried to call person B but person B didn't pick up, that would be something we could record in the logs. And anytime the system was actually actively in use, so both the people were actually trying to chat, we would record video of the system use. Now, to make sure to protect participant privacy, all of that video was actually stored locally on the machine, and when we visited the families on the communication areas they could note, I don't want you to watch this video, can you delete this without watching it, and in front of them, we would delete it. Now that only happened in a couple of cases, so generally we have video records of everything the families did with the systems. And again, we did weekly interviews to understand what the experience was like for them. And then after the deployment, we continued to collect the communication diaries, we did weekly interviews, now we did the post measure of relationship quality to see if the system had actually changed their relationships. And we also asked them to fill out a validated questionnaire called effective benefits and costs of communication technologies, which focused on comparing different systems based on the emotional costs. The emotional costs might be something like a feeling of obligation or loss of privacy. And emotional benefits, which may be things like feeling closer or having that sense of the other person being there for you, even if you're not there in person. So we collect all of these things, and what I'm showing on the image here on the right is kind of the folder that I have with all of the different scripts and protocols and questionnaires that we had them fill out and diaries and all of the stuff. So as you see, a lot actually goes into a field study, you have to have a lot of these things prepared ahead of time. You have to be pretty organized, you want to make sure that you know what's going to happen every single week, so every time you would go there to do a weekly interview you would have a different script, and you would want to make sure to bring the right one. You want to make sure that you have them fill out the right questionnaires at the right time because if too much time passes, for example, they might forget what it's like to use the system. So, overall, quite a lot of work went into actually organizing this kind of study. And lastly, I just wanted to show you an example of the diary, the communication diaries that we used. So we actually had two different versions, so one was a version that we used with the parents, and one was the version we used with the kids, I think you can guess which one is which. So basically the kids had to circle when they talked, how they talked, so in this case, the child has circled the phone. They could also draw something that was different from that, they have to say how they felt after the talking. So in this case, the child drew, so I know it's a little bit hard to read, but that's why we also did interviews so we could interpret these. That's excited, that's what it says, and the child also circled the topics that they talked about. So in this case, they talked about how they felt, and they also talked about earrings, so I think the girl had just gotten her ears pierced, so she drew the topic. And then the parents had something similar, though they probably did less drawing, so they said the time that they communicated the approximate length of the session, the date, how they communicated, and what it was about, and how they felt about it. And also, as you see here, there's this check mark box for if they didn't want us to actually view the recording videos of that session, they could mark that, so we got through quite a few of these diaries throughout the study. Now, the point of this is to really just give you an example of what goes into the field study, so I don't want to go too much into the results, but I feel like if I didn't talk a little bit about the results, it would just be kind of too much of a cliffhanger. So let me give you just a little bit about what happened once we deployed the system. So this is the results of the pre-deployment, so what you're seeing is that for both of the families, there's generally fairly little talking. So in a week, they were averaging, one family was at five and the other one was about at 11 or 12 minutes of communication per week. So this was mostly very, very short phone calls, and none of them really used video chat regularly, it was just too hard to set up, and so, telephone was really the technology they used the most. And, with the pie charts, what you see is what proportion of the sessions were actually initiated by the child. So in the first family, it was actually all initiated by the parent, the child didn't initiate any of the sessions. In the second family, I think that pie slice represents one of the sessions was initiated by the child, the rest were initiated by the parents. And so, then we compared that to what happened after we deployed the ShareTable, what happened with the ShareTable, and what we see is that the amount of time spent communicating each week actually doubled, more than doubled, actually, for both the families, and the children were initiating a lot more of the communication sessions, though just because of kind of the social practices in the first family, it was still very parent driven. Their rule was that they kind of had to coordinate by phone before they could use the ShareTable, and so, it was still mostly the parents doing the communication, but in the second family, it actually shifted so the kids were actually doing more initiating than the parents were and that was interesting to see. Now the other thing I want you to note on here is if you look at week three for family one and week four for family two, these were the weeks that we actually deployed the ShareTable system, and you see that there's kind of, it looks like a pretty tall spike for those weeks, and this is what we call the novelty effect. So we just gave them this cool new toy and they wanted to use it every day, and they wanted to figure out how it works and they were doing all these exciting things with it. But then you see kind of a leveling off, so, in fact, this dip that we saw in week four and five, and for one family, week five, for the other family, is them trying to figure out, okay, well, we tried all the kind of cool, quirky things that we can do with the system, how do we actually now make it work for our family, make it work for our relationships? And that took a little bit of figuring out for both families. We actually see a dip in use, and then you see a plateauing as they figure it out, they now have their practices around it, they decide that they're going to use it once a week for a certain amount of time, or whatever practice they come up with, and you see that leveling off again. And this is really cool because you can't see this in a lab study. You could only see this if you deploy a system in the field for a longer period of time. And so, the last element that I want to talk about a little bit is the researcher rules, because it's also part of a user test plan. And in this case it took a fairly large team to actually run this deployment. So there was a research lead, in this case, this was me. The research lead was in charge of doing things like recruitment, and getting consent from the people that were recruited, the weekly interviews with children, and just served as a point of contact for all the issues. If something didn't work, if there were some questions that the participants had, those went to the research lead. There was also a research apprentice. In this case, my wonderful Masters student at the time, Sanica. And you can see more of her work in the paper that I'll reference later. So, the research apprentice handled things like the weekly interviews with parents, so that we could actually conduct all the interviews in parallel, so we didn't have to have a really long visit. And also collecting and cleaning all the log data that was coming through the system. Tech support lead, very important, so doing things like doing setting up the share table, making sure that if there was some sort of a problem that it was addressed. Also, just kind of keeping an eye on the logs, and if they see something like there were a lot of sessions where somebody tried calling and nobody picked up on the other side. Was that in fact, because nobody was picking up on the other side, or was there some sort of a technical problem that actually needs to be addressed with the system? And this is a research prototype, so it was actually not that robust, and we had to do quite a bit of babysitting to make sure that the system would work when participants needs it. And there were still quite a few situations where there were tech troubles that we had to address kind of on the fly. And lastly, but not least, we also had to have helpers to actually just set up, take down and transport the system because it was actually a piece of furniture. And so, every time we needed to bring it somewhere, we needed to rent a pick up truck and actually get it there. So we had two, three helpers from the lab who also assisted us with those tasks. So that's a fairly significant research team just to run a study with two families, four households. However, it's still worth it to do these field studies, and I think the main reason is that you really want to get beyond those novelty effects. I question every study which gives a new technology to people and ask them how much they like it. Well yeah, of course they like it, it's new, it's interesting. But are they going to still find it as fascinating four weeks from now where they've had a change to use it for awhile and they're seeing all the ways that it may or may not work for them? It also leads you to have just more ecologically valid use and comparisons. So the families weren't using these for tasks that we made up for them. They were using these in ways that force them to find value in the system for themselves. And so we can actually make a very clear comparison between the use of the share table system during the deployment, versus the use of phone system before the deployment. And lastly, and I think this is also really important, at the end it led to very well informed feedback from our users. This wasn't just the system that they saw a picture of, or that they kind of played around with for five minutes. This was something that was part of their lives for a whole month. And so they were really able to tell us what worked, what didn't work, and what they would do differently with it. So it led to a lot of interesting research questions for me to explore in the future, and I think that was a really a valuable side of running a field study. Now of course, it's not all positive, so there's cautions as well. And, one of these I've kind of been going through the entire time is cost, time, recruitment struggles. It's hard to do studies that last this long. They cost a lot of money, they might have to have a very large research team actually working on it. I also kind of mentioned system robustness. So the system has to work pretty well if participants are using it without you there. It has to work pretty much every time. It is a research system, so there's some patience on a participant's part if it doesn't work, but mostly you want it to be a fairly independent thing that can be done without you being there to hand hold the system. And the last one is called demand characteristics. So what this refers to is this idea that participants generally want to be good participants. So if they have a sense of the kind of data that you're hoping to get, they may actually kind of try to lean towards providing you with that kind of data. And they're not doing this to mess up your science, but they can see that you put all this effort into building the system, and they really do hope that it works out for you too. And this is quite common in all sorts of studies, but I think in field studies it may be potentially even more prevalent because you actually end up developing a relationship with your participants. In this case we were in their homes every week for two months asking them questions about their communication practices. So they got to know us, and they wanted the system to succeed for us, potentially. And so, if I had just asked them how much they liked the system, I don't know if I would trust that data, because I think it would be very hard from them to say well, actually, it kind of sucked and we didn't want to use it at all. And so we relied more on the quantitative data about the amount they used it, or the kinds of activities they did with it, rather than just saying well did you like it better than the film. So those are the main cautions for field studies. I do hope at some point you do get a chance to do a field study for any system that you may be building or testing. I think it's really an interesting experience, and it's a much richer evaluation than just running a study in the lab. So for more information, now I kind of just hinted at some of the data that we got from the share table deployment. I didn't really talk about the relationship quality, I didn't talk about the emotional benefits and costs. So you can find all of this in the paper that we published on this study. So check it out if you want to know more, and just know more about the whole process and the team. And, I also really like a research paper that kind of talks about this idea of demand characteristics and field trials in a lot more detail, and provides a lot more data about why this happens, how it happens, and what you can do about it. So that's another paper I suggest you maybe check out, and we'll link all of these in the resources. So thank you for sticking around for a very long video, that's all I have for you today and I hope to see you next time.