How to Train Your Dinosaur: Building a Smart(er) Chatbot

We set out to address the question, How can we create an engaging experience that allows people to interact with the Field Museum’s newest—and biggest—dinosaur? The result is Message Máximo (fieldmuseum.org/maximo), a chat experience that lets users text or online message Máximo the Titanosaur. Seeking to develop an easy-to-use, friendly, and welcoming interface that creates opportunities for personalized interactions, we placed a particular focus on persona definition, content strategy, and conversational logic. In a discussion of our process, challenges, and learnings as we developed Message Máximo, this session will cover topics that may be relevant to a wide variety of content-heavy projects: identifying content scope, creating interactions that are both engaging and also integrate learning opportunities, and anticipating users’ needs and expectations.

Transcript

Unknown Speaker 00:00
Good afternoon. We're going to jump right in. We have a lot to throw at you for this 30 minute session. Can you hear me okay in the back? Yep. Awesome. All right. I'm Katherine Urich. I am the Social Media Manager at The Field Museum. I'm going to talk a little bit about just some background on Maximo, and then the persona work that we did to inform the Chatbot project.

Unknown Speaker 00:23
And I'm Caitlin Kearney. I'm the Digital Content Manager also at The Field Museum. I'll talk a little bit about our initial strategy and approach to content development for this chatbot

Unknown Speaker 00:34
Hey, I'm Caitlin PICCHIO. I'm a senior user researcher at beyond. But when I was working with Filmmuseum, I was at purple rock scissors agency out of Orlando. And I'm going to chat a little bit about our look at dialogue flow, the Chatbot platform that we used for Maximo, and why we choose it.

Unknown Speaker 00:53
And then of course, the star Maximo contact information in the bottom left,

Unknown Speaker 00:57
we will show this again at the end, you'll have plenty of opportunities to chat with him.

Unknown Speaker 01:02
And we trust that you won't be chatting with him during our presentation. So who is Maximo he is a cast of the world's largest dinosaur, a species of Titanosaur called Pedego. Titan, my era, which stretches 122 feet and our main hall and does a really nice job of filling up this sort of cavernous space. But to make room for Maximo that meant that we had to move to the T Rex. And that was not exactly a popular choice, even though Sue was getting a much bigger, better suite. And that was going to be contextualized within the rest of our Hall of dinosaurs. So, Maximo had big shoes to fill, Sue is the largest and most complete T Rex that's ever been discovered. So she was really important to science. But she was also really important to the Chicago community who maybe grew up seeing Sue or now that their parents are taking their kids going on field trip. There's a lot of nostalgia around this, around this fossil. And you may know that Sue also has a very vibrant Twitter persona. So Sue has been offering up their carnivorous hot takes for about a decade on Twitter. And the success of that account had a lot of internal staff clamoring for Maximo to also get his own Twitter account. But we were concerned about the success that that would see or more likely not see, because we knew that when Sue started, Sue was one of the first accounts for a museum that was out there with a personality 10 years ago, that was pretty novel at the time, and it's really no longer that big of a deal. It's not that it's not a big deal. She's not that surprising anymore. And also, Maximo is a large herbivore. He's sweet, he's gentle, and Twitter is probably a hostile environment that he would not thrive on. So the challenge for ourself, Oh, I totally forgot to advance the slide became, how can we endear Maximo to the public, without setting ourselves up for failure either by copying ourselves or creating something that nobody really wants. So I think before we get into the solution, which spoiler alert was a chat bot, I want to back up, I think it can be easy to get carried away when you're brainstorming, to kind of lose sight of what you're actually trying to solve for. So we sort of tried to orient ourselves with the objectives or marketing colleagues had set out. And we realized that in order to endear visitors to this new dinosaur, and to make this mat, this massive replica, approachable, that we needed to imbue this dinosaur with personality. And we also suspected that people would want to engage with this either by taking selfies with or touching the cast.

Unknown Speaker 03:57
So we knew from Sue who turned out to be sort of a good north star that a name was really meaningful. So Maximo reflects his enormous size. And it's also a nod to his Argentinian roots. And we knew that we needed not just a name, but sort of a backbone. So the persona that we started to flesh out, was informed by what we knew about this species. So maximum was huge, we knew that he was going to be a little slow, but still very wise, this species is 101 million years old. So maximum is not going to be up to date with current trends or technology, but he is very curious and eager to learn. And we quickly realized in creating this persona, that it was a foil to sue essentially. And we knew that if we pursued this sort of foil relationship that we could probably have space for both dinosaurs to be beloved objects in the museum and in visitors hearts. So this early version of the persona that we created felt like enough to sort of let Maximo run wild on social media sort of take this persona for a test drive. So he did two Twitter takeovers that were tied to different museum events. And that allowed us to introduce Maximo to our online audience, not in a promotional way. Look, come see our new Titanosaur. But here's the Titanosaur hear from him yourself. And as you can see from some of the screenshots, it peaks visitors curiosity, so we went a little further with it and Maximo did a week long Instagram takeover, where he went around Chicago to visit some of the institutions and sort of get to know his city and share his reflections. So he went to the Chicago Bears the Shedd Aquarium Museum of Contemporary Art. And again, you can see from the responses that people were really encouraged the comments were really encouraging people were responding in a in a helpful way that felt like they were connecting with this persona. And they were even more importantly asking questions. So these takeovers confirmed that we were headed in the right in the right direction with the persona. And they also validated that this persona was sort of platform agnostic. So that was really important for us to take back to some of these internal stakeholders who had initially really been championing the idea of a Twitter account. And let them know that this can live in a lot of different places. And it also confirmed the importance of consistency in the in the Persona. So to be believable, we realized that we couldn't have inconsistencies. So we really worked on developing a detailed persona document that went in to Maximos, origin, story and background, we sort of fleshed out a dating persona or dating profile, rather talked about his likes, dislikes, hopes, fears, dreams, and even built out a q&a so that people could get a sense of what the tone and style Maximo would be using in a written response. And this would really come to inform the Chatbot content that Caitlin and I worked on, giving us guidelines and sort of serving as a reference when building out Maximos responses. So I'm gonna pass it to Caitlin, who will go into our initial strategy and content around the Chatbot.

Unknown Speaker 07:13
Three channel. So after we sort of figured out who Maximo was, what his personality was, the next big question we wanted to tackle was, what kind of interaction should we build? And what platform should we use? Where should that live? As Catherine mentioned, we were pretty sure we weren't going to create a brand new social media account. One initial idea was to build a chatbot that would run on Facebook Messenger. So we worked with our in house Audience Insights Team to do some like very initial proof of concept user research. So we did a card sorting activity with visitors in the museum, had them rank a variety of different ways that they might want to interact with Maximo. So this included messenger and included texting, it also included some things we knew we probably weren't going to do. But we wanted to get a sense of where people's interests were. So just the kind of very simplified results of this user research. The highest ranked ones were a YouTube series about Maximo, and taking selfies which people were already kind of doing on their own. The lowest ranked turned out to be Facebook Messenger. So that was good information for us to have. And then texting fell kind of somewhere in the middle. So we really debated the platform. We may have gone against this user research in some ways. But we had to consider what was feasible for our team, our resources, and ongoing maintenance. And we knew we weren't going to be able to invest in something like a recurring YouTube series or creating a new computer game about maximum. So texting itself didn't provoke really strong reactions. It's not new, it's not outdated, it's really functional. So as Catherine mentioned, we decided to sort of lean into the personality and we took maybe a little bit of a risk that the personality would be strong enough that the platform wouldn't be quite as important to users and texting. And also online chat would be sort of really accessible, something that people could use, anytime, anywhere. So, audiences we tried to define me as fairly early on in the project. And explores is a category that our advertising team uses it's sort of a predefined group of people 18 to 49, local to the Chicagoland area. Casual learners interested in science like being in the gnome. After that, early user research, we ended up sort of grouping with or without younger kids into that group. And we didn't really set out with kids in mind, but through that user research, we A lot of adult child facilitated experiences. So when adults heard, you know, talk to a dinosaur, they kind of assumed it was something for kids. But we really observed a lot of that multi generational enjoyment. So it was kind of an unexpected benefit, but also a little bit of a challenge in terms of how do we talk to both of these audiences at the same time. Okay, so then getting into the content, what does Maximo talk about? These were some real visitor questions we heard, what was life like for him? Who else lived during the Cretaceous with him? What did he eat even? How did he die? So this started getting into how do we divide define the scope of the content. And unlike something like a Twitter account with a real person behind it, we had to be pretty specific about how we define that scope, because it would determine how much content we ultimately developed and had to maintain into the future. So we knew his knowledge couldn't be unlimited. His persona gave us that guidance of he lived 101 million years ago, he doesn't know a lot of things about modern life. So his knowledge has to be limited in certain ways. We talked about the idea of customer service. And this was really appealing because of the idea that it would make this a more useful tool, so that someone could be in the museum and ask where the nearest bathroom was, or get sort of like tailored recommendations for exhibitions. And we ultimately realized that this wasn't something we could support from a technology standpoint. So we really went for what would make it fun and engaging plus also combine a little bit of our educational mission with science and paleontology. Something we sort of recognized in retrospect was that determining scope really goes hand in hand with user expectations much farther down the road. So it's helpful to think about how you might describe or categorize content as you're getting started.

Unknown Speaker 12:07
So organizing content, currently, we have 15, different topic areas that cover Maximos life a little bit about other dinosaurs, some general museum visit tips, and even conversation, just small talk about the weather or telling a joke, we used air table to organize all of our content. And it's really helpful to decide on a clear naming system from the beginning. This makes it easier to reference what you already have, and also expand that content in the future. And then document your decisions. Content was kind of a long, ongoing process through the duration of developing this. So we came up with some guiding principles along the way. This even came down to like basic sentence structure like we have to answer the question directly first, before we can think about adding in humor. Maximos responses should also help keep the conversation moving. So it's not all on the user, you won't see a lot of yes or no responses from Maximo. He gives a lot of suggestions if you don't know what to ask, or he'll sort of like help redirect the conversation if it comes to a dead end. And now I'm going to pass it over to Caitlin who will talk more about bringing these conversations to life.

Unknown Speaker 13:20
Thank you, Caitlin. So I was part of the team at agency purple rock scissors that helped to bring, as Caitlin said, the intents of the Chatbot into an actual technological solution. So figuring out how does this work with the website? How does this work? For the texting application that field had decided was going to be the best use case for Maximo. So when we, when we started talking about picking a chatbot platform, we really had to consider like what kind of chatbot we're designing for. And this is true when you're designing any type of Chatbot. In general, there's so many different ways you could taxonomically categorize chatbots but one Karlie common ways to think about whether they're conversational or transactional. So like a good transactional chatbot example that everyone may or may not have used is something like a Facebook, Facebook Messenger bot, like Expedia or something along those lines using a chatbot to fill the coffee order something with a really predefined conversion goal is something that's really transactional. And then on the other hand, there are conversational bots that need to handle conversation in nonlinear ways, with questions that may or may not follow each other. And it's this type of chat bot that Maximo really is and I think one of the it's one of the reasons that conversation with Maximo can be kind of so satisfying is because it's not necessarily getting to one particular conversion, but rather, really fostering that idea of discovery that was evident in the early user research that Phil did. So When we decided which chatbot platform really was the one that we really wanted to use, we decided on dialogue flow, which is a Google chatbot product. And it has the conversational flexibility that really allowed for that nonlinear natural conversational experience. Another reason that it was really good, we decided on it was that it had a pretty easy to use content management ability. So other systems that we looked at, were not quite as user friendly from the backend perspective. So having that scalability was important. And so we decided that that was a good was a good reason to use it. And then lastly, our team of our development team was also most pleased with its technical capabilities and ease of implementation into the existing website, since it was going to be both a web bot and a text bot as well. So there are pretty there are three things I think, I'd like to go through just in terms of pretty much what are the what are some guidelines? And in terms of how do we actually execute on a chatbot? Like, what are the technical things to think about that are like natural extensions of the content strategy Katelyn was talking about before. And one of those things we've talked about a little bit already, is framing that conversation upfront. So this ties into that user expectation setting. chatbots are not exempt from this, you know, common user experience heuristic of setting expectations. So when we're in a conversation with someone in the human realm, we generally know what we want out of the conversation. It's just kind of this like, unspoken accord you have with someone where you're talking to someone and you know what you want out of the conversation. And a good conversation usually results when both people get what they want, or like, yeah, it was a great exchange. A good conversation is like an alignment of that. And that's the same is true for chatbots. Once users can know what Maximo or Once users know what Maximo can do, we had the hypothesis that they're going to be much more likely to engage with him successfully. So on the next slide, you can see a little bit of an example from our dialogue flow platform. It's a little small, sorry, but basically saying, showing, you know how maximum was able to frame what he knew when users asked him a question, since he's not able to field every particular question successfully, when you asked, you know, cool, where were you discovered? That was a natural lead in from the framing that he had right away. The user testing that we did throughout the project that we continued, revealed that they didn't know what he knew from his opening statements alone. So when your chatbot just says, Hi, hello, what can I help you with today, that can sometimes not be enough to really spark the rest of the conversation. So because of that data, we decided to include a speech bubble, with a statement about what Maximo knows when users load his exhibition page on the website in order to address that, which will still show in a later slide.

Unknown Speaker 17:59
So then, the second thing that we'd like to we wanted to consider was what happens when a user inevitably asks Maximo something that he doesn't know about, technically, how are we going to address that. And as we can move on with user testing, which we conducted both in the museum with visitors as well, as we're looking back through training data, which dialog flow provides and quantitatively assessing whether a conversation was accurate or not, it became clear that some of these error responses were causing some confusion, frustration. And what I mean by an error response is anytime you ask a chatbot something and it returned, sorry, I didn't get that. Or can you say that again. Because of this, we basically realized there was a really interesting opportunity not just to provide the standard error message, but rather, listen for patterns in what users were saying in order to give them a more personalized experience. And this is all kind of read out this example, as it goes, and for people that can't see, but one of these examples would be, you know, do you do you think that the Cubs will win this year? The maximum doesn't know the answer to that he doesn't necessarily have an opinion on that right now. But he's able to reframe that conversation into something that he does know about by offering what a lot of times we call him chatbot design a happy flow, so redirecting somebody back into a conversation that he can answer. I think the next slide shows it also in the dialogue flow interface. Another question, why is the oh who is the president? He doesn't know. stumped again, stumped. The Cretaceous hadn't yet heard of presidents. That's okay. And then lastly, the last thing we really wanted to do to build on the work that the really great work that field had done crafting the persona of Maximo. As Katherine's already mentioned, Proximus personality like took a lot of strategy Um, the museum is pretty crowded with very charming dinosaurs already. So how were we going to imbue this chatbot with personality and obviously not make it human? Because that's usually where a lot of chatbots fail, right? When you're talking to it, and you realize it's not human, it's like, well, yeah, I knew that. But it's frustrating, right? So how could we use personality as a way to make that that kind of transition easier? Especially in the realm of handling messages and guiding his conversation when people were wrong? Or when they asked him a question he couldn't answer. So this is just a fun example of that in action. Does it say how come you don't like hamburgers? I think is what it is. Yeah. How come you don't like hamburgers maximum is a vegetarian dinosaur. So I think you said what does he say? I can't read it.

Unknown Speaker 20:49
Let us focus on the things we enjoy in life. No, yeah.

Unknown Speaker 20:53
A good a good example of how he his like unique spirit really kind of is able to come through and that was all these guys crafting that. So how did this really how did this do, you know, a last last part of you know, really assessing whether or not a chatbot was successful or not. And I use that in air quotes, was to very systematically assess its progress over time and how it's improving. So what we did was kind of track and the number of conversations from May to October turns out that there were over 7000, initially, in some of the soft testing that we were doing, he had around a 72% accurate response rate. And we were able to improve that with some training that we did between the soft launch and the public launch of the Chatbot, which was really great. The average conversation length turns out to be around six months, six messages, and he really likes talking about food, what he looks like, and how old he is. So those are, those are some hot topics with Maximo. But some of the next steps for this, and you guys feel free to jump in on this. But one challenge with dialogue flows is keeping up with training data, as you might imagine, depending on your traffic to website or, you know, whatever, however you choose to implement your chatbot, you're gonna get a lot of data. So just keeping up with training the Chatbot really ensures that it meets users expectations down the road. But it's not possible to train every single query that comes in, I assure you guys can can agree.

Unknown Speaker 22:26
Yeah. So Catherine, and I have the opportunity to read every single question that comes in. We don't read all of them. But we typically try to fix the longest conversations, because it's more likely that there will be errors in the long conversations. Sometimes people are having like 25 query long conversations with him. But yeah, there's quite a lot of data. So coming in regularly. Yeah.

Unknown Speaker 22:52
So it's a good way to sample that data to train the conversations that will potentially have the biggest effect on improving the machine learning behind his brain. And then in terms of next steps, there's a lot of cool features to dialogue flow on that extend beyond using machine learning to match what users say to intents that you load into the Chatbot. Some of those things are contexts and entities, I won't get too deep into what those are. But contexts are an interesting feature that help Maximo remember previous queries in a conversation entities or something else that allow chatbots to pass in really predefined information to make a conversation more rich. So for example, if a user were to ask Maximo about firms, and then refer a, which a lot of he really likes to talk about firms, and then refers to firms as them some sort of indirect pronoun, by default, maximum won't be able to remember what the user is talking about. But you can kind of configure dialogue flow to do that. And so that allows the conversation to be more natural. And then I think there's also some potential plans for translation of the Chatbot in the future. That was something that has been discussed. So some next steps for where Maximo could grow. Even though he's very old already. There's still room for him to grow.

Unknown Speaker 24:14
So I think that's our part. We have about five minutes if anyone has any questions.

Unknown Speaker 24:22
Yeah. We're using a chat bot right now.

Unknown Speaker 24:30
So wondering how bad are ya, um, I was actually doing a lot of reading into chatbot accessibility recently. I can't really say for sure from a Google perspective, but I just from like an implementation perspective, and feel free to chat with me afterwards or chat or you know about some of the specific things you're, you're thinking about, but in general, a lot of the accessibility guys deadlines that apply to websites can apply to chatbots, too. So like making sure that it's accessible for screen readers. Another common thing is placing it in an area that's really familiar to users. So keeping it in the bottom right hand corner is, isn't oftentimes, like an affordance that people recognize. Those Those things are some things that can really help. But I, there are probably there are way more specific things we could get into. But those are just some high level things. Not so much with with Google's internet, like application, particularly, but some high level things.

Unknown Speaker 25:37
So we are sitting North Carolina in our state library trying to do a chatbot. And what we ran into was lots of teams, messaging with sometimes threatening things and violent things that was that are really on lockdown. And we eventually had to just turn the chat bot off, because they would get as the messages in it. If they couldn't answer it, then it would get quite vivid, and they would personally message with the person to try to help them be an educational tool

Unknown Speaker 26:07
that I didn't have y'all ran into that or that was included in your screening process of messages. You had to fly, and then move it up the level? Yeah,

Unknown Speaker 26:20
we've we've definitely seen more conversations with teens than we expected, I think are

Unknown Speaker 26:26
perceived to be teens based on their language, but maybe adults.

Unknown Speaker 26:31
Yeah, we did shortly after the public launch, we did create a new intent that I think we called like inappropriate or something. And Maximo responds very politely in typical fashion being like, if you'd like to have a nice conversation, and we can talk about these things. I think also just the fact that it is like a dinosaur personality kind of lets people like suspend their disbelief, like a little bit further, like they just don't have the same expectations as they would if they were talking to a person.

Unknown Speaker 27:03
I don't know. Yeah, and for better or for worse, we aren't reviewing these in real time. And people I think, are pretty aware that there's not up a person chatting behind the scenes with them. And so that might help sort of shift those conversations in a different way. But we do. I mean, we see as many sort of just spam conversations, as we see people that are actually seeking to get the experience and the engagement with it. It is the internet.

Unknown Speaker 27:32
We were also surprised that we were getting a lot of conversations where people were just sort of like telling Maximo about their lives, or like what they were doing. And we hadn't planned for that. So we developed some new intents to respond to that. But I don't know, I feel like that only speaks to like, people are interested in having like some, on some level a relationship with a dice.

Unknown Speaker 27:58
And I do think it'd be kind of interesting to look at, you know, the number of text base versus wet web message conversations and see if there's any trends in which ones happen to be more spammy. And I assume that the ones that come in via text would be a little bit more legitimate, if you say, rather than the web where I feel like it might just be like, I'm a kid in between class. I'm sort of wasting time. I don't know. Yeah. That's a great question.

Unknown Speaker 28:25
Do you remember? I don't know if we have that data?

Unknown Speaker 28:28
Yeah, I think there are some just in the way that we had to configure analytics for the platform. There's no I don't think there's a clear way to tell the platform differences right now, since dialogue flow kind of imports the data to both places, but could potentially be something that we figure out in the future, or you know. Well, thank you for your time. Thanks. Thanks.