Patron-Participatory Machine Learning through in-gallery Interactives

Over the course of the Collections as Data: Part to Whole project, Carnegie Museum of Art has not only increased points of access as related to the Teenie Harris collection data, but we are currently expanding our role as stewards by building in-gallery interactives for the public. Significantly, new information gathered from these interactives will then become a part of the collections as data that is then provided back into the community, beginning the cycle all over again. In the fall of 2019, CMOA will open a semi-permanent exhibition and community engagement space in its permanent galleries. This will be a dedicated Teenie Harris gallery space for exhibitions, community relations, and the omni-directional exchange of information with the public. This space will be staffed periodically by “citizen archivists” who will have a public facing presence to aid patrons in research and retrieval of images, as well as collecting image information of the who/where/when/why of Harris images. In addition to prints and gallery panels, these interactives will allow patrons to engage with faceted search, heat maps using GIS technology, personal and family identification using facial recognition technology, and public history using an amalgamation of newly developed programming.

Transcript

Unknown Speaker 00:00
Hi, my name is Charlene fogy Barnett, I'm with the Teenie Harris photo archive in Pittsburgh, PA at the Carnegie Museum of Art. I've been with the archive since 2006. And I started as a volunteer as a community member who actually knew the artist and was photographed by him. And museum was looking for people to help identify the nearly 80,000 images that we had, most of whom were and are still not fully identified. And so as we have progressed, we've been able to collect oral histories. With the help of archivists, such as my colleague, Dominique Lester, we've been able to keep everything in order and to make sure that we have things that are retrievable for a variety of projects, which we'll both describe later. But this photo up here, right here is our very handsome teeny Harris. And we're talking about patron participatory machine learning through in gallery interactives. But I'm going to lay a little quick foundation about teeny. He shot for the Pittsburgh Courier newspaper and as an independent photographer for nearly 40 years, over four decades, between the late 1930s and into the late 1970s. And into the 80s, a little bit. He was known for his ability to capture people in the moment. People who were not only celebrities will show you these photos, but I'll talk about it as we go through, but everyday people. So in a lot of ways, Teenie Harris is actually the historian of the African American community and its experiences in the mid 20th century. As I said, we have nearly 80,000, black and whites that are four by five negatives. And because we retrieve these negatives, and have been able to keep this very healthy size negative, that has allowed us to replicate off this negative. And that's a more digital thing that that domine can explain. But if it weren't for that old speed, graphic, camera size negative, we might not have this collection. It's been through a lot. It wasn't always kept in the greatest condition. But it has sustained all these years when we have 15,000 color photos, and we are still yet to do a lot of the digital tape digitization of the hosts, and we have 750 prints and about 150 oral histories. We've done hours and hours of that. And in night, in 2011 2010 2011 I was hired to do the oral histories as well. So who did teeny shoot? And by the way, this is a classic shot of Teenie Harris on the beat with his speed graphic camera. And no, that's not a cell phone. That's probably a light meter or notepad. He did studio portraits. He's known for his classic, very handsome studio portraits. And he's actually known for that old kind of Halo photo behind him with a lot of subjects have that. So sometimes you'll see Teenie Harris photos that are outside of Pittsburgh information. And that's probably a teeny Harris if it has that halo effect. And then there's the glamour portraits. And as you go through this, you're going to see that teeny shot everything. He was a working man, he wasn't necessarily a wedding photographer. He wasn't just a Publicity Hound. He wasn't on the beat. He wasn't following just the mayor and the celebrities. He was also taking water running down the street, children's party's personal history of people that he was asked to photograph for. And families in this family is kind of dear to me, because that's me and my dad. So. So I've known Teenie for 62 years. He's been gone a while but to me was important to not only African American history that we were participating in as a family, but also in our personal family. We have 250 or so weddings and baptisms and proms and all kinds of things in the photos. That's where I came in to help explain some of what we're seeing where we're seeing it, why we're seeing it. You see a lot of things in imagery, but you don't always know what you're looking at. Right? So there are weddings

Unknown Speaker 04:42
and news photography. This is something that's really amazing about him. He we aren't sure at that time. There were no cell phones and maybe someone told him there's a fire going on over at the Flamingo skating rink. Or maybe he happened upon it. Maybe he saw the smoke coming from another sign At any time he went over, but this is the kind of thing that he was really good at. It's not often that you see someone with major sports figures who can also do that prior type of photography. And of course, entertainers, do you know who this beautiful woman is? I know five of you in this room do. Very good. Well, thank you, Katie. Katie was and several people in the room have been staunch supporters of the Teenie Harris archive over the years. Musicians if you're looking for musicians from that era, Tini has them all the sitting presidents campaigning presidents, and most of the civil rights leaders of the day came to Pittsburgh. Now that's another thing to bear in mind. Pittsburgh, was were teeny shot. He wasn't all over the country shooting. So I think that's very impressive to see what he was able to collect in one region. You know, of course, I hope you know who this this guy is, right? Okay. JFK, the social clubs, the Alpha Kappa Alpha fraternity sorority, the nightlife, another highlight of Teenies career. And if you've seen films like fences or other shows, films, TV shows, TV is used for a huge variety of Film and Television and plays, and even an opera was written about based on Teenies imagery, the LGBTQIA community, we didn't know that he had shot so much, there are a vast array of this community, the entertainers, the lifestyle that they were living at that time. And that was one of the most valuable finds in the collection. And of course, discrimination, and civil rights imagery, crime scenes, catching someone finding out that someone was in their loved one was injured. But he was right there to capture it. Those of us who knew him and lived in the community, had confidence in him because he was one of us. And he was an everyday guy, but he actually became a superstar, he became an icon, because he kept our confidence, he kept our secrets, and he kept us alive. So we're all very grateful for Tini. And through my work with the archive, I've been able to learn curation, collection of oral history presentations, and working, like I said, with authors and film companies and whatnot. So I consider it a privilege to work for this very important archive. nominee.

Unknown Speaker 07:55
Okay. So the main activities that the archive does, it's an archive by title but not necessarily by activity. So these are actually the list of activities that our archive participates in. And the majority of our work are, first and foremost is preservation of the collection. And thankfully, through a tremendous line of support over the years, about 75 to 80% of the collection has actually been item level digitized. The entirety of the collection has already been item level cleaned, rehoused and prepared. So it's all rehoused and prepared, and then about 80% of it has been digitized, and item level catalogued. So that's great. So that means that as an archivist, the physical preservation lift the backlog that a lot of archivists go through is not necessarily something that I have to worry about. Instead, we have the additional add of none of the collection, when it goes cut, when it came to the museum was catalogued, or teeny. Harris did not provide us information with who, what or where, for any of the 80,000 images. So the majority of Sharleen a nice job is to add names, dates, locations, and the story behind the photographs. So we now know and I'll show you a little bit about what we've been doing. We now know that there are a little north of about maybe 325,000 individuals, like we're not in humans, and maybe have been repeated in the collection throughout the 80,000 photographs, and maybe about 125,000 individual people. And without the work of the community without the work of us going through newspapers and whatnot, all of those names and dates might be lost. So a huge part of our job is community engagement, outreach events, activities, going to the senior citizen community events, we do a lot of outreach we probably have somewhere of like, I don't know 65 Plus annual partners a year that we work with, from schools to colleges to community centers to senior center Senior Citizen centers to whomever, anyone who has an interest in Teenie Harris, who may be able to identify their father, their grandfather, their grandmother, their great aunt in the photographs. And then once we work with the community, we get those names, dates, and information intact. We then do exhibition of the materials in the museum, which is the majority of what we're here to talk about, is the results of our collections as data grant. So the majority of what we're doing is one. So if you're not familiar collections as data grant, just as a recap, is an AW, Mellon originated grant that was sub awarded to UNLV. And then UNLV, then sub awarded collections as data sub grants to these five or six institutions as being one of them. This is our team for this project. Myself, Charlene, our colleague who actually just left yesterday evening, unfortunately, Ed Masnick. So if you see him, maybe next year, say hello, he's really fantastic. And our creative technologists, software developer, Samantha Ticknor, we've all been working on this project together for the past almost a year. And what we are doing is two things. One, we are creating a standardization tool. So our data collection, our collections database is a Canadian axial product called emu. And like I mentioned, all 80,000 of our objects have been item level catalog, which means they were cataloged over about 10 years and about three or four major grants. So there were maybe 10, plus different interns, volunteers, staff members, part time staff members, anyone and everyone was coming through and helping the archive, do this cataloging work, which is absolutely amazing. And we're very, very grateful to have such a wealth of information. However, that does mean that all of that information is incredibly unstandardized. And so we've created a tool that will take one full export of our entire data dump, clean it according to rules that we've taught the machine and then be able to create a product that we can then put back into emu, so that names, dates, a lot of this information will be standardized. And I'm going to show you the details. And then the second part of our grant was to create an in gallery interactive, which is I was which is the title of this presentation, that takes all of that cleaned data, and creates an in gallery interactive that our patrons can be in our galleries playing with that is beneficial to them.

Unknown Speaker 12:43
So as I mentioned, the first service that we've created is a cleaning tool. This is a screenshot of what the front end UI looks like. And it basically does these things. It takes our very, very, very long, complicated titles, and through a course of rules that we have taught it. Can you don't laugh, the titles are really long, it's fine. They're no longer really long. Because the tool has gotten really good at understanding parts of speech, we've had to really understand the English language in order to be able to teach the computer parts of speech so that it can abbreviate titles. It can also go through the subject headings that had been provided to it by humans, and then search through the Library of Congress and try to match up what might be appropriate list of names so we can strip out all of those names, no matter if they're free texts, or whatever, and sort them by NPR. So now that we can actually use standardized first names, last names, Plitt, titles of locations, things like that. We're using a bunch of Google Map API's. We are pulling all of the Pittsburgh Courier data as one data dump and matching up those original courier articles with the images that the courier came from back in the 40s 50s, or whatever. And we're also using object recognition, and a few other protocols to implement some type of key wording. So this is what the tool looks like on my end as the archivist, it requires human quality control. So on top of probably a constellation of maybe about 12 tools and libraries, it is spitting out all of these suggestions from dates, locations using GIS data, it spits out a whole lot of information. And then on my end, I have to go through and manually just double check that everything is correct. We never want to export and then import information into our database that hasn't been human quality controlled. So that's what it looks like. And it goes through every single image. There it goes. So that's service one. And then service two. Now that we have all this clean data, we are developing the the permanent team HARRIS gallery, we're very, very, very, very proud and excited that after X number of years, the collection was acquired in 2001, and hasn't been on permanent view in the galleries. So finally, we believe that Teenie Harris is due for a permanent gallery for as prominent and important as the collection is. And we're really excited that this is going to be the space, this is what it will look like when it's up and ready. And in the gallery, well, this is what it will look like, for those who AutoCAD. So in the galleries, as I was saying, our second requirement of our grant was to create an in gallery interactive, that allowed our patrons to engage with this new data in ways that actually was beneficial to them. So as with all projects, there were multiple tries at this. And this was I'll show you our first take on the app that we wanted to create. And it used text maps, the Pittsburgh Courier articles and oral history that we had collected over the past 15 to 20 years. And it created a basically a content board for our visitors to be able to scroll and read. And this is what the layout looks like. So that you would have an iPad, or actually, we have two iPads that will be installed in the gallery. And guests could flip through and read and get deeper information about people career articles, places, titles, stories, by scrolling through this iPad. However, no, this is what it looked like. Ignore the Latin, we were just kind of playing with how it was going to scroll and work.

Unknown Speaker 16:44
So this is what it would do. And they would include oral history files, they would include a compilation of a lot of things. However, our administration, we went back and forth on deciding whether or not a story book was the best use of all this new data. And it was decided that it wasn't we needed to try again, and try harder. So we took a second stab at the app, take two. And we're really excited to have been sent back for a second challenge. Because we think we've came up with something that was a lot better. We went back to the drawing board and decided what mattered to us was what mattered to the patrons that were using it specifically the African American patron community that has historically not been included in the museum. And we wanted to make sure that people in the collection could find their own people better, faster, quicker and more efficiently, more effectively, whatever. So if our job and building this in gallery app was to make sure that people could find people better. And so we decided that a simple, but highly advanced search and browse application was exactly what we needed using all this new clean standardized data. So I know this is really tiny, but for those of you who can read mock up land, this is now what we came up with. And basically, you still have our two iPads in our gallery. And again, ignore the Latin, we're still building content. But someone will be able to browse topics such as the hill districts, such as civil rights, such as whatever, and they will also be able to free type search. And it will scroll through all of our new clean data in order to provide better results. And it will get smarter as I try not to use this example. But it's the best one that works is the brain that powers Amazon, and offers you suggestions and pumps, better suggestions closer to the top, the more that you search, that's what we're building. So the more people use the app, the better it will get at understanding when someone's type civil rights. These are the types of results that have proven to be most effective for the past 100 People who have searched the same word, or if someone's looking for information about the Black Panther Party or the history of housing discrimination, or whatever, these are the types of images that people have the last 100 searches have found effective. So it's going to start pumping those searches to the top. And it will also on another version that we're planning for next year be able to start incorporating those changes more on the fly rather than an export and import of the data. So that's what it does. So we're in October, we are approaching our finish line. I actually added this slide yesterday this morning because I was at a session yesterday that talked about project management and talked about how important it is to always be planning for the end of a project from the beginning of the project and while you're in the project. And so and while we have these plans in place, we don't really talk about them or present them. But they're actually really important about how we plan on closing up this huge project and all of this data. One, we're opening this permanent gallery in January. And we're going to be doing a lot of impact and engagement analysis on how people are actually using the tools that we're putting in the galleries. But also, we'll be presenting a code for lib. So if you'll be in Pittsburgh at Code for lib in March, come to the gallery and see us we'll be giving a tour. And then for any of those who are interested, all of our tools and applications will be open sourced and available on our GitHub repo. The teeny Harris metadata is already available on our GitHub repo. So if this tool is something that might be useful in your institution, everything that we do is open source, and it will be available to everyone else who might be interested. And then we're officially closing and doing documentation of this project and our results in April. And that is it. We have about six minutes for questions, if there are any, but like no pressure, if there are not. Over there's a lot. Okay. Maybe Sue's and then maybe we can just bounce around, and then

Unknown Speaker 21:11
my friend, and then I'll walk around with a microphone for you.

Unknown Speaker 21:13
Bless you. Thank you. Thank you.

Unknown Speaker 21:16
I have two questions, one, which I think is super easy, and one which might be a little bit more complex. You were talking about the order that things go up sort of 100 photos? Are you going to be like, will you tell people? The reason you're seeing these photographs is these are the 100 most successful images, because often what actually appears is not made transparent. And I wondered whether that was part

Unknown Speaker 21:40
of it? Yes. So we are working on. And actually our director was very clear that we want to have a bunch of content around the app of like, why we even got here in the first place, and why we're using machine learning in museum galleries and how so the what you're playing with on this iPad is a direct result. And it will be doing these things because of this experimentation that we've been doing around data.

Unknown Speaker 22:07
That's really great. I have a quick question. I was really interested when you were talking about cleaning up the data, and that you did talk about decolonization of data. And I did just want to hear a little bit more about what that means in this context.

Unknown Speaker 22:22
So that was actually one of the reasons why it took so long to even start this project. So I've been at the museum since 2016. And this is when I came into this job knowing that decolonization of archives is important. And something that we wanted to do, especially here because it was so heavy handed on the titles. But we weren't really sure how in ways that wasn't just my biases. Because everybody comes into something with their own perceptions of how it should be done. And so I personally had to do a lot of thinking and waiting for the right way to do it to come forward. And a lot of that happens with our quality control checks that we're doing. So we're writing a manual right now, that incorporates a significant amount of research on racially conscious, culturally competent, descriptive practices, which I have a TED talk on, if you'd like to go hear me talk about it. I will. But that is something that took a really long time to develop as a personal theory of practice. And it's something that we thought about for years before we started building machines to try and implement something so complicated.

Unknown Speaker 23:41
That would be great and cool. So I'll encourage you to put that up with your documentation. And everything should have been positive.

Unknown Speaker 23:47
Absolutely. That's a good idea. Thank you. There was a question. Sorry. She was first. Sorry.

Unknown Speaker 23:53
I'm holding the mic. Oh. Talking Stick. I have a question about I may have missed this. But are you giving visitors also the opportunity to add data to the correct Yeah, happening in the gallery? Okay, yes.

Unknown Speaker 24:06
So on the mock up, it was a super small mock up. So I don't expect anybody to be able to see it. But yes, it's in there. This popup box is allows people to connect directly if you found your father, your grandfather or somebody and you want to share that information, or you went to Schenley high school and you know that that's the building behind the corner. And that's the black of steroid and whatever. Tell us because otherwise, people names and places remain undocumented. So communicating with me and Charlene is of utmost importance for everything that we do

Unknown Speaker 24:36
they do that within the interface in the gallery or do they have a separate way of communicating?

Unknown Speaker 24:41
So there's the interface in the gallery and right there, they can enter it directly on the iPad. There's also email information on the bottom of every screen if they don't want to give their email through an application. We want to be very conscious of that. So if someone just wants to take the email address and Email us, email us separately. That's available on every screen. And then also, we have probably a bajillion events a year. So anybody at anytime can come and talk to us. Great, thanks. Thank you.

Unknown Speaker 25:16
That question was, because I could just say so. So following on from that, then you know, how like, sort of photos on a Mac will go and like, pull out all the faces? Are you doing that kind of stuff we

Unknown Speaker 25:29
did. So we use a couple of protocols want to that are opens, everything we did is open source. So there's, there's one called Open pose and one called open face that I would highly recommend running concurrently or simultaneously, because that's how we were able to actually count the individual faces and find that there are 80,000 photos and this many humans.

Unknown Speaker 25:49
Okay, cool. And just a follow up question then on that, because a lot of that's open source. But what scares me a little is that you have used Google Maps, you pull in like location data from Google Maps, which comes with a whole lot of licensing restrictions that you're not allowed to store. So that really kind of concerned me.

Unknown Speaker 26:07
So one of the cool things that we did in the very, very, very, very, very beginning is that archives are awesome, as an archivist, so we pulled a ton of old historic Can anyone whether they are sorry, there's still mining maps, from the University of Pittsburgh archive, they have a really excellent digitized historic map collection that has been digitized for years. And it's very, very, very accurate. And so we were able to use a lot of their original maps along with some of the more open historic maps that are available through Google. So this collection is taking place from 1930 to 1960. So everything that we're doing a lot of the streets actually don't even exist anymore. Google Geocoding results. No, no, we can't do that.

Unknown Speaker 26:57
But along those lines, that's where the community comes in. And that's why we are so closely connected with the work of our community. Because we, those of us, especially who've lived in the community know, yes, Google doesn't show that building anymore. But the building in the photo, we know exactly where it was, and you know, and then it moved to this place. And then it went over here. And for the last 12 or 13 years, I've been the facial recognition software that we haven't had, because another problem is that this photographer, this photo data doesn't always work well with African American skin tones. And so, you know, we one of the things that we also need to do is to make sure that we are seeing what Teenie saw, one of the best things he could do was take, as you saw in some of those group photos, make everyone look like they have the same feature content, so that people didn't fade out or bleed out, you know, that's the that was done by him. And our responsibility is make that available for people to have at their at their hand, because they're looking for a woman. Oh, my parents had a wedding. They couldn't afford to get the photos out. They had on this dress and that dress we have 600 on identified weddings, right? So the conversation with the community also narrows down a lot of information and that gets tacked on to our email records. And at what at some point we hope to have all of this available so that you can hear someone talking about this event that event this dress her shoes, whatever. Because without community we couldn't have done this.

Unknown Speaker 28:44
Computers don't replace community engagement. I have bookmarks.

Unknown Speaker 28:51
All of these are the images that we have that are there cleaned up I know are available. We have about 60,000 images online. And we'd like to hear from you if you need to have information and utilize the the archive. Thanks for spending time with us.