Tags, Art, and AI. Oh My.

In 2018, The Met added subject keyword tags to 300,000 artworks in their online collection. The goal of the project was to improve search and discovery of the collection, increase user engagement, and provide a new access point around depicted subject matter. The keyword tags have also opened up the collection to new types of exploration with artificial intelligence (AI). The Met recently collaborated with the Wikimedia community using the keyword dataset to explore the use of AI around the museum’s collection. The keywords were connected to Wikidata terms which predicted tags for artworks the machine learning (ML) model had never seen. In many cases depicted subject matter was recognized with accuracy, but results were still mixed. With its global network of skilled volunteers, the Wikimedia community was able to add the additional element of human judgment in reviewing tags generated by AI. An analysis of both machine and human-generated tags revealed issues around accuracy, completeness, relevance and bias, highlighting the challenges in describing subject matter depicted in art. This session will examine the tagging process, discuss the collaboration with the Wikipedia community, and identify areas where AI models succeed and fail.

Transcript

Unknown Speaker 00:00
Good afternoon, everyone. Thanks for coming to tags are an AI Oh my, my name is Jeannie Choi. I'm the general manager of collection information at the Metropolitan Museum. This is my colleague Elena via space assistant professor at Pratt and Andrew Lee, Wikimedia strategist and ignite superstar. We're each going to talk and I'm going to kick things off to talk about a tagging project we completed at the Met last year, which was adding subject keyword tags to our online collection. And the goals of the project where we wanted to increase user engagement, improve search and discovery of the collection, make the collection accessible to the widest possible audience. And we wanted to experiment with using our tags as training data for AI models. And we used humans to do this tagging work, we hired a vendor. And the process generally was we drafted a taxonomy our taxonomy is about 1000 terms, and we're constantly we're adding to it frequently. Let's put it that way as we acquire new objects. And as we review our records, we selected an outside vendor we trained that vendor and their team. We limited to single judgments often in these sorts of tagging projects, you will get multiple judgments but because of the complications of art, single judgments will not work because there's never a single right answer for art. So, and we had weekly calls and data review with the vendor. We then imported all the tags into our collections management system, and we're constantly doing ongoing review. These are some of the overall stats on your 1000 tags. As I mentioned earlier, 233 objects have are tagged our top tag is men. Surprise, surprise. Women 38,000, portraits 35,000 Flowers 20,000 and some fun facts even though men are our top tag. female nudes outnumber male nudes. We have. We have 3000 dogs and we only have 600 cats. This is the tag distribution. So it's all about the longtail 90% of our tags have less than 1000 occurrences. 60% of our tags have less than 100 occurrences. So, men portraits, athletes, we have a lot of athletes because we have a large collection of baseball cards. So that's what skews out. But most of it is the longtail and actually some of the most interesting terms are in the longtail so we as we were reviewing these tags, we noticed some issues that cropped up. And I'm gonna go through some of the things we noticed the one first thing we noticed is completeness. So this is in an embroidery and it's a circus so it was tagged circus that was the only term it was tagged with, but there are also tigers, acrobats, dogs, monkeys, horses, elephants, snakes, men and women, none of those terms were tagged by our binder. So a lot of we found a lot of issues where they did not you know, add all that relevant tax. This is another example it was tagged women, but for some reason they didn't tag the horse, which is fairly prominent. Another issue we found was accuracy. So these are just three examples. So this is a cathedral. And it was tagged me Rob, which is a prayer niche often found in mosques. This was tagged crucifixion, it's actually a kneeling saint. And the object on the right was tag lighting. And I could you can sort of see why it was tagged with lighting. It's actually a hat on a stand. And it's that was actually in the title it said man's hat but it was tagged lighting. So I always go into when I find these errors, I go into slight panic and I think what else is wrong? But luckily, we don't have too many of these really inaccurate tags. The other issue we found was subjectivity. So we have we had we included terms in our taxonomy related to human emotions. And this all these images were tagged with the same term. Does anyone want to try to guess what is tag? Fear? Good. Yeah, these were all tagged fear. This is a ghost This is a person encountered by countered with a ghost that's a baby portrait. This is a Austrian New Year's card of a woman standing with pigs. I don't I think she looks more adoring than me Fear.

Unknown Speaker 05:01
But this is this is very this is where you get into subjectivity. I didn't think they were fear. So actually, them. And we have we have we very, we use these terms very limitedly. Because, because it is so subjective. But the most common issue we found is relevance. So we told the team to tag what they see. And you know, people interpreted that differently. So, again, anyone want to guess what the term was used? What term was used for all these images? Close tree. So, tree was tagged in each of these images, which there is a tree and each of these images, I wouldn't say it was relevant to these images. So we remove that. It's, you know, it's not incorrect, but it's not relevant. And so we, we spent a lot we're spending a lot of time fixing these, then we have these sorts of images. And this is when we get to talk about AI. This is something an AI model will never never be able to answer correctly. So how many women are depicted in this slide? How many men if you had said four men and four women, you would actually have both been correct. So on the far right, that is a male Japanese actor depicting a female dancer. These three images to pick the same person. This is someone called the Chevalier d'Eon. He was a French diplomat in the 18th century. For various reasons, he was sent to live in exile in Britain, because he tried to blackmail the king went on his return to France, he was forced to live as a woman. So these are all the same person. I actually found this by chance when I was reviewing images in our database. And I still haven't tagged these images, I don't know what to tag, I don't know what to tag. If it's true, I tagged them as women should I tag them. As mentioned, I tag them as both. The problem is, there's nothing in our collections or cataloging record. To explain the story, I actually had to go to Wikipedia to find out what was going on. And that's actually a really interesting story. We have nothing in our record to tell it, which is really, really sad. So if someone saw this record, and saw that it was tagged a man would be very confusing. So I'm still I still don't know what to do with these records. And then we have a lot of images, like these. So this is a portrait of a boy. This is a work by Renoir, Madame Charpentier and her children. So the child in the middle is a boy. And the image on the far right is a Native American sheep. So our vendor pretty consistently identified Native American men as women. I think because of the long hair, and any machine learning model, if they saw this, they would assume it was a girl for obvious reasons. But it was, you know, the practice at the time for boys to wear dresses. So we're gonna get to the AI tags a lot more. But this is where a model can get really tripped up. So I want to move to our partners and what we're doing with this data. So during this process, we were put into contact with machine vision scientists at Cornell Tech, and we told them what we were working on. And they said, Wow, that's going to be a really interesting data set, it would be a great focus for a data science competition. So Kaggle is a website for data scientists. It hosts several competitions, major companies will host a competition like Microsoft, Google, and their competitions related, like create models that predict the stock market, create a model that identifies wildlife animals. So we were expecting 20 to 30 teams, we actually got over 520 teams to participate. And we weren't offering a cash prize, a lot of these competitions offer cash prizes. So we were thrilled with the participation.

Unknown Speaker 09:19
I check this website every single day, because they have a very active discussion board. So it was interesting to see what they're finding. This is what they were called a colonels only competition, because our all our data is available through open access. And in CSV, it would be it'd be very easy for people to cheat and find out the right answer. So what they had access to was a training set, and then another set, they would have to run their algorithm on. So kernel is only meant they had to submit their code. So they couldn't just submit their right answers. They had to submit their code. And the leaderboard was fairly consistent. There was just one person always at the top of the leaderboard until the last week of the competition, too. To other people just shot up to the leaderboard and the community went crazy. They said you're cheating. They are basically disqualified. Even though this was a criminal's only competition, but we got comments like this is such a great data set, it's much more interesting than whaled hit like, they had to identify one costume contest was about identifying whales. So it was very inspiring to see how inspired this community was. And another researcher based at Google, he created his own model, and he put it on TensorFlow hub. TensorFlow is a framework that many machine learning models use, often using Python. So he made his model public. And what he did was he created, I met collection, attribute classifier, which you can drop any image into, and it will do a tag prediction, and it will give it a confidence score. So that's the picture of the boy I showed earlier, it actually came up as girl, and the competent score is 95%, which is actually wrong. So these are the challenges we found with AI, we have a lack of developer resources, we have a large team of developers, but they are so swamped with other projects and priorities. So it's not like we could take one of these models from the competition and play with it ourselves. I would love to train our own model or use, you know, like, download that TensorFlow model and see how we could train it further. But we don't have the resource for resources where that we have imperfect training data, everything I just mentioned before, subjectivity, completeness, accuracy, relevance, it's imperfect. So if our training data is imperfect, the model is going to be imperfect. We don't have enough training data, we only have 600. Cats. Normally, with machine learning models, you need 10s of 1000s, if not hundreds of 1000s of data. And we just don't have that. There are no right answers for tagging art. If I showed you a picture from our collection, and we all tagged it, we're gonna get probably 50 different answers. So it's very, that's also very difficult. And then there's bias. Our collection is inherently biased. Only 4% of our artists are women. As you've seen most of our objects to pick to men, I have bias when I am reviewing images, I like dogs, so I tagged dogs, no matter how prominent they are in the image. That's just my natural bias. So it's very difficult to get around this, we're going to talk a lot about bias coming up. And I'm going to turn it over to Elena.

Unknown Speaker 12:38
Thanks, Danny. Okay, so what I'm going to do now is to talk about how to generate tags, using machine learning, computer vision, and how those tags to compare it to the tags that Jeannie and her team have created. So for these, for this study, what we use was image recognition to generate different table different labels. We focus on a specific data sample from the Mets collection, we analyze the highlights, which are a selected number of artworks from all these different departments from demand. We look at those that are in the public domain, which image is available. And so in total, tags and labels were generated for 1414 objects from the mass collection. So we gathered the data on just reviewing the methodology, we collect the data through the mass collection API. With a scrape, we gather all the images, and then we use Google Vision API, and then Amazon recognition to generate all those tags. So in summary, you can see here, the three kind of like systems that already the tags for the for this object from the Collection, which is interesting is for common, the unique number of tags. So from the human generated tags, we have 537 from Google vision 918, which is the highest unit number, and then from Amazon recognition 733. I think the data that is interesting to highlight is the average tags per object. For the ones writing for humans, on average, we had around two and you can see how high that is for Google vision and Amazon recognition. There are reasons behind that, for example, we didn't vlog, medium or art form art form. So those types actually are generated by Google vision and Amazon recognition, then, those AI tools are So, add a lot of similar words or synonyms for the same object. So for example, for a photo, they will add photograph photography and photo. So basically adding three tags, right, we probably I mean, in that case, we will have none because we don't add medium as a tag, that's a separate fields in the database. And in the case also have human tagging, we didn't add a tag when it was the same as the title. So if an object was a chair or a vase, we will not tag in that case, the object with that tag. So here, I'm going to show you some examples so you can get a better picture of what I'm talking about. So here's an hour claimed. And you can see the human tags that were added were girls and portraits. For Google, it was painting being are these are stress, acrylic paint, illustration, watercolor paint, and flour. For Amazon, it was art person human clothing, apparel, painting, rag, dress, robe, gown, fashion, even even in dress, female and combat. So you can see the number of tags and the differences between those three systems. Here you have another example this one for a mask. And this was in sales one interesting one, this is a chair. And what do you see interesting is that I mean, is zoom out in a second again. So this is a chair. And what human tagging was able to do was to go so mean, and tag everything that was the picture in the chair, like getting all the details, which Google and Amazon were not able to do, I believe on pretty accurate that Google was able to identify it as a chair based on the image you just saw. And Amazon didn't get it right. Thought is that was like a rocker grilled. So how were the tags used across the collection, similar to what Jean presented long tail distribution. And interesting. I mean, Google and Amazon use some tags that for us were not relevant at all. And many objects we're talking about. With that, like art, or artworks. We wouldn't talk that for the collection, because our hours are art. And it's actually interesting that amount of like 1400 objects like Google said that only 875 Word Art, and Amazon thought was only 540. And

Unknown Speaker 17:36
I'll talk a little bit more about the differences. But there are clear differences between the tags that were used, actually looking just a unique tags, and how they overlap between the three different systems. This diagram shows that only 12 tags were exactly using the same way by the humans, Google and Amazon, which I mean, I have to say that I didn't, I didn't compare like synonyms and also like plural versus singular, which may, then number probably increase. But still like, if you look at the long tail, the usage of of tags is very, very different. So it's a very small tag number of tags that overlap. In the case of Google and Amazon, the number is much higher actually 40% of the tags, which by Amazon are also used by Google. Some of the reasons why they are those differences. What I mentioned before, we didn't tag with medium, or art form. And we sometimes the machines will tag color, which we didn't do, or our movements like modern art, contemporary art, we didn't do that. And in the case of humans, they didn't mentioned without emotions. Also some actions like you can see in reading, which the machine didn't do. And also it was very detailed. The human tagging on what was depicted on on an hour while the machines were like more generic about what the artwork was about. And so now, going back to accuracy, and all the different criteria that Jeanne has mentioned in terms of assessing how useful things are. So why do you need them find many examples about issues with accuracy in terms of I am using humans to tackle a collection. That's the biggest challenge, I think for an AI, diaper bag, Rodriguez, birthday cake for this phase of skateboard. Here are a few more examples. So this musical instrument identify as a web bond, the Egyptian object from 1500 before Christ, like 3d modeling, or this artwork, phone, arms and armor tag as modern art. And so yeah, this is one of the biggest challenges. I mean, what is useful from these tools is that They offer you these like confidence score, which for it may be some tax declaration, you may want to decide, okay? If he's like over 80%, or 80, as a body of accuracy scored, and we added it, because as you can see here is tagged as an airport. And I think because this looks like, like a plane landing or something, I'm not sure it's the like way they see like the wheels or some of the other things. But you can see that the confidence the crisis, but still airport is like 71. And then subjectivity. This is hard to assess, because it's very subjective. But what I, what I found is that, like machines are adding a lot of terms that are like more like contemporary terms that we use. Actually, we did, we had the data from Google ovation from two years ago, that was very ecommerce driven by all the terms that were used, because how the models are being trained. So for photography, for contemporary or for some of the art movements, it works pretty well. And it's very detailed. But from some other objects from previous, like our movements, vary all objects, they are less accurate. And the terms used to tag those objects are more generic examples you see here mean for these sorts of pretty inaccurate for these, as far back then completeness. And these are I already mentioned that, and you saw it with the chair example that humans were able to tag much more than the machines. So you can see here, how many things are being tagged versus Google or Amazon that just say these are painting art? Or the try to I mean, this is definitely not more than so you have here, two other examples like here going into into the details of the actions like smoking or dogs, maybe you need to attack this one. Relevance. I mean, some of the keywords are accurate, but actually very relevant for our collection. Like I mean, tagging with art in museums, online collection is not that relevant for users, you know, to explore.

Unknown Speaker 22:19
And one of the biggest challenges that have been linked kind of with accuracy is the lack of context for these like motors to tag the collection. So I mean, these were accurate like tags supported or determined. I mean, some lady but painting person a human, but the humans were able to see the context untag is art with with George Washington, or hearing the example. Google and Amazon identify this as an A sculpture a figure human. But we're not able to identify it as Cleopatra. And something interesting. I don't know, because we don't know all the behind the scenes about all these tools like Google Amazon, but and what we noticed is that and they don't risk to tag with women or men, as we've done with like humans, again, a lot of data. A lot of the tags are just say like person or human. And they don't identify like a woman or admin. Or here in this case, like we have many, many fields, but not for the machine ones. So why all this is useful. And we have to think about our users. As you mentioned, one of the goals of having tax is to allow users to explore and discover our online collection. And this is something we hear a lot from user feedback. These quotes come from surveys and they met online collection. They really want different ways to explore it. They want tags, or they call it subjects, but something where they can find about a specific topics. So one way of implementing the tax is adding them to their search functionality. So when someone searches for those tags, then our works are showing on search results. That will improve the scalability of objects that don't have that information on the label and the title or in their description. And potentially, the machine generated ones can complement what we use with human tagging. So some data about the usage of the search functionality on the math website is used by 10% of our online collection users. Here is the beginning of the chart of the long tail of the usage of different keywords and searches on the online collection. It includes actually 875,000 different keywords. So it's like endless like chart in my computer. Okay. So I mean, you can see at the top, we have artist named mangog. Money, we have our movements. But when you look at the long tail is when you can see where these tags generated by either humans or machines can be useful because they are very varied. And as I said, it can help to discover objects. Here are some examples. These are tagged by the word birds. But these, the word birds does not appear either on the title or in the description. Or here's another example for the war dance and is depicted in some of the objects. So looking at the volume, so what I did was look at those tags, and compare them with the search volume, and see how much potential and this gravity I didn't know these keywords will mean for the online collection. And so you can see here, the volume of those unique terms, is pretty similar. And what is interesting that, although they were less unique number of keywords for the tax that generated that humans, actually the volume is the highest. And you can see here, the top. I mean, those tags that were generated by humans and machines, and their volume, there are the top 20. For Google and Amazon, they're pretty similar for the humans acts are a bit different. What you can see is that cats and dogs are definitely one of the most searched things on the online collection. And this is talking about not Google, but the internal search for clarify. So I'm gonna have a conclusion, probably the biggest challenges for using machine generated tags is the accuracy and the lack of context to get it right.

Unknown Speaker 26:58
However, using image recognition can generate labels that may increase the diversity of the tags we already have that have been set up by us. And that can help, of course, to improve like the search functionality, the navigation within the collection. And ultimately, the search engine optimization of the side of your things to do in the future is, I mean, we started adding some of these tags to our search functionality on the Met website, although they are not weighed very high yet. So once that happens, is we'll be to look at search analytics and see how that impacts those metrics like search exit bait, we will look at data tags with high level of confidence. And so how that will affect this, this whole analysis. And then there are other systems like verify Imago or Microsoft that also have their own systems and models that could be used to generate tax and compare with the data that we have. And if we're like add those tags to an online collection, I will be very interesting to see how that is presented, how users value that information, how they use it. And how that helps to to increase the discoverability of the collection. I'm gonna know Mike to Andrew.

Unknown Speaker 28:30
All right, that was a great overview of what the Met has been doing with tags and the AI. And what I thought I'd show you is the project we did specifically with the Microsoft AI technology and how we fed those results into wiki data and did some interesting things with them. Some of that you saw last night, but I'll give you some more details. So everyone knows about Wikipedia, it's got 50 million, some pages across 200 plus languages. For English, Wikipedia, you're looking about 6 million articles. So that's, you know, 10 times larger than Britannica, if you remember Botanica back in the day. But we'd like to think that's a lot of stuff. We're 10 times bigger than Britannica, but it's actually still very much highly notable topics. And if you think about it, even like highly notable Western academic topics that are in Wikipedia, if you look at Wikimedia Commons, I think a lot of you know about that is the image and multimedia repository for the wiki projects out there 50, some million media files, very wide project scope. So you don't have to be highly notable to have an image in Wikimedia Commons. But it's not very structured, right? It's just kind of a big pile of images. And if anyone's ever tried to search it or look at the categories, it's kind of a big mess. And discoverability is a huge problem in Wikimedia Commons. So that's where the need for wiki data comes right. And the wiki data is kind of sitting in between this is how we try to explain where wiki data sits for a lot of the C level folks in the glam world because they don't really quite know what to make of wiki data. I think anyone in collections management or worked with linked open datasets, this is a great thing, but to the sea levels, I don't understand what wiki data is, right? So we try to permitted that it is the structure database of all notable works, or items in a particular domain. So this doesn't have to be as notable as something that has a Wikipedia article. The nice thing about it as language independent, so we deal with objects and concepts and not lexical information. And it supports comprehensive linkages to collections, right? So we can have an object and wiki data and have the LC identifier and the one in different national libraries and point out your master crosswalk database. And because of that, it's highly searchable, interactive and scalable. And what does this mean? So we showed this to you last night, but in case you didn't get it loaded on your cell phone, because the connections might have been a bit sketchy there. Hopefully, what you saw was wiki data, loading up a query, a live query onto wiki data, allowing you to actually, you know, drag and move these nodes around interactively. So this is something that any of you can try yourself, if you know, the wiki data items, if you hit the sparkle code there, you just replace this one line, where it has values that long, long, long line with just the objects you're interested in, it'll find all the connections and graph it for you. That's it. So if anyone wants to play with it, that's something you can play with right now, using that query.

Unknown Speaker 31:10
Alright, so what did we do with AI machine learning using Wikidata as the basis, so we use the Met subject keywords to train a machine learning model, and we use the image classifier to predict labels for other artworks. So unlike using generic Google Cloud Vision, or recognition from Amazon, or just a generic Azure image recognition system from Microsoft, if you feed your images in with the keywords, you're basically training a model. And what it effectively does is takes the neural network chops off the last layer. And then you're replacing that prediction layer with your custom vocabulary and your images, right. So it's a much better, a more customized version, which is why Microsoft's system is called Custom vision, right? So almost all these things work the same way. Whether it's Google, Amazon, or Microsoft, it is trained with a very popular image, net base level of images and keywords. But by giving it your images and your keywords, you're training that last prediction layer to be much more specific to your datasets. So training takes hours, but predictions are very fast. So once you train a model, you can feed it images. And it'll come out with predictions several times a second. And this is what we did we use the Met data to train it. But we actually fed the paintings from Cleveland from Rijksmuseum from all over the place just to see how it did. And it did pretty well. So we created a wiki data game, because the results are pretty good, but not good enough to just feed that result directly into the database. Right? So how do you get folks to try to vet good predictions and bad predictions? Now for most people, you're just going to throw up your hands and say I, how do I get good from this pile of stuff that's questionable quality, most people would stop, give up or try a few years later, when it gets better. The good news about the Wikipedia and the wiki data community is you can get them to do cool stuff with this. Right. So this is where we had a hackathon with Microsoft at their research facility in Boston, Genie, there's looking very, very intense, asking deep, serious questions about how we refine our machine learning model. And this is basically what we did, we took those 200 plus 1000 works, they're hand key tagged. But 100,000, I'm sorry, 1000 terms. And this is what genies worked a lot on customizing this vocabulary. We fed it into Azure custom vision, again, we trained it using the images and the specific vocabulary from the Met. And then we came up with a prediction. So we came up with predictions. So what do you do with 70%? Maybe correct predictions, what you feed it to the wiki community. So it's the most successful volunteer community pretty much in the history of the world. And you say, here's a bunch of stuff, tell us whether it's right or wrong. And if you make it really simple to do this, you can get really good results. So what we did was it created a game. And this game is so easy to play. This is when we had a little reception at the Met. And we actually had people who'd never edited Wikipedia or wiki data before. And they Champaign, and they had kind of bays, and they walked up to the screens. And all we did was say here's a picture, does it contain a boat or not? And you hit green? If it's yes, or blue? No, you can hit the middle one say i don't know i give up or skip. Alright, so you just give them three choices. Yes, no, skip. Very simple. No training needed. Just what do you see? Click on one of these buttons. I even had my eight year old boys play this game with no training. And they were meaningfully adding information to wiki data. Yeah. So this is an example of what it looks like. So there's the painting, it says, does it depict a tree or does it not depict a tree? Right? So I think the trees are big enough in this image. I think the trees are significant, right? So I would say yes, depicts a tree or you can hit skip. And we actually had multiple entries. You could go process horses, soldiers, dresses, houses, flowers, whatever you wanted. In this setup, they're pretty successful. What we did was, we also created a tool they just watched as our community was adding these depicting these depiction statements, and we saw these things kind of scroll through there.

Unknown Speaker 34:47
Alright, so this is a picture from that event. So this is the conclusion that these AI predictions don't really work well unless you have someone to sift through and put the humans in the loop and the Wikimedia can Unity is like the ultimate humans to put in that loop. Right. So there was a q&a earlier today in this room about should we use ml? And I think when we get into discussion, it's not a matter of whether you use ml. It's like what's the right application and proportion proportion of it. So some numbers, so more than 7000 judgments were made, resulting in about 5000 edits to wiki data in the period that we tested. The depiction topics that worked the best were like landscape painting features, probably not a big surprise to folks but a tree boat, flower, horse soldier, house ship really good results from the AI and good additions of that info to wiki data. But gender determination not so good cats not so good dogs not so good. As Jeanne said, we don't have tons of training data for that. And just so many different breeds of cats, so many different styles of paintings and artworks for cats, it's just not going to give us a real deep set of training data right there. Right now, the folks running Wikipedia are putting similar machine learning capabilities. When you upload an image to Wikimedia Commons, they might suggest some tags after putting it through some ml systems. So something that might be surprising to folks, when you hit that button, all these folks drinking champagne, hopefully not too much champagne, hit that button that edit goes live to wiki data. We a lot of questions from people say, oh, so I hit this button, you're gonna put it in a queue. And some humans go, Nope, it's live on Wikidata, the moment you hit that button, and they're really surprised to hear that. And I said, the ethos of Wikipedia and the community is recruiting and retaining a user is much more expensive than undoing vandalism. So if you think you've done some productive work, we're gonna suck it in and assume it's productive work. Because it's really easy to reject bad actors. So why not assume super good faith and say, you're probably here to do good stuff. And in general, it works, right? So for AI, the Wikimedia editors, perhaps the best ones to do this kind of job. So here's some examples of what the kind of breakdown of depiction statements looked like. After we did this experiment. Before we did this, the majority of depiction statements and wiki data were about Virgin Mary child, Jesus, basically Europeana these folks sucked in all the religious paintings and kind of overwhelmed the depiction statements and wiki data. But after our experiment, you can see the key words that we picked up on, wound up being dominant in wiki data. These are for painting specifically in wiki data. Then if you look today, woman and man are much bigger, because we kind of put the woman and man predictions into the system. And now most people are playing the game using women and man as a determinate determination there. All right, so we're going to be presenting some of the stuff at South by Southwest in Texas in March, hopefully, and there aren't too many museums presenting there. So hopefully, we can try to get more museum related content there at South by Southwest. Yeah. Future work? Well, we want to feed the judgments back in. So we really haven't done that yet. So get all the predictions on the the confirmations from the wiki data site and feed them back into the ML model to refine the predictions there. And then we want to try to perform specific training for different types of work. So right now it is like statues, paintings, silverware, everything's just sucked in. And all labels are there as well. So we probably want to break it down by different types of artwork to come up with better predictions. One other thing is maybe use this machine learning suggestion for other types of tools that we already have in the Wikimedia sphere. So there's a project that is a knight funded card, the wiki, wiki, art depiction, Explorer, and it's a way to make an interface for folks to create more and intelligent predictions were not predictions, but labels for artworks in wiki data. So an example what this looks like, is something like that. So that's the basic interface that we have for the wiki art depiction explore. Depending on the site, we actually might bring in the tombstone description here. But we tried to give you all the context that we know that image classifiers don't have, right, if you just deal with the image, you don't get the title, you don't get any kind of information about the artist, or you don't get any kind of extra metadata. So the idea here is to surround the user with as much data about this painting so they can do something intelligent. So why not use that big empty space down there, perhaps to put in the AI generated tags down there. So you can kind of click on them. And you can even have three or four different image classifiers. Down here, you can say Google says this, Amazon says this, Microsoft says this. And then you can choose any of the tags that make sense for this type of image. Right. So the idea here, I think, is that ML can be a great aid. It shouldn't be the only tool you use, but can be a really powerful tool, as some folks said earlier to go do the menial tasks, things that get us to do more productive work faster. Alright, and I think that is it. We can have some q&a. Thank you.

Unknown Speaker 39:37
About to be went out to an outside vendor to talk a little bit about how that work.

Unknown Speaker 39:46
We looked at we had to because our procurement department requires us we looked at multiple vendors. They actually did some test records. And based on that they're pretty similar but based on that based on price, we chose this vendor and their, their team is actually based outside the US. So that brought a lot of issues because they weren't familiar Western names, events, that sort of thing. So those types remit missed. And also, these people work a lot with self driving car technology. So they're used to looking at detailed images. But they loved looking at art. And they were really sad when our project was ended. There were there were things definitely I would do differently. Because of funding, we had to finish this project in three months. So they did it in three months, but they had 70 people working on it 24/7. So we lost consistency, when you have that many people that 70 opinions. So I would have done that differently. Also, they worked at department, by department. And because they work so quickly, during our weekly call, I would bring issues to them about say Arms and Armor, they'd already be done with it, they'd already be moved on. So I would if I were doing it again, I would do it on in random order.

Unknown Speaker 41:10
self driving cars, we can also choose choosing all the pictures and a capture that we could. With the game that you've got potentially 1000s of

Unknown Speaker 41:27
folks today, that's a good idea. Yeah, classify the tag information from your vendor and put it into your actual database.

Unknown Speaker 41:44
We did that at they finished in April, I imported them. We did some cleanup, we spent a lot of time cleaning up, but then I imported them into our collections management system. It's very, it was pretty simple to do. But we did it about five months after the work was completed. So now all our tags are on the collections management system. They're in our API, they're in our Open Access CSV. And they're also searchable now on our online collection. Right, I would like to wait them as tombstone? tombstone has waited very high. Yeah, but we're it's in the queue. It's in the developer queue. So they're actually waited too low, I believe right now. taxonomy that you guys use? You know, are we actually we made it? Or are we made our own taxonomy? We looked at the 80. And I definitely look, I refer to the 8080 is way too dense for our needs, you search tree and 80, you're gonna get 10,000 scientific names for trees, whereas we just need a tree. And we made the conscious decision we wanted very, fairly general, because we had this time crunch. These were not subject matter experts. But basic, or taxonomies. fairly basic. And like I said, as we review records, I add terms, because you know, sometimes I'll see an image and I'll say, Oh, that's not whenever that's only that's just a one off, and then I'll find like 10 More examples. So I'm constantly moving target.

Unknown Speaker 43:26
I have not done that. No, I'm trying to do that. For Andrew, we're trying to map our opt our terms and get the AE T ID. So Andrew can put that into wiki data is very slow going on using open refine, it's just very tedious.

Unknown Speaker 43:46
When you're talking about the game, part of it, did you do any testing on what the actual retention? Like

Unknown Speaker 43:52
how long you will play it? How many taxes they put in? For?

Unknown Speaker 44:01
Yeah, that's a great question. I don't think anyone has done that. Because the funny thing is that the game interface has been there for years. It's kind of a generic framework. So if anyone wanted to create a game for it, it's actually quite simple. You just need to kind of have a API that supplies, things, and then three decisions. And that's it. So that's a great question. We should look into that to see what the average engagement is, we could probably go back and look at the edits to wiki data that resulted from the game and get some base basic information about how how much people played that game.

Unknown Speaker 44:35
I want to go back to one last slide about providing the like title and the information about the

Unknown Speaker 44:42
artwork with the image to a user, right. This is for humans to

Unknown Speaker 44:49
do you know, is that possible? Or Are there studies where you can feed a machine an image and text and so they like if it's Title?

Unknown Speaker 45:05
Yeah, that's a great, that's a great idea. I mean, I don't know of anyone who looks at the total record and tries to do AI based on everything you find the records.

Unknown Speaker 45:13
Microsoft Cognitive search is trying to do that. Anyone for Microsoft is here.

Unknown Speaker 45:19
And we also have some tricks in this interface. So that if it says, you know, just in basic heuristics, like if the title is like all proper nouns, like if it's like Mike Smith, and then has a date, dash and other date, then we will look up someone of that name born in that year died in that year in wiki data and provide suggestion saying, Here's a Mike Smith that was born on that date and died in that date. So we try to provide smart suggestions. But I think that's a good idea is look at the total record, and try to do some machine learning based on everything you see there other that way, you'll you'll remove some very basic errors. That's right.

Unknown Speaker 45:56
I guess it's more of a GED. How does this go over with you're not subject matter?

Unknown Speaker 46:07
Um, yeah. So some of them think it's great. Others do not like it, they do not like it at all. Because it's not data, it's not content they've created. It's a loss of control. So because of that our tags are not visible yet on our object page. They are searchable. I haven't, I haven't even exposed it to them in our collections management system. It's hidden from them, mostly because I don't want them deleting a lot of tags. But also, I think having the a non subject matter perspective can actually be very, very useful. Because some of our departments do tag their own works. And we have looked at them, and we will try to correspond but some of them just I don't think are that would be that useful for the general public. And the tags are for the general public. They're not geared at the PhD scholars, it's really for accessibility.

Unknown Speaker 47:05
Where do you think we'll be in five years? What's gonna happen? Lena, you're the researcher. So I'd love to hear your views on this.

Unknown Speaker 47:16
But I think what we've seen what's interesting with Google patient, particularly, I mean, we had the data from 2017. And this year, and it has improved so much is I mean, in terms of accuracy detail. So I do think that maybe in a few years, like these models will become much, much better. And maybe, you know, museums can come together and like train the model. And then these can be much better. And I see some reasons. I mean, there, I have been working on a project called museums and AI, I know Museum, there are more museums at this current state. So I think maybe at one point, I want to say, Okay, I'm putting them on my online collection and see what happens. And we actually talk about maybe how you can display and make sure that you let the user know that this is like machine generated data. So he's not donated by the museum. And like, I don't know that you just say something like wrong, like, you can email us or maybe this assistant to get some feedback. But I think at one point, museums will start adding this data to their own collection, hopefully, because I, I mean, I see the data from the users. And you can clearly see that this is these are words people are searching for

Unknown Speaker 48:30
me, one thing we're looking at doing very soon in the near future is to add a visually similar search option on our object page, it's very low risk, it's not tag prediction. It's just this image looks like these other images. So in a related objects setting just to get people to continue to explore the collection. So that is doable. That technology exists now. And they're pretty good. So we're hoping to do that in the near future on our site.

Unknown Speaker 49:00
Actually, we tack the hard problem first. I know like Barnes and other folks have done the visually similar for a while. So we, I guess we'd like making life difficult for ourselves by hitting the hard questions. I mean, the funny thing is, as Lena said, it's amazing how far the state of the art has come in five years, like I've been studying or observing AI since 1980s, when I was in computer science, and I've always been very skeptical of ai, ai ai. And even when we had this hackathon with Microsoft, I was kind of going like, Alright, you got to convince me because I'm really not on the side that AI is gonna provide us something useful. And after that two day period, I said, Whoa, things have really come a long way in like the three years that that preceded that hackathon last year.

Unknown Speaker 49:44
And those things are also becoming more accessible and more user friendly. So actually, we gathered near my research assistant, we gathered the data in a week. So it's actually pretty doable. Like to gather I mean, when the QA is probably what takes now longer but the Technology is there any is becoming much more accurate and also accessible.

Unknown Speaker 50:05
A big portion of is that the libraries are so freely available. Now the training data is open source, the computing power is fairly cheap. Google even makes a custom hardware that does tensor flow for like $79, you can get like a specialized USB key that does things lightning fast. So it's pretty amazing what's out there right now. So there's really no excuse not to experiment with it, even if you don't go production with it.

Unknown Speaker 50:37
Media are is the,

Unknown Speaker 50:43
we the way they're stored in our collections management system, we have the ability to enter equivalent terms, we haven't done that yet. Because I don't know if our websites search will be able to understand that because they're really just pulling the terms I would love it for them to pull are the equivalent terms. But we're not there yet.

Unknown Speaker 51:05
In wiki data, you have the ability to do that just inherent with what Wikidata does, right? So if you if you label it George Washington, then you don't have to tag it, man as well, because Wikidata knows it's a human male. So any search for man will can surface that as well. But most CMS is, you know, museums are not semantic. So, there might be some bridging of those two worlds at some point.

Unknown Speaker 51:29
You had any problems with offensive tags, not just incorrect, but either machine generated or even in the game. You know, some evil actor or something just wants to mess up the system?

Unknown Speaker 51:46
Not our our vendor did not because they were using our own taxonomy. Yeah, but the only I mean, you know, the only offensive things were you know, gender. Gertrude Stein was almost always tagged as a man.

Unknown Speaker 52:02
Yeah, I mean, we're wrestling with it in wiki data right now. Like, do we say oriental costume? We wouldn't use that term today. But at the in that era, that's what they used. Right. So we're still trying to figure out what's the right thing to do with these terms that have changed over time? Whether it's Negro, black, African, American, all these things? It's, it's not easy. We don't have a great. I don't think we have great policy about that right now. But we looked at museums that might have better insights into this to help us out.

Unknown Speaker 52:30
I was just wondering, when you were talking about the amount of tags that were generated from the project, were you able to do an analysis of how many of those tags were not already in your metadata to see how many extra entry points

Unknown Speaker 52:45
even if they were in the title, for instance, a lot of tags repeating the title, we wanted a consistent central place where everyone could search that so there is a lot of overlap, where we are removing tags or things where it repeats the object type, like if it's a vase, and it was tagged a vase. We're moving that tag. We don't want to set the precedent of having to do double data entry. But there is a lot of overlap with the title field. But we're leaving those because it's the subject. Any other questions? Thank you guys. Thank you.