Unknown Speaker 00:00
My name is Courtney Callahan. My pronouns are she and her. I'm a third year MC and board member and a mentor in the MC n mentorship program. MC n is a nonprofit volunteer run professional organization committed to growing the digital capacity of museum professionals. MC n has developed a deep active community engaged in year round conversations, webinars and resource sharing. As an MC n member, you can join special interest groups, you can participate in our mentorship program, and you can shape MC ns future by holding a leadership role such as a cig special interest group chair, and with time perhaps on the unseen board if you're interested. If you're not already a member, we hope you'll join us. You can learn more about any of this at mc n.edu. And hopefully you know where that is because you're here and so you got tickets to the conference. I'd like to thank Microsoft, who is our registration Assistance Fund sponsor, Axel, who is our Ignite sponsor, and all of the sponsors we have listed on the program schedule for helping us make this conference possible. Today's session is a zoom meeting workshop, we are recording it did say Chatham House Rules, but that was wrong. So just be aware. We're using the chat box for questions, as you'll see in our chat from our emcee and volunteer. And if you have a question, you can post your name and someone will call on you. And we would love it if you were willing to have your face and voice on the screen and make it more active. If you prefer not to verbally ask your question, though, feel free to type it in the chat box, and we'll still work to get to it. So we're three minutes in, and I'm guessing we're gonna wait another minute or two more, but I want to make sure before we do that, that I turn this over to our fantastic and wonderful presenter, Andrew Lih. He's a Wikimedia strategist at the Metropolitan Museum of Art, and someone that I met many years ago, helping me get our museum on Wikipedia, and I will forever be grateful for that, Andrew.
Unknown Speaker 02:09
Great, thanks so much, Courtney. Can you hear me? Okay? Everything sounding All right. Excellent. So I do have a tendency to talk fast at times when I'm excited. And I'm very excited about the topic. So feel free to either in the chat or to signal the moderators to slow it down or repeat something, I'm more than happy to do that. We're really lucky that we have two hours today, we're going to try to mix it up to have a lot of different types of activities. Because there's nothing worse in the zoom COVID era than sitting at your spot for two hours trying to learn something without doing anything. So the first thing I'd love for you folks to do is to make sure you see my screen. And it should say, with a half red background, the joys of connecting your collections to wiki data. And there are three links that we have there. The probably the most important thing is the main one Bitly slash MC n 2020 wiki data, and that'll get you a link to this exact slide deck. So I'm one of those folks who likes sharing the full slide deck with anyone, because we're covering so many things. And a lot of this might be new, you should have the opportunity to kind of, you know, just screen snapshot or to go back and look at anything that you want. But right now, I'd appreciate you folks, if you could go to the survey. So if you go to Bitly, slash MC n 2020 wiki data survey, that should get you to this form. And you don't have to enter your name or email or anything, I just want to get a basic idea of what you folks know. So we can cater the session better to what you folks have knowledge of. So thank you, COVID. And other folks, great to see you in the chat. It's been a year, wish we were there in person, which we're doing Ignite in person, but this is the next best thing. So if you folks could go to that survey and fill out just basically three things. One is what is your familiarity with Wikipedia in general? What is your familiarity with wiki data? And it's perfectly fine. But none, I never heard of it until the the workshop was described a number three, just how would you rate your familiarity? We using databases in general? Okay, again, we don't require any of this experience. But it helps as I go through some of this to see basically where we need to recalibrate some of our content. And then the last one is check as many boxes that are relevant to the title or your role in the organization. And this is especially tough so sorry if your role is not in this list, but just something in the right ballpark would be good here. And I'm seeing the responses come in already. And I'm not going to bring up the new miracles for you quite yet. I might share them later on in the in the session, but it's pretty good that we have about right in the middle about 50% of folks know something Wikipedia some percent don't. This is really nice to know that familiarity with wiki data we're seeing, you know, most people are on the I don't know much about it scale. So that's great to see that we can, we will not be boring, you will be telling you a lot of new things about wiki data. But a lot of folks who already know some that wiki data should learn something new as well, a lot of things have happened last year. And then in terms of familiarity with databases, we're seeing quite a number of folks do more than 50% have a familiarity with databases, which is good, but we don't require any expertise in databases at all. And then when we come to titles, museum staff, the titles that I keep seeing here are manager, developer, technologist, arts and culture consultant. So it's a good spread of folks. So thank you very much for doing that survey really does help to get a sense of where things are with the audience that we have. Okay, so here are the links again, and then we will be using a working doc later on, we'll we're going to be asking you to try some things out. And we'd love to see some of your results. So the easiest way to do this is with a Google Doc, and you can paste in screenshots if you'd like in there. So again, these are three links that we have the slide deck, the survey, and the Google Doc that everyone can write into later on. Okay, does that sound okay, Courtney's everything, sounding logical so far?
Unknown Speaker 06:23
everything sounds fantastic.
Unknown Speaker 06:25
Excellent. So let me quickly introduce myself, This is my third emcee. And even when I was there, for the first time, I felt right at home, working with all the folks at MCs, who were, you know, pretty much in the same ballpark as what I was doing, which is working more with museums, on how to work with Wikipedia content and how to work with digital collections. So as Courtney said, I right now I'm working with Matt as a Wikimedia strategist to work with their content. We also have Jeannie Troy on the call today, who will bring in talk a little bit about what we've been doing there. And I think that'll be really interesting for you folks to hear about. This is kind of a new thing that the Smithsonian Institution with the launch of their strategic plan is bringing in are bringing in more folks to work with their goals towards a billion digital viewers of their content or users of their content. So I will be working with them as well. So we can medium at large, which is very odd Title I know. But I will try to explain what that means. later on. I written a book about Wikipedia called the Wikipedia revolution. So if you want to know more about the history of Wikipedia, and how it came about, there's a whole narrative about that. And there's also a book called leveraging Wikipedia by the American Library Association. So for folks who work in that area, that might be an interesting compendium of stories on how different glam organizations have been working with Wikipedia over the years. So just give you an overview. It's a it's a nice long two hour period that we have. So we're going to be covering these three things. And COVID No, I'm not at a cool bar. This is my set that in the COVID era, more and more, you know, if I'm gonna be stuck at home, that doesn't make it look good. So there's my background there. And I can change the colors as people want. So we're going to cover what is wiki data? What are the benefits of it. And we're going to be covering also how organizations can work with wiki data and Wikipedia content. So folks might be familiar with Wikipedia might even be familiar with Wikimedia Commons as a multimedia repository. But you're probably fairly new to wiki data as a metadata storage area. And then see like what are some practical next steps you can take as an organization to work with wiki data or the community. This will include some case studies and some recommendations on what to do and that as well as a basic fallback, you can always just contact me. And we have a big network of folks in the Wikimedia community that work with museums and institutions. So feel free to just contact me directly, but also give you the address of a new employee with the Wikimedia Foundation was dedicated to glam activity. So some of you might know Fiona. Romeo used to be with the Museum of Modern Art, I think, in New York City, and has been in the UK for a while now. So just a very quick summary of the history of libraries, archives, and museums with the Wikipedia community. It kind of had a breakout year in 2010 when the British Museum hosted their first Wikipedia and residence
Unknown Speaker 09:21
and over the interrupt you for a second you're still seeing the first slide and so we couldn't see your book and I don't know if what you're talking about now has a slide so we'd love to see it.
Unknown Speaker 09:32
Oh, I don't know why you're not seeing Okay, let me see if I can reshare the screen and see that. Okay, let me reshare right, share my screen. Okay, are you seeing Let me see. Let me try to move the slides here. Are you seeing my books right now? Okay, excellent. Thank you for the feedback. And hope you see ABC on the screen right now. So those are the three things we're going to be doing. Excellent. And so this is the timeline of where we are now, I think we had in 2013, kind of the breakout year in the United States, at least for the US National Archives, having a full time wikimedian in residence. And ever since 2017, when a lot of organizations have started their open access initiatives, we now have the Met Cleveland Museum, and now Smithsonian as of earlier this year, or last year, doing their Open Access Initiative. So this has been a big two or three years in the life of lambs and Wikipedia. So for example, the open access that we had the map started in 2017. The Smithsonian has a goal, one of its third strategic goal is to reach a billion people a year with a digital first strategy. So I think it's a astute of these folks to say, I don't think a billion people are going to visit our website, or museum website on. So we need to find the right places to have impact and for exposure, and that's where most folks look to the Wikipedia and Wikipedia community when they have these types of large scale goals. So if I were to describe wiki data briefly, and why it's crucial to that 1 billion eyeballs for something like the Smithsonian digital strategy, it's the evolution of free knowledge, which we currently kind of see as Wikipedia, towards a multilingual linked open database. So if I were to describe wiki data in one sentence, this would kind of be what it is the evolution of free knowledge, widely towards a multilingual linked open database, and not not using any technical terms like semantic or, or structured at this point. But here's a very basic explanation for what wiki data is right? It doesn't help to talk about wiki data without some basic introduction to it. So this is just an example of a sentence that you might read in Wikipedia, in what we call a lexical form, right? It's a sentence that we've read in English, the United States Congress is a bicameral legislature of the federal government of the United States. And the wiki data version of this or the way that we would express this in wiki data is through structured statements, right. So we would basically store three pieces of info, the United States Congress is an instance of a bicameral legislature. Okay. And then the second thing we might add is like United States Congress is in the country of United States. So we basically take anything that we read in Wikipedia, and we try to break it down into these three part statements that we store in wiki data. That's the that's like 90% of what wiki data is, is just taking facts and storing them as these three parts statements in a database. and wonderful things can happen once you store knowledge in this format, right? So this is basically the Semantic Web in like 10 seconds. So what do we do? How does this actually, how do we actually do this? In practical terms? Well, what happens is, instead of Article names, or people or places, each of these entities or concepts has what we call a queue number in wiki data. So queue numbers are unique identifiers for items in wiki data, and to be human usable. So if q 1234, we should have some kind of label or description for what this cute number is. But the nice thing is that this cue number is unique. And then we can put all different kinds of language labels on top of it, or language descriptions on top. Right, and we'll show you some examples later on why this is so important. So here's an example of what a wiki data item looks like for the United States Congress. So you can see that it has an English language label of United States Congress, and it has a very brief description called legislature of the United States. And then what you might have seen on Wikipedia, for example, are redirects or aliases or links, you know, some people might call US Congress or American Congress or legislature that states how do we keep this all straight? Well, the nice thing is that there's only one q number for the concept of Congress. And we can have multiple aliases or different labels. Because when someone says American Congress or Congress knighted states, they're still talking about United States Congress. So that's why we have all these aliases, where it says also known as right, so that's where we can have multiple names for this thing in wikidata. Okay, so the queue number is the unique identifier. And then the label description and aliases are the main information that we have about that concept or entity.
Unknown Speaker 14:28
Then these are the things that we just talked about before we have things like Congress is an instance of a bicameral legislature. It's part of the federal government, the United States, the country that we're talking about is United States of America, because you could have congresses of other countries here, right? So this is where the three part statements come into play for wiki data, and you can start to see why this is called structured data, right? It's very rigid, because we break down what we know into these three parts statements. So what are we looking at in terms of these statements? We'd like to find out what is it The human is the mammals that a dog these types of things, we might want to know the location of something that's a cue an item, we might want to know if there's an ISP, a number associated with a book, if a book is a cue item, we might want to know the Ulan ID, if you're an artist, you know what the idea is another database. So these are all just parts of what makeup statements in wiki data. We walk away today with nothing else, you should have a link to this page. This is kind of a one page summary of wiki data, pretty much what we're going to be talking about today, trying to be boiled down in one page with links out to the relevant parts of wiki data. So if you go to wiki data dash one page, this is a guide that I put together, and it's available in 10 languages. Now, other people have translated this. So it's a nice 10,000 foot summary of what wiki data is and what you should be paying attention to. So feel free to have this next to you, as we talked about a lot of these things today. Okay, so hands on, let's start with looking at what wiki data is about. So one of the things that I also point you to is this wiki data doc. So if you can click on that link in the slides, you'll get there, or we can just type into your URL, bar Bitly, slash MC n 2020, wiki data dock. Alright, so if you go to that dock, hopefully you should see something like this. Alright, so I'm going to wait for a number of icons to pop up there. Meaning that folks are actually visiting the stock. So the top of the dock is trying to recap what I just talked about, here are the links to all the major things I'm talking about. Then I took a very quick look at who was attending the session, I tried to digest the the organizations that I saw their take. So some of you might have more and feel free to add yourself to the list. If you have other rows there, just put your name there, or I'm sorry, not your name, but your organization name there. Because what we're going to do is we're going to have you look up the queue numbers for your different institutions, and start to kind of break down what they are and what we could do to improve them. So yeah, go ahead and add more links or more rows to that table. And if someone took up the last row, go ahead and go right click on that table and add more rows if you want. This is great. Seeing your Historical Society mohonk preserve, Isabella Stewart Gardner Museum, awesome. museum. Great. So what we're gonna do is we're gonna go to wiki data. So let's go ahead and go to wiki data.org. Hopefully, you're seeing this on the screen. And if ever I'm doing something, Courtney, and you don't see what's going on, then feel free to just ping me and we'll jump in. But this is the front page of wiki data right here. And if you go up here, we're going to go ahead and try one of those entries out. So let's say I'm going up here. And I'm going to say Smithsonian American Art Museum, right? Okay, so notice as I type, it's matching against what is known in wiki data. And I'm going to go ahead and click on Smithsonian American Art Museum. Okay, so if you do find your entity, your organization there, go ahead and look at your cue number right there. Go ahead and paste it into the table. And don't worry if yours does not exist, that's actually really more interesting than if you found it. Yeah. So go ahead and go ahead and paste it in there. Yep, there's a by default, Google. Pay saw the style in there. If you don't know this quick shortcut of using a Mac, you can hit shift, Command V. And it'll paste it without the style, which is kind of nice. So shift Command V, will paste it without the fancy fonts, or the color, egg and all that stuff. Great. So seeing people fill in those two numbers. And don't worry if you don't have it, if you don't have one, you can say missing sadface, whatever you want there. But go ahead. And I know for a fact that long year and picker do not have wiki data entries. Okay, good. So we're seeing people fill in those wiki data entries there or the wiki data that cue numbers there.
Unknown Speaker 19:17
And if you have a queue number for your organization, go ahead and inspect some things related to your entry. Right. So the first thing I would do is make sure that the name is the right common name for your museum. And the description is an accurate description. Normally, it'll be x Museum in location, that's a pretty good basic description. If it's not, you might want to go in and edit it right away. So what you might want to do is go to the Edit button right here. It has a little pencil right next to it edit, you'll notice that these fields are now editable. And you can actually edit them like that. I'm not going to change it right now. But go ahead and edit right there. Or we can add more aliases, right. So I'm going to put in SAM with notes. dots, you'll notice that someone put in SAM s dot A dot A dot m dot, and commonly within the Smithsonian will just say Sam with no dots. So in case someone types that into the field, I'd like to match it against Sam with no dots. So I'm going to go ahead and do that. And hopefully that will take be committed to wiki data. Publish Oh, have you? Oh, sorry, someone added that already. Okay, good. So whatever you want to add there, go ahead and hit Publish right up here after you're done. And that'll be added to the entry. Right. So I notice a number of organizations had a good label, but they didn't have a lot of aliases. So for example, some of them you have the VA in the front as the formal title, but don't have it in his alias. So you might want to add that as well. So go ahead and try adding them and then go ahead and type into the third column. Any changes or aliases you might have added, so that we just get a sense for what you are putting in there. Now some of these which are very high traffic may not allow you to edit them. So something I did not make you do, because I want to show you that Wikipedia and wikidata don't require you to actually create an account. So you probably noticed, you're like, wait a minute, I didn't have to create an account, we recommend that you do. But one of the weird quirks and amazing things about Wikipedia and this community is that by default, anyone can edit any page. But a lot of the high traffic pages will be kind of locked down, you need to create an account for things like Smithsonian or for the Met museum or Yale Center for British art. But for the ones that are not as high traffic, they want to encourage new folks to edit them. So feel free, yes, the bog anyone is probably going to be locked. So if you have a Wikipedia, if you have a Wikipedia account, you have a wiki data account as part of that. So you can actually log in and edit that if your account is more than seven days old. Okay, so if it's locked, go ahead and enter an alias that you would want. And I can make that change for you. Or you can log in and make an account there. So we can see the Nelson Atkins museum looks like someone's adding a lot of good aliases there. That's great. And this is just to increase the discoverability of your organization. Right? We'd like to have the queue number to be well known the title to be the common name. But we highly recommend folks enter in as many aliases that makes sense for your organization that are likely to be added or to be looked up in their in wiki data. So it's great to see different folks adding those aliases in there. Good. So already, you're increasing the exposure of your organization simply by adding more aliases. And we'll tell you later on about the function of matching content outside of wiki data to stuff that's inside wiki data. And providing more of these aliases, you provide more higher percentage of a chance for a match for these types of things. Okay, great. So we are seeing you change those aliases, that's great. That's the most basic thing. So notice that when you change an alias has an edit button at the top. But then every other function that you want to do to edit is pretty much in context for where you want to make that change. So for example, here instance of art museum, if you want to change that you hit the edit button right there next to that right there. Okay, and I don't think this is a item right now. No, it's not. But if I want to say here.
Unknown Speaker 23:26
So if you see if I want to add more things here, it will not let me free text enter things, which is quite interesting. It has to match against other items in wiki data. So you're starting to see the structure part of wiki data, take hold here. If anyone's ever edited Wikipedia, you'll know that you can type whatever you want, hit the Save button, typos and all. But here, you can see you're kind of editing on rails, you have to enter something here that is already in wiki data. And if it's not in wiki data, you need to create a new key data first, before you add it to something like this. So that's why wiki data is quite different than Wikipedia. It's much more rigid. And it's much more quality check, because you need to have all these nouns and verbs fixed on wiki data first before you pick those changes. So are there any questions right now? If you if anyone has any questions right now, regarding what we just talked about the queue, number four key data, the labels, the description, the aliases, which are really good to add any common aliases people might know your organization by, and then the statements that you can edit one by one below. Okay, and you can see that some things are special, like images, they'll show the image, if it's coordinates, it will show a map right there. And we'll, we'll show you some examples of that later on. So you can see that wiki data tries to be more multimedia by default, if you enter in data that can be, you know, map to geo coordinates, or can pull images from Wikimedia Commons. Something that is very, very common for folks to ask about is Oh, can I put any image in here unfortunately, the only images you can In here are ones that are legitimate in Wikimedia Commons under a free license. So that is quite a restriction that we have is that the images need to be free and uploaded to Wikimedia Commons, you can't just point to a random Google image search image or one at an institution, it has to be free on Wikimedia Commons. Okay, any other questions, feel free to put them in the chat right there or to raise your hand and Courtney can bring you in for anything that's there. Okay. So that's your basic kind of hands on. Let's see what else other things people have done? I think yes, the Minneapolis Institute of Arts, anyone who's been to the Ohio State University knows that this is a big deal for some organizations. Jay Paul Getty Museum jpgs. Let's go ahead and change that. So let's go ahead and Jay Paul. So we're going to go to Jay Paul Getty Museum. Okay, and we have do have some good aliases there. And we're going to go ahead and hit the edit button. And we're going to go in type in j, p, GM. And then we have the Getty and lowercase, let's just say the Getty with a capital, oops, sorry, in case matters. So we're going to hit return there. And it's going to publish that to wiki data. And just like Wikipedia, if you've ever looked at the history, you can go ahead and click on View history. And you will see all the changes that have been made to the century, including my entry that I just made right there. It's a change in English aliases, and it automatically adds that edit summary there. Okay, so my screen is tiny. So I'm gonna put on my glasses to read these questions. With the galley that I'm a DOS net, our entry doesn't show links to popular artists. Would you improve search results? How did you go about this later? Yeah, we'll talk about that a little later, in terms of how artists relate to institution entries on wiki data? Well, that's an interesting question how to how do you link, you know, your most significant holdings to your wiki data entry? And there are statements that you can add to wiki data for that. Alright, great. So that is our first exercise is just to get familiar with the interface of wiki data. And thank you for adding it. You've already enriched wiki data with these aliases that you folks are experts at. All right, so that's number one. Number two, one thing that is really useful for these aliases, these are the most common reasons why we have aliases. Number one is language variation. So one of the folks I saw who registered was part of this, I think, Swedish museum. So sometimes you will put the name of your Museum in the English translation. But you might also want to be findable by your authentic name, whether it's Spanish or Swedish, or French, or anything else. So sometimes, that's a great reason to have not only the common name of your Museum, or institution in English, but also the Swedish version of the French version as well, even though it's not technically English, that's good. Also, something that's useful is to have the diacritical variation, sometimes Jose without the accent, and sometimes with an accent or both. And then leading though we just had an example with in Minneapolis, and then formal titles versus colloquial, right, so you might have multiple things. And there's a little joke in our Wikipedia world, we're still not sure what to do with newfields. Like his new fields, the the official name was Indianapolis Museum of artists sub part of new fields, and it's not clear from newfields site. So if anyone here from new fields or Knossos folks, we could talk and figure out how to model new fields correctly in wiki data. Alright, so key data items. Let's dive deeper in using identifiers removes language dependency, right. So the nice thing is that you have one cue number, and you have lots of different things you can layer on top of it. So for example, this is probably the best example you can think of is more market coffee. No one's ever seen how his name is spelled and in the press knows that there's like 50, some variations of this depending on how you phonetic size Arabic language text, right. So here's a great example of we have over 50 latinized variations of marmar Gadhafi, but only one number for him as the person, right. So that's a great example of why we have these aliases to find the person through multiple labels and aliases. And then this this, this task of matching what we have in wiki data with what you have in your institutional database, is a process called reconciliation. All right, so you might hear this term quite a bit, not only in the wiki data context, but I have this list of artists at our organization. How do I know they're in wiki data. And this process is called reconciliation. And the most common tool for this is called Open refine, we're not going to go into it here, but you should look it up. And it it's a it's like the industry standard for how you match databases, or try to align and to compare different databases. So you should know about that tool. So 2017 was a real turning point using wiki data. And was pretty much the year where we start to see the emergence of digital systems like Siri and Alexa. We already knew that Google uses Wikipedia wiki data extensively but Now we know that Apple does we, Microsoft does DuckDuckGo all these different major coms are using it for searching. and machine learning even wiki data edits have risen pretty much every year consistently over time. So you can see that the project started in 2012. But it's now accounts for the most edits of any project into wiki, media universe. So even more than Wikipedia, because it's a multi level project. So here's just a great illustration for how many items in the wiki data had content
Unknown Speaker 30:32
with a geo coordinate, and this is what it lit up as in, it probably is no surprise that wiki data started as a German project. So Europe has a lot of a lot of lit up sites there. And even North America doesn't really compare to Europe back in 2015. But I'm going to step through year by year 2015 2016, you can see this little Flashpoint in Africa, someone really went to town on Rwanda, I think it was down there. And then you can start to see between 2016 2017 you can start to see the Middle East Asia, South America start to really get involved with wiki data. And then 2018, you can start to see it really emerged, as a lot more content start to come into wiki data. So that's just a great example of how you know 2017 is a pivotal year 2018 really worked start to get a lot more exposure. So what are some examples of why this matters? So here's an example that we had at the Smithsonian, where we created an article as part of what we call an edit a THON for Ada Lovelace day. So at the National Air and Space Museum, we created an article for a balloonist and inventor and in VR assignment, she had no copy the content, no wiki data content, if you googled her, you come up with like Air and Space Museum and some other smattering of links, within 15 minutes of creating her wiki data item and her Wikipedia article and uploading an image of her from the National Air and Space Museum. It showed up in Google as number three. And by the end of the day, if you went on your iPhone or iPad, and you didn't even fire up a web browser, you just typed into the search box, you type in VR assignments, you will get what you see on the right hand side of the screen there you will see her Wikipedia content and wiki data content saying that she's inventor, balloonist artist, her name, her picture and the lead sentence from Wikipedia. Now that's pretty astonishing. You couldn't even buy that if you wanted to, from Google, to say, hey, I want to be searched search result number three, No, you can't. You could buy an ad to try to be number one. But just organically be number three, there's only one site in the world that does that. And that's Wikipedia and wiki data. We also did an experiment at the mat, where we created an article and wiki data content for this sculpture for white born enslaved, and the exact same thing happened DuckDuckGo, Google, Bing, Yahoo search, it all showed up, you know, in the first hour of creating this, and it's now available on voice assistance, and Siri. And it just shows you the importance of this. So if you're trying to convince hire management, or upper management, why this matters, pointed these two examples, we got many other examples of how impact within an hour of creating this content makes a huge difference. Something that might be on your museum website for years, suddenly gets a lot more exposure once it is in the Wikipedia and wiki data ecosystem. Yeah. So here's another great example of the growth that we saw them that over the years, there now have been 700 million or more pageviews Wikipedia articles with matte images. Since the Open Access Initiative started. Average traffic demand images on Wikimedia projects gone up five to six times over that period. And then we have these really weird anomalies, which are good anomalies where we have this gigantic spike of content of use. And this just happened to be the month where this photograph from Julia Margaret Cameron was on the front page of French Wikipedia, and the Notre Dame Cathedral fire, if you remember that used pictures from the Mets collection. So that caused us a gigantic spike in May of 2019. But what's really interesting, if you look at the right hand side of this graph, is even with a pandemic that started january, february of this past year, you can start to see that the traffic has still gone up monotonically as in the past, and this was a big a big thing to see for museums that are really suffering, like, how are people going to visit our content and collections, they were still impacted by the Met content, even though that was closed for most of those months there.
Unknown Speaker 34:30
Alright, so we're going to talk about why wiki data and the design of wiki data and some examples of how we use it for institutional collections, right. So right now, you know that there are more than 6 million English language articles in Wikipedia. It's one of the five most visited websites on the entire planet. And it's gotten so popular and so useful to the point where in the last two years in this world of you know fake news or fake fake news and we're not even sure where the renews We're starting to see multibillion dollar corporations relying on Wikipedia, Facebook, YouTube, all these folks are pointing to Wikipedia, to try to sift facts from fiction. This was an announcement in 2018. YouTube will link directly to Wikipedia articles to fight conspiracy theories. And this is where they're directing people to go to try to get the truth about things. And then we're also seeing different organizations like the Tate linking to artists BIOS on Wikipedia, saying, we know we don't have the staff to do all this. In fact, the content of Wikipedia is better than we could do for most of these folks. Museum of Modern Art now, actually, not only business, media content for their artist files, they link to the wiki data item specifically on their pages. Right. So we're starting to see that wiki data, not just that behind the scenes meta data project, but actually being publicly exposed in this way, is quite interesting. So it's fascinating to think in just this last 20 years that Wikipedia is coming on the 20th anniversary in January, that has gone from wiki fast, loose, weird, unreliable, should we trust it to wiki, please save us from fake news, please read it to get more information about artists and history. That's pretty amazing to think about that in the last 20 years. Alright, so we have Wikipedia challenges, though. So as much as Wikipedia has been successful, we also know that knowledge is now scattered among the 30 million articles in 200 languages, we, we often look at Wikipedia in English and say, Oh, it's got the 6 million articles is the biggest it's a superset of everything that's out there. Absolutely not. Wikipedia English is great. But it's got a lot of holes, it's missing lots of biographies of women, is missing lots of biographies of folks outside the United States and missing a lot of information about historical sites. So we know there's inconsistency and gaps in this content purely in the Wikipedia additions. So how do we consolidate these noble facts? And you probably guessed, wiki data is the answer to this, right. So this is my basic explanation for why wiki data fits into this this mix quite well. Wikipedia is kind of at the top of the pyramid here, right? Wikipedia consists of text articles, and we have a very high bar for notability, you have to be pretty well known, you need to be covered by New York Times, USA today have a book reviewed in a major publication before you qualify for a Wikipedia article. So we have a lot of stuff that we leave out of Wikipedia intentionally. Because we rely on reliable sources, a lot of women scientists, Nobel Prize winning women scientists, we don't have in Wikipedia, because they haven't been covered. There's systemic bias in the reliable sources out there. So they're missing from Wikipedia.
Unknown Speaker 37:31
They're in consistencies across editions, and they're stale, and they're inaccurate at times. So we learned something in 2001. When we start to see images scattered after in Wikipedia edition, say it's kind of silly to have Marie curis picture in French Wikipedia and an English Wikipedia. And they're two different copies. So why can't we consolidate the copies of images, so we only have one place? That's what Wikimedia Commons was a place to centralize and consolidate multimedia. So that's what we have as Wikimedia Commons. Right. So Wikimedia Commons is the multimedia repository. notability is very low, you can go out and take a picture with your iPhone, upload to Commons, as long as it's not spam. It would be welcome there. So it's a very low bar for notability. In Wikimedia Commons, it has unfortunate very weak metadata functions, though, you can't really say, show me all pictures of birds that are green, it doesn't really do that right now. In fact, almost all the metadata is English language based, which is terrible. So we're missing out on the on reaching folks who don't understand English. And it lacks complex search operations, as we said, but we do have this gigantic repository in Wikimedia Commons. So the challenge we have for Wikipedia is what do we convert all that text content we have in Wikipedia into structured statements and turn that into a machine readable machine understandable? This is often called let's store things and not strings, right? Let's story concepts and entities and not specific names for Moammar Gaddafi. So that's what wiki data is, right? So wiki data has a much lower bar for notability. We can import every single woman PhD has written a research paper that's been cited perfectly fine in wiki data. That's great. It is language independent. So as we saw before, we could have Moammar Gaddafi, and all these different languages and different spellings. And the great thing about this is we can explicitly store external identifiers, we can point to library catalogs, we can point to the Met page, the museum a lot on our page, we linked to the American Museum, Natural History page for these scientists and things like that. So we can start to link to other databases in ways you cannot really do with Wikipedia. And we can do complex queries, we can say show me all women, mayors of cities larger than 100,000 people. That's actually a query that we have in wiki data on a regular basis, we can just kind of see all this information very quickly with a query. So that's why we want wiki data as this kind of layer where we can store a lot more things that we have now than what's in Wikipedia and start to consolidate facts. information. So sometimes this is referred to as you know, linked open data for libraries, archives, museums, or load Lam, right? You might have heard that acronym before, or wiki data linking to stable external data of glam institutions. And wiki data can be kind of this database of databases, it could be this place where you're pointing to all the different places on the web to do this. Okay, so here's an example of how we might take content from a Wikipedia article and break them down into wiki data content. We took that a very simple example before. But look at all the other stuff that we find on the Wikipedia article, we find coordinates, geo coordinates, we find out that it's by camel, we find that it's meets the capital. And then the bottom of law Wikipedia articles, we see things like this. This is what we call the navigation box and a Wikipedia article. But there's tons of useful metadata here that we're actually not using at all. It's here for display, but it's not being used for logic, or for anything else. So what if we can mine all this content that we find at the bottom of artists, archaeological sites, heritage sites, and take this and put it in a structured database? And that's what we'll get out is doing. So we can see that there are different caucuses for Congress, there are different committees. What if we can model this and wiki data to show all the relationships here, not just as text, but as as links in the database? Okay, so wiki data was launched in 2012, to try to capture all this content. And it provides not just the power of reading this stuff, but the search and sort and to investigate this data. So the claims that we just talked about before the statements were, Congress is a bicameral legislature, right, is what we call a statement in wiki data, sometimes called a claim, but it pretty much just is made up of item, property value, right? washington, D, geo coordinates that here are the numbers, or it's just very basically, something in wiki data has a relationship to something else in wiki data. Very simple, right? So the cue numbers are kind of nouns are things in this database, right? Anyone can create a new item, you can go in right now to wiki data, say create a new item. And hopefully, it's notable enough to have something but we encourage people to create cute numbers all the time. So from some examples of two numbers, number one, q one is the universe just kind of makes sense. I guess you probably want to make that as Q number one. Earth is q two, q five is human. Obviously, wikidata. People aren't cat lovers, because for some reason, cat is q 146. It comes before animal. Don't ask me why. But we love cats, I guess in the Wikipedia community book is q 571. Library is 7075 Museum, shame, way down the list 33,506. nothing we can do about it. Now, that's just the numbers that we have. But the numbers are not important. It's the labels that we use on top of that, right? We don't really deal with the numbers that much. Then we also have properties. So these are the relationships that we have, right? We have things like Jay Paul Getty Museum is an instance of Museum, right? Or Herrick house is an instance of historic Historic House Museum. So we actually have these properties. And these are not created by anyone. These are very tightly controlled. We don't want people willy nilly making properties. We want to tightly control this vocabulary so that we are consistent that they align with other vocabularies like Getty, or Europeana, and things like that. So these are things like is it an instance of something? Is that the date of birth, the court location? Or is it an inventory number at your organization, like a x number. So these are things we want to control? And these are the ones that are useful to museum professionals. Right. So these are just kind of the list of properties. And you can click on that. There. Right, I could pause right now there's any questions from anyone? That is just the, you know, 90% of wiki data is just understanding q numbers, as the nouns and P numbers as the connections between those those objects or nouns in wiki data. If you get that you're pretty much, you know, understanding wiki data. Okay, so feel free to type into the chat, if you have any questions around those things. So here is an example of what a queue item looks like for George Washington, you can start to see this makes sense. So just think of anything that you can break down. It's almost like sentence diagramming. Right, you say George Washington was, you know, the founding father of the United States, and you know, he was painted in this painting. So you can start to see this all broken out instance of human part of the founding fathers, sex or gender, male country citizenship, United States of America, right. So once we have these three parts statement, sometimes we call these triples, or statements, claims triples are all pretty much the same, you can start to see that we have the representation, the database of Q 23, p 31. q five, that's what the database stores, but for human readability, we have George Washington instance of human Alright, so that's for our convenience. The cool thing is, we can translate those labels and go to all these other languages and certainly see that German Spanish Asian Chinese, we don't have to create new numbers for all these things, we just put the different labels in those different languages. And the cool thing is, if I model something in wiki data using English, everyone else, those other languages get it for free. That's something that does not occur in Wikipedia. Right now, if I read an English article, Wikipedia, someone has to go through the labor of translating every single word to all under 200 languages. Here, if I just say q 23, p 31, q five wiki data, every other language benefits from me putting that knowledge into wiki dub. So you can start to see this is very powerful. Now it's not I don't have to know, Chinese or Malaysian Bahasa Melayu, or Spanish, I don't even know these languages exist. As long as someone is translated labels, they get that information and that knowledge, which is really cool. So what you probably did not know is this is hiding in plain sight. So I don't know how many people have actually clicked on the wiki data item. Link in a Wikipedia article. If you don't respond in the chat, I'm gonna assume you've never done it. But Has anyone ever gone to that wiki data item? link in the Wikipedia article? I'd be surprised all the time. Good. Oh, Cal, you you're you know you're doing but there's all kinds of no, yes. Nope. Yeah. So most people don't know it exists. Right. So let's say I'm over here and say SpaceX crew, what is the main story here and Wikipedia,
Unknown Speaker 46:20
is hiding in plain sight. In fact, anyone who's done like eye tracking studies, I think you've realized this is the least likely place anyone would ever click on on a webpage. So it's right here hiding in plain sight, you click on wiki data item, I encourage you to do this, inspect the wiki data items and help improve it. Right. So you can see SpaceX crew, one Commercial Crew Program mission, you might want to say American or something like that, to make that better, we can start to see the statements that are here. Part of that, here's the logo image. Now, whether these are really NASA images that are public domain or not, we I probably need to check on those but probably is if they've survived. So you can start to see all the information that people are adding here, crew members, UTC date of spacecraft launch, and these are all properties that other folks have have sussed out as being the right properties to have for a space launch. Okay, so good to see folks are some folks have done it. But don't worry if you've never done it. It's just really interesting to see how much does the article match with what you see in wiki data. And if it doesn't match, please do add more stuff to wiki data. It's still a very young project that needs more eyeballs and labor. All right. So wiki data item will get you to something like this. And you will see there is the item, there is the property right there. And this is the value. And for most things, this value has to be something in wiki data already, right? It has to be something that has been set up as a number, but then other things like coordinates or date, or a accession number, those are not in wiki data, right? Those are just freeform numerical fields. So they will be checked against things like is it a number, is it an integer is a fraction. But you know, wiki data tries to put you on rails as much as possible. When you add this information. Well, you can see the underlying and wiki data, we store q 11268, p 31, q 189445. That's what's stored in the database. And then we have those labels on top to give it meaning to us. That's readable. Right. So sometimes we call these triples, sometimes we call it statements or claims, we can still see that we can do all kinds of neat things here that we can say, by camera legislature in Wikipedia, I'm sorry, wiki data is an example of a voting school. That's not exactly a voting system. But I guess it is a voting system. It's part of a political system part of a societal system. So you can actually do cool things like search wicked data for all examples of legislatures that use a particular voting system, because we have my camera legislature modeled like this in wiki data. So let's just compare this to traditional databases. So this is why I ask folks, how familiar are you with traditional databases? If you've ever used a spreadsheet or even an address book, you know, what a traditional databases, right? rows and columns? That's all it is, right? And then what we call that organization of how many columns should I have? And should it be a date? Should it be a country? Should it be a medium, that's what we call a schema, right? The organization of your database in these rows and columns is what we traditionally call a database schema. So things that you might have used before,
Unknown Speaker 49:22
anything from like MySQL, or SQL, or, well, any of the PC based ones like a database. These come from relational databases, right? If you learn Structured Query Language and select this from that, that's what we call a relational database. These were the dominant databases for decades. And it's a good reason why they're just organizing things in the rows and columns. The problem with these is that if you want to change the schema, it's very disruptive. You need to kind of get agreement from everyone who uses these databases that you know, I'm not going to call it medium anymore. I'm going to call it material and I'm going to break it down to two different Call Is that okay? And you need to get agreement from everyone that you're gonna do this otherwise, chaos reigns. So anyone's use TMS, or any database or asset management system at an organization, you know, that usually have one or two folks who really dominate the database because they need to know what's going on, you can't just willy nilly change columns and change something from an integer to a, you know, some other kind of field, you need to kind of have someone as a database administrator, even that word administrator of a database is very traditional. So the difference here is that relationships in this kind of traditional relational databases, they're not easy to find. So here's the difference. wiki data is now what we call a new generation of RDF databases. So sometimes, you know, there's different names for this kind of stuff. Sometimes you hear focusing, no SQL, like it's not structured like a relational database, there's no rows and columns. And that's a good reason why most of those triples that we talked about the statements, we didn't talk about rows and columns, were pretty much saying, okay, Edward Hopper is a citizen of United States. And he was born in the state and he created this, but that's in this collection now. And that's a painting. So you can start to see how we're starting to model relationships in kind of a free form way here. So you can see why this kind of database is often called a graph database, or a triple store in a way that's very different than the rigid rows and columns, databases of the past. And there's a role for each of these different types of databases. But for the Wikipedia community, this was a dream come true, right? Wikipedia is a project that's never done, it's kind of always in flux, always a work in progress. And for someone to have to lay down the law, saying we're going to structure it this way with rows and columns would never have worked. But if you do it in this way of saying, I'm going to add a statement here, a statement there, and I'm going to change this today. And it might be different tomorrow, then this kind of triple store, RDF database is ideal. It can always be changing. There's no fixed schema. But that's that can be challenging to we're going to talk about that in a second. Like we this is the same database for art historians are doing our work in the same space as military history folks doing the same thing as fashion history, folks, it can get a little confusing at times. All right, so the summary, RDF triples, or the triple store or the graph database, like we can add, it makes our very fast, flexible and fast systems. They're suitable for the wiki culture of being bold and changing stuff. And the weird thing is multiple parallel ontologies can exist, right? How we model dresses, and phases, and historical houses all can be mixed in the same database. The downsides are, it's hard to figure out how to find stuff. Honestly, it's kind of weird. Like if you want to go in there and say, show me all baseball cards sometimes like is it an instance of a card is an instance of a baseball card is an instance of a car that's used for sports? How do you find that thing? Right? So that's not easy. It can be hard for newcomers to understand, right, these triples are kind of weird. And then the same reason why it's a benefit is also the same reasons why it's a downside is that multiple parallel ontologies can exist, that means you sometimes don't need to fix the big problem, you can kind of just concentrate on little problem of how do I model baseball cards. But then someone from library Congress can come by and say, Well, I'm not a baseball cards like this. And then you might be passing each other in the night, you might not be doing things the same way. And there's nothing to enforce you ever doing things the same way in wiki data, that can be a big downside. Alright, so let's take a quick pause here. Any questions before we try our hand at something hands on. So at this point, you should understand wiki data is a triple store, a graph database and RDF database to pretty much synonymous with this kind of new database system where we're storing statements in this way. And they're easy to kind of manipulate with three part statements. Okay, any questions, feel free to enter in the chat where you can raise them, or raise your hand somehow in zoom and Courtney can call and you will start to get to specific examples now with how they work with glam organization. So if you wanna take a little stretch, break, feel free, we're gonna we're going to almost now right now.
Unknown Speaker 53:59
So feel free to stretch. Before we start this exercise, I'm going to stretch. So we're going to be doing a project a sorry, an example of using a wiki data Knowledge Graph. And this is actually pretty new. So you folks are getting us a sneak peek at some new tools that we've created here. For anyone who was at MCA last year. This is a this is what I talked about at the Ignite talk, but I didn't get into the nuts and bolts of how it works. So we're going to actually get into how it works. Okay, so the easiest thing to do if you have a mobile phone, it's actually pretty cool. You can actually QR code, scan that QR code. That's what I've made you folks do in San Diego last year. But you can also just click on the W wiki slash 55 D and get to that, let me go ahead and show you what that looks like. And the cool thing about this is this is not a canned multimedia example here. This is a live query on wiki data that brought back all these different entities and you can actually drag them around the screen and see the relationships here right so you can start to see that this death of Socrates is a painting. Death of Socrates here is also a painting, they both depict Socrates and depict the trial of Socrates here. And then he also kind of show how different works are related here. Like there's another work by Raphael that depict Socrates. And this is not exhaustive, obviously. But it's just showing you some examples here. All right, so make sure everyone can bring that up on your screen. So whether it's on mobile phone, or whether it is by bringing up on your web browser, I'm going to do a mobile phone just to show you, this is doable. So if you bring this up on your mobile phone, you should see on any modern phone, this come up on your Safari or Chrome browser, right? And feel free to drag them around, click on those items. And you'll see that wiki data scrolls way, and it says, Oh, you want to know more information? Go ahead. Welcome. Look at that. That's really cool. You can say, Oh, it's a metropolitan scene bar. So the blue bubbles are showing you what it found. If you want to stick around, go ahead and click on Salon of 1787, for example, and look up your app. So change wiki data is just what you're displaying, you're starting with death of Socrates, go back and say hi, interested in Metro about museum art. Alright, so there it is, where's much more, I can click on that. And it can say, Oh, it's actually here in Manhattan, I click on Manhattan, and you can start to grow your display, which is really cool. Right, so this is live. So this is not a canned exercise. I mean, you're used to seeing a lot of cool animations and stuff that people spent time, you know, working in Adobe After Effects or whatever, this isn't live. And anyone can do these queries. We'll get into that later on. But I'll show you what the query looks like. This is the code that gets that query. Don't worry, I'll show you in a second how this all works. But this is just to prove to you that there's no fishy business going on here. This is actually just specifying the cue numbers, you're interested in wiki data. And it's going to show you all that stuff at any organization can right now do this with things that are related to your collections, or anything you have in wiki data and show those connections together. Alright, so let's take a look at trial Socrates. And see that Oh, it's Socrates. So jack Lewis, and we can start to see other information. So this is just another way of experiencing knowledge. And I think is, you know, I've been working on this for two, three years now. It still blows me away. Like, I wish I had a tool like this when I was a kid to see these connections, because otherwise we're leafing through paper encyclopedias, and we never understand a lot of these things. Right. So here's the question. Yes. So Liz,
Unknown Speaker 57:32
I've got a question from Liz Neely. She wants to know, can Linked Data include data external to wicked data, such as wicked data that links to the LSC Library of Congress?
Unknown Speaker 57:44
Yes, that's a great question. What wiki data does now is it can hold a pointer to what is an lcsw database. So this often happens for like authority control records for people. So if you go to library Congress, and if we go to id.llc.gov, this is kind of their main portal for finding their linked open data records. So if you go in here, and you say, Barack Obama, the thing to look at here, there's a lot of stuff comes back where you authored books, but what you want to do is say under name authority, you know, and you want to try to find out the main name authority for brock obama. And that's probably the number which is identifier there. And what the the philosophy of these databases that wiki data shouldn't try to hold everything about brock obama, like it shouldn't try to replicate all the stuff that library Congress has, hopefully, all these organizations, whether it's library, Congress, OCLC, the mat, Smithsonian, will all have their own wiki data like databases, and then what we're gonna do something, what they call Federation, right means that you don't have to have everything in one database, but I can kind of say, look up this in the Met database and look up that in LCS database, look up this Smithsonian and showing the results. And that is possible today. It's not being used that much. But we actually have Federation across 40, some different databases of wiki data. So you can actually say, look up this identifier in LCS database, and then that field and LCS database, and bring them all together and display that in one record. So that is the the the the dream, to have
Unknown Speaker 59:18
the dream, I guess.
Unknown Speaker 59:21
Yeah, I've gone in to wiki data records and linked it up to LLC and, and dulann IDs, and then have that make sure that's in my collection. So like the graphic tool that you show us if it could actually start, like actually even breaking the the membrane like that would be so cool, because if this had, if this knows it's linked to that LLC record, and that that name file is also in link data, if that could link out because I think that's like when I think about putting my collection here if it's just another export that nothing ends up linking together. It's just Not the dream quite yet. Right? And so I guess that's what I was, like I'm talking about is if, if, if this kind of this tool is beautiful, if this tool helped break that wall, it'd be awesome through those very, you know, through even those top level crosswalks there, that'd be really cool.
Unknown Speaker 1:00:23
You're right. Yeah, absolutely. It's possible today, we just have, I've never done it before, but it should be possible. But you, as you said, requires that those organizations have a functioning RDF database that you can actually do that federated query into. And more and more folks are experimenting with it. And if you want to look at the software, that wiki data, use it, it's called a wiki base. It's something you can actually download, and play with. And if you're familiar with Docker, that is the simplest way to play with it. There's a Docker container, you can download it and just launch it on your Mac, your PC, even like Amazon cloud, and just kind of like one click, it'll start it up. And you have like a replica of the of the system of wiki data with nothing in it. And you can start to populate it with artworks or whatever you want to try out. So that's been more popular last year. But that's a great question. Like, that's the dream of punching through to those federated databases and showing these really intricate connections. Yep. Yeah, thank you for that question. That's great. So I highly encourage you to play with this, I'm going to show you a tool later on in the third hands on where you can specify anything that you want and start exploring right now. That's one example. But we can show you an example in a second. Okay, we're gonna show it to you right now. Good. So we're going to show you this tool that I hacked up this past year, it's called knowledge grapher dot tool forge.org. So you can click on that. And it's a very simple interface like this. Okay, so it's called knowledge for dot tool forge.org, where you can click on that right there, I can probably put a link in our chat, if that helps people. Because it's a long, it's not hard to type. But it's a long, weird name to type. And what we can do here is just type in any two starting entity. So I suggested you type in Wright flyer and Wright Brothers, right. So the Wright flyer was their first plane, and Wright brothers. So this is a very innocent looking textbox. But the cool thing is, if you look at the examples here, I can type in any name of a Wikipedia article in French, English, Spanish, Chinese, whatever I want. Or I can put in the raw queue number from wiki data, but I'm just gonna put in these two, right flyer and Wright Brothers as my two starting points. And if I hit GRAPH, it should show a connection between them. Right, it should say, oh, Wright Brothers designed, or the Wright flyer was designed by the Wright brothers, what I can do is can start clicking on these notes, and start building out my graph. So you can say, Oh, well, it was followed by the Wright flyer, too, which is kind of interesting. And the Wright flyer too, is instance of an aircraft. That's good to know. I can go in here and start exploring, it's Oh, it's at the National and Space Museum. That's kind of neat, I can go ahead and hopefully click on that. So go ahead and try that. So for folks go to Knowledge Graph or dot tool forge.org.
Unknown Speaker 1:03:19
And this just a very simple tool that you can type in any two entities or three, or four or five, whatever you want. And if there's a connection, it'll graph that connection between them, and then you can start to grow that graph in displaying that right. Another one I suggest you folks could try is sue the dinosaur at the Field Museum, which is kind of fun. So you can look up an artifact that's famous. And hopefully we have a wiki that item about it, I hope, but type in en for English. Dinosaur, that's our Wikipedia, Article name for it. And then blue Field Museum. And this is where those aliases help, right? Do I say the Field Museum or Field Museum? Hopefully, this will match it. Let's hit GRAPH. Oh, there we are. So there's the Field Museum. And I'm like, if I don't know where the Field Museum is, I can click on that. And I can see here that is. Chicago. Let's bring Chicago in. And then someone had asked what do the blue numbers represent? So are you talking about the numbers here? Yes. So that's normally Yeah, unfortunately, the numbers here are not going to be that useful to you. Except that it says it's four nodes that show up, which is kind of interesting, right? So you can actually see that instead of one answer. It has four answers, and it's not going to expand the four answers for you until you click on the node. Right. So you can see that it was significant event and it says Great Chicago Fire world's Columbia's Columbian Exposition for dearborne progress. That's kind of cool. So the numbers they will show you there's multiple things that come back, but if you click on it, I will show you what those things are. It's kind of neat, right? So let's see if I click on this one, what is this one? This is shares border with? Oh, that's kind of interesting. So someone had gone through the Yes, so someone's gone through the pain of Satan Chicago's shares a border with Skokie, Park Ridge, Evanston, all that stuff, which is kind of interesting. So, you know, you can start to see that you start with sue the dinosaur, look at that Museum, you start to learn more about Chicago, and you can start moving moving out. So it's kind of like what I'm sure everyone has done before is you started Wikipedia article. And oops, 30 minutes later, you accidentally learned 10 new things. And this is just like the graph version of the wiki wormhole, as we call it, like, you start to go in and start to learn these things. And like, Oh, that's pretty cool. I didn't realize that we modeled all this information of Chicago having all this interesting stuff here. And if I want to know more about the World's Columbian Exposition, I can do that. And expand that commemorates the quadrangle, and we can start to move out from this. So really cool things you can do with this tool. And now you have a general purpose tool with knowledge grapher, dot tool Forge. org, and I recommend that, yeah,
Unknown Speaker 1:06:03
we had a quick question on your previous screen. If someone was asking Susan white, what did the blue numbers represent?
Unknown Speaker 1:06:10
Oh, yeah. So I marched through that, but I'll show you again. So if you go through, if you go through clicking on these things, anything with a number shows you that there's actually three instance of statements, if you click on that, you'll expand those three. So you can see here that it's kind of interesting. That's kind of like multiple things that sue the dinosaur is here, trying to source Rex, it's a skeleton, it's a fossil find. So it's kind of like multiple, multiple instances of things there. And if you want to find out more about T Rex, you can do that. Find out more about skeletons, you can do that. And anytime you click on the number, it's going to expand that to show you more information.
Unknown Speaker 1:06:52
Unknown Speaker 1:06:54
Yeah. So that's just really fun to do this and try different things there. And I will warn you that it kind of fail silently if you don't enter in something that matches, right. So if you put in two very unrelated things, it's just going to bring back blank. So you need to kind of know something. Something very solid, like Leonardo right, face it, Leonardo da Vinci. And the last, the Last Supper is by Mona Lisa, it's by better. There's many laughs suffers by many artists. So only one really famous Mona Lisa, I hope this comes back. With that connection. Yes, we can start to explore out from there and start expanding those nodes. Okay, so we I've seen folks put in like 10 2030 things, great, really cool things. I will show you one last thing, if you want to play with us, we're not gonna have time here. If you go to filmmaker, I have a special mode, we can put in one filmmaker, and I'll show you all the films and all the actors of that person. This is really cool. So if I click click on Kathryn Bigelow, anyone here to film his story. And this is really fun to do. I put in Kathryn Bigelow, and look at that, it's going to show you all the films she's worked on, whether it's a producer, screenwriter, director, and then all the actors she's worked with as a result of that. So you start to see her the favorite actors that have worked with her in different films, which is kind of fun, that sometimes you have these little isolated islands that have nothing to do with anything else, like TV series episodes. So there's a lot of cool insights you can have by using tools like these graphing tools. Then if you're interested here, you can also go to creator mode, and click on artists. So for example, Mary cassat, which is huge, I hope it doesn't blow up my computer here. And just to give you a warning that you'll get more meaningful results here for artists that have worked before 1925. Does anyone know why 1925 is important. I think most people here probably know why 1925 is important. But you'll see, you know all these paintings of hers and what they depict show up here. Yes, Liz is correct copyright. Right. So post 1925 or pre 1925. Generally, you can start to see a lot more public domain works. And then anything post 1925 you're not gonna get lots of modern art, not a lot of images on wiki data. Alright, so you now have the tool in your hand that even most Wikipedia editors and most wiki data editors don't know about. So you have a secret power that those folks don't know about knowledge grapher dot tool forge.org. Okay, so wiki data has more than 90 million items. Simple searches take less than a second, which is pretty amazing. And complex queries are supported by a language called sparkle. Right? So let me just show you an example of how this works. It's actually pretty simple how this works. If I want to find all instances of bycamera legislators remember we had that triple before. All we need to do is specify to the query engine of wiki data. We put in question mark, legislature and Then we put in p 31. And q 18944 or five, the W DT tells wiki data, we're looking at this, you know, property. And then w d column means we're looking for a wiki data item called q 189445. And then we put question mark legislature in the front, it's just going to match the last two against anything you can find for that first variable. That's it. So you basically just specify a pattern. And then it'll find all the stuff that matches that pattern. So here's an example of what that brings up. You can start to see here, that when you run a query, and I'll say, a query, stalking about 48, results in less than one second. Okay, so that is pretty amazing if you think about so I did this query like a year or two ago, that 52 million items, now there's 90 million items, it still comes back in less than two seconds, I think it is, that's pretty amazing. You're searching 90 million records from a database that's constantly changing, and you're getting results in a second or so. So you can start to see that there are by camera legislators, not just us, but Kenya, India, Canada. And then you can graph these and do interesting things with that information.
Unknown Speaker 1:11:05
Okay, so then the last thing I'll show you kind of here as part of a wiki data items are identifiers. So this is something that Liz, I talked about before, like, how do you point out to other databases? And and how do you follow that information. So at the bottom of any wiki data entry are what we call identifiers. And this is sometimes I call the the pot of gold, at the end of a wiki data item. These are really valuable to say, Hey, we have this number for this artist, here's who it is in that database. And here's that exact match in library, Congress, and Getty, and Europeana and tape and met. And that's super valuable, because we don't actually have anywhere that does that in a authoritative way or a comprehensive way. And wiki data, if for no other reason is getting very popular just for that function of resolving these authority control records or identifiers to other places. Right. So here's some examples of what these PII numbers look like, or these properties, or the identifiers, as we call them. These are just some examples that we use in the Smithsonian world here, Cooper Hewitt has a person ID, Sam, or American Art Museum has a person or institution ID, there's a volcano ID, which is kind of cool. There's a identifier for volcanoes, this facility is not just museums, scientific applications. There's a lot of identifiers here for different folks. So we had another question. Or if you're having a day that needs something common update, we'll see how to half and Ulan IDs and the data, it is very relaxing, it's actually really fun. There are some games that we have in our wiki data world that allow you to just sit back and hit 123 buttons just to match things with wiki data, which is really cool. Alright, so let's go through the querying part of wiki data. And then we'll go to some examples. So querying, so this is what the query looks like in wiki data very quickly. And the basic search is very simple and wiki data, you can actually choose from a bunch of examples, then we're going to go through that in a second. And this is what that query language looks like, don't be scared off by the all caps, or the service wikibase. Pretty much what you see in the middle there, on line three, item, wt p 31, q 146, which is a cat. That's the main thing, the only thing you really changed in any query is to say, I'm looking for all cats in wikidata. So let's do that all together. So let's go and try query that wiki data.org. And I couldn't get me directly to the cat query, but I want you to learn how to do it on your own. So you could go to query that wiki data.org. And you should get this screen here and make a little bit bigger, so you can all see it. So hopefully, everyone has that there. I will also paste a link into the chat. So people can just click on that. So I'll give you a second to get there. Now, the only big downside of this, it doesn't really give you much direction what to do. I recommend people just go to examples. Oh, let's see. Yes, we have we have cats on screen. Yes, we have cats on screen. Awesome. So Courtney, that's her cat on screen. That's great. So this is a gonna make your cat happy. So we actually have lots of cool queries. Show me all humans without children. Show me all. Show me all countries that have sitelinks on the wiki. But we're going to go ahead and choose cats. So if you click on cats, you will see a very simple query, where this is the content right there. Right. If you click on the eye icon right there, it'll show you kind of a friendlier display on this right there. So it says show me all instances of house cat. Okay, and that's it. So you just brought up the query and go ahead and click on the blue play button right there. And you will see, hopefully, there we are 149 cats come back in 180 milliseconds, so believe or not All 90 some million records were searched in 180 milliseconds. That's how fast these graph databases can be if you have a very pinpointed search like that. All right. So it's important to know that these are named famous cats. They are not Tabby or Maine Coon, or anything. These are not breeds of cats. These are famous cats that rise to the level of having an individual entry. Some of these might look familiar to some of our YouTube stars. Some of these are presidential cats that live in the White House, like socks for Clinton. Some of these are just famous in their own right, they had some famous role in history.
Unknown Speaker 1:15:37
And cats with college degrees. Yes. So one of the cool things about querying wiki data, as you can fight vandalism and things like that by just doing some basic checks, like make sure a cat has lived more than 25 years or something like that. So sometimes you want to do some sanity checking on things related to these types of records. But what's really cool about this is that you can change this very quickly. So if I go up here, and I put my mouse on top of that, hopefully everyone's seeing that it says instance of, and cue 146 is a housecat. But I can go in here and change this right here as well, if I click on the i button, right, so the way I got here is I click on the i button, I can go in here and say horse, domesticated workhorse or divest? So what would be your guests? If I click on that, if I click on the blue button, whether it be more or fewer? horse? Hmm. And someone did the goats query is only eight goats? Yes. Unfortunately, goats are not as well known in our human world. But if you click on the blue button, you'll notice that there are a lot more horses that come up 11,000 more horse or 11,000? Some horses, right? And you probably wonder what's going on? Well, it kind of makes sense, because these are all resources are mostly resources or show horses, right. So these are named famous horses versus named famous cats versus named famous goats. But this is your basic wikidata query. And most of your queries are going to be this type, show me all instance of paintings of archaeological sites, things like that. And that's your basic query. And once you get that down, it's actually quite simple to customize this going forward. Right. So hopefully, you folks have done that to do the cats query, and gotten your basic information about cats, and goats, someone just goes. Okay, so you can go back to the slides and kind of play with this on your own. But this is really fun to try these different queries. The cool thing also is that if you have different kind of data types coming back in different columns, it'll do magic for you. It'll make graphs, maps, charts, all these different things. So let's do one more query before we deep dive into some case studies here. So let's examine DC area museums. And I'm saying DC only because I'm sitting here in DC, I'm going to have to change this query to be whatever you want. So go ahead and click on this one, W dot wiki slash five, V Zed. Let me see if I can copy this link. Oh, no, that's not good. Let me go ahead and try going to type that in correctly, W dot wiki slash five v Zed like that. And if you click on that query, notice what happens is that it will look for all museums within 100 kilometer radius of the center of Washington DC. Now when it comes back, it's going to show you the queue number, right, the item, the label, right, you know what the label is now, American history, I'm sorry, American poetry Museum, Peterson house, Baltimore Museum of industry, it's going to show you also the geo coordinates as well. And this is where the magic happens. Because as a geo coordinate, if you go to the left hand side, now the map option lights up. And believe it or not, if you just click on map, it's going to do that. Because it said, oh, there's geo corner, you probably want me to not put on the map. And it just does it. which I find really, even to this day, like I'm a programmer, and I find this not that hard to do. But I'd much rather prefer to do it automatically for me. So look at that. That's pretty cool. Yeah. And what you can do is go back to that list, I can go back here to the table, you can say, but I'd rather see a gallery of images. see a lot of these had pictures. Can I just see a scrapbook of images? Absolutely. You go back here and say image grid. And there you are their grid. And if you have collections and wiki data, this is really cool. You can kind of see all your content in this kind of slideshow format, not slideshow, but a kind of a scrapbook format here. You'll also notice that if you go back to the table, we also have inception. So you got to do a timeline of like when these museums are founded in DC, which is kind of cool. And then visits. If you do a visitor data, there is a a field for visitor data. You can go in here and say make a bubble chart and look at that That's pretty cool, right? So you can see that Aaron space and Museum of Natural History and DC are the most popular. And that is true, we know that for a fact. But it's kind of neat. How many other things now this is not a very useful title there, we can actually go and look National Gallery of Art, that's also very popular as well. Right. So that's pretty neat that we can take the same query, and, and kind of hone in on any of these columns and come up with a display. And we can do timeline too, but it's not as pretty, but I'll just show you just to be complete.
Unknown Speaker 1:20:29
That's what your timeline looks like. So all those options are available to you, when you return those columns in here. And that's why I think, you know, wiki data, as an adjunct to Wikipedia is so powerful, because you don't need to wait for someone to make these graphs and charts. And believe it or not, that's what things are done. That's what things are done as right now on Wikipedia, they're hand drawn. They're handmade, mostly. And you need to wait for someone who's adept at Adobe Illustrator, or GIMP, or any of these tools to make these graphs and charts. And here, you can make them on your own with wiki data. Right? Okay, so what I want you to do is let's try modifying that query, I rarely make my own queries, I always copy someone else's query or based off something else. So if you go into your query, and the right hand side here says Edit sparkle right there. Right at the right hand side says Edit sparkle, you click on Edit sparkle. And you will be presented with code here. That, you know, you're kind of scared of, but you can go ahead and click on the i button, if you want right here, click on the i button and change Washington DC to whatever city you want, then you have to be city, it could be a, you know, some other thing that has coordinate location. So let's say I'm going to say Space Needle work. Let's try this, let's say 100 kilometers from the Space Needle. Let me hit that play button. Yes, if I'm 61 museums within 100 kilometers of the Space Needle, so I don't even have to specify a city, I can say a something that has geo coordinates, it's going to find out those coordinates and do that. And then what I can do is go down here, the eyeball button, and choose map. And look at that. They're all the museums, pretty much the Seattle area 100 kilometers from the Space Needle. Pretty cool. So what I want you to do is go ahead and try that with either where you're sitting now, or some interesting location, you can even if you're daring, go ahead and change the 100 to something else, right. So you can go in here and say I don't, I don't want 100, I want a 500 kilometers, you can go ahead and change 500 right there. And I can change it to let's say, let's see, Mount Everest base camp. So now let's say something else here.
Unknown Speaker 1:22:57
And see Mount Fuji, and hit play. And it should come back with something. But I have a wider radius only take a longer time to look for that. Oh, that's actually a lot of stuff to come back. 1600. So 500 kilometers is pretty big. So a lot of museums come back there. So go ahead and try that. So I would say just change the location there and try generating a map. I'm gonna change this back to 100. And then make sure you can after executing that query, see the results in a map, or just pulled down in the eyeball button and choose map right there. Okay, and there's my map for Mount Fuji. And what I'm going to do is go ahead and capture that. screenshot that and I'm gonna go back to my Google Doc that I gave you this morning. And I'm just going to paste it in there. So notice that I put in the one for Omaha earlier and I'm gonna say Mount Fuji and pasted my little
Unknown Speaker 1:24:08
Unknown Speaker 1:24:15
Oops, go back here and do it again.
Unknown Speaker 1:24:19
So go ahead and try that. Make it make your map for your locale and try pasting it into the document there. Oops, not sure why it's not taking you there now. I'll give you a few minutes to just make sure you can do that. So there is space needle. There is not Fuji. If you have any problems with that. Let us know in the chat. See if there's something went wrong. Yes, can you please show how to get from the query to the page with the graph? Oh, the page with a graph. Oh, okay. So I'm not sure if this one is going to have it. But you could pull down this menu right here. And you can get to timeline you can get to bubble chart. So it looks like we do have some data come back here. This, this query is got multiple national art centers here. But we different divisions. Yep. Great. So yeah, it's, it's going to light up different options, depending on how many columns you have here. If you don't have geo coordinates, it won't light up the map option. But if you can paste in some of your creations in here, Oh, nice. Someone did Los Angeles. That's great. Someone did, sometimes hard to guess where these places are, if you don't? That looks like Yale or New Haven. Yes, great. Gainesville, Florida. Nice. Okinawa, that's great. So if you don't know, if you know that there's museums missing, that's a great prop to go into wiki data and try to enter in geo coordinate data or to fix some of those entries in there. But that's a great exercise to to try to go in and investigate your glam organizations and in your area, just to make sure they have something in wiki data. Okay, so some examples like Chicago, Seattle, Houston all have interesting results there. If you are interested in doing some of these queries, but you don't want to wrestle with the complexity of that query, we do have a simpler tool called visit query. And it's more like, you know, Mad Libs, like fill in the blanks for searching. And there's whole tutorials on how to do the more advanced searches. And it's really cool things you can do. But we're not going to get into that in this. In this presentation, we're going to talk about some of the interesting case studies, I hope that you folks would be interested in so one that we want to hone in on is open access at the Met museum. So why don't we take a quick stretch break while we make sure that Genie Troy is in the room as well, we are good about 30 minutes after our first stretch break. Let's do another stretch break. And Jeannie hopefully has audio and video available because Jeannie early is the database master of the mat that I work with to do most of these projects. Jeannie, are you round?
Unknown Speaker 1:27:28
Unknown Speaker 1:27:29
Oh, great. Hi, Jenny, why don't you introduce yourself to everyone real quick.
Unknown Speaker 1:27:34
Hi, everyone. I'm Jeannie Troy. I'm the general manager collections mission at the Metropolitan Museum. And I've been working with Andrew, for about two years now with wiki data. And it's been a lot of fun. I've learned a lot. And I think what Andrew is going to show you hopefully will pique your interest and will inspire you to add your intuition records to wiki data.
Unknown Speaker 1:28:02
Excellent, thank you. And we'll definitely get Jeannie to introduce some of these things that we've been working on. If we wanted to kind of break down the the three parts of kind of how we think about this. Some of you might recognize this as neat assignments, construction, which I really love of like, how do you engage the public with the organization. So whether its contribution as kind of the first stage or collaboration, and then the elusive third step of CO creation? How do you get to this, this new area that you're making things that neither side of her thought of before. And I think we've kind of hit this all three stages now with the Met organization, and the Met engagement with the community, which I think is really exciting. So number one is the contribution stage. As we mentioned before, 2017 was the release of the open access materials. And for anyone who wants to get into this, there is a very good case studies on how organizations are releasing their images and metadata under a cc zero license. That's the that's the most useful to the, to the world and to the Wikimedia community. And Cleveland Museum of Art, the Met Smithsonian, a lot of folks have done this with with great success. And once we have that metadata and the images available to us, we can now bring those objects into wiki data. So this is an example of what an ideal wiki data item looks like. A from the map right we have the labels description aliases, which we added before, we hopefully have meaningful statements and claims, and then external identifiers that we want to point out to the Met content, right. So for example, the death of Socrates that we just looked at, we have the label that description and the alias. But we also want other stuff as well like the inception, like when was it painted? What is it? Is it a painting? Is it a drawing? what genre might it be? What's the material used, its dimensions height and width, the copyright status of this thing, The inventory number within the organization, or we can kind of link directly to the object pages, or the API's of that organization to get more information about it. So this is an example of, you know, kind of the ideal core statements that we want and wiki data For met object. And hopefully, if you folks are interested here, whether your art museum, historical society or some other entity, there's some meaningful things that you can add to wiki data, based on your expertise and collections, it may not even be a holding, it may just be a database of women authors that we don't have in wiki data. And that's super useful to have as just populating wiki data and pointing to your, your data. And the kind of the core of everything that we do with the Met is a special met object ID. As you mentioned before, this is a p number. So you have to propose this to the community and say, there's a good reason why we want to create a new, unique identifier for organization. And for most museums out there that have their database exposed to the public, whether it's a web, a series of web pages, or an API, it's not that hard to get a object ID, or identifier like this. And then we just have some extra special modeling that we do for the Met, we're very lucky at the Met has broken down kind of like an all star set of artworks, which we call, you know, the highlights, and then has a timeline of art history, which is like a bigger set that has objects on view, which is the greatest set. So we actually have a special designation for those things in wiki data. So you can actually search all the highlight objects very quickly out of wiki data, or the timeline of our history, objects looking at it. And you know, you've worked with, you've worked on this, too. But,
Unknown Speaker 1:31:21
Andrew, we have a question from Brenda, are there? Yes, templates or style guides for facilitating creation of different types of common entries are helped to ensure consistency across museums and artworks? etc?
Unknown Speaker 1:31:38
That's a great question. It's, it gets back to that comment we had before or the thing that we observed before that there's no rigid schema on wiki data, which means that sometimes really hard to divine how to model a painting, or a watercolor or something like that. So we do have kind of best practices documented in a wiki project on wiki data called visual artworks. And you can go there or contact me, I can give you a pointer to that. But most things in wiki data, if there's enough critical mass, people will create what we call a wiki project, and then try to lay out the schema that is agreed upon. And that this is really getting geeky. But there's a new standard and Semantic Web called checks. Really weird sh x, which is meant to really define schemas for things like this. So that's kind of next generation for now we do have tables where we try to label try to come up with the best practices for modeling, things like this. So paintings is pretty good. But other things like sculpture not so good. Hopefully that answers that question. It's a great question, because I said schema lists. But if it's schema less, how do I add the next thing? That's a good question. It's not always easy to figure that out. I won't go too far into this. But this is just showing that we do have tools to mass import content from your databases into data into wiki data. And also to crosswalk this across. So we can say, you know, you call something altarpiece, and we call it this, you call something this, and we call it that, and then we can crosswalk and cross link those things. And then we also have technical tools. So something most people don't know about is that Wikipedia is a enciclopedia. Anyone can edit, we actually have a compute farm on the back end. And most anyone can create an account, interestingly enough and run code to help improve Wikipedia and wiki data. So anyone who's out there is got a little bit of coding or a little bit of interest in that you can create an account today on the back end of Wikipedia to do coding, which is really fascinating. Yep. Okay, so yeah, we've talked about the summary, representing the Met content in wiki data. But some of the challenges here, I'm sure everyone's gonna run into are things like, Well, sometimes the Met object ID is like a set of things. It's not a cup or chalice. It's a tea set, or it's an altarpiece with five distinct pieces. And sometimes that's complex to model in wiki data. So those are kind of our weird edge cases we need to deal with sometimes, Jeannie, you have a lot of experience with moving things into wiki data from your database, any words of advice or some insights from what it took to take TMS content to correlate it to wiki data.
Unknown Speaker 1:34:13
Um, I would familiarize yourself with quick statements, which is a tool to mass do mass uploads. The challenge is mapping it so everything on wiki data has to map to an existing item except for titles and numerical values. So all our object names have to be mapped all artists, a lot of our wiki data items do not have artists, because they do not yet exist on wiki data. So I'm right now compiling all our artists that new wiki data items and I'm hoping that community will help and create those items. So it can be very tedious, even something as simple as circa dates. They have before made it formatted just so you have to enter a qualifier very, very time consuming. I haven't we haven't done dimensioned yet. Dimensions have just been added to our API, the numeric values, hopefully, that's going to be a little easier. And then there are things nuances that the schema doesn't accommodate things like formally attributed to, there's no way to enter that right now at wiki data. Things like dates, we have tapestries where we have date woven, we have sculpture words cast, we have negative photographs, where we have a print date, there's no way to add that to the data yet. We have complex weight. And you know, we have Arms and Armor, very complex dimensions. Again, this schema does not accommodate that. So I've been working with Andrew to try to propose new properties. So it's not as simple as uploading a spreadsheet the way we're used to with TMS, it can be very tedious, but the more you familiarize yourself with these tools, and the formatting it requires, it'll be a lot easier, but it is kind of fun. I like I've been enjoying, I enjoy doing it.
Unknown Speaker 1:35:57
We have two questions to follow up on that. The first one's from is Liz Lee, how does the Met handle record updates and get that new information into Wikipedia wiki data? And I know you sort of touched on that. The second one from Brenda is how did you deal with duplicates existing records about objects already in your collection?
Unknown Speaker 1:36:18
Unknown Speaker 1:36:21
I'll just answer the duplicate question usually when, because our object ID is already a property in wiki data in the past and I tried to add it I get an error, it won't let me add something I have an existing that ID. So that's good. We don't have, I don't think we have any duplicates unless object IDs have changed. Updates is sort of another holy grail that I hope to work with the community on. Because we do have an API. So it could be possible for you know, something to be built to call our API to get updates. But right now, we don't have an automated way to keep our data updated.
Unknown Speaker 1:37:02
Yeah, that's a great point from Jeannie, I think the our long term goal is to come up with a better kind of round trip synchronization is helped a lot. And we'll talk about it in a few slides. That Genie now holds in the TMS database, wiki data identifiers for the artwork for depiction information and the artist. And we did not have that two years ago, when we started, it was kind of a one way of like, wiki data ingesting met content. But now that we that TMS on the left side holds that wiki data ID, it's really great to have synchronization that way. And we can solidify both sides much better that way. So I think we are we have a bot that kind of tries to do that on a periodic basis, we can probably do 80 to 90% of civilization easily. But then we have a lot of weird edge cases where we need to kind of come up with some better solutions for that. Some other questions? Are people thinking about a crosswalk between link art target model and wiki data? So there's standardization? Yeah, I mean, that's a great question that we were definitely looking into things like that as well. Yep. Okay, collaboration. So as we mentioned before, you know, finding new stories in things like the Knowledge Graph are really great in terms of opportunities, and we even have already discovered, you know, insights into things because we get out of it is like a fashion database and an arts database and a literature database and, and a database of biographies. It is allowing us to see connections we never really appreciated before. Then co creation. So this is an example, we talked about briefly last year at MCI that, where we use the Met museums, our keywords, artwork, keywords for machine learning. And then we train a machine learning system with the all the tags that the men had added for their artworks. And we say, Well, what happens if we feed the machine learning system a new painting had never seen before, and see if we could predict what was in that painting. And we made the job a lot easier, because we said paintings not 3d artworks, to the artworks only probably paintings before 1925, because of public domain. And, you know, it's a much narrower domain than then trying to match anything in the world. And we actually got some very nice results from this. So here's an example of what Liz was talking about before, like, Can I just kind of relax and help wiki data out? Yes, you can. So we actually turned this into a game where the recommendation from the AI, you know, was of uncertain quality, we fed it into the game. And the game basically displays this to the user. And the user says, um, the game asks the user, does this painting depict a tree because our AI thinks it depicts a tree, and all you need to do to be a useful contributor is to click on depicts a tree does not depict the tree or you skip it. If you're unsure. That's it. You just have three buttons on your keyboard that does this, or you can tap on your screen. I've done this waiting in line at McDonald's before you play this game. And it's great because you can help improve the AI and add content. So here's an example of what we have had at the in the lobby of the the Great Hall at the Met we had people come up with never ever ever read Wikipedia. I don't know what wiki that is. It just had them play this game. And they're meaningfully adding statements to wiki data simply by clicking on the green button or the blue button. And we actually had like 7000 some judgments resulting about 5000 edits from our first test. And it did really well on things like tree boat, flower, horse soldier house, so kind of landscape painting features, it did really well. But it did not do good on gender determination. Cats and dogs. Strangely enough, it didn't do that well, on cats and dogs. I guess they can have all different positions and things like that. And, you know, we've kind of felt like, Oh, we really failed on gender determination. But you know, what's what's interesting, Google announced last year, it was going to quit doing gender determination on image recognition. Because that said, and contextualize, it does not make sense to take the pixels of an image and determine whether it's a male or female. It just doesn't make sense. And I think that's the right decision. And it's kind of something that everyone who deals in this space knows is the right move is to not try to predict with a 90% certainty. That's a boy, that's a girl. That's a woman that's a man, it just doesn't make sense. Right.
Unknown Speaker 1:41:12
Andrew? We have a question about someone wondering where this game is, I think people would like to play?
Unknown Speaker 1:41:18
Yes, well, the funny thing is, it was so successful, we exhausted the the bucket of AI recommendations, we need to, we need to fire it up again to feed more artworks into it. So we, we processed all the paintings we had, I think, from Sam, American Art Museum, from Reich's museum and some other Cleveland Museum of Art. So we need to feed more in so contact me and we can maybe think of more things to put in there. Maybe Jeannie and I can find some bandwidth, the traffic get more artworks to be processed by the AI, because it's quite interesting. But the funny thing is our communities are prolific. They burn through the game very fast. So we need more candidates in there. That's right. So yeah, it's a great comment in the chat, like what is gender, gender is over determined anyway. It's like, yeah, it's just a foolish exercise to try to do that. Okay, so we've contribution, collaboration, co creation, as kind of these three slices. And I think we're meaningfully doing it. And not just the open access, slice, the collaboration to create new story slice. Also, in using AI and machine learning to come up with new recommendations, we've not even thought of before to create new tools. So I thought I'd show you just some of the the dashboards and the stats that we have. And you can click on these links and go to them yourself, we actually have little dashboards that track how complete are the items in wiki data from the Met. And these are going to get better as as Jenny said, now that they're structuring dimensions, and keeping the the artist IDs as not just strings, but things, we can start to do better in these areas of completing the properties for all these things in the wiki data. You can also go through and just look at the whole inventory, every single object we have in wiki data in a chart like this from the map. And then we also have things like what are the most popular artworks that are being described by Wikipedia and wikidata. Right now, these are the ones that rank near the top. Then we also have some stats that we often run on one of the most popular images, that and popular artists that are in the map. And then I thought, gee, you could talk about this, this is really cool, this kind of next step that we're doing, where now that the Met is ingesting and holding pointers to wiki data specifically, they're all kind of neat things that we can do here.
Unknown Speaker 1:43:28
Yeah, so during the past year, I imported all the wiki data item q ID for our objects and about 22,000 objects that have wiki data items. So I store that in TMS. I've also mapped our artists, to wiki data items. And I use that because we imported all the Ulan IDs for our artists that match to Ulan I use did that last year with openrefine. I pulled I queried all the names with you and ID and the wiki data and then match them to our existing un ID. So in one shot, I got about 12,000 names that now have two data items. And then for our keyword tags, I we have about 1100 subject keywords on map those two data items stored out in TMS. And those have all been added to our API, which Andrew who is showing right now. So we include the wiki data URL, the at URL to the names, the objects and our keyword tags. So it's hopefully people using our API can now use these links, and then, you know, extend their queries to these other data sources.
Unknown Speaker 1:44:45
Yeah, this is great. It's just to see all this. All these connections now in a way that we did not see just two three years ago, is really inspiring. And there's all these neat things that we can do for making sure data is synced correctly and also connected other databases in ways that, that we have never seen before, if we didn't have these, you know, precise, precise cue numbers that are being held on the Met side. So this is available, I think, What geniatech? About two months now, you've been feeding this via the API has been available.
Unknown Speaker 1:45:20
Um, yeah. So in summary, we added the wiki data and AP URL to the API.
Unknown Speaker 1:45:28
Great. Yeah. So you can hit the API with that URL there. It's also on GitHub, there's a whole dump of the wiki, I'm sorry, the Met database in a CSV file. So you can download all 600,000 rows of their TMS database and do wonderful things. And that's kind of how we did the AI project is going through all that. All that content there. And as I took a look at that, and we actually,
Unknown Speaker 1:45:57
I just want to say quickly that we're trying to be strategic and how we decide what to add to wiki data. So as Andrew showed in the chart, we started with our printed guidebooks, which are sort of like the top highlights. Then we have our website highlights, which is about 5000 objects, and then our timeline objects, which is about 8000. We're working on objects on view. I've also added I tried to add records where I think there are gaps on in the Wikipedia ecosystem. as Andrew mentioned, at the very beginning, like works by women, I added all our works by female artists, I added all our work by black artists, I added a huge bunch of our costume records, because there's very little fashion in Wikipedia. But it's very, very popular. We've had two edits on stage dedicated to fashion, and they've been very well attended very well received. So I'm also trying to fill in the gaps where I know they exist in the, with the ecosystem at large. So we're trying to try to be strategic, we don't, it's not like we're going to recreate our online collection on wiki data. That is not the point. We're not going to have our spoons and all our shoes, you know, but we do want to contribute records where we think that people should have access to that should get better exposure, and that will contribute to open knowledge.
Unknown Speaker 1:47:15
Deborah house has a question that follows up on that thinking about the MCs night session on your data are racist? Are there things that we can do in the audience to collectively improve these amazing wiki platforms to be more inclusive?
Unknown Speaker 1:47:33
Yeah, that's a great question. And we are trying to do a lot in this area, not just racist, but sexist. So I think wiki data is really useful in that it tries to tear down that super high bar that we had in Wikipedia, because that's super high bar meant that we only had 15% of all biographies in Wikipedia about women. Right now we've improved it quite a bit to like 18 and a half percent. That's a lot relative, but it's still far short of what we'd like to see. But that's still a net gain of 200,000 women, the biographies over the last five years, that's really great. But we still have a long way to go. wiki data is is gives us opportunity to do a little bit better than that, because the notability bar is not that high, we can be more inclusive there. Well, we have run into a lot of issues, as well, but it's the same issue that you see all over the place, like how do we model ethnicity correctly, or properly in wiki data to find all African American artists, you know, that's, that's sticky. And then even just the gender field that we have is really messy and, and cringe worthy. So it's like, okay, 85% of the time, but cringe worthy, another five or 10% of the time. So, yes, we need expertise from the glam community to help in this area as well. So I think the decolonizing museums side is really great to to help inform some of the stuff that we're doing and wiki data. And also the the issues with ethnicity and modeling that we need more help in that area. Certainly. Oh, I don't know if we have time for Santa Anna's wouldn't like but I thought I just mentioned that we don't have time to do it. But I just pointed out, this is interesting that there probably is not enough information about Santa Anna's wouldn't let general Santa Ana to have a Wikipedia article. But it certainly deserves a wiki wiki data entry. And believe it or not, there is no wiki data entry for Santa Anna's wooden leg. In fact, there's not a wiki data article about the museum that holds Santa Anna's wooden leg. And you might know this is kind of funny story and that this was I think, what was it taken by some regiment of of soldiers that brought it back to Illinois, and it's now sitting at the Illinois State Military Museum, fairly small museum, that this is by far the most famous artifact they have. I cannot find an open source or open access picture of Santa Anna's wooden leg. It's not uncommon. It's there's no wiki data item for the leg. There's no wiki data item for the Illinois State Military Museum as we speak. So I think I thought we had time we could do this together. But I think we're running a little bit out of time, I want to leave time for questions. But this remains an exercise for the reader if you want to try this, or I might just do it later this week. But I thought this was really interesting, because, you know, Texas has been trying to get this leg from Illinois for the longest time, because it's got a lot more relevance to Texas history than Illinois history. But the but the people in Illinois are like, no, we're keeping this leg. And it's been kind of a funny story. So just trying to model this look at data wouldn't quite interesting. So just to sum up, and I'd love to have some time for questions. wiki data is still an early work in progress, even though it's now like, coming on years old, many areas are quite bare. So there's still a lot of major issues with it, as we just talked about before with modeling, ethnicity, gender, things like that. So what we'd like to do is to see more use of wiki data content, you're starting to see this in Wikipedia already. Believe it or not, there's has been some pushback incorporating wiki data content in Wikipedia additions. So the information boxes you see, on the right hand side of Wikipedia articles are starting to use wiki data a lot more. But interestingly enough, English Wikipedia has been pushing back on this because they're like, we don't need this wiki data project, we can do it all on our own. And it's the smaller languages that are embracing wiki data, which is probably looking back at it not surprising, but it's quite interesting to see that the largest entrenched Wikipedia additions are the most resistant to using wiki data. So if you're gonna summarize wiki data, I sometimes called internet duct tape, but the most affectionate way, not that it's a hack. But it's like the thing that ties things together, it is becoming a hub for a lot of folks, especially if you're an archivist, there's a lot more activity now with, with pointing wiki data to the archives of like Eisenhower or the famous authors or, or academic libraries. So it's quite interesting to see how that property is now being used a lot more. So join us in doing a lot more of these experiments, contact myself or other folks at the let's get this real quick.
Unknown Speaker 1:51:59
The Wikimedia Foundation, there is this site that shows kind of the workflow for data and the partnership. So if you're interested in trying your your hand at contributing to Wikipedia and wiki data, this is kind of a nice workflow for trying some of this. And there's the link right there on the slide 115, well, under 15 slides in this deck, and you can go and click on there or contact me and I can point you in the right direction. We also have the wiki data in one page. And then in terms of tangible next steps. I mentioned that Fiona Romeo is now a full time Senior Program Manager for glam and culture at the Wikimedia Foundation. So she's no stranger to FCM. So that's great, you can contact her or myself, I'm not with the Wikimedia Foundation. But we tend to do community thing and official things in different ways. There's also a project called finding glams that tries to document every single library archive museum around the world. So feel free to get involved with that. We also have a wikimedians in residence exchange network. So if you're interested in a wikimedian residence, or some kind of part time person, even on a volunteer basis, to help with some of the stuff with your organization, feel free to contact me, I can put you in touch with them. There's also the open glam movement. So a lot of stuff being done with open glam and Creative Commons, they just had a hackathon a few months ago. And then also, there's a great project on wiki women in red. And this is associated with the Smithsonian's new American women's history initiative as well. Trying to address the gender gap problem and trying to get more and more biographies of women in Wikipedia, and it's really moved the needle quite effectively, we still have a lot to go here. So these are just some very simple ways to get involved. Or just contact me and I can put you point you in the right direction. So we do have these five minutes for questions or discussion on any of these things. A lot to digest I know. But I thought was important to kind of give you a foundation for wiki data, and give you some examples on what might inspire you. So any questions you can either type it directly in the chat or you can get recognized by audio and video. love to chat with you.
Unknown Speaker 1:54:08
Oh, there is a MC in 2020 wiki data channel. Oh, that's great. I've got a go join that.
Unknown Speaker 1:54:17
Not yet. But yeah, it'll be in about
Unknown Speaker 1:54:22
and we'll be soon. Happy to join that.
Unknown Speaker 1:54:24
Fantastic. And I know we had a couple people very interested in seeing that demo, which I know we don't have time for but they can reach out to you. They're on the Slack on the Slack.
Unknown Speaker 1:54:35
were set to Emma's leg
Unknown Speaker 1:54:37
for Santana's leg or just any questions they have on how to get involved that they might not have caught during this.
Unknown Speaker 1:54:44
Yeah, yeah, reach out to me. There's my email and my my handle on almost every social media platform. Thanks, Danielle. Thanks, everyone. Yeah, feel free to let me know, too fast, too slow, more of this less of that. We've done a brief wiki data to Two years ago wiki data tutorial, but this goes further in terms of talking about real collections data.
Unknown Speaker 1:55:10
Thanks, Allegra and Emily. And I might do the same to Anna's leg and post the results in the Slack channel.
Unknown Speaker 1:55:35
Oh, I forgot to mention we do a Facebook group. So if anyone has a Facebook or the Facebook group is fairly active and friendly. So there is a wiki data dot glam group on Facebook as well.
Unknown Speaker 1:55:45
You have the link that you could share in the chat for that.
Unknown Speaker 1:55:49
For Facebook. Yeah, I can do that. Oops, I need to go public here.
Unknown Speaker 1:56:12
We have one quick question with our three minutes from Brenda. She is focused on dupes and wants know, are there any ways to consolidate existing dupes within wiki data?
Unknown Speaker 1:56:25
Yes, there absolutely is. One things you get one of the things that we have several tools that do do doo book ation or merging. So we do have what are called maintenance critters that try to find duplicates. And then if you are sure to duplicate, you can actually go in here and choose the merge option here. So you can actually take an option, I'm not gonna merge it with something else. But you can actually take two wiki data items and merge them, and then one will be redirected the other one. And, yeah, that's the cleanest way that we have to do merges. But we need to make sure they are in fact, the exactly the same. Sometimes they're not, I mean, sometimes very, very close, but they should be distinct. And that's oftentimes a point of debate, we actually do have a whole section for proposing merges sometimes if they're complex. Oh, another thing I could point out while we're waiting here is if you're interested, just go to wiki data's project chat. So it's this button, the third one down. And this is also another great way to get started is just read the chat that's here. Anyone can post a question or ask, ask, make a request here. And it's pretty friendly here. It's English, mostly. So you can go in there and go to project chat on wiki data, and start conversing with folks.
Unknown Speaker 1:57:42
All right, well, I'm going to say thank you to everyone, especially to Andrew for this amazing presentation. We will have this will be recorded. And if you were at the conference, you will have access to this recording as soon as we release it. We promised email and message one it is ready so you can ask but it won't make it go faster. Thank you again so much, Andrew, and we will see you on slack.
Unknown Speaker 1:58:09
Great. Thanks, everyone. See you on slack.