In this episode of K9 Conservationists, Kayla speaks with Ellen Dymit from Oregon State University about Metabarcoding, SNPs, and microsatellites.
Links Mentioned in the Episode:
Science Highlight: Training of Ecological Detection Dogs for Wolf Scat (Canis lupus)
You can support the K9 Conservationists Podcast by joining our Patreon at patreon.com/k9conservationists.
K9 Conservationists Website | Course Waitlist | Merch | Support Our Work | Facebook | Instagram | TikTok
Transcript (AI-Generated)
Kayla Fratt 00:01
Hey all, I’m just dropping in to the start of this episode with an ask. K9Conservationists is a nonprofit, and we are heading into the holiday fundraising season. I know there are so many worthwhile causes to support, and times are tough, but we do have to ask. So this year, our goal is to raise just under $5,000 which would get us a new GPS collar to make sure that all of our dogs have their own GPS collars for our surveys and to cover my travel to South Africa for the African Canines in Conservation Conference, which is hosted by the Endangered Wildlife Trust. I’ve been invited to be one of the keynote speakers, and I’m so so so excited, but it’s going to be really expensive to get me there, and we want to make sure that I can do this for free, rather than charging Endangered Wildlife Trust or raising the prices of this conference, which is aimed to help African canine handlers get together and learn we don’t want to be charging them for my, you know, very expensive plane tickets. So if you find the content that we put together on these episodes valuable, I really hope that you’ll consider donating, which you can do at k9conservationists.org. Your donations are tax deductible and will make it possible for me to travel to South Africa, again, at no cost to our hosts. And then, aside from being a keynote speaker at this conference, I would also be able to provide some one on one, mentoring and shadowing for several different teams that we’ve worked with remotely for the past several years, which is really, really invaluable. I am so excited about this opportunity to provide free capacity building and support to these programs, but we do need your help to pull it off again, you can donate using the big green donate button on our website k9conservationists.org, thank you so much. And here’s the episode.
Kayla Fratt 01:53
Hello and welcome to the K9Conservationists podcast, where we’re positively obsessed with conservation detection dogs join us every other week to discuss detection training, canine welfare, conservation biology and everything in between. I’m Kayla Fratt, one of the co-founders of K9Conservationists, where we train dogs to detect data for land managers, researchers, agencies and NGOs. Today, I’m super stoked to be talking to Ellen Dymit from Oregon State University about what I’m affectionately calling genetic gobbledygook. So we’re going to be going into all of the stuff that happens to our scat samples and all the stuff I need to learn throughout this PhD process. And I figured that instead of just asking Ellen all these questions over our dinner table, because we now live together, we’re going to do it around the dinner table, but we’re going to share it with all of you. So Ellen, you’ve been on the podcast twice before, but why don’t you just give us a reminder of who you are. You know all your accolades. Tell us all the fancy parts about you, maybe not all of them, because there’s too many, and then we’ll get into questions.
Ellen Dymit 02:54
Thank you, Kayla. I’m Ellen Dymit. I am a professional roommate to Kayla now, and also a professional poop scientist. I’m a fourth year graduate student and wildlife Sciences at Oregon State University, working also with Dr. Taal Levy. And I work in both Alaska and in Guatemala, studying carnivore diets, among other things.
Kayla Fratt 03:19
Cool. And so right before we do get into this episode, we do have a science highlight. Now that I am properly into PhD Ville, I’m doing a much better job of reading the scientific papers needed to prepare science highlights. So this week, I read a paper titled “Training of Ecological Detection Dogs for Wolf Scat (Canis lupus),” which was written by Hilda Vervaecke, Ellen van Krunkelsven and Kohen van den Berge. These folks came out of Belgium, and I really apologize for anything I did to their last names, in particular. This was published in May 2021, in the bulletin of University of Agricultural Sciences and Veterinary Medicine CLUJ-NAPOCA Animal Sciences and Biotechnologies. So this paper is going to be relevant for some obvious reasons. Basically, they had four experienced detection dogs, two Belgian Malinois, a flat coated retriever and an English Springer Spaniel that all were operational or in training for various conservation tech targets, and they started training these dogs to find wolf scat. Over the course of early training, three of the four dogs showed markedly low enthusiasm and significant aversion to the wolf samples, to the point that the owners so these were kind of community science participants decided not to continue the training for fear of kind of ruining the dog’s performance in future conservation targets. So the methods of the study are going to be pretty familiar to anyone who has done imprinting using multiple box setups so the dog. Dogs are kind of introduced to the target, given a positive reinforcer, then they’re expected to alert to the target, and then they started having multiple targets, or multiple blanks in identical boxes, with one containing a wolf scat, and then started adding distractions. And what is really interesting about this particular paper is that very early in studies, in this study, the three of the four dogs showed such a strong aversion to the wolf scat as I said, that they were unable to continue and one dog, the flat coated retriever, did continue on with the study and was able to kind of pass through all of the sensitivity and specificity training. But because Barley is going to be getting trained on wolf scat here very soon, this was something that I was reading quite closely, and have a couple ideas for how we’re hopefully going to have better than a 25% chance of success in this training protocol. And, yeah, well, I’m just going to read a couple little notes here, so Malinois won. They write quote, “The aversion remained in a lineup with two or three pots. She preferred to point to all of the other samples instead of the wolf scat. She systematically did not approach the wolf sample, which can be read as an alternative indication in itself, and fixating on the wolf sample improved with an extra toy as a reward. After several trials of brief corrective pointing, she started to point directly at the wolf sample, but still keeping more distance than normal during correct than a normal positive fixation.” So just as an example of the sort of aversion that they were dealing with. And again, these are experienced, highly motivated detection dogs. So we’ll be sharing more about our progress with barley once I’m getting into it.
Kayla Fratt 06:41
But without further ado, let’s get on to our interview with Ellen Dymit. So Ellen, why don’t we start out with just a couple definitions. So the three that I hear kind of tossed around most often in lab meetings that still have me really confused are meta barcoding, micro-satellites and snps. So these are all genetic things that we can get out of poop somehow. What are they?
Ellen Dymit 07:08
All right, very good question. I remember being new in the lab too, and confused by that terminology. And I’ll start with meta bar coding, which is perhaps the most confusing, because there are a lot of different things that we might call a barcode in the context of DNA, but when we talk about meta barcoding in our lab, we are typically talking about a genetic method by which we can read a specific region of the genome and use that sequence of nucleotides in the DNA to differentiate species from one another.
Kayla Fratt 07:46
So I’ve got follow up questions already. So when you’re doing this, gosh, I don’t even know how we’re going to avoid making this just an absolute nightmare technologically. But so you know, there’s like, a region that you might look at to tell the difference between, like a margay and an slot. Is that what we’re kind of talking about with meta bar cutting Ellen is nodding. We’re sharing a mic, so we’ll go forward with that. So how do you know that that’s what you’re reading? Because I’m assuming, basically, what we’ve got is something that’s reading like your A’s and T’s and C’s and G’s, and they’re in the specific order, and that tells you whether it’s Margay or Ocelot. Yeah, do you even know this? How does it actually know where to read and what to be looking for, as far as, like, a copy?
Ellen Dymit 08:34
Yeah, really good question. That’s kind of the heart of it. So there are a lot of different regions of the genome that we can look at. Obviously, genomes are very, very large for most species, and when we design one of these regions, or molecular markers, is what we would also call them for a particular purpose. So in the context of ocelots and margays, which are two congeneric cats that live in the Neotropics, we want to find a region of the genome that is different for both ocelots and margay but the catch is that that region needs to be flanked by two other regions that are the same between the two of them. So like you said, if you think of the genome as a string of A’s, T’s, G’s and C’s, you need to find an area, you know, maybe 200 100 base pairs long, that has variation between your target species in the center, but these conserved regions on either side, and the reason that we need these conserved regions is that we target specific parts of the genome with primers, which are short sequences that bind to those conserved regions.
Kayla Fratt 09:54
Oh, my God. Okay. Wow, that helped so much already. So to maybe, oversimplify it. So if we’re looking at like a sentence where we say the black dog is running, what we might be looking at is the difference between the black dog is running and the Black Cat is running. And that tells us, you know, the noun in that sentence, or in this case, the meta, the barcoding.
Ellen Dymit 10:19
Yeah, I think that’s a really good way of explaining it. So you would need to design primers that bind to the words the and running. And then those primers would amplify via PCR, the region in the middle. And then when we read later on the outcome of that sequencing, we can see the difference between dog and cat.
Kayla Fratt 10:43
Cool. Okay, yeah, that helps a ton. I’m already like and for everyone at home, I am really not playing along with this. I genuinely don’t know what any of this stuff is, so y’all are learning with us. So next up, snips. So I know that’s a single nucleotide polymorphism, and I maybe could guess what each of those individual words mean. But what, what does that mean? And kind of, how is that related to meta barcoding in any way? Is it not? If not, what does it do instead?
Ellen Dymit 11:16
Yeah, so definitely related to meta bar coding in sort of a nested way. You can think of SNPs as sort of the basic unit that comprises genetic variation in organisms. So if we break down the acronym single nucleotide, so DNA is comprised of nucleotides, among other things, but the you know, A’s, T’s, G’s and c’s are nucleotides. And when we talk about a snip, we talk about a specific locus which is just a location on the genome for which that nucleotide is variable, either within an organism, between sister chromatids or among organisms and or among organisms.
Kayla Fratt 12:12
Okay, so is this something that you could use to identify Ocelot a versus Ocelot b? Or is this still more on the species level, or does it depend? So
Ellen Dymit 12:22
SNPs, you can kind of think of as the most basic unit that comprises genetic variation among individuals within a population. So if you think of it in the context of dogs, for example, you have, do they know niffler and barley? Yeah, all right, you have, you have niffler’s genome, and you have barley’s genome, and they’re both Border Collies, so a lot of their genome is going to be similar, but they have different colored fur, they have different personalities, they have different sizes, etc, and all of those differences are conveyed in their genetic code somehow. So if you were to look at, for example, the 12 s region of the mitochondrial genome of niffler and barley, you might find that in one spot, specifically, niffler’s DNA reads a a T, C, C, and barley’s DNA reads A, A, A, C, C, that difference between the A and the T at the same location or locus is a snip, so single nucleotide being the A or the T polymorphism being the difference.
Kayla Fratt 13:38
Cool, yeah, that’s super helpful. And I know I mean, for the dog nerds in our audience, and particularly our border collie nerds, a lot of us are, I think, familiar with thinking about like alleles, as far as color genetics. So like, that’s something that I am very far from an expert on. But thinking about like miffler as a Merle, and the Merle is dominant. So people know those things, hopefully, and that’s a specific allele, and the length of that can determine the amount of Merlin that the dog has in their coat type, and then also too much of that can bleed over in other things. So that’s a much larger change than just a snip, although I think there might be other coat and eye color things that can be just a single change. And so this might be a dumb question, but when, when you change? So I forget it was an AA, CC is what we said for barley. That means that that third A is now opposite a T. So why? Why would we call it a single nucleotide and not it’s a base pair change as well, generally, right? Because the base pair is that pair that are always opposite each other, right?
Ellen Dymit 14:50
Yes. However, typically, when we do when we use the sequence data generated. Our snip application in the lab, we are only using the forward red sequences, so we’re not looking, you know, the reverse complement of whatever nucleotide you’re looking at is always predictable because they only pair with each other. So you kind of, that’s kind of a given, and you only really need to look at one strand. But yeah, the allele thing in the context of SNPs, we talk about SNPs as being mono allelic, bi allelic, trialelic. So sometimes a snip can have two versions. So a snip that could either be a T or an A at the same locus among organisms, we would say is a bi allelic snip. Sometimes it can have three or even four. So that would be respectively a tri allelic or a quadrilake snip. And when we design snip panels for different applications, we’re typically looking for BI allelic SNPs. So two different versions,
Kayla Fratt 16:01
gotcha. So it sounds like these SNPs are mostly you use those to differentiate between individuals, and the metabarcoding is more between species. Is that kind of broadly accurate? I’m sure all of this is roughly It depends.
Ellen Dymit 16:14
So kind of the first thing I said about SNPs, about them being the basic, you know, functional unit comprising genetic variation. It means they’re sort of nested within meta barcoding, in the sense that what makes that bar code region variable among species is SNPs, but it’s an assemblage of SNPs. It’s not just one or, you know, you know, an individual locus. You’re looking at a series of SNPs which comprise the difference between the species and in the context of, for example, Puma, Puma and jagoundi on the 12 s region, you actually only have a one base pair difference, so a single snip that distinguishes those two species there, which generally means that’s not a very useful marker, or reliable marker to use for differentiating them.
Kayla Fratt 17:04
Okay, yeah, I think I’m following. So then the third thing, so I said we were going to talk about snips, meta bar coding. And what was the third thing? I said, we’re going to tell it micro satellites. Yeah, what’s a micro satellite?
Ellen Dymit 17:17
So when DNA is replicated. Sometimes there are errors in that replication that get repaired by the internal repair mechanisms in the DNA replication process. And the kind of relic from these replication errors is repeated sequences of DNA. So within the DNA, you might have a region that suddenly goes at at at a T, at a T, at AT, AT AT for, you know, 100 bases, 200 bases, however long, or you might have a little bit of a longer there’s that would be a bi tandem repeat, a tri tandem repeat would be like ATG, ATG, ATG, for X number of repeats, and the number of times that those that’s a micro satellite, the number of times that those are repeated, can be used, in some cases, to distinguish individuals from one another.
Kayla Fratt 18:16
Excellent. So I think now, now that we’ve kind of laid out, hopefully, some of the harder parts of this podcast. What I’m thinking we can do now is talk about some of the analysis you’re doing on diet and ID of your wolves in cat my and use that as an example to talk about basically what happens again after we’ve collected these scat samples, not just what we can do with them because we’ve done episodes on that, but kind of how it works and how all of these terms that we’ve just learned play into that. So tell us a little bit about the ecosystem you’re working in in Katmai and what your study questions are, and then how we’re using these things. And this might now just be like a half hour response. So go ahead.
Ellen Dymit 18:58
Well, it’s good practice for me to try and keep it simple, because this is like my elevator pitch that I should be prepared to give. So my work in Katmai is focused on the diet of the coastal wolf population that’s living there. It was sort of inspired way back in 2016 one of the Rangers, Kelsey Griffin there, who is my main collaborator for this work saw a wolf carrying a sea otter carcass in its mouth on the beach, and thought that was really interesting. How are wolves getting sea otters and what other marine resources might they be taking advantage of in this system, which sort of inspired this super intensive field sampling effort in 2021 where we camped on the coast for four months and walked around looking for wolf poop and wolves and set a bunch of trail cameras to get video of them as well, and we ended up collecting over 1000 scats. Turned out that not all of it was Wolf, but a lot of it was. Is. And with those wolf scats, we are performing, or we have performed already, two of those sort of genetic methods that we went over at the beginning, and the ones that we have used are DNA, meta bar coding, specifically of the 12 s region on the mitochondrial genome, which is used commonly to differentiate vertebrates from one another, and the SNPs, which we use for genotyping, which is assigning individual identities to the wolf samples, so that we know which individual wolves produced which scats.
Ellen Dymit 20:37
So for the meta bar coding, we take a scat and we extract the DNA from it using a kit in the lab. And then I won’t go through the whole lab workflow right now. We can maybe talk about some parts of it, but basically, you end up submitting a prepared sample to a sequencing machine, which then reads the DNA, and then you are able to look at the DNA sequences that were produced from that scat sample and see which animals are present in it for the snip genotyping. You or we have a panel of several SNPs. I think there’s 32 in our panel that has been optimized by a postdoc in our lab, Charlotte Erickson, that was designed to be able to differentiate individual wolves specifically so for this, the lab prep is largely similar. There are some differences, but same idea where we end up with sequences for the target regions that contain these SNPs that we are interested in. And then together, the SNPs sort of comprise a genetic fingerprint, you know, a series of individual base pairs that are different across all the individuals in our population, or largely different. And then we’re able to use that kind of sequence of snps as a code to differentiate individual wolves, and that allows us to assign identity to the scats, which, combined with the diet data, can tell us things like individual variation and Wolf diets like our you know, the breeding male and female of this pack eating something different than, you know, the pack in the adjacent Bay, or other members of their pack, or whatever.
Ellen Dymit 22:28
It also can tell us site and pack level differences in wolf diet. So are the wolves you know, collected in this the wolf samples collected in this area of the park reflecting a completely different dietary composition than in a location 20 kilometers away. And it can also, perhaps most importantly, give us a minimum population estimate of wolves living on the coast. And of course, that’s limited by our sampling effort. We’re only able to, you know, use as many scats as we can find, which you know is why conservation detection dogs are super important, wink, wink, but it is really useful for the park to know the size of their wolf population.
Kayla Fratt 23:09
Yeah, definitely, unlike the environment that I’m going to be working in this summer with barley, it’s also it’s not a protected area, so there’s quite heavy harvest and take of the this wolf population. So knowing the minimum numbers is really important for the government and Alaska fish and game to figure out how many wolves can be taken in a season, and therefore, kind of how long the season can be. So that makes sense. So gosh, I’ve got so many different places I want to go. So can you do these sorts of things with any scat sample, or does it have to be pretty fresh? Is this the sort of stuff like I’ve heard epithelial thrown around, which I think means kind of like inner intestine skulls that are kind of like on the outside of a scat sample. Do we need those to be intact? What does sun and rain do to all of that? You know, again, so, like, I assume that not 1000 all, 1000 of your samples amplified the way that you were hoping. What are some factors? And, you know, again, sometimes we might just not know. But yeah,
Ellen Dymit 24:14
Yeah. So it’s definitely tricky working with poop because of the degradation challenges. Among other things. So the starting with sort of the freshness of the scats, we can get surprisingly high quality DNA out of scats that are pretty old. And when we come across a scat in the field, it’s hard to say with any certainty how old it is, especially because of all the climatic things that are happening, you know, the salt from the ocean, the harsh weather, sometimes there’s recently thawed snow, and you’re like, Okay, well, how long was this poop here? Was it potentially on snow or under snow? But generally, we find that we are more easily able to. Meta bar code degraded, Scout samples, then genotype them. And I think the main reason for this is that for the meta bar coding, like we discussed, we’re looking at a specific region of the genome, I think for 12 s our amplicon, which is, you know, the region that is amplified by the PCR that we read is only it’s like 133 base pairs, plus the primers and the index sequences. So it’s not very long.
Ellen Dymit 25:04
And as long as that, you know, you think there’s like a bajillion copies of DNA in a scat sample, because, you know, every single cell has DNA in it. And when we as long as some of those DNA copies have that region intact, we are hopefully able to sequence them and get that information. If that region of the genome is like consistently broken or degraded, then we don’t get enough reads or copies of that sequence in our results to say with confidence that it came from any particular species with the SNPs, since they’re scattered across the genome, it’s more a question of what proportion of them can we, like recover, because we’re, you know, targeting a single base pair the amplicons for those or even smaller. But there’s several of them, and you typically want, like 80% of them to be at consensus, for you to be able to confidently call a genotype.
Ellen Dymit 26:26
So what ends up happening, especially if a lot of your scats are degraded, like like me, is that you get some information for a lot of individuals, and you can maybe have an inkling of which Wolf, or you know which group of wolves they could belong to, but it’s not comprehensive enough information for you to confidently say who it is, and that can be really frustrating, but it’s just part of the game. And then what else the gut epithelial cells is definitely a consideration. So we kind of, I don’t know if this is based in a paper or if it’s just like wisdom that I certainly haven’t read it in a particular study, but have absorbed it sort of osmotically through the people I worked with that and it makes sense intuitively that the outside of a wolf scat is going to contain more wolf DNA than the inside of a wolf scat because it passes through their you know, colon, and the colon is lined with you know, gut epithelial tissue, which will shed cells onto the outside of the scat that contain wolf DNA. So when we sample a scat to extract DNA from it with the intention of targeting prey, we sample from the inside the interior of the scat. So we’ll like crack it open and then take a piece of the scat from the middle. And if we are specifically trying to get the defecators identity so genotyping the wolf, we might scrape the outside of the scat with like a razor blade and collect that portion of it instead. And oftentimes, when we come across a scat in the field that’s like, white and cracked and like, clearly super weather degraded, like not going to be great DNA, we might be more concerned with trying to get the genotype out of that than the diet. And in that case, we might be more inclined to scrape the outside of the scat rather than cracking it open and going for the inside.
Kayla Fratt 28:26
Yeah, I think that makes a lot of sense. And this, you know, part of why I want to do this episode is, you know, sometimes we get these procedures as scat dog people that say how to specifically sample or collect what the dogs have found. And I think a lot of times we don’t really know why, or we kind of have this vague understanding of, like, you know, oh yeah, they need the outside because that’s how they tell Who pooped, and not really knowing more about that. So now I want to go back a little bit to this idea of the diet. And so this diet is primarily done through meta bar coding. Yes, we’re getting a nod. Okay, good, when we get that back, I know like Marie Tosa, who Dr Marie Tosa just defended successfully, was talking about running her spotted skunk samples, and sometimes she would get back that they had detected no vertebrate DNA, even though the samples were fresh. And then it would come back, and you would, like, look at the sample, and it would be just all wasps. So that, to me, suggests that we’re not just able to, like, take a sample and magically, like, put it into a bleep bloop magic machine that tells us exactly what’s in there, we actually have to, like be picking something that then matches, and we’re figuring out who, who was eaten based on, how does that work? And can you actually identify like any species? Because I’m guessing the answer is no, but yeah, how do you actually figure out? Who is in the poop, what is in the poop, I guess.
Ellen Dymit 30:03
Yeah, so unfortunately, there is no one catch all region or primer set that’s going to give you, you know, all of the vertebrates, invertebrates, plants, whatever else is in the scat. You have to be somewhat choosy. So it’s kind of this balancing act of trying to pick a primer set target region that comprises the genetic variability of all of the taxes that you’re interested in, but doesn’t amplify too many things that you’re not interested in. So starting with 12 s, it works for vertebrates. So we can get bird identity, fish identity, all of the mammals inside wolf diet. It does not work for invertebrates. So we know that our wolves are eating clams, for example, crabs, we seed their shells in the scats.
Ellen Dymit 31:01
But sometimes, or when we’re using 12 s and we sequence the DNA in those scats, we don’t get anything. And it’s because the primers that we have designed for that, not that I have designed, but that have been designed to target 12 s don’t bind to invertebrate DNA. So what you end up having to do is do one run on the sequencer with your library, which is like all your samples prepared with the 12 s primers. And then you have to do a second run with a new primer set that will target the marine invertebrate prey or the in the context of maybe bear diet you’re interested in plants also, then you have to choose a marker and primer set that will work for the plants. So like trnl, is a common marker that our lab uses for plant meta bar coding. For example, the marine invertebrates thing has been giving me grief in particular, because that is something I’m interested in showing for the diet of our wolves. And it’s difficult because there’s a lot of genetic variability among marine invertebrates. So something like a feather duster worm, which we know wolves eat, is very different genetically than a clam, and finding primers that bind to both feather duster worm DNA and clam DNA, but don’t also bind to algae, for example, or mold or something that’s going to be pretty ubiquitous in the environment, is really difficult, and what we have found is a primer set that works well, but also amplifies a bunch of non target taxa. And that is tricky, because the there is a limit to how many you know times the DNA is replicated, and how many you know times the amp you know, how many sequences of DNA can actually be read by the sequencing machine. So you don’t want to have a bunch of non target DNA in your sample.
Kayla Fratt 33:00
Okay, that’s super helpful. And so you can run it, so you can run it once for vertebrates and run another time for invertebrates, I would assume, like we talked about this in Guatemala. I said, Yeah, you know, like, you know niffler and I had a tough search. He didn’t alert to any scats. He did alert to a shrew, you know, I told him no, and then we moved on. And then you said, Oh, my God, actually, we want that true. And that was because you weren’t sure if that would be in GenBank, I think is what we talked about. So talk to us a little bit about, you know, I assume most big North American things are in this database. But what is this database and why did you want that true?
Ellen Dymit 33:48
Yeah, so sometimes when we and this happens much more often for my projects in the Neotropics versus Alaska, but sometimes what happens is we read the DNA on the sequencer, and we do our bioinformatic process to assign the species, and I will end up with samples that instead of saying the species identity, they say no significant similarity found. And what that means is that the DNA sequence within that sample does not exist in the sort of universal key that we’re using to check the species identity of the DNA sequences we’re reading, and that sort of key is GenBank, which is a database, online, publicly accessible of DNA sequences of all sorts of organisms worldwide, and it’s contributed to by scientists in the form of submitting their sequences that they read for various projects and because. It’s, you know, limited by what has actually been studied. There are a lot of species in the world, and sometimes in our study systems, that don’t have sequences uploaded to GenBank.
Ellen Dymit 35:10
So if there is no reference sequence in the reference database, then we aren’t able to know who that sequence belongs to. So, for example, some of the ocelots, GATS I had in Guatemala had, you know, 2000 3000 reads of a species that is probably a vertebrate, because it was amplified by 12 s. But we have no idea what it was. It could be a bird, could be a rodent, could be a fish. We have no idea, because the DNA sequence is not in GenBank. So the reason I wanted that true was on the off chance that we had, you know, 50% of our ocelots, scats come back with some sort of like mystery diet item. It might be worth sequencing the DNA of that true, checking whether it’s already in GenBank, and then checking whether that sequence matches any of the reads that we got from the ocelot scats.
Kayla Fratt 36:12
Very cool. And actually, that was a perfect segue, because so and part of the reason we need that, obviously, because we need to know what it is, and we can’t do that from scat, but also, my understanding is tissue samples are what we really need to actually develop the references for these things so we can identify a species through metabarcoding in the scat. Just yeah, keep nodding, if I’m getting this right, yeah. Ish, yeah, okay, ish, but we couldn’t, like take something from a scat sample, even if we knew so. Say we fed barley, nothing but elk forever. We couldn’t take his sample and necessarily use that to upload an elk to GenBank. Maybe we could, if it was 100% I guess. But what I’m getting at is we need those tissue samples to develop some things. And I know that’s sometimes individual ID sounds like these species ideas, when do we? When do we need something we can’t get from scat?
Ellen Dymit 37:09
So the main reason that tissues are useful is because the DNA that you can recover for them from them is typically much higher quality than a scat. So you get a more sort of solid, complete sequence from a tissue sample, and you also get a higher concentration of DNA in your initial extract to work with. However, what’s sort of more important for you know confidently, uploading a sequence to GenBank is to have what we would call a voucher specimen, or and not everybody does this, but in an ideal world, you have, along with the sequence you’ve produced a photo of the organism or of the, you know, sample that is associated with that sequence as sort of like proof of where it came from. So that’s kind of also why it’s difficult to use scat samples to generate data for GenBank, is because you’re not going to upload the sequences and then a picture of the turd and be like, we know it was fish. Look this poop clearly has fish in it. But not everybody does this, and that doesn’t necessarily mean that you shouldn’t upload, you know, data without a voucher specimen to GenBank.
Ellen Dymit 38:28
For example, I had a bunch of scats that I didn’t know whether they were Ocelots or margay and 12 s, our typical vertebrate barcode region was not able to distinguish those species, so I had to sequence them with primers targeting a different region on the mitochondrial genome called 16 S, which we know does distinguish them. And what I ended up getting back was, you know, the sequences of the 16 s gene for 20 samples, six of which ended up being from margay, and and, because Gen bank has margay sequences, I knew that, you know, based on the pattern of the SNPs, that those were Margay and not Ocelot, because I was able to put them in an alignment, which is just when you kind of look at a series of sequences from the same species, or from several species aligned with one another, to see where the differences are. That showed me which of my sequences match the reference sequence for margay and which ones match the reference sequence for Ocelot. However, GenBank did not have Guatemalan margay or Guatemalan Ocelot specifically, and because that’s a different population, there might be variants SNPs in the 16 s region that are not present in, say, the Colombian margay that exists already in GenBank. So this is an instance where I might actually go ahead and upload my sequences, even. They came from scats, because I know confidently which ones are Ocelot and which ones are margay, based on the large pattern of the SNPs the haplotypes from them. But I want to help GenBank out by having some Guatemalan specific margay and ocelots, in case somebody’s looking at that specifically in a future study.
Kayla Fratt 40:23
Gotcha, okay, that’s a really helpful example. And, you know, I think again, thinking to our conservation dog pros, and the audience in particular, they’ll be kind of familiar with, you know, when we’re imprinting a dog on a new species, we ideally either know confidently, we ideally know very confidently, the scat donor before we train the dogs. Because, you know, imagine you think you’ve got bobcat, and it’s actually Coyote, and then now you’ve trained a dog to find Coyote, and you know, you’re off to the races collecting all sorts of off target scats. So we like it to either be confirmed genetically or through something where, you know, we’ve basically seen it come out of the butt of whatever animal. So in this case, for GenBank, they would like to have a photo of your specimen. Or, in some cases, it’s kind of sufficient that you’ve confirmed species ID. In some other way, you’re not just kind of going on, going on, and I don’t really want to say going on vibes, but one of the other papers that was going to come up soon and yeah, ew, one of the other papers that’s in the hopper for science highlights is there’s an entire paper that is titled something along the lines of like misidentification of mesocarnivore Scats, or factors affecting misidentification of mesocarnivore Scats, we are terrible at visually identifying scats, so really important.
Ellen Dymit 41:46
There are errors in the GenBank database too, and it’s very annoying. So for example, there is a study somewhere sometime that uploaded sequences that they had assigned to Catopuma to like a cat that are actually human DNA, and for whatever reason, that has not been removed. I don’t know what the process is for removing errors from GenBank, but this has not been removed. So when I do my coding, I have to have a line in my code that removes all of the sequences that have been assigned cat opuma, because it’s actually human contamination. So that’s and that’s just one example. There are several instances of this. There’s one that’s kind of comical. Is this like extinct species of bear comes up pretty consistently, especially in my samples that contain brown bear, and can be a little bit confusing. And then there’s also, like, hog badger that always comes up in my samples. Obviously, there’s not hog badgers in Alaska. I don’t even know if they’re extinct, but they Yeah. I mean, there are several instances of species that have been misidentified in GenBank never corrected, such that when you use blast, which is the basic alignment local search, basic local alignment search tool, which is just the tool that we use to check our sequences against GenBank, it comes back as one of these errors that exist in GenBank and can be really confusing, especially if you’re just getting started interpreting your meta bar coding data.
Kayla Fratt 43:33
Okay, gosh, yeah, still, that is not something I was needing to figure out how to remove a line of catopuma that is actually human is not something I was anticipating needing to figure out how to code. But, yeah, this is why we’re doing this. And I did a quick fact check on our hog badger. They are hilarious, weird, weird looking mustelids from south southeast Asia and Sumatra definitely not, definitely not in Alaska. Do you think, I guess, do you have any memory? Are the samples that are coming back as Hog badger even, actually, must tell it. Are these, like Wolverine samples or anything?
Ellen Dymit 44:15
I don’t remember. I don’t know. Okay, yeah,
Kayla Fratt 44:17
I ask you lots of questions about very specific samples, and you never remember this. She said, I know that’s because I have 1000s. And I’m like, Hey, do you remember that one sample that Niffler found? What did that one come back as? And she’s like, I have no idea what you’re talking about. I have 1000s of samples. Okay, we’re all over the place now, but. So we’ve talked a little bit about the process of collecting these samples in the field. So depending on what you’re planning on trying to do with that sample, or what you’re worried about most, as far as the sample quality, you’re maybe sampling from the outside, maybe sampling from the inside. You take it back to the lab, you’re going to put on some gloves, put on. Fancy coat and do some PCR and then run some gels. Is that roughly, right? I assume there’s pipetting involved. Yeah, tell us a little bit about that with you know, we don’t need to share your whole lab protocol and bore everyone to death. But what does it actually look like in the lab?
Ellen Dymit 45:21
Yeah, I’d say it’s 99% pipetting. There is rarely gels involved in what I do, but I have run gels all kind of try to walk through my meta bar coding workflow in as simple way as I can. So I have a poop. I put that poop in the tube, and we actually use a very small part of the scat. Interestingly, if you use too much of it, it decreases your DNA yield, which I learned through experience. So you want to use, you usually say you want, like, a lentil sized chunk of the scat. So very little. I don’t know why, too much. I think something about it swamping the filter on the spin column.
Ellen Dymit 46:06
Anyway, we take a lentil sized piece of the poop, we put it in a tube, and then the first step is DNA extraction. So we use a kit called the kayogen blood and tissue kit, which comes with a protocol. If you’re super nifty, you can do it in like two hours, but you will add a bunch of buffers to the tube in a series that basically leads to, you know, enzymatic activity, breaking down the cell walls and making the DNA available, you know, cell walls of your plant, you know, cutting the DNA up. Basically, I think of it as like cracking open the cell to expose the DNA, and then you flush the DNA in a centrifuge through something called a spin column, which basically has a glass filter on it that is like microscopic, and catches the DNA, and you flush it through with a series of buffers. So what you end up with at the end is all of your DNA clinging to this glass filter, which you’re then able to elute, or like wash down into a DNA preservation buffer, which is called Buffer AE in the context of this kit.
Ellen Dymit 47:12
So you start for this process, you start with your tubed poop, and you end up with a tiny bit of clear liquid that contains just the DNA, the concentrated DNA, ideally just the DNA of your sample. And then the next step is PCR. So what you would do, PCR stands for polymerase chain reaction. We won’t get into the nitty gritty of how that works, but basically, use a machine called a thermo cycler. So you’re taking the sample, and you the extracted DNA that you’ve just made, and you are putting a bunch of those samples together on a little plate that contains 96 wells. And one of our kind of quality control checks is that we plate everything in our lab in triplicate, which means there’s three copies of each sample. So a 96 well plate will typically contain like, 31 samples of DNA and then one blank, which is water. And we want to have the blank there so that we can see whether there was contamination happening at this stage of the preparation later on.
Ellen Dymit 48:19
So you put, you know, 31 samples in triplicate in this plate, and you put it in a thermocycler, which is just a machine that heats and cools it in a predetermined pattern. And this basically copies the DNA a bunch of times. And then you have your post PCR amplified DNA. And then, typically, what we’ll do is normalize it, which is making sure that the concentration of DNA across all of the samples on a plate is more or less the same, so that no one sample takes up all of the sort of reads in your sequencing run. Because when you’re sequencing something, it’s not just one sample. It’s like hundreds of samples that have been pooled together. So you normalize it, you’ll pool them together into one tube. They have specific indices which identify, wait, we’ve got a pause here.
Kayla Fratt 49:14
Hang on. Hang on. So wait, what are? What are we pooling? Because don’t we want to know wolf, what wolf a, a versus wolf B. So if we’re pooling all of them, don’t we not know that anymore? Confused.
Ellen Dymit 49:28
Yeah, I was, I was about to say, we we attach these indexes, indices to the samples from individual wells. And that’s that attachment happens during the PCR reaction. And those allow us to later on, bioinformatically demultiplex, which is basically saying which individual samples which reads came from in our results. So those are added on, so that when we can pool them freely, and we can always you. To sort them back into their original samples after on So, yeah, so they get pooled, and then we do, typically, something called a magnetic bead cleanup, which is really neat. We use magnets to catch DNA, and then we wash off everything that’s not DNA, and that just kind of helps with sample purity. So purity.
Kayla Fratt 50:23
Sorry, I’ve got so many interruptions. Wait, so DNA is magnetic and other things aren’t okay. Ellen says DNA is negatively charged, so, or maybe it’s positively charged, but it’s, but it’s got a charge. Oh, my God. Who knew? Um, okay, yeah, DNA is negatively charged. Um, I my, like, conspiratorial brand is like, so magnets could strangle scramble by DNA. Um, okay, absolutely not, but Okay, back to back to our we’ve got DNA on a magnet. We’re washing off the other stuff.
Ellen Dymit 51:02
Yeah. So then you have your library, which contains a bunch of samples that have been normalized to be more or less the same concentration of DNA and cleaned so that they don’t contain impurities. And then I pass that off to our lab manager, and she does some sort of pre sequencing prep magic, and then she brings that tube to the CQLS, which is, oh gosh, the Center for Quantitative Life Sciences here at OSU, which houses the Illumina sequencers, and we pay them to read the DNA for us.
Kayla Fratt 51:48
Very cool. So, oh, we’ve got a robot. Hold on. We’ve got to hear about the robot.
Ellen Dymit 51:53
Yeah. So the process out of all this lab workflow, the most tedious and time consuming step is the normalization, because you have to add a different volume of water to every individual well on the plate, which we you know when, as a lab, we’re going through, like 1000s and 1000s of samples every month. So we don’t want, you know, people to all have carpal tunnel from doing this. So we, couple years ago, got a liquid handling robot, which we have programmed to do this normalization step and the pooling of the samples for us. And her name is Glorificus.
Kayla Fratt 52:31
My next question was going to be whether or not our robot has a name that’s so great, Glorificus. I’m so excited Gloria. I’m so excited to meet her. We’ve got some Puma samples. I’ll be learning all of this on shortly, so we’ll report back once I’ve actually tried this. For the record, my one genetics class I took in undergrad, I had a concussion throughout the entire thing, and our, like, one lab project that we had was, like, basically going through this process where it was it was actually a cool study. Um, shout out to Colorado College for this one. Or a cool like lab process. We went to the supermarket and bought fish, and then we were supposed to be amplifying the DNA from our fish samples from the supermarket to see if the species ID matched what it was being sold as. Um, long story short, my samples just didn’t amplify at all. So I spent like four weeks working through all of these lab protocols, and got to the end was like, oh, there’s just no DNA. It’s just not fish, which maybe that’s true, but probably I was concussed. So it’s gonna be really fun to actually get to learn all of this stuff properly. Okay, we’re at risk of going along. Do you have anything else you want to say about the lab, I guess one of the things that I don’t know what I was expecting, but I’m like, oh, okay, I guess it does make sense that we don’t just have like, a monitor that then reads like, oh yes, you have this percent toucan in your Ocelot sample. Like, it’s not like, it pops up with like, species name and percentages at the end. I guess I knew that, but good to know for sure.
Ellen Dymit 54:07
I think one of the more interesting kind of realizations that I had, well, for context, I didn’t take genetics in college like I am now, I’d say a like beginning molecular geneticist, but I have no genetics background before graduate school, so I didn’t really have any sort of expectations about anything going into this. That being said, one thing that I thought was really surprising is just how subjective the interpretation of genetic data can be, especially if you’re working with really degraded samples. So we have a lot of, um, not a lot of but we have problems with knowing what species assignments, and, you know, DNA in our samples we can trust as actually having come from the sample itself. Versus, you know, contamination, either between individuals in the wild or from, you know, a dirty glove that I was wearing when I tubed a poop, or from ambient DNA that’s just present in the air in our lab. And in order to deal with that, you have to set sort of cut offs for your curation and filtering of your results in order to make sure that everything that you’re reporting on is trustworthy.
Ellen Dymit 55:32
And I kind of thought that there would be some sort of universal standard like, oh, this percentage of you know, the sample must be from the defecator or whatever, in order for you to trust it, but nothing like that really exists. I had to sort of decide for myself what sort of cutoffs were conservative enough to make my samples trustworthy, but also not getting rid of, like, absolutely everything in my data. So that’s been really interesting for me. And same with the genotyping, like, it’s just not black and white, like sometimes with these really degraded scats, I have to really think hard about whether the results I’m getting are making sense, and if I have any sort of hesitation or uncertainty about what I’m trying to interpret, I’ll just throw that sample out. And the result of that is sometimes that you have a smaller sample size, and that sucks, but it’s better than, you know, creating false data.
Kayla Fratt 56:37
Yeah, definitely. So I’ve got a couple more questions, and then we will probably try to wrap up here. We might have to do more of these. I am thinking about just kind of continuing to interview people in our lab and to get more and more detail on this. So when, if we’re going back to this idea of that true in Guatemala, I guess the answer to this is no, because you said that in your Ocelot samples, sometimes it’ll come back as just, like, a big fat question mark. So it’s not like, if there’s not a match, it’ll be like, oh yeah, this is probably a rodent. Or, like, this is probably a fish. Or does it ever have the ability to tell you, like, family or genus?
Ellen Dymit 57:19
Yeah. So one, one piece of information that we get back when we check our sequences against Gen Bank is a percent match factor. So sometimes it’s not only going to tell us what our sequence matched 100% sometimes it’ll say this sequence was a 98% match for red fox, for example. And that doesn’t necessarily mean that it’s not a red fox. It just means that there could be a snip that is making the sequence for whatever Red Fox made it into the GenBank, different, a different sequence from what we’re reading in our sample, and kind of a good anecdote for this is for whatever reason, the wolves in Alaska at the 12 s locus that we’re looking at, or the 12 s region that we’re looking at, have a signature that is similar, 96% Similar to some red fox sequence that exists in GenBank. So for a long time, we were really confused about why we were seeing so much red fox DNA in our wolf scats. And we were kind of like our red fox is consistently peeing on the wolf scats, or something like that. And what we ended up finding was that almost all of these red fox reads we are seeing in our wolf scats, which were co occurring with Wolf reads, by the way, were a 96% match, whereas others were 100% match for red fox. So what we ended up figuring out is that the 96% match red fox reads are actually wolf reads that are just very similar at 12 s to red foxes, or to some red fox that exists in GenBank. And the 100% match red fox reads were true red fox. So after we corrected that, the proportion of red fox in wolf diet made a lot more sense. It was like a lot less.
Kayla Fratt 59:19
Oh, gosh, this is so cool. So, and this is one of these days we’re going to have a lab meeting where I really want to talk about this paper that I’ve told our podcast listeners about a couple times, which is the Karen de mattio et al paper. I’m not, I don’t have it in front of me, so I forget what year it was, but they, this is the glitter poop paper that we’ve talked about. So, you know, they were really trying to figure out, basically, a bunch of their Puma samples that this really experienced, really accomplished dog had been trained on were coming back as Coyote. And this study, they kind of intentionally went through and confirmed that a coyote can overmark on a puma scat and it will then come back as genetically as Coyote. Coyote instead of Puma. But that doesn’t mean that the dog is then trained on coyote here, and they did all sorts of stuff. We’ve covered it in a science highlight. I’ll link the paper in the show notes, but so that’s helpful as a reminder as well. You know, when we’re thinking about urine marking or saliva from your detection dog, that can be a huge source of headache for you, right? Because, you know, I assume, presumably, some times you could have domestic dog naturally, actually in your scat somewhere, potentially maybe with your like Jaguars in El Salvador. I’m sure they get a stray dog every so often. But it also could just be Barley drool.
Ellen Dymit 1:00:41
Yeah, and it’s a that kind of cross contamination between samples is particularly an issue for the genotyping, because if you have one scat sample that has the DNA from two individual wolves in it somehow, and both of those wolves are homozygous, meaning that both of their allele copies are the same, so one wolf is homozygous, tt, and the other one is homozygous, aa, then your results might make it look like an individual that is heterozygous at and has a different genetic signature than either of the True wolves that were in that sample. So then you risk overestimating how many unique individuals you have at that location, which can be really tricky.
Kayla Fratt 1:01:32
So is that something where I can imagine you’re doing your analysis, and you’re like, gosh, we only got one sample from this one wolf. And then you go back and look at your data sheet, and you’re like, Hmm, we got that at a latrine site or a Denning site or a rendezvous site. Maybe this isn’t a real individual because we never detected it anywhere else. Is that more or less how you solve that sort of thing? And this is where good data sheets come in. Or,
Ellen Dymit 1:01:58
Yeah, as a general rule, we’re really skeptical of singleton samples. They do definitely happen, but I always re sequence that sample and make sure I get the exact same genetic signature as the first time. And then the other thing is, we don’t want our samples to differ from one another by just one or two snips. You want it to be several. And if you have one sample that is only different from the rest at one or two of those loci, then that’s sketchy.
Kayla Fratt 1:02:30
So, okay, the last question I have for you is, I know a little bit of a debate, but why don’t you just kind of explain some of the the approaches for this is i Okay, so I’ve got a sample, and we’ve read it, and we have detected, this is a wolf scat, and we have detected sea otter and Arctic hair and ptarmigan in it. Is there any way that we can tell that this wolf is like primarily eating sea otter or ate sea otter that week, and what are kind of our options for that?
Ellen Dymit 1:03:08
There is a school of thought that holds that the number of reads that you get for each species in the scat is somewhat reflective of the proportional biomass consumed from that prey item. So I can’t remember the exact sweetest species you gave in your example, but say you have a scat that has 200 salmon reeds. Well that wouldn’t pass filtering. 2000 salmon reads, 20,000 sea otter reads, And then, like 50,000 wolf reads. So, you know, it’s a wolf scat, the wolf ate salmon and sea otter. There are some who would say that the wolf, because there are 20,000 sea otter reads and only 5000 salmon reeds that the wolf ate more sea otter meat, and that that is then reflected in the proportion of reads. I do not ascribe to this school of thought, because it has been shown that there are a lot of sources of bias in what DNA gets ultimately read. So particularly in PCR, there are biases in primer binding affinity for different species. So it might be the case that this wolf actually ate just as much salmon meat as sea otter meat, but our primers don’t bind as consistently to salmon DNA, and consequently less of that salmon DNA gets amplified in the sample.
Kayla Fratt 1:04:41
Okay, I think that makes sense. I can kind of see the, see the debate. I can see how, yeah, like, intuitively, 20,000 sea otter samples versus 5000 salmon samples. Seems like there’s more sea otter in there. But I can see, yeah, the primer. Is binding preferentially, or even, yeah, just kind of minor areas and how you’re picking up that sample digestibility, I was kind of thinking even, and this might be something that’s more specific to, like, I know this happens with like, hair and whisker growth, but like, if you have a really long turd and you grab from like one half versus the other half, you could have some differences there. So we sample three distinct locations. Oh, Ellen says this is why we sample three distinct locations in our samples. That makes sense. So anyway, I think we’re gonna, we’re gonna wrap it there with all of our genetic gobbledygook, because I’m tired. I’m sure everyone at home is tired. We learned a lot today. So thank you so much for listening. Ellen, thank you so much for being here and for answering this, the all of these questions, we’ll maybe have to do more of these, definitely with other lab mates, maybe with you again, but again. Thanks for being here.
Ellen Dymit 1:05:54
Yeah, I loved this. I’m excited for you to come home from your first day in the lab, and we’ll do this all over again with new questions.
Kayla Fratt 1:06:04
I’ve been thinking about maybe getting a lapel mic so I could just like a muttered lapel mic in the lab, as I’m like, learning how to do all of this. People do their mic up workout. Yeah, mic’d up lab time. We could. It’d be like Kayla ASMR. I’m just like me quietly cussing to myself. I mean, honestly, it kind of a fun podcast thing to have a lapel mic for field work as well. Yeah, yeah. But for everyone, I hope, I hope you learned a lot and you’re feeling inspired to get outside and be a canine conservationist in whatever way suits your passions and skill set, you can find show notes, donate to K9Conservationists, join our online class, or Patreon learning club, all at k9conservationists.org. Until next time – bye!