Penguin Watch Talk

What scientific use is 30?

  • Shiphrah by Shiphrah

    I have read that counting only 30 penguins is of scientific use to you. I wonder why it is of use. The picture might have 35 birds, or 100, or 250. If I try to count all, even if inaccurate because they're too small, too crowded, etc. I suppose averaging my count with those of others would give you a pretty good estimate of penguins present at that time & place, and over time you could follow population trends and such things. But counts that stop at 30 could not be included in such averages. Counting 30 tells you there are still living penguins at that place, but so would counting 10, or 25, or 50.

    I try to count all I can because I don't see any use in marking only 30. I'd appreciate some explanation of how you use those counts.

    Posted

  • yshish by yshish moderator, translator in response to Shiphrah's comment.

    Hi @Shiphrah

    May I know where have you read that information about 'scientific use' of counting only 30 penguins? It must be some misunderstanding 😦

    We've been explaining everyone that the scientists prefer counting as much animals as possible but if it is problem for you, you can feel free to stop after marking 30 of them (of course in case there is more of them present) as it is written in the pop up message which appears as you count the 31st penguin.

    I think I've read all comments in the Talk since the beginning of this project and I haven't noticed such an information you were talking about. I'm really surprised. So please, continue counting all animals you can! I also never stop at 30 when more of them is visible.

    Thank you and sorry for such a misunderstanding!

    Zuzi

    Posted

  • Shiphrah by Shiphrah

    @yshish Thank you for your prompt reply.

    I have read many comments from moderators and others saying things like "every click counts" and "counting 30 is useful" and, since this is a scientific project I think it was natural for me to assume that this data is useful to the scientists in their work. From your message it would seem that counting only up to 30 is NOT scientifically useful.

    So let me restate my question: Are counts that stop at 30 marks useful in this scientific work? If so, how?

    Posted

  • yshish by yshish moderator, translator in response to Shiphrah's comment.

    OK, I got it.

    As for your question, my English is not perfect but I'll do my best to explain how I understand the classification.

    As you know, each image is classified by several users. I'd say that when the image contains 300 penguins and you'll count 30 of them only as well as other users, then more users will be need to classify that single image to get the results we need. If you imagine, that each of the classifiers will mark different 30 penguins, then 10 classifiers will be need to mark all the penguins there*. Otherwise, if we asked you to count them all by yourself, most users would give that up even without marking the 30 ones! 😦

    There was a Beta version of the project before releasing it, where the users were asked how many penguins is OK for them to mark for each image. I'd say that the 30 is a result of that survey.

    *In fact we're looking for a consensus of the classifiers. So it isn't only about marking all 300 penguins, but each of them should be marked more than once by different users. Unfortunately I don't know the exact number for each image, nor exact number for each penguin but will ask the scientists for answering this question and correcting my answer if necessary πŸ˜ƒ

    So for the image with 300 penguins, there will be much more users need to get all the penguins marked. I'm not sure whether the scientists really need every single penguin to be marked or if for example 80% of them is enough.... But I know that they're trying to learn computer to count the penguins itself πŸ˜ƒ

    Let's wait for the scientist's reply!

    Thanks for your question, hope I explained that understandably enough.

    Zuzi

    Posted

  • Shiphrah by Shiphrah

    Thanks, yshish! Your explanation is very helpful.

    There are aspects I still don't understand. It'd be great if someone could write a blog about how the counts are used, combined, averaged together, or whatever. For example, I don't understand how a useful consensus could be gotten from a combination of "counts-of-30" with "counts-of-as-many-as-I-can". But then, I'm not a statistician!

    Posted

  • yshish by yshish moderator, translator in response to Shiphrah's comment.

    To be honest I'm very curious and have spent many hours thinking about how that works as well πŸ˜‰ I've just asked the scientists to check this thread and leave their answer....

    Zuzi

    Posted

  • DZM by DZM admin

    I've wondered about this, too. I hope that Caitlin or one of the other scientists can answer why 30 is the "magic number" for counting!

    Posted

  • yshish by yshish moderator, translator in response to DZM's comment.

    I'm almost sure that the 30 comes from the survey made during the Beta testing.. But not sure about the rest 😦 like how do they work with the results... etc.

    Posted

  • Shiphrah by Shiphrah

    @yshish & @DZM is there any prospect of an answer to this? I'm still very eager to understand better. Thanks!

    p.s. It's not really why 30 vs 25 or 50, but why are such limited counts useful?

    Posted

  • yshish by yshish moderator, translator in response to Shiphrah's comment.

    Dear @Shiphrah

    I've asked Caitlin and Tom for a reply on this but they're perhaps too busy with their work now. Please, be patient, I'm sure they'll answer that as soon as they come back πŸ˜ƒ

    Cheers,

    Zuzi

    Posted

  • caitlin.black by caitlin.black moderator, scientist, admin

    Hey guys,

    30 was actually a number chosen by the Zooniverse developers so that nobody is overwhelmed by the number of penguins present in some of the images. Ideally, every image would be annotated for every penguin, but this isn't realistic- especially in the king penguin colonies. We though that 30 was a good number to limit the number of time spent on each image and in turn limit some of the frustration that may come with annotating hundreds of penguins in each image. The annotations from each image are eventually clustered so that, hopefully, even with only 30 penguins annotated, when the annotations from several users are combined, the end result is the same as if all the individuals had been annotated.

    Make sense?

    Posted

  • gardenmaeve by gardenmaeve moderator

    Yes- so do you want all of us to continue counting from the front, or will you be more likely to have an accurate final count if some start from the front, some the back, some from one side, etc?

    Posted

  • DZM by DZM admin

    Thanks, @caitlin.black -- !

    I also learned, looking back to when we launched this project, that we did a survey asking how many penguins people thought it would be reasonable to count. I think that the "30" number derived from that survey. I believe that selecting the "too many penguins to count" also contributes strongly to how we analyze the data.

    Posted

  • Shiphrah by Shiphrah

    Thanks for answering, @caitlin.black!

    But I'm still very puzzled. How can limited counts from several users be combined so that the end result is as if every bird had been marked?

    Suppose you have an image of 45 penguins and another of 245 (probably Kings). 5 users mark each image, stopping at 30. How can you combine those two clusters in a way that distinguishes between the smaller group and the larger? Maybe I'm obtuse and it's plain to everyone else, but to me it just doesn't make sense!

    Those shaded spaces that block out part of the picture, now that makes sense! (except that with Kings it often leaves 200+ birds). It's sort of like the common technique for estimating the size of large crowds, marking off a limited area, carefully counting those within it, then estimating the total area and multiplying. If most people are discouraged by big groups, you could divide the original image into smaller sections.

    I'm really not interested in the arbitrary number 30. What's bugging me is how you can crunch the numbers to make any arbitrary cut-off useful. I'm not a statistician nor a programmer, but perhapssomeone could explain it to laypeople like me, a blog perhaps?

    Posted

  • yshish by yshish moderator, translator in response to Shiphrah's comment.

    Hi, I'm in a hurry and not going to explain that.. Just wanted to say that you need much more than 5 classifiers for such a big number of penguins. I think that even if everyone would mark all 200 penguins on a single image, they would need more than 5. When you stop before marking all penguins, then more users are needed to finish the image ( like instead of you). That's how I understand to it. But better wait for the scientists when they're back online......

    Cheers,

    Zuzi

    Posted

  • gardenmaeve by gardenmaeve moderator

    Just a reminder: part of the purpose behind our painstaking counting and (identification where needed) is to work toward a software program that can handle accurate counting. Thus even if we only count 30 (or 2 or 50 or many more) each count helps.
    From the Science page tab on the home page: "With the help of the Computer Vision laboratory at the University of Oxford, we are working to develop a recognition tool by which computers can automatically count every penguin individual in an image. By working at the fruitful interface of ecology and computer vision, we hope to improve the management of data from imagery and answer novel questions about wildlife dynamics that would otherwise be impossible....Your notations on the images will also aid in β€œtraining” a computer to automatically recognize penguin individuals. With your help, we hope that future image analysis can be automated for a range of species and speed up the lengthy data extraction process to produce near real-time results." http://www.penguinwatch.org/#/science

    Posted

  • yshish by yshish moderator, translator in response to gardenmaeve's comment.

    Yes, that's true. But as far as I know the computer learns from pictures where all penguins have been classified and you can't teach it to mark them all without giving it a big amount of images which were classified properly. Hope I'm not wrong about this but that is how I understand it.

    So I think that at first they need to classify them all properly to teach the computer to do so alone. And then, when it is able to mark them, they can give it pictures where only a part has been classified to finish them.

    Please, correct me someone if I am wrong!

    Anyway, thank you for your post, @Gardenmaeve and happy Friday (the weekend's almost here!!!!!!!!!!! .)

    Zuzi

    Posted

  • gardenmaeve by gardenmaeve moderator

    I've volunteered in other such projects through Cornell, and somehow all of our classifications were helpful in programming the computers. Looking forward to more on this. Thanks, Zuzi.

    Posted

  • yshish by yshish moderator, translator

    ..I was thinking about it for a while again and I may be wrong about that.. I'm really curious where the true is and would be glad if someone explained that to us better :]

    Anyway, every single mark is definitely helpful for the science! Keep marking dear classifiers. You're awesome and important for the results!!!!

    Thank you for your hard work! Enjoy the weekend πŸ˜ƒ

    Zuzi

    Posted

  • anilam by anilam

    I can understand that. It may be more beneficial to have a more than 30 button leading to
    30-50,
    50-100
    100-300
    too many to guess.. then 30+ images can be analysed as a separate group. Making most of the citizen input.
    Just an idea, don't know if its feasible. Must say I stopped marking 40ish penguine images after a while and just put them as too many to count. ?? Which way is better for you.

    Posted

  • DZM by DZM admin in response to anilam's comment.

    As far as we can tell, the science team finds that the current system works and provides good data. (yes, science team?)

    But we definitely appreciate the brainstorm!

    Posted

  • yshish by yshish moderator, translator in response to anilam's comment.

    Hi @anilam

    Thanks for your ideas. I am not entirely sure what you meant by the benefit there (and who should it be for?). Are you asking for guess-buttons like one with 30-50, another one with 50-100 etc.? What would be the following analysing like to get the exact count? There are many users (like me) who try to mark every single penguin when possible so I think the current systems is useful enough. But I am interested in your idea as well, just need more details since I don't understand it well enough0:)

    Thanks,

    Zuzi

    Posted

  • gardenmaeve by gardenmaeve moderator

    For a brief but intriguing discussion of the number of animal identification volunteers required for confirmed data, based on the degree of agreement between volunteers, read the National Geographic blog about the Zooniverse "Snapshot Serengeti" project, here: http://proof.nationalgeographic.com/2014/11/26/hidden-cameras-reveal-the-secret-life-of-the-serengeti/?utm_source=Facebook&utm_medium=Social&utm_content=link_fb20141201pr-serengeti&utm_campaign=Content&sf6020285=1

    Posted

  • Nickypeng by Nickypeng moderator

    Thanks for that gardenmaeve. Interesting to see that the volunteers collectively are so accurate.

    Great pictures too - even if there aren't any penguins!

    Posted

  • penguinTom79 by penguinTom79 scientist, admin in response to Shiphrah's comment.

    Hi Guys,
    Sorry, just noticed this!
    This was on the advice of Zooniverse that we didn't want people to get bored. We cluster the results of several people to make sure that we've got everything. While we would prefer it if you mark everything, that's not always realistic and we want to keep everyone interested. So, even if they stop at 30, it still works!
    Cheers,
    Tom

    Posted

  • gardenmaeve by gardenmaeve moderator

    We penguin counters have longer attention spans than some others! πŸ˜„

    Posted

  • AvastMH by AvastMH moderator

    We are inspired by penguins - for every pebble they collect for a nest we can count 10 penguins I'm sure πŸ˜‰

    Posted

  • mhwallace01 by mhwallace01

    When I saw the message about marking thirty, I wondered if multiple people are marking the same photos.

    If that is the case, it is likely that combining the marks from multiple people will result in identifying more penguins in total for that image, since we all have different approaches to examining the photos.

    Is this the case?

    Cheers,

    Michelle

    Posted

  • yshish by yshish moderator, translator in response to mhwallace01's comment.

    Hi,

    Yes, as you said, each images is classified by multiple users as well as at other Zooniverse projects.

    Hope this answers your questions:)

    Cheers,

    Zuzi

    Posted

  • GizmoMischief by GizmoMischief

    Personally I always mark every penguin I can clearly see... and find the "30" pop-up quite annoying!

    Posted

  • AvastMH by AvastMH moderator

    I know what you mean Gizmomischief - I often wish I could just say 'I'm going to do way over 30' and just keep clicking away - but I guess that could be open to abuse. It WOULD be nice if it knew where I was clicking and didn't appear just there πŸ˜‰ But hey - we won the competitive tender against computers for this job - so perhaps they can't be quite clever enough? πŸ˜„

    Posted

  • yshish by yshish moderator, translator

    To make the classification process more clear to you all, I asked Caitlin for some more details.

    Each image is reviewed between 5 and 20 times (most get seen at least 10) by our volunteers, depending on when it was first reviewed and if there are any animals present in the image. Also, there are a total of 409,808 "subjects" (aka. images) that need to be classified. As of today, 73 % of those images have been classified and are considered complete, while 5 % are currently active, and 22 % are inactive but still need to be classified. Then a new data set will be uploaded and new images will be activated for the classification process πŸ˜ƒ

    Hopefully, we'll be able to finish the 27 % of the 'remaining' images as soon as possible. I can't wait to see images from the new data set!! πŸ˜ƒ

    Posted

  • gardenmaeve by gardenmaeve moderator

    Thank you, @yshish ! That's very good to know. I'm aiming for completion by April 16- anyone game to try?

    Posted

  • coldcounter by coldcounter

    I'm keen to see some new images ...... so I'm going back to classifying right now !!!! πŸ˜ƒ πŸ˜ƒ

    Posted

  • yshish by yshish moderator, translator

    Caitlin adds that we are just chipping away at 27 %, 5 % at a time. They hope to upload more data in April from the last field season. They're afraid that those 27 % won't be finished by April, so they probably won't wait for finishing them before uploading the new ones. However, when they add new images to the load, they come in at the back, so we must finish the old ones first before seeing the new ones anyway (uploading doesn't mean their activating for the classification).

    Indeed, we should try our best to help finish the current data as soon as possible so we could see the new ones.

    Let's finish them all!! πŸ˜„

    Posted

  • AvastMH by AvastMH moderator

    Let me at them - my mouse is primed and ready for action. In fact I own a spare computer mouse just in case the current one gets penguin fatigue πŸ˜‰ πŸ˜„

    Posted

  • gardenmaeve by gardenmaeve moderator

    On your mark, get set, start counting!

    (A mouse with penguin fatigue, too funny! πŸ˜„ πŸ˜„ )

    Posted

  • AvastMH by AvastMH moderator

    Well 5% of 409808 is 20490 which is about 10 days work for us. So we certainly have a month's worth to go..perhaps Easter will see us most of the way there? πŸ˜„

    Posted

  • gardenmaeve by gardenmaeve moderator

    Time for a progress report. @AvastMH have you kept track of our starting number in the end of February (or shall I travel back in time to find it)?

    Posted

  • AvastMH by AvastMH moderator

    Let me just check @gardenmaeve πŸ˜ƒ Got to nip to the other computer - back in 5 minutes πŸ˜ƒ

    Posted

  • gardenmaeve by gardenmaeve moderator

    Lovely- thank you. If you don't have it at hand I am content to dive back into time.

    Posted

  • AvastMH by AvastMH moderator

    No problem - I've got two dates from about then...23rd Feb or 1st March:

    23.2.16; since last reading (12 days back) 2813985
    24939 138 : 2078 per day 11+ folks per day

    01.03.16
    2822906 IMAGES CLASSIFIED Get started! 23354 VOLUNTEERS PARTICIPATING
    2813985-2822906= 8921 over 6 days = 1486 per day

    Looks like we've done lots - I'll let you do the numbers @gardenmaeve πŸ˜„

    In fact I'll do a log here from now onwards - once a month - will that be often enough? πŸ˜ƒ

    Posted

  • AvastMH by AvastMH moderator

    So I just recorded 2866435 less 2822906 at 1st March = 43529 over 4 weeks (less one day) 10882 per week! πŸ˜„ That's a LOT of penguins counted. THANK YOU ALL

    And THANKS @gardenmaeve for keeping an eye on what a great job the Peng-Zooites are doing - it's good isn't it!

    Posted

  • gardenmaeve by gardenmaeve moderator

    You're typing so fast I can't get a letter in edgewise! πŸ˜„
    Thanks for starting a log. I think you'll encourage us all.

    Posted

  • gardenmaeve by gardenmaeve moderator

    Yshish told us Caitlin said we were, "... just chipping away at 27 %, 5 % at a time." Looking back at your initial estimate of what we needed to accomplish, "Well 5% of 409808 is 20490..." to finish that 5% we were working on, so we've perhaps finished nearly half of what's left of the old data? I'm sleepy- does that make sense?

    Posted

  • AvastMH by AvastMH moderator

    Seems to make sense to me @gardenmaeve Then again I'm up late so hopefully I am not too sleepy! πŸ˜‰

    Posted