A new Computer Vision Model (v2.3) including 1,624 new taxa

We released a new computer vision model today. It has 74,135 taxa up from 72,511. This new model (v2.3) was trained on data exported last month on April 2nd and added 1,624 new taxa.

Taxa differences to previous model

The charts below summarize these new taxa using the same groupings we described in past release posts.

By category, most of these new taxa were insects and plants

Here are species level examples of new species added for each category:

Click on the links to see these taxa in the Explore page to see these samples rendered as species lists. Remember, to see if a particular species is included in the currently live computer vision model, you can look at the “About” section of its taxon page.

We couldn't do it without you

Thank you to everyone in the iNaturalist community who makes this work possible! Sometimes the computer vision suggestions feel like magic, but it’s truly not possible without people. None of this would work without the millions of people who have shared their observations and the knowledgeable experts who have added identifications.

In addition to adding observations and identifications, here are other ways you can help:

  • Share your Machine Learning knowledge: iNaturalist’s computer vision features wouldn’t be possible without learning from many colleagues in the machine learning community. If you have machine learning expertise, these are two great ways to help:
  • Participate in the annual iNaturalist challenges: Our collaborators Grant Van Horn and Oisin Mac Aodha continue to run machine learning challenges with iNaturalist data as part of the annual Computer Vision and Pattern Recognition conference. By participating you can help us all learn new techniques for improving these models.
  • Start building your own model with the iNaturalist data now: If you can’t wait for the next CVPR conference, thanks to the Amazon Open Data Program you can start downloading iNaturalist data to train your own models now. Please share with us what you’ve learned by contributing to iNaturalist on Github.
  • Donate to iNaturalist: For the rest of us, you can help by donating! Your donations help offset the substantial staff and infrastructure costs associated with training, evaluating, and deploying model updates. Thank you for your support!
Publicado el 12 de mayo de 2023 por loarie loarie

Comentarios

Sweeet!

Publicado por kevinfaccenda hace más de 1 año

This is one of my favorite days for Inat!

Publicado por yayemaster hace más de 1 año

Nice update. Why is https://www.inaturalist.org/taxa/908145 included in the new model since it only has 12 research grade observations with 29 photo's ? Is the minimum bar lowered or has this taxa been cleaned up recently?

Publicado por rudolphous hace más de 1 año

Btw, you might want to change the destination of the link to the description of the groupings of taxa in the stats - right now it links in several rounds to pages that just contain the same reference to an earlier post and no further explanation. By repeated clicking I assume the correct destination should be https://www.inaturalist.org/blog/69958-a-new-computer-vision-model-including-4-717-new-taxa

Publicado por jlisby hace más de 1 año

You should not be heroising the CVI platform. You do not know much time this one person spends trying correct incorrect/ridiculous CV idenifications. It mostly makes me want to leave the platform.

Publicado por oneanttofew hace más de 1 año

@oneanttofew While the CV certainly does make conspicuous mistakes, it is still generally quite accurate at getting things to genus / family level where they can then be improved by humans. I'm personally very proud of the CV and how much it has improved over the past two years.

Publicado por kevinfaccenda hace más de 1 año

CV is a tool. We need better onboarding for new people, how to evaluate CV options. It has most definitely improved in the years that I have been on iNat. And my patch is the almost half (okay one third) is - green stuff - just dump it in plants.

PS left a few comments - added to CV May 2023 - on the Cape Peninsula species. Since the info holds only while we can use the links embedded in this blog post.

Publicado por dianastuder hace más de 1 año

@rudolphous I think the rules for inclusion in the model may have changed, but the Help page (https://www.inaturalist.org/pages/help#cv-taxa) still lists having at least 100 observations, so that is confusing.

Publicado por cthawley hace más de 1 año

In between, I think it was 100 pictures - about 60 obs depending.

But that seems to have changed since? @loarie ?

Publicado por dianastuder hace más de 1 año

This could be wrong but I think more iconic species (which presumably have more photos on the internet) get included in the model even if they are well below the threshold.

E.g. The pygmy hippopotamus was included in this model despite having only 4 RG observations.

Publicado por mabuva2021 hace más de 1 año

@mabuva2021 @cthawley @rudolphous The model is also training on captive observations. E.g. there are 140 observation of pygmy hippos from zoos: https://www.inaturalist.org/observations?place_id=any&taxon_id=74192&verifiable=any

I don't think that the model traing on any data which isn't on iNat

Publicado por kevinfaccenda hace más de 1 año

Please add the button to drop this post from the dropbox page. I have read it.

Publicado por tonyrebelo hace más de 1 año
You should not be heroising the CVI platform.

I for one do think the CV AI is a hero! For all its mistakes it is brilliant on what it is trained for. To me it is already an indispensable tool. I foresee the day that when I post an ID, it will ask me - are you sure? It looks more like ...
(and I wish it already did it, for when I post - instead of a plant - Passerina the bird or Elegia the moth or Erica the spider).

Publicado por tonyrebelo hace más de 1 año

So, where is it released and does the model have an free/libre/open license? I miss a link to the download.

Publicado por davidak hace más de 1 año

It is, what we are now using on iNat.

Publicado por dianastuder hace más de 1 año

Absolutely, the CV is an awe inspiring acheivement. I tested once that it got 80% of Eristalis hoverfly observations right at species level. It won't be great for every taxon - (and obviously not for those species that are not included) - but it is great for the majority of observations. When the suggestion is poor, I doubt the user's best effort would have been better!

Publicado por matthewvosper hace más de 1 año

@rudolphous it looks like there has been some identification churn as there has been a considerable effort recently to update misidentified observations in genus Eucereon. the next model will likely remove E chalcodon in favor of some of its siblings. See https://www.inaturalist.org/journal/regisrafael/77128-eucereon-chalcodon-are-being-misidentified-as-eucereon-compositum for some info

Publicado por alexshepard hace más de 1 año

@matthewvosper That clearifies a lot. Thanks for pointing out.

Publicado por rudolphous hace más de 1 año

@alexshepard could you confirm whether "This has changed over time, but as of the model released in March 2020, taxa included in the computer vision training set must have at least 100 observations, at least 50 of which must have a community ID." is still the case, or whether the requirements are different now

Publicado por thebeachcomber hace más de 1 año

@thebeachcomber that's not the case anymore - we require 100 photos, but that can be spread across a number of observations. We prefer not to use too many photos from the same observation, but we will use a few from each, so the number of observations can be lower than 100.

We're always looking for ways to increase the robustness of the training dataset, so this criteria might change in the future.

For example, lately I've been concerned about taxa like https://www.inaturalist.org/taxa/387943-Costelytra-brunnea - it has enough photos to be included in the vision model, but it only has 3 observers and 3 identifiers, and 98 of the 100 observations of this taxon were made by a single observer, almost all in 2 months in late 2021. I fear there may not be enough diversity of photography equipment, observer behavior, and identifier opinions to say that we really know what this thing looks like well enough to teach a computer vision algorithm. If this persists and turns out to hurt the model (still TBD), we'll probably have to add a floor to the number of photographers and perhaps identifiers as well.

Publicado por alexshepard hace más de 1 año

thanks for that; so theoretically a species could enter the model if it had just 5 observations, with each of those having 20 photos?

Publicado por thebeachcomber hace más de 1 año

@thebeachcomber nope, as I said, we'll use a few from each, but we prefer to not use too many photos from the same observation.

Publicado por alexshepard hace más de 1 año

thanks again [I should've clarified, I meant without the human choice aspect, 5 obs should be possible based purely on the numerical requirement]

Publicado por thebeachcomber hace más de 1 año

'add a floor to the number of identifiers'

That would also be good for new species, where there may literally only be one or two identifiers.

Publicado por dianastuder hace más de 1 año

'add a floor to the number of identifiers' - except that as the AI gets trained on rarer and rarer species, the number of people capable of accurately identifying them gets less and less. Especially with rare plants or invertebrates. If the only identifier is the world expert, then surely that is good enough: otherwise we will require "supporting" IDs behind taxonomical specialists to get species onto the training dataset.

Publicado por tonyrebelo hace más de 1 año

I am torn both ways. It is not good for CV to offer a rare new species, instead of the genus where there are many species. For some reason the Species novum bounces to the top of the list.

Publicado por dianastuder hace más de 1 año

"For some reason the Species novum bounces to the top of the list." - any examples please: this should not be the case, unless the "Species novum" is the best match.

Publicado por tonyrebelo hace más de 1 año

Can't find it, but I remember an Indigofera? With a (new species) number, instead of a name. Which was subsequently offered by CV where it was not relevant.

@alexshepard is there a threshold for Seen Nearby? I tidy up distribution maps as it only takes one obs, with one ID to offer as Seen Nearby. That error multiplies fast. Threshold should at least be CID and Research Grade for That One?

Publicado por dianastuder hace más de 1 año

Agregar un comentario

Acceder o Crear una cuenta para agregar comentarios.