Closing data gaps with Lacuna Fund

Machine learning has shown enormous promise for social good, whether in helping respond to global health pandemics or reach citizens before natural disasters hit. But even as machine learning technology becomes increasingly accessible, social innovators still face significant barriers in their efforts to use this technology to unlock new solutions. From languages to health and agriculture, there is a lack of relevant, labeled data to represent and address the challenges that face much of the world’s population.

To help close this gap, Google.org is making a $2.5 million grant alongside The Rockefeller Foundation, Canada’s International Development Resource Center (IDRC) and Germany’s GiZ FAIR Forward to launch Lacuna Fund, the world’s first collaborative nonprofit effort to directly address this missing data. The Fund aims to unlock the power of machine learning by providing data scientists, researchers, and social entrepreneurs in low- and middle-income communities around the world with resources to produce labeled datasets that address urgent problems.  

Labeled data is a particular type of data that is useful in generating machine learning models: This data provides the “ground truth” that a model can use to guess about cases that it hasn’t yet seen. To create a labeled dataset, example data is systematically “tagged” by knowledgeable humans with one or more concepts or entities each one represents. For example, a researcher might label short videos of insects with their type; images of fungi with whether or not they are harmful to plants around them; or passages of Swahili text with the parts of speech that each word represents. In turn, these datasets could enable biologists to track insect migration; farmers to accurately identify threats to their crops; and Swahili speakers to use an automated text messaging service to get vital health information.  

Guided by committees of domain and machine learning experts and facilitated by Meridian Institute, the Fund will provide resources and support to produce new labeled datasets, as well as augment or update existing ones to be more representative, relevant and sustainable. The Fund’s initial work will focus on agriculture and underrepresented languages, but we welcome additional collaborators and anticipate the fund will grow in the years to come. And our work is bigger than just individual datasets: Lacuna Fund will focus explicitly on growing the capacity of local organizations to be data collectors, curators and owners. While following best practices for responsible collection, publication and use, we endeavor to make all datasets as broadly available as possible.

Thanks in part to the rise of cloud computing, in particular services like Cloud AutoML and libraries like TensorFlow, AI is increasingly able to help address society’s most pressing issues. Yet we’ve seen firsthand in our work on the Google AI Impact Challenge the gap between the potential of AI and the ability to successfully implement it. The need for data is quickly becoming one of the most salient barriers to progress. It’s our hope that the Fund provides not only a way for social sector organizations to fund high-impact, immediately-applicable data collection and labeling, but also a foundation from which changemakers can build a better future.

Image at top: A team from AI Challenge Grantee Wadhwani Institute for Artificial Intelligence in India is working with local farmers to manage pest damage to crop.

Read More