December 2023 – Page 15

AV 2.0, the Next Big Wayve in Self-Driving Cars

A new era of autonomous vehicle technology, known as AV 2.0, has emerged, marked by large, unified AI models that can control multiple parts of the vehicle stack, from perception and planning to control.

Wayve, a London-based autonomous driving technology company, is leading the surf.

In the latest episode of NVIDIA’s AI Podcast, host Katie Burke Washabaugh spoke with the company’s cofounder and CEO, Alex Kendall, about what AV 2.0 means for the future of self-driving cars.

Unlike AV 1.0’s focus on perfecting a vehicle’s perception capabilities using multiple deep neural networks, AV 2.0 calls for comprehensive in-vehicle intelligence to drive decision-making in real-world, dynamic environments.

The AI Podcast · Wayve CEO Alex Kendall on Making a Splash in Autonomous Vehicles – Ep. 209

Embodied AI — the concept of giving AI a physical interface to interact with the world — is the basis of this new AV wave.

Kendall pointed out that it’s a “hardware/software problem — you need to consider these things separately,” even as they work together. For example, a vehicle can have the highest-quality sensors, but without the right software, the system can’t use them to execute the right decisions.

Generative AI plays a key role, enabling synthetic data generation so AV makers can use a model’s previous experiences to create and simulate novel driving scenarios.

It can “take crowds of pedestrians and snow and bring them together” to “create a snowy, crowded pedestrian scene” that the vehicle has never experienced before.

According to Kendall, that will “play a huge role in both learning and validating the level of performance that we need to deploy these vehicles safely” — all while saving time and costs.

In June, Wayve unveiled GAIA-1, a generative world model for developing autonomous vehicles.

The company also recently announced LINGO-1, an AI model that allows passengers to use natural language to enhance the learning and explainability of AI driving models.

Looking ahead, the company hopes to scale and further develop its solutions, improving the safety of AVs to deliver value, build public trust and meet customer expectations. Kendall views embodied AI as playing a definitive role in the future of the AI landscape, pushing pioneers to “build better” and “build further” to achieve the “next big breakthroughs.”

Driver’s Ed: How Waabi Uses AI Simulation to Teach Autonomous Vehicles to Drive

Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience—the road to achieving this astronomical level of driving leads to the virtual world. Learn how Waabi uses powerful high-fidelity simulations to train and develop production-level autonomous vehicles.

Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Driving enjoyment and autonomous driving capabilities can complement one another in intelligent, sustainable vehicles. Learn about the automaker’s plans to unveil its third vehicle, the Polestar 3, the tech inside it, and what the company’s racing heritage brings to the intersection of smarts and sustainability.

GANTheftAuto: Harrison Kinsley on AI-Generated Gaming Environments

Humans playing games against machines is nothing new, but now computers can develop games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game Grand Theft Auto V.

SUBHEAD: Subscribe to the AI Podcast, Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Google at EMNLP 2023

Posted by Malaya Jules, Program Manager, Google

Google is proud to be a Diamond Sponsor of Empirical Methods in Natural Language Processing (EMNLP 2023), a premier annual conference, which is being held this week in Sentosa, Singapore. Google has a strong presence at this year’s conference with over 65 accepted papers and active involvement in 11 workshops and tutorials. Google is also happy to be a Major Sponsor for the Widening NLP workshop (WiNLP), which aims to highlight global representations of people, perspectives, and cultures in AI and ML. We look forward to sharing some of our extensive NLP research and expanding our partnership with the broader research community.

We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in NLP, and check out some of the scheduled booth activities (e.g., demos and Q&A sessions listed below). Visit the @GoogleAI X (Twitter) and LinkedIn accounts to find out more about the Google booth activities at EMNLP 2023.

Take a look below to learn more about the Google research being presented at EMNLP 2023 (Google affiliations in bold).

Board & Organizing Committee

Sponsorship Chair: Shyam Upadyay

Industry Track Chair: Imed Zitouni

Senior Program Committee: Roee Aharoni, Annie Louis, Vinodkumar Prabhakaran, Shruti Rijhwani, Brian Roark, Partha Talukdar

Google Research booth activities

This schedule is subject to change. Please visit the Google booth for more information.

Developing and Utilizing Evaluation Metrics for Machine Translation & Improving Multilingual NLP

Presenter: Isaac Caswell, Dan Deutch, Jan-Thorsten Peter, David Vilar Torres

Fri, Dec 8 | 10:30AM -11:00AM SST

Differentiable Search Indexes & Generative Retrieval

Presenter: Sanket Viabhav Mehta, Vinh Tran

Fri, Dec 8 | 3:30PM -4:00PM SST

Retrieval and Generation in a single pass

Presenter: Palak Jain, Livio Baldini Soares

Sat, Dec 9 | 10:30AM -11:00AM SST

Amplifying Adversarial Attacks

Presenter: Anu Sinha

Sat, Dec 9 | 12:30PM -1:45PM SST

Automate prompt design: Universal Self-Adaptive Prompting (see blog post)

Presenter: Xingchen Qian^*, Ruoxi Sun

Sat, Dec 9 | 3:30PM -4:00PM SST

Papers

SynJax: Structured Probability Distributions for JAX

Miloš Stanojević, Laurent Sartran

Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning

Clifton Poth, Hannah Sterz, Indraneil Paul, Sukannya Purkayastha, Leon Engländer, Timo Imhof, Ivan Vulić, Sebastian Ruder, Iryna Gurevych, Jonas Pfeiffer

DocumentNet: Bridging the Data Gap in Document Pre-training

Lijun Yu, Jin Miao, Xiaoyu Sun, Jiayi Chen, Alexander Hauptmann, Hanjun Dai, Wei Wei

AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-Powered Applications

Bhaktipriya Radharapu, Kevin Robinson, Lora Aroyo, Preethi Lahoti

CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks

Mete Ismayilzada, Debjit Paul, Syrielle Montariol, Mor Geva, Antoine Bosselut

Large Language Models Can Self-Improve

Jiaxin Huang^*, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han

Dissecting Recall of Factual Associations in Auto-Regressive Language Models

Mor Geva, Jasmijn Bastings, Katja Filippova, Amir Globerson

Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks

Alon Jacovi, Avi Caciularu, Omer Goldman, Yoav Goldberg

Selective Labeling: How to Radically Lower Data-Labeling Costs for Document Extraction Models

Yichao Zhou, James Bradley Wendt, Navneet Potti, Jing Xie, Sandeep Tata

Measuring Attribution in Natural Language Generation Models

Hannah Rashkin, Vitaly Nikolaev, Matthew Lamm, Lora Aroyo, Michael Collins, Dipanjan Das, Slav Petrov, Gaurav Singh Tomar, Iulia Turc, David Reitter

Inverse Scaling Can Become U-Shaped

Jason Wei^*, Najoung Kim, Yi Tay^*, Quoc Le

INSTRUCTSCORE: Towards Explainable Text Generation Evaluation with Automatic Feedback

Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei Li

On the Robustness of Dialogue History Representation in Conversational Question Answering: A Comprehensive Study and a New Prompt-Based Method

Zorik Gekhman, Nadav Oved, Orgad Keller, Idan Szpektor, Roi Reichart

Investigating Efficiently Extending Transformers for Long-Input Summarization

Jason Phang^*, Yao Zhao, Peter J Liu

DSI++: Updating Transformer Memory with New Documents

Sanket Vaibhav Mehta^*, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, Donald Metzler

MultiTurnCleanup: A Benchmark for Multi-Turn Spoken Conversational Transcript Cleanup

Hua Shen^*, Vicky Zayats, Johann C Rocholl, Daniel David Walker, Dirk Padfield

Findings of EMNLP

Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

Jiefeng Chen^*, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

A Comprehensive Evaluation of Tool-Assisted Generation Strategies

Alon Jacovi^*, Avi Caciularu, Jonathan Herzig, Roee Aharoni, Bernd Bohnet, Mor Geva

1-PAGER: One Pass Answer Generation and Evidence Retrieval

Palak Jain, Livio Baldini Soares, Tom Kwiatkowski

MaXM: Towards Multilingual Visual Question Answering

Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut

SDOH-NLI: A Dataset for Inferring Social Determinants of Health from Clinical Notes

Adam D. Lelkes, Eric Loreaux^*, Tal Schuster, Ming-Jun Chen, Alvin Rajkomar

Machine Reading Comprehension Using Case-based Reasoning

Dung Ngoc Thai, Dhruv Agarwal, Mudit Chaudhary, Wenlong Zhao, Rajarshi Das, Jay-Yoon Lee, Hannaneh Hajishirzi, Manzil Zaheer, Andrew McCallum

Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo, Tajuddeen Gwadabe, Clara E. Rivera, Jonathan H. Clark, Sebastian Ruder, David Ifeoluwa Adelani, Bonaventure F. P. Dossou, Abdou Aziz DIOP, Claytone Sikasote, Gilles HACHEME, Happy Buzaaba, Ignatius Ezeani, Rooweither Mabuya, Salomey Osei, Chris Chinenye Emezue, Albert Kahira, Shamsuddeen Hassan Muhammad, Akintunde Oladipo, Abraham Toluwase Owodunni, Atnafu Lambebo Tonja, Iyanuoluwa Shode, Akari Asai, Anuoluwapo Aremu, Ayodele Awokoya, Bernard Opoku, Chiamaka Ijeoma Chukwuneke, Christine Mwase, Clemencia Siro, Stephen Arthur, Tunde Oluwaseyi Ajayi, Verrah Akinyi Otiende, Andre Niyongabo Rubungo, Boyd Sinkala, Daniel Ajisafe, Emeka Felix Onwuegbuzia, Falalu Ibrahim Lawan, Ibrahim Said Ahmad, Jesujoba Oluwadara Alabi, CHINEDU EMMANUEL MBONU, Mofetoluwa Adeyemi, Mofya Phiri, Orevaoghene Ahia, Ruqayya Nasir Iro, Sonia Adhiambo

On Uncertainty Calibration and Selective Generation in Probabilistic Neural Summarization: A Benchmark Study

Polina Zablotskaia, Du Phan, Joshua Maynez, Shashi Narayan, Jie Ren, Jeremiah Zhe Liu

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation

Markus Freitag, Behrooz Ghorbani^*, Patrick Fernandes^*

Sources of Hallucination by Large Language Models on Inference Tasks

Nick McKenna, Tianyi Li, Liang Cheng, Mohammad Javad Hosseini, Mark Johnson, Mark Steedman

Don’t Add, Don’t Miss: Effective Content Preserving Generation from Pre-selected Text Spans

Aviv Slobodkin, Avi Caciularu, Eran Hirsch, Ido Dagan

What Makes Chain-of-Thought Prompting Effective? A Counterfactual Study

Aman Madaan^*, Katherine Hermann, Amir Yazdanbakhsh

Understanding HTML with Large Language Models

Izzeddin Gur, Ofir Nachum, Yingjie Miao, Mustafa Safdari, Austin Huang, Aakanksha Chowdhery, Sharan Narang, Noah Fiedel, Aleksandra Faust

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

Kundan Krishna^*, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

In-Context Learning Creates Task Vectors

Roee Hendel, Mor Geva, Amir Globerson

Pre-training Without Attention

Junxiong Wang, Jing Nathan Yan, Albert Gu, Alexander M Rush

MUX-PLMs: Data Multiplexing for High-Throughput Language Models

Vishvak Murahari, Ameet Deshpande, Carlos E Jimenez, Izhak Shafran, Mingqiu Wang, Yuan Cao, Karthik R Narasimhan

PaRaDe: Passage Ranking Using Demonstrations with LLMs

Andrew Drozdov^*, Honglei Zhuang, Zhuyun Dai, Zhen Qin, Razieh Rahimi, Xuanhui Wang, Dana Alon, Mohit Iyyer, Andrew McCallum, Donald Metzler^*, Kai Hui

Long-Form Speech Translation Through Segmentation with Finite-State Decoding Constraints on Large Language Models

Arya D. McCarthy, Hao Zhang, Shankar Kumar, Felix Stahlberg, Ke Wu

Unsupervised Opinion Summarization Using Approximate Geodesics

Somnath Basu Roy Chowdhury^*, Nicholas Monath, Kumar Avinava Dubey, Amr Ahmed, Snigdha Chaturvedi

SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data

Ruoxi Sun, Sercan O. Arik, Rajarishi Sinha, Hootan Nakhost, Hanjun Dai, Pengcheng Yin, Tomas Pfister

Retrieval-Augmented Parsing for Complex Graphs by Exploiting Structure and Uncertainty

Zi Lin, Quan Yuan, Panupong Pasupat, Jeremiah Zhe Liu, Jingbo Shang

A Zero-Shot Language Agent for Computer Control with Structured Reflection

Tao Li, Gang Li, Zhiwei Deng, Bryan Wang^*, Yang Li

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

Daniel Fried, Nicholas Tomlin, Jennifer Hu, Roma Patel, Aida Nematzadeh

Improving Classifier Robustness Through Active Generation of Pairwise Counterfactuals

Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Jilin Chen, Ed H. Chi, Alex Beutel

mmT5: Modular Multilingual Pre-training Solves Source Language Hallucinations

Jonas Pfeiffer, Francesco Piccinno, Massimo Nicosia, Xinyi Wang, Machel Reid, Sebastian Ruder

Scaling Laws vs Model Architectures: How Does Inductive Bias Influence Scaling?

Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

TaTA: A Multilingual Table-to-Text Dataset for African Languages

Sebastian Gehrmann, Sebastian Ruder, Vitaly Nikolaev, Jan A. Botha, Michael Chavinda, Ankur P Parikh, Clara E. Rivera

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Sebastian Ruder, Jonathan H. Clark, Alexander Gutkin, Mihir Kale, Min Ma, Massimo Nicosia, Shruti Rijhwani, Parker Riley, Jean Michel Amath Sarr, Xinyi Wang, John Frederick Wieting, Nitish Gupta, Anna Katanova, Christo Kirov, Dana L Dickinson, Brian Roark, Bidisha Samanta, Connie Tao, David Ifeoluwa Adelani, Vera Axelrod, Isaac Rayburn Caswell, Colin Cherry, Dan Garrette, Reeve Ingle, Melvin Johnson, Dmitry Panteleev, Partha Talukdar

q2d: Turning Questions into Dialogs to Teach Models How to Search

Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

Emergence of Abstract State Representations in Embodied Sequence Modeling

Tian Yun^*, Zilai Zeng, Kunal Handa, Ashish V Thapliyal, Bo Pang, Ellie Pavlick, Chen Sun

Evaluating and Modeling Attribution for Cross-Lingual Question Answering

Benjamin Muller^*, John Wieting, Jonathan H. Clark, Tom Kwiatkowski, Sebastian Ruder, Livio Baldini Soares, Roee Aharoni, Jonathan Herzig, Xinyi Wang

Weakly-Supervised Learning of Visual Relations in Multimodal Pre-training

Emanuele Bugliarello, Aida Nematzadeh, Lisa Anne Hendricks

How Do Languages Influence Each Other? Studying Cross-Lingual Data Sharing During LM Fine-Tuning

Rochelle Choenni, Dan Garrette, Ekaterina Shutova

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

Benjamin Minixhofer, Jonas Pfeiffer, Ivan Vulić

IC3: Image Captioning by Committee Consensus

David Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A Ross, John Canny

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Aviv Slobodkin, Omer Goldman, Avi Caciularu, Ido Dagan, Shauli Ravfogel

Evaluating Large Language Models on Controlled Generation Tasks

Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Wieting, Nanyun Peng, Xuezhe Ma

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration

Daniel Deutsch, George Foster, Markus Freitag

Transcending Scaling Laws with 0.1% Extra Compute

Yi Tay^*, Jason Wei^*, Hyung Won Chung^*, Vinh Q. Tran, David R. So^*, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani

Data Similarity is Not Enough to Explain Language Model Performance

Gregory Yauney^*, Emily Reif, David Mimno

Self-Influence Guided Data Reweighting for Language Model Pre-training

Megh Thakkar^*, Tolga Bolukbasi, Sriram Ganapathy, Shikhar Vashishth, Sarath Chandar, Partha Talukdar

ReTAG: Reasoning Aware Table to Analytic Text Generation

Deepanway Ghosal, Preksha Nema, Aravindan Raghuveer

GATITOS: Using a New Multilingual Lexicon for Low-Resource Machine Translation

Alex Jones^*, Isaac Caswell, Ishank Saxena

Video-Helpful Multimodal Machine Translation

Yihang Li, Shuichiro Shimizu, Chenhui Chu, Sadao Kurohashi, Wei Li

Symbol Tuning Improves In-Context Learning in Language Models

Jerry Wei^*, Le Hou, Andrew Kyle Lampinen, Xiangning Chen^*, Da Huang, Yi Tay^*, Xinyun Chen, Yifeng Lu, Denny Zhou, Tengyu Ma^*, Quoc V Le

“Don’t Take This Out of Context!” On the Need for Contextual Models and Evaluations for Stylistic Rewriting

Akhila Yerukola, Xuhui Zhou, Elizabeth Clark, Maarten Sap

QAmeleon: Multilingual QA with Only 5 Examples

Priyanka Agrawal, Chris Alberti, Fantine Huot, Joshua Maynez, Ji Ma, Sebastian Ruder, Kuzman Ganchev, Dipanjan Das, Mirella Lapata

Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision

Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil Zeghidour

AnyTOD: A Programmable Task-Oriented Dialog System

Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu

Selectively Answering Ambiguous Questions

Jeremy R. Cole, Michael JQ Zhang, Daniel Gillick, Julian Martin Eisenschlos, Bhuwan Dhingra, Jacob Eisenstein

PRESTO: A Multilingual Dataset for Parsing Realistic Task-Oriented Dialogs (see blog post)

Rahul Goel, Waleed Ammar, Aditya Gupta, Siddharth Vashishtha, Motoki Sano, Faiz Surani^*, Max Chang, HyunJeong Choe, David Greene, Chuan He, Rattima Nitisaroj, Anna Trukhina, Shachi Paul, Pararth Shah, Rushin Shah, Zhou Yu

LM vs LM: Detecting Factual Errors via Cross Examination

Roi Cohen, May Hamri, Mor Geva, Amir Globerson

A Suite of Generative Tasks for Multi-Level Multimodal Webpage Understanding

Andrea Burns^*, Krishna Srinivasan, Joshua Ainslie, Geoff Brown, Bryan A. Plummer, Kate Saenko, Jianmo Ni, Mandy Guo

AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages

Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Said Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Alipio Jorge, Pavel Brazdil, Felermino D. M. A. Ali, Davis David, Salomey Osei, Bello Shehu-Bello, Falalu Ibrahim Lawan, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Destaw Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, Stephen Arthur

Optimizing Retrieval-Augmented Reader Models via Token Elimination

Moshe Berchansky, Peter Izsak, Avi Caciularu, Ido Dagan, Moshe Wasserblat

SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Elizabeth Clark, Shruti Rijhwani, Sebastian Gehrmann, Joshua Maynez, Roee Aharoni, Vitaly Nikolaev, Thibault Sellam, Aditya Siddhant, Dipanjan Das, Ankur P Parikh

GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Joshua Ainslie, James Lee-Thorp, Michiel de Jong^*, Yury Zemlyanskiy, Federico Lebron, Sumit Sanghai

CoLT5: Faster Long-Range Transformers with Conditional Computation

Joshua Ainslie, Tao Lei, Michiel de Jong, Santiago Ontanon, Siddhartha Brahma, Yury Zemlyanskiy, David Uthus, Mandy Guo, James Lee-Thorp, Yi Tay, Yun-Hsuan Sung, Sumit Sanghai

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen

Universal Self-Adaptive Prompting (see blog post)

Xingchen Wan^*, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Martin Eisenschlos, Sercan O. Arik, Tomas Pfister

TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models

Zorik Gekhman, Jonathan Herzig, Roee Aharoni, Chen Elkind, Idan Szpektor

Hierarchical Pre-training on Multimodal Electronic Health Records

Xiaochen Wang, Junyu Luo, Jiaqi Wang, Ziyi Yin, Suhan Cui, Yuan Zhong, Yaqing Wang, Fenglong Ma

NAIL: Lexical Retrieval Indices with Efficient Non-Autoregressive Decoders

Livio Baldini Soares, Daniel Gillick, Jeremy R. Cole, Tom Kwiatkowski

How Does Generative Retrieval Scale to Millions of Passages?

Ronak Pradeep^*, Kai Hui, Jai Gupta, Adam D. Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q. Tran

Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets

Irina Bejan^*, Artem Sokolov, Katja Filippova

Workshops

The Seventh Widening NLP Workshop (WiNLP)

Major Sponsor

Organizers: Sunipa Dev

Panelist: Preethi Lahoti

The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC)

Invited Speaker: Bernd Bohnet

The 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS)

Organizer: Geeticka Chauhan

Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics (SpLU-RoboNLP)

Invited Speaker: Andy Zeng

Natural Language Generation, Evaluation, and Metric (GEM)

Organizer: Elizabeth Clark

The First Arabic Natural Language Processing Conference (ArabicNLP)

Organizer: Imed Zitouni

The Big Picture: Crafting a Research Narrative (BigPicture)

Organizer: Nora Kassner, Sebastian Ruder

BlackboxNLP 2023: The 6th Workshop on Analysing and Interpreting Neural Networks for NLP

Organizer: Najoung Kim

Panelist: Neel Nanda

The SIGNLL Conference on Computational Natural Language Learning (CoNLL)

Co-Chair: David Reitter

Areas and ACs: Kyle Gorman (Speech and Phonology), Fei Liu (Natural Language Generation)

The Third Workshop on Multi-lingual Representation Learning (MRL)

Organizer: Omer Goldman, Sebastian Ruder

Invited Speaker: Orhan Firat

Tutorials

Creative Natural Language Generation

Organizer: Tuhin Chakrabarty^*

* Work done while at Google

TensorFlow 2.15 update: hot-fix for Linux installation issue

Posted by the TensorFlow team

We are releasing a hot-fix for an installation issue affecting the TensorFlow installation process. The TensorFlow 2.15.0 Python package was released such that it requested tensorrt-related packages that cannot be found unless the user installs them beforehand or provides additional installation flags. This dependency affected anyone installing TensorFlow 2.15 alongside NVIDIA CUDA dependencies via pip install tensorflow[and-cuda]. Depending on the installation method, TensorFlow 2.14 would be installed instead of 2.15, or users could receive an installation error due to those missing dependencies.

To solve this issue as quickly as possible, we have released TensorFlow 2.15.0.post1 for the Linux x86_64 platform. This version removes the tensorrt Python package dependencies from the tensorflow[and-cuda] installation method. Support for TensorRT is otherwise unaffected as long as TensorRT is already installed on the system. Now, pip install tensorflow[and-cuda] works as originally intended for TensorFlow 2.15.

Using .post1 instead of a full minor release allowed us to push this release out quickly. However, please be aware of the following caveat: for users wishing to pin their Python dependency in a requirements file or other situation, under Python’s version specification rules, tensorflow[and-cuda]==2.15.0 will not install this fixed version. Please use ==2.15.0.post1 to specify this exact version on Linux platforms, or a fuzzy version specification, such as ==2.15.^*, to specify the most recent compatible version of TensorFlow 2.15 on all platforms.

Enable faster training with Amazon SageMaker data parallel library

Large language model (LLM) training has become increasingly popular over the last year with the release of several publicly available models such as Llama2, Falcon, and StarCoder. Customers are now training LLMs of unprecedented size ranging from 1 billion to over 175 billion parameters. Training these LLMs requires significant compute resources and time as hundreds to thousands of graphics processing units (GPUs) must be used to handle today’s vast training datasets and model sizes. One bottleneck in distributed training can be GPU communication handled by the NVIDIA Collective Communication Library (NCCL). In some large-distributed training jobs, more time can be spent on inter-GPU communication than actual GPU computation. To alleviate the GPU communication bottleneck and enable faster training, Amazon SageMaker is excited to announce an optimized AllGather collective operation as part of the SageMaker distributed data parallel library (SMDDP). AllGather is the most used collective operation in popular memory-efficient data parallelism solutions like DeepSpeed Zero Redundancy Optimizer (ZeRO) and Fully Sharded Data Parallelism (FSDP), and it is the main contributor to GPU communication overhead. In this post, we show a high-level overview of how SMDDP works, how you can enable SMDDP in your Amazon SageMaker training scripts, and the performance improvements you can expect.

Solution overview

Traditional data parallel training involves replicating an entire model across multiple GPUs, with each model training on different shards of data from the dataset. During the backward pass, gradients are averaged among GPU workers so that each model replica is updated with the same gradient values despite them being trained with different data shards. This technique allows much faster training on vast datasets by parallelizing the consumption of training data. However, some of today’s large models (e.g., Llama2 70B) are far too large to fit entirely within GPU memory, which makes traditional data parallelism unusable. To continue reaping the benefits of data parallelism while overcoming limited GPU memory, sharded data parallel solutions such as DeepSpeed ZeRO, PyTorch FSDP, and the Amazon SageMaker model parallelism library have grown in popularity.

In sharded data parallelism, rather than replicating the entire model on GPU workers, the model parameters, gradients, and optimizer states are broken up and distributed (i.e., sharded) across GPUs in the training job. To perform forward and backward pass computation, parameters are gathered from shards on other GPU workers to form one or more model layers. After computation is performed, these layers are then freed from memory to allow for the next set of layers to be gathered. Note that there are variants of sharded data parallelism where only the optimizer states and gradients are sharded, but not the model parameters. AllGather is still used in this type of sharded data parallelism, but only prior to forward pass computation in order to gather model parameters that have been updated by different gradient or optimizer state shards from other GPU workers. Refer to the different DeepSpeed ZeRO stages and the SHARD_GRAD_OP FSDP sharding strategy for more detail.

An AllGather collective operation is performed each time parameters are unsharded—NCCL provides the standard open-source implementation of this routine. As shown in the following, each GPU worker involved in the AllGather starts off with an input buffer and ends up with all of the input buffers from other workers concatenated together. When AllGather is used in sharded data parallelism, the input buffers contain the model parameter shards and the large output buffers contain one or more model layers materialized from the other shards.

Although NCCL is typically used for AllGather in distributed training, its underlying low-level implementation isn’t tailored to the networking infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) instances, and thus its performance can slow down end-to-end training. The SMDDP library is a collective communication library for NVIDIA GPUs that serves as a drop-in replacement for NCCL and provides better performance for distributed training jobs with PyTorch. Specifically, SMDDP provides an optimized implementation of AllGather for p4d/p4de instance types.

Since collective operations like AllGather block forward and backward pass computation, faster execution of these operations directly translates into shorter end-to-end training time with no side effects on convergence. Other collective operations that’re used less frequently in sharded data parallel training are handled by falling back to NCCL.

Walkthrough

AWS-optimized AllGather

AWS-optimized AllGather uses the following techniques to achieve better performance on AWS infrastructure compared to NCCL:

We move data between instances via Elastic Fabric Adapter (EFA) network with an all-to-all communication pattern. EFA is AWS’s low-latency and high-throughput network solution, and an all-to-all pattern for inter-node network communication is more tailored to the characteristics of EFA and AWS’ network infrastructure by requiring fewer packet hops compared to NCCL’s ring or tree communication pattern.
GDRCopy to coordinate local NVLink and EFA network traffic. GDRCopy is a library that provides low-latency communication between CPU processes and GPU CUDA kernels. With this technology, we’re able to pipeline the intra-node and inter-node data movement.
Reduced usage of GPU streaming multiprocessors to give back more compute power to model kernels. AWS P4d/P4de instances are equipped with NVIDIA A100 GPUs each of which has 108 streaming multiprocessors. While NCCL takes up to 24 streaming multiprocessors to execute collectives, SMDDP Collectives only use up to nine streaming multiprocessors. The saved streaming multiprocessors can be picked up by model compute kernels for quicker execution.

Usage

SMDDP collectives natively integrates with PyTorch through the process group abstraction in the torch.distributed module. A process group defines the interfaces for common collective operations such as AllGather, ReduceScatter, AllReduce, etc. Users can write generic distributed code and then choose the underlying backend, which provides the implementation for these operations based on the compute device used. CPU training jobs often use the gloo or mpi backend while NVIDIA GPUs use the nccl backend.

The SMDDP library comes into the picture by registering itself as a custom backend in the process group abstraction. This is done by the import statement, which is shown in the following code snippets. Then, when selecting the backend for your GPU-based distributed training job, just replace nccl with smddp. The smddp backend abides by the same semantics as the nccl backend and supports the same training scenarios.

DeepSpeed

import smdistributed.dataparallel.torch.torch_smddp
deepspeed.init_distributed(dist_backend="smddp")  # replacing "nccl"

FSDP

import smdistributed.dataparallel.torch.torch_smddp
dist.init_process_group(backend="smddp")  # replacing "nccl"

Benchmarks

We benchmarked standalone AllGather performance where the collective operation is run in isolation without any model training. Below is a sample result on 32 p4d instances comparing NCCL and SMDDP AllGather. The X-axis represents the output size of AllGather, and the Y-axis represents the network utilization rate of p4d’s 400 Gbps EFA network. The 4 sub-graphs represent the common communication group patterns where we have 1, 2, 4, and 8 ranks per p4d instance participating in the AllGather operation, respectively.

These microbenchmarks show that SMDDP outperforms NCCL with two key characteristics:

The peak performance of SMDDP (approximately 90% bandwidth utilization) is higher than that of NCCL (approximately 80% bandwidth utilization) in all configurations.
SMDDP reaches the peak performance at much smaller buffer sizes than NCCL. This particularly improves training speeds for smaller models or when the user sets a small AllGather buffer size in DeepSpeed (where AllGather size need not be equal to layer size).

Model training benchmarks

In large-scale training jobs where GPU communication is a significant bottleneck, SMDDP can markedly improve training speeds, as measured by model TFLOPS/GPU.

Configuration			Performance
Model/Training	Cluster	Sharded Data Parallelism Solution	Model TFLOPS/GPU with NCCL	Model TFLOPS/GPU with SMDDP	% speedup
13B Llama2 Seq length: 4096 Global batch size: 4M tokens	64 p4d.24xlarge nodes (512 NVIDIA A100 GPUs)	PyTorch FSDP	97.89	121.85	24.40%
65B GPT-NeoX Seq length: 2048 Global batch size: 4M tokens	64 p4d.24xlarge nodes (512 NVIDIA A100 GPUs)	DeepSpeed ZeRO Stage 3*	99.23	108.66	9.50%

*EleutherAI’s Megatron-DeepSpeed repository was used. Tensor parallelism was also enabled with a tensor-parallel degree of eight.

Note: Model TFLOPS/GPU is based on the Model FLOPS Utilization calculation defined in the paper here and benchmark figures elsewhere may cite hardware TFLOPS/GPU as the performance metric. Hardware TFLOPS/GPU can be approximated as 4/3 x model TFLOPS/GPU.

Conclusion

In this post, we showed you how to significantly speed up sharded data parallel training jobs on Amazon SageMaker with just two lines of code change. Large-scale distributed training is becoming increasingly ubiquitous with the emergence or LLMs, but with this scale comes high costs. By reducing the communication bottleneck between GPUs, SMDDP helps you train faster at scale and save on compute resources. You can find more SMDDP examples with sharded data parallel training in the Amazon SageMaker Examples GitHub repository.

About the Authors

Apoorv Gupta is a Software Development Engineer at AWS, focused on building optimal deep learning systems for AWS infrastructure and hardware. He is interested in distributed computing, deep learning systems, and ML accelerators. Outside of work, Apoorv enjoys traveling, hiking, and video games.

Karan Dhiman is a Software Development Engineer at AWS, based in Toronto, Canada. He is very passionate about the machine learning space and building solutions for accelerating distributed computed workloads.

Ruhan Prasad is a Software Development Engineer at AWS who is working on making distributed deep learning training faster, cheaper, and easier to use on SageMaker. Outside of work, Ruhan enjoys playing tennis, traveling, and cooking.

Zhaoqi Zhu is a Senior Software Development Engineer at AWS, passionate about distributed systems and low level optimizations. He enjoys watching soccer matches while drinking (non-diet) soda.

Use custom metadata created by Amazon Comprehend to intelligently process insurance claims using Amazon Kendra

Structured data, defined as data following a fixed pattern such as information stored in columns within databases, and unstructured data, which lacks a specific form or pattern like text, images, or social media posts, both continue to grow as they are produced and consumed by various organizations. For instance, according to International Data Corporation (IDC), the world’s data volume is expected to increase tenfold by 2025, with unstructured data accounting for a significant portion. Enterprises may want to add custom metadata like document types (W-2 forms or paystubs), various entity types such as names, organization, and address, in addition to the standard metadata like file type, date created, or size to extend the intelligent search while ingesting the documents. The custom metadata helps organizations and enterprises categorize information in their preferred way. For example, metadata can be used for filtering and searching. Customers can create the custom metadata using Amazon Comprehend, a natural-language processing (NLP) service managed by AWS to extract insights about the content of documents, and ingest it into Amazon Kendra along with their data into the index. Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by Machine Learning (AWS). The custom metadata can then be used to enrich the content for better filtering and facet capabilities. In Amazon Kendra, facets are scoped views of a set of search results. For example, you can provide search results for cities across the world, where documents are filtered by a specific city with which they are associated. You could also create facets to display results by a specific author.

Insurance companies are burdened with increasing numbers of claims that they must process. Additionally, the complexity of claims processing is also increasing due to the diverse types of insurance documents involved, and custom entities in each of these documents. In this post, we describe a use case for custom content enrichment for insurance providers. The insurance provider receives payout claims from the beneficiary’s attorney for different insurance types, such as home, auto, and life insurance. In this use case, the documents received by the insurance provider do not contain any metadata that allows searching the content based on certain entities and classes. The insurance provider wants to filter Kendra content based on custom entities and classes specific to their business domain. This post illustrates how you can automate and simplify metadata generation using custom models by Amazon Comprehend. The metadata generated can be customized during the ingestion process with Amazon Kendra Custom Document Enrichment (CDE) custom logic.

Let’s look at a few examples of Amazon Kendra search with or without filtering and facets capabilities.

In the following screenshot, Amazon Kendra provides a search result but there is no option to further narrow down the search results by using any filters.

The following screenshot shows Amazon Kendra search results can be filtered by using different facets like Law Firm, Policy Numbers, created by custom metadata to narrow down the search results.

The solution discussed in this post can easily be applied to other businesses/use-cases as well, such as healthcare, manufacturing, and research.

Solution overview

In this proposed solution, we will 1) classify insurance claims submissions into various classes, and 2) retrieve insurance-specific entities from these documents. When this is complete, the document can be routed to the appropriate department or downstream process.

The following diagram outlines the proposed solution architecture.

Amazon Comprehend custom classification API is used to organize your documents into categories (classes) that you define. Custom classification is a two-step process. First, you train a custom classification model (also called a classifier) to recognize the classes that are of interest to you. Then, you use your model to classify any number of document sets.

Amazon Comprehend custom entity recognition feature is used to identify specific entity types (names of insurance company, names of the insurer, policy number) beyond what is available in the generic entity types by default. Building a custom entity recognition model is a more effective approach than using string matching or regular expressions to extract entities from documents. A custom entity recognition model can learn the context where those names are likely to appear. Additionally, string matching will not detect entities that have typos or follow new naming conventions, while this is possible using a custom model.

Before diving deeper, let’s take a moment to explore Amazon Kendra. Amazon Kendra is a highly accurate and easy-to-use enterprise search service powered by machine learning. It allows users to find the information they need within the vast amount of content spread across their organization, ranging from websites and databases to intranet sites. We will first create an Amazon Kendra index to ingest the documents. While ingesting the data, it’s essential to consider the concept of Custom Data Enrichment (CDE). CDE enables you to enhance the search capability by incorporating external knowledge into the search index. For more information, refer to Enriching your documents during ingestion. In this post, the CDE logic invokes the custom APIs of Amazon Comprehend to enrich the documents with identified classes and entities. Finally, we use the Amazon Kendra search page to show how the metadata enhanced the search capability by adding faceting and filtering capabilities.

The high-level steps to implement this solution are as follows:

Train the Amazon Comprehend custom classifier using training data
Train the Amazon Comprehend custom entity recognition using training data
Create the Amazon Comprehend custom classifier and custom entity recognition endpoints
Create and deploy a Lambda function for post extraction enrichment
Create and populate the Amazon Kendra index
Use the extracted entities to filter searches in Amazon Kendra

We have also provided a sample application in the GitHub repo for reference.

Data security and IAM considerations

With security as the top priority, this solution follows the least privilege permissions principle for the services and features used. The IAM role used by Amazon Comprehend custom classification and custom entity recognition has permissions to access the dataset from the test bucket only. The Amazon Kendra service has access to a specific S3 bucket and Lambda function used to call comprehend APIs. The Lambda function has permissions to call the Amazon Comprehend APIs only. For more information, review section 1.2 and 1.3 in the notebook.

We recommend you do the following in a non-production environment prior to implementing the solution in the production environment.

Train the Comprehend custom classifier using training data

Amazon Comprehend Custom Classification supports two data format types for annotation files:

Using the CSV option
Using the Augment manifest file option with the help of using SageMaker Ground Truth to Label Data

Since our data is already labeled and stored in CSV files, we will use the CSV file format for the annotation file as an example. We have to provide the labeled training data as UTF-8 encoded text in a CSV file. Do not include a header row in the CSV file. Adding a header row in your file may cause runtime errors. An example to the training data CSV file is as follows:

CLASS, Text of document 1
CLASS, Text of document 2

To prepare classifier training data, refer to Preparing classifier training data. For each row in the CSV file, the first column contains one or more class labels. A class label can be any valid UTF-8 string. We recommend using clear class names that don’t overlap in meaning. The name can include white space, and can consist of multiple words connected by underscores or hyphens. Do not leave any space characters before or after the commas that separate the values in a row.

Next, you will train either using Multi-class mode or Multi-label mode. Specifically, in multi-class mode, classification assigns one class for each document, while in multi-label mode, individual classes represent different categories that aren’t mutually exclusive. In our case we will be using the Multi-Class mode for Plain-text models.

You can prepare separate training and testing datasets for Amazon Comprehend custom classifier training and model evaluation. Or, only provide one dataset for both training and testing. Comprehend will automatically select 10% of your provided dataset to use as testing data. In this example, we are providing separate training and testing datasets.

The following example shows a CSV file containing the class names associated with the various documents.

Document format – Type of Insurance, Content of document 1

When the custom classification model is trained, it can capture different classes of insurance on the documents (Home, Auto, or Life insurance).

Train the Amazon Comprehend custom entity recognizer (NER) using training data

The training dataset for Amazon Comprehend Custom Entity Recognition (NER) can be prepared in one of two different ways:

Annotations – Provides a data set that contains the annotated entities for mode training
Entity lists (plain text only) – Provides a list of entities and their label type (such as “Insurance company names”) and a set of unannotated documents containing those entities for model training

For more information, refer to Preparing entity recognizer training data.

When training a model using entity list, we need to provide two pieces of information: a list of entity names with their associated custom entity types and a collection of unannotated documents in which the entities appear.

Automatic training requires having two types of information: sample documents and the entity list or annotations. Once the recognizer is trained, you can use it to detect custom entities in your documents. You can quickly analyze a small body of text in real time, or you can analyze a large set of documents with an asynchronous job.

You can prepare separate training and testing datasets for Amazon Comprehend custom entity recognizer training and model evaluation. Or provide only one dataset for both training and testing. Amazon Comprehend will automatically select 10% of your provided dataset to use as testing data. In the below example, we specified the training dataset as Documents.S3Uri under InputDataConfig.

The following example shows a CSV file containing the of entities:

Once the custom entities (NER) model is trained, it will be able to extract the various entities like “PAYOUT“, “INSURANCE_COMPANY“, “LAW_FIRM“, “POLICY_HOLDER_NAME“, “POLICY_NUMBER“.

Create the Amazon Comprehend custom classifier and custom entities (NER) endpoints

Amazon Comprehend’s endpoints make your custom models available for real-time classification. After you create an endpoint, you can make changes to it as your business needs evolve. For example, you can monitor your endpoint utilization and apply auto scaling to automatically set endpoint provisioning to fit your capacity needs. You can manage all your endpoints from a single view, and when you no longer need an endpoint, you can delete it to save costs. Amazon Comprehend support both synchronous and asynchronous options, if real-time classification isn’t required for your use case, you can submit a batch job to Amazon Comprehend for asynchronous data classification.

For this use case, you create an endpoint to make your custom model available for real-time analysis.

To meet your text processing needs, you assign inference units to the endpoint, and each unit allows a throughput of 100 characters per second. You can then adjust the throughput up or down.

Create and deploy a Lambda function for post extraction enrichment

The post-extraction Lambda function allows you to implement the logic to process the text extracted by Amazon Kendra from the ingested document. The post-extraction function we configured implements the code to invoke Amazon Comprehend to detect custom entities and custom classifying the documents from the text extracted by Amazon Kendra, and uses them to update the document metadata, which is presented as facets in an Amazon Kendra search. The function code is embedded in the notebook. The PostExtractionLambda code works as follows:

Splits the page text into sections that do not exceed the max byte length limit of the comprehend detect_entities API. (See Limits ).
NOTE the script uses a naive character length splitting algorithm for simplicity – production use cases should implement overlapping or sentence boundary splits, based on UTF8 byte length.
For each section of the text, calls the comprehend real-time endpoints for custom entities and custom classifier to detect the following entity types: [“PAYOUT“, “INSURANCE_COMPANY“, “LAW_FIRM“, “POLICY_HOLDER_NAME“, “POLICY_NUMBER“, “INSURANCE_TYPE“].
Filters out detected entities that are below the confidence score threshold. We are using 0.50 threshold which means only entities with confidence 50% and more will be used. This can be tuned based on the use case and requirements.
Tracks the frequency count of each entity.
Selects only the top N (10) unique entities for each page, based on frequency of occurrence.
For document classification, the multi-class classifier assigns only one class for each document. In this Lambda function, the documents will be classified as Auto Insurance, Home Insurance, or Life Insurance.

#The function to read the input text and detect entities in it using Comprehend 
def entity_detector(doc_text):
    #List of JSON objects to store entities
    entity_data = dict()
    #List of observed text strings recognized as categories
    category_text = dict()
    #Frequency of each text string
    text_frequency = dict()
    for et in categories:
        entity_data[ et ] = []
        category_text[ et ] = []
        text_frequency[ et ] = dict()
    
    #Make detect_entities_v2 call in a loop to work with the text limit
    for i in range(0, len(doc_text), compre_text_size):
        try:
            entities = compre.detect_entities(Text=doc_text[i:i+compre_text_size], LanguageCode='en', EndpointArn=endpoint_custom_entity)
            
        except Exception as e:
            logger.info("Exiting - detect_entities_v2 terminated with exception")
            return []
        for e in entities["Entities"]:
            #For each of the recognized entities take only those that have confidence score higher than min_score, 
            #are printable, dont contain quotes and are previously unseen
            if ((e["Score"] > min_score) and (e["Text"].isprintable()) and (not '"' in e["Text"]) and (not e["Text"].upper() in category_text[e["Type"]])):
                #Append the text to entity data to be used for a Kendra custom attribute
                entity_data[e["Type"]].append(e["Text"])
                #Keep track of text in upper case so that we don't treat the same text written in different cases differently
                category_text[e["Type"]].append(e["Text"].upper())
                #Keep track of the frequency of the text so that we can take the text with highest frequency of occurrance
                text_frequency[e["Type"]][e["Text"].upper()] = 1
            elif (e["Text"].upper() in category_text[e["Type"]]):
                #Keep track of the frequency of the text so that we can take the text with highest frequency of occurrance
                text_frequency[e["Type"]][e["Text"].upper()] += 1
    #The Kendra attribute metadata JSON object to be populated
    metadata = dict()
    for et in categories:
        metadata[et] = []
        #Take at most elimit number of recognized text strings having the highest frequency of occurrance
        el = [pair[0] for pair in sorted(text_frequency[et].items(), key=lambda item: item[1], reverse=True)][0:elimit]
        for d in entity_data[et]:
            if (d.upper() in el):
                metadata[et].append(d)
    for md in metadata:
        metaUL.append({
            "name": md,
            "value": {
                "stringListValue": metadata[md]
            }
        })
    return metaUL

Note that as of this writing, CDE only supports synchronous calls or if it has to be asynchronous, then an explicit wait loop is needed. For post extraction Lambda the max execution time is 1 min. The Lambda custom logic can be changed based on the requirements that fit your use case.

Create and populate the Amazon Kendra index

In this step, we will ingest the data to the Amazon Kendra index and make it searchable for the users. During the ingestion, we will use the Lambda function created in the previous step as a post extraction step and the Lambda function will call the custom classification and custom entity recognition (NER) endpoints to create the custom metadata fields.

The high-level steps to implement this solution are as follows:

Create Amazon Kendra Index.
Create Amazon Kendra Data source – There are different data sources which can be used to ingest dataset. In this post we are using an S3 bucket.
Create Facets Law_Firm, Payout, Insurance_Company, Policy_Number, Policy_Holder_Name, Insurance_Type with string type as ‘STRING_LIST_VALUE’.
Create Kendra CDE and point it to the post-extraction Lambda function previously created.
Perform the sync process to ingest the dataset.

Once completed, you can populate the index with the insurance data, using the Kendra CDE with post extraction lambda, you can filter searches based on the custom entity types and custom classification as custom metadata fields.

Use the extracted entities to filter searches in Kendra

Now the index is populated and ready to use. In the Amazon Kendra console, choose Search Indexed Content under Data Management and do the following.

Query the following: List of insurance failed due to late filing?

The results show an answer from the policy type – HOME INSURANCE and brings text_18 and text_14 as the top results.

Choose “Filter search results” on the left. Now you will see all the Entity types and classification values extracted using Comprehend, and for each entity value and classification you will see the number of matching documents.

Under INSURANCE_TYPE choose “Auto-Insurance”, and then you will get an answer from text_25 file.

Note that your results may vary slightly from the results shown in the screenshot.

Try searching with your own queries, and observe how the entities and document classification identified by Amazon Comprehend quickly allows you to:

See how your search results are distributed across the categories.
Narrow your search by filtering on any of the entity/classification values.

Clean up

After you have experimented with the search and tried the notebook provided in the Github repository, delete the infrastructure you provisioned in your AWS account to avoid any unwanted charges. You can run the cleanup cells in the notebook. Alternatively, you can delete the resources manually through the AWS console:

Amazon Kendra Index
Comprehend custom classifier and custom entity recognition (NER) endpoints
Comprehend custom classifier and custom entity recognition (NER) custom models
Lambda function
S3 bucket
IAM roles and policies

Conclusion

In this post, we showed how Amazon Comprehend custom entities and custom classifier enables Amazon Kendra search powered by CDE feature to help end-users perform better searches on the structured/unstructured data. The custom entities of Amazon Comprehend and custom classifier makes it very useful for different use cases and various domain specific data. For more information about how to use Amazon Comprehend, refer to Amazon Comprehend developer resources and for Amazon Kendra, refer to Amazon Kendra developer resources.

Give this solution a try for your use case. We invite you to leave your feedback in the comments sections.

About the Authors

Amit Chaudhary is a Senior Solutions Architect at Amazon Web Services. His focus area is AI/ML, and he helps customers with generative AI, large language models, and prompt engineering. Outside of work, Amit enjoys spending time with his family.

Yanyan Zhang is a Senior Data Scientist in the Energy Delivery team with AWS Professional Services. She is passionate about helping customers solve real problems with AI/ML knowledge. Recently, her focus has been on exploring the potential of Generative AI and LLM. Outside of work, she loves traveling, working out and exploring new things.

Nikhil Jha is a Senior Technical Account Manager at Amazon Web Services. His focus areas include AI/ML, and analytics. In his spare time, he enjoys playing badminton with his daughter and exploring the outdoors.

Foundational data protection for enterprise LLM acceleration with Protopia AI

This post is written in collaboration with Balaji Chandrasekaran, Jennifer Cwagenberg and Andrew Sansom and Eiman Ebrahimi from Protopia AI.

New and powerful large language models (LLMs) are changing businesses rapidly, improving efficiency and effectiveness for a variety of enterprise use cases. Speed is of the essence, and adoption of LLM technologies can make or break a business’s competitive advantage. AWS is especially well suited to provide enterprises the tools necessary for deploying LLMs at scale to enable critical decision-making.

In their implementation of generative AI technology, enterprises have real concerns about data exposure and ownership of confidential information that may be sent to LLMs. These concerns of privacy and data protection can slow down or limit the usage of LLMs in organizations. Enterprises need a responsible and safer way to send sensitive information to the models without needing to take on the often prohibitively high overheads of on-premises DevOps.

The post describes how you can overcome the challenges of retaining data ownership and preserving data privacy while using LLMs by deploying Protopia AI’s Stained Glass Transform to protect your data. Protopia AI has partnered with AWS to deliver the critical component of data protection and ownership for secure and efficient enterprise adoption of generative AI. This post outlines the solution and demonstrates how it can be used in AWS for popular enterprise use cases like Retrieval Augmented Generation (RAG) and with state-of-the-art LLMs like Llama 2.

Stained Glass Transform overview

Organizations seek to retain full ownership and control of their sensitive enterprise data. This is a pillar of responsible AI and an emerging data protection and privacy requirement above and beyond basic security and legal guarantees of LLM providers.

Although enterprise business units want to utilize LLMs for various tasks, they are also concerned about trade secrets, intellectual property, and other proprietary information leaking through data sent to these models. At the same time, enterprise security, compliance, data management, and information offices are apprehensive of exposing or leaking plain text customer information or other regulated data outside of the enterprise. AWS and Protopia AI are partnering to deliver the critical component that solves this common enterprise customer need.

Protopia AI’s Stained Glass Transform (SGT) solves these challenges by converting unprotected enterprise data to a randomized re-representation, referred to as RmoRed data, as shown in the following figure. This representation is a stochastic embedding of the original data, preserving the information the target LLM needs to function without exposing sensitive prompts or queries, context, or fine-tuning data. This re-representation is a one-way transformation that can’t be reversed, ensuring holistic privacy of enterprise data and protection against leaking plain text sensitive information to LLMs. SGT’s applicability is not limited to language models. Randomized re-representations can also be generated for visual and structured data. The name Stained Glass Transform is rooted in the visual appearance of randomized re-representations of visual data that can resemble viewing the data through stained glass, as demonstrated in this US Navy use case.

SGT works with state-of-the-art LLMs such as Llama 2. The following figure shows an example of applying SGT to a Llama 2 model for instruction following while adding a layer of protection to the instruction and context. The left side of the figure shows an example of a financial document as context, with the instruction asking the model to summarize the document. On the bottom left, the response generated by Llama 2 when operating on the raw prompt is shown. When using SGT, the embeddings associated with this prompt are transformed on the client side into stochastic embeddings, as described in more detail later in this post. The bottom right shows Llama 2 can still generate a correct response if the RmoRed data (post-transformation embeddings) are sent instead of the unprotected embeddings. The top right shows that if the RmoRed data leaked, a reconstruction of the original prompt would result in unintelligible text.

To create an SGT for a given model such as Llama 2, Protopia AI provides a lightweight library called the Stained Glass SDK, which is an extension of PyTorch. As shown in the following figure, after an SGT is created, it can be integrated into deployment pipelines in multiple ways. The transform that is created from the SDK can be deployed locally, in a hybrid setup, or completely on the cloud. This is possible because SGT is designed to be a lightweight process requiring very little compute resources and as such has minimal impact on the inference critical path. Another key evaluation is retention of model accuracy using re-represented data. We observe that across different data types and model variations, accuracy is retained within desirable tolerance limits when using re-represented data.

These options for deployment and maintaining the accuracy allows for confident adoption of SGT by all the stakeholders within an enterprise organization. To further protect the output of the LLM, Protopia AI can encode query outputs to a representation whose decoder is only available to the enterprise data owner.

Solution overview

The previous section described how you can use Stained Glass Transform in a variety of architectures. The following figure details the steps involved in creating, deploying, and using SGT for LLMs:

SGT creation – The team that trains the baseline LLM foundation model (providers of proprietary LLMs, cloud service provider, or enterprise ML teams creating their own LLMs) runs Protopia AI’s Stained Glass SDK software without altering their existing practices for training and deploying the LLM. After the foundation model training is complete, the SDK runs as an optimization pass over the language model to compute the SGT. This optimization pass is delivered through an extension to PyTorch. The SDK wraps the foundation model and mathematically discovers a unique Stained Glass Transform for that LLM. Further details of the underlying math can be found in the accompanying whitepaper. Note that because the team training the LLM itself is also running the Stained Glass SDK, there is no exposure or sending of model weights that is necessary for this step to be completed.
SGT release and deployment – The SGT that is output from the earlier optimization step is deployed as part of the data pipeline that feeds the trained LLM. As described in the previous section, the SGT sits on the enterprise client side.
SGT use – The SGT runs on the prompts created by the enterprise and generates protected prompts, which are sent to the deployed LLM. This enables the enterprise to retain ownership of their sensitive queries and context. Using Protopia AI Stained Glass, the unprotected sensitive data does not leave the enterprise’s site or trust zone.

You can use the Stained Glass SDK to create an SGT in multiple ways. For example, you can use the Stained Glass SDK in self-managed machine learning (ML) environments with Amazon Elastic Kubernetes Service (Amazon EKS) for training and inferencing or within Amazon Elastic Compute Cloud (Amazon EC2) directly. Another option is it can run within Amazon SageMaker to create an SGT for a given trained model. Transforming the input for deployment during inference from the client is independent of the chosen deployment implementation.

The following figure illustrates a possible implementation in a self-managed ML environment where training a Stained Glass Transform is performed on Amazon EKS.

In this workflow, a container is created using the Stained Glass SDK and deployed to Amazon Elastic Container Registry (Amazon ECR). This container is then deployed on Amazon EKS to train an SGT that is saved to Amazon Simple Storage Service (Amazon S3). If you’re using Amazon EC2, you can train a transformation directly on your instance as part of your ML setup. The Stained Glass SDK can run on a variety of instance types, including Amazon P5, P4, or G5 instance families, based on your base LLM requirements. After the LLM is deployed to be used for inference, the client application uses the created SGT, which is a lightweight operation, to transform prompts and context before sending them to the LLM. By doing so, only transformed data is exposed to the LLM, and ownership of the original input is retained on the client side.

The following figure demonstrates how you can train a transform and run inferencing on SageMaker.

The creation of the SGT follows a similar path as the Amazon EKS setup by ingesting the training data from Amazon S3, training an SGT on a container, and saving it to Amazon S3. You can use the Stained Glass SDK in your existing SageMaker setup with Amazon SageMaker Studio, SageMaker notebooks, and a SageMaker training job. The LLM is hosted as a SageMaker endpoint that is accessible by the client application. The inferencing for the client application is also identical to the Amazon EKS setup, except for what is serving the model.

Randomized re-representations to protect LLM prompts and fine-tuning data

This section covers a variety of use cases demonstrating how randomized re-representation protects LLM prompts. The examples illustrate major implications for enterprise generative AI efforts: opening new doors to AI use cases, accelerating speed to market while properly protecting enterprise data, and retaining ownership of the sensitive data required for use in LLM prompts.

RAG use case

A popular enterprise use case for LLMs is Retrieval Augmented Generation (RAG). The following figure shows an illustrative example where the prompts and sources are protected using Stained Glass. The left side of the figure shows the unprotected prompts and source information. In an enterprise implementation of RAG, the sources could include sensitive information such as enterprise trade secrets, intellectual property, or financial information. The right side shows the best possible reconstruction in human readable text from the RmoRed prompts created by the SGT.

We can observe that even in the best possible reconstruction, the information is completely obfuscated. However, the response from the model with and without the transformation is the same, with pointers to the original source documents, thereby preserving the accuracy of both the question and source documents while performing this popular enterprise use case.

Broad applicability across LLMs and languages

One of the highlights of the Stained Glass SDK is that it’s highly resilient to model advancements and adaptable to state-of-the-art models such as Llama 2. The following figure shows an SGT that was created on a Llama 2 LLM that was previously fine-tuned for working with Japanese text. This example further illustrates that SGTs can be created and applied for any language and that even inputs for fine-tuned models can be transformed. The general applicability of SGT is driven by the robust foundation of the Stained Glass SDK being model- and data-agnostic.

Protecting fine-tuning data as well as prompts

Stained Glass Transform is not limited solely to protecting data at inference time; it can also protect data used to fine-tune a foundation model. The process for creating the transformation for fine-tuning datasets is the same as that explained in the solution architecture section earlier in this post. The transformation is created for the foundation model to be fine-tuned without accessing the fine-tuning data. After the SGT has been created and trained for the foundation model, the fine-tuning dataset is transformed to randomized re-representations that will then be used to fine-tune the foundation model. This process is explained in more detail in the accompanying whitepaper.

In the following example, an enterprise customer needed to fine-tune an existing model for network log anomaly detection. They used Stained Glass to transform the sensitive fine-tuning dataset to randomized embeddings, which were used to fine-tune their foundation model. They found that the detection model that was fine-tuned on the transformed representations performed with almost identical accuracy compared to the hypothetical scenario of fine-tuning the foundation model on the unprotected fine-tuning dataset. The following table shows two examples of plain text data records from the fine-tuning dataset and a reconstruction to text of those same data records from the fine-tuning dataset.

Under the hood of Stained Glass Transform for LLMs

When applied to computer vision, SGT operates on input pixel features, and for LLMs, it operates at the embedding level. To highlight how Stained Glass Transform works, imagine the prompt embeddings as a matrix, as illustrated on the left of the following figure. In each entry, there is a deterministic value. This value can be mapped to the original data, exposing the unprotected prompt. Stained Glass Transform converts this matrix of deterministic values to a matrix whose elements are a cloud of possibilities.

The transformed prompt is rendered by sampling noise from probability distributions defined by the SGT and adding the sampled noise to the deterministic embeddings, which randomizes the original prompt values irreversibly. The model still understands the randomized re-represented prompt at the mathematical level and can carry out its task accurately.

Conclusion

This post discussed how Protopia AI’s Stained Glass Transform decouples raw data ownership and protection from the ML operations process, enabling enterprises to retain ownership and maintain privacy of sensitive information in LLM prompts and fine-tuning data. By using this state-of-the-art data protection for LLM usage, enterprises can accelerate adoption of foundation models and LLMs by worrying less about exposure of sensitive information. By safely unlocking the value in real enterprise data, organizations can enable the promised efficiencies and business outcomes of LLMs more efficiently and quickly. To learn more about this technology, you can find further reading in the accompanying whitepaper and connect with Protopia AI to get access and try it on your enterprise data.

About Protopia AI

Protopia AI is a leader in data protection and privacy-preserving AI/ML technologies based in Austin, Texas, and specializes in enabling AI algorithms and software platforms to operate without the need to access plain text information. Over the past 2 years, Protopia AI has successfully demonstrated its flagship Stained Glass Transform product across a variety of ML use cases and data types with the US Navy, leading financial services, and global technology providers.

Protopia AI works with enterprises, generative AI and LLM providers, and Cloud Service Providers (CSPs) to enable maintaining ownership and confidentiality of enterprise data while using AI/ML solutions. Protopia AI has partnered with AWS to deliver a critical component of data protection and ownership for enterprise adoption of generative AI, and was one of 21 startups selected for the inaugural AWS Generative AI Accelerator in 2023.

About the authors

Balaji Chandrasekaran is the VP for Go-to-Market & Customer Enablement at Protopia AI, works closely with clients to leverage AI in their business while prioritizing data protection and privacy. Prior to Protopia AI, Balaji was the Product Lead for AI Solutions at Infor, developing value-centric products while acting as a trusted partner for enterprise customers across diverse industries. Outside work, he enjoys music, hiking, and traveling with family.

Jennifer Cwagenberg leads the engineering team at Protopia AI and works to ensure that the Stained Glass technology meets the needs of their customers to protect their data. Jennifer has prior experience with security working at Toyota in their Product Cybersecurity Group, managing Cloud workloads at N-able, and responsible for data at Match.com.

Andrew Sansom is an AI Solutions Engineer at Protopia AI where he helps enterprises use AI while preserving private and sensitive information in their data. Prior to Protopia AI, he worked as a Technical Consultant focused on enabling AI solutions for clients across many industries including Finance, Manufacturing, Healthcare, and Education. He also taught Computer Science and Math to High School, University, and Professional students.

Eiman Ebrahimi, PhD, is a co-founder and the Chief Executive Officer of Protopia AI. Dr. Ebrahimi is passionate about enabling AI to enrich the human experience across different societal and industry verticals. Protopia AI is a vision for enhancing the lens through which AI observes the necessary and quality data it needs while creating novel capabilities for safeguarding sensitive information. Prior to Protopia AI, he was a Senior Research Scientist at NVIDIA for 9 years. His work at NVIDIA research aimed to solve problems of accessing massive datasets in ML/AI. He also co-authored peer-reviewed publications on how to utilize the power of thousands of GPUs to make training large language models feasible.

Rohit Talluri is a Generative AI GTM Specialist at Amazon Web Services (AWS). He is partnering with top generative AI model builders, strategic customers, key AI/ML partners, and AWS Service Teams to enable the next generation of artificial intelligence, machine learning, and accelerated computing on AWS. He was previously an Enterprise Solutions Architect, and the Global Solutions Lead for AWS Mergers & Acquisitions Advisory.

Exploring LLMs’ potential to help facilitators enhance online healthcare communities

This research paper was presented at the Fourth African Human Computer Interaction Conference (opens in new tab) (AfriCHI 2023), the pan-African conference on interactive digital technology design.

Online health communities can be a lifeline for people seeking healthcare support, enabling them to share experiences, ask questions, and receive help. These are particularly vital in low-and-middle-income countries (LMICs), where access to quality healthcare can be limited and online health communities function as a doorway for receiving expert advice and accessing trustworthy content. One platform that is widely used for this purpose is WhatsApp due to its popularity and ability to host facilitated communities for specific groups, like patients affiliated with a particular clinic.

For all their benefits, online health communities also face challenges due to the myriad responsibilities and equal lack of support for facilitators, who must answer questions, respond to ongoing discussions, and review reports. Facilitation requires staying abreast of ongoing chat threads, verifying facts, and generally just being available. Given that most healthcare professionals already have a full day of in-person healthcare work, facilitation occurs during lunch breaks, evenings, and even mornings before the workday begins.

Our paper, “Can Large Language Models Support Medical Facilitation Work? A Speculative Analysis (opens in new tab),” presented at AfriCHI 2023 (opens in new tab), discusses research conducted in collaboration with the University of Washington, where we examined facilitated WhatsApp groups created for young people living with HIV in informal settlements in Kenya. Facilitation involved moderating chats, providing emotional support, conducting administrative tasks, sharing information, and resolving conflicts. Because many discussions occurred at night, facilitators struggled to keep up with the chats, often missing important questions or responding to them a few days after they were posted. Facilitators also found it difficult to defuse tensions, which occurred from time to time.

LLMs’ potential in supporting online health facilitators

To help resolve these challenges, we explored ways large language models (LLMs) could potentially support facilitators, for example, by flagging important messages and helping with content authoring. LLMs’ language translation capabilities and capacity to answer questions and summarize information made them great candidates for online heath communities, understanding that facilitators should always verify the content that LLMs create. To explore their potential, we tested their application on chat log data. We concluded that an LLM-enabled copilot could help facilitators in several ways, such as:

Coproducing compelling content: LLMs could help facilitators create educational and informative content for group members. They can summarize frequently asked questions, patient stories, and best practices for managing chronic conditions.
Summarizing messages: LLMs could summarize long discussions in the chat, making it easier for facilitators to get up to date and identify important issues. Summarization can also help participants who need to be offline and might otherwise miss important information.
Providing recommendations: LLMs could help facilitators conduct research when answering questions. However, facilitators must exercise due diligence and verify any suggestions the LLM makes.
Performing sentiment analysis: LLMs could flag potential trouble spots in messages, such as declines in mental health, tension among participants, harmful advice, and misinformation.
Assigning badges: LLMs could assign badges to group members in recognition for participating in discussions, completing tasks, or achieving milestones. This could help to motivate and engage members.

Importance of human facilitation

While LLMs offer numerous potential benefits for healthcare facilitation, it’s important to consider their challenges and limitations. We strongly believe that LLMs should be used to augment, not replace, human facilitation. One crucial reason is that this technology cannot provide the emotional support essential in these groups. Another challenge involves the potential for bias and harm. LLMs are trained on massive datasets of text and code, which might contain harmful biases and stereotypes. Additionally, LLMs can produce errors when dealing with content from outside the training data, such as cultural backgrounds that are underrepresented in this data.

Our research shows that the benefits these groups provide lie beyond merely providing information. Their success, gauged by participation levels, perceived value by members, and adherence to medical protocols, is attributed not only to the facilitators’ expertise but also to their empathy, humor, and care. These are human qualities that LLMs cannot replace.

a medical professional in scrubs holding a stethoscope posing for the camera

Looking forward

When used to augment and support existing medical professionals, LLMs show promise in healthcare solutions, such as those for patients with chronic diseases in LMICs. We recommend that future research and practice in this area prioritize the following:

Developing and testing LLM-enabled copilot systems that are tailored to specific patient populations and online health communities.
Ensuring that design supports medical professionals, taking special care to preserve their agency.
Designing copilot systems so that users can easily evaluate output as well as identify and correct erroneous content.
Developing guidelines and regulations to ensure quality and safety when using LLMs for healthcare purposes.

Overall, the use of LLMs to support the work of online health community facilitation is an exciting new area of research. By making the facilitators’ tasks easier, they can pave the way for groups supporting more patients, improve adherence to medical protocols, and enhance well-being. While our research focused on a specific type of WhatsApp group, the potential of LLMs reaches far beyond. These models have the potential to support facilitators of online health communities across a diverse range of platforms.

The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.

Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Gretchen Huizinga speaks with Cecily Morrison (opens in new tab), MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė (opens in new tab), who also goes by Caroline, a PhD student and member of the citizen design team working with Morrison on the research project Find My Things. An AI phone application designed to help people who are blind or have low vision locate their personal items, Find My Things is an example of a broader research approach known as Teachable AI. Morrison and Pakėnaitė explore the Teachable AI goal of empowering people to make an AI experience work for them. They also discuss how “designing for one” when it comes to inclusive design leads to innovative solutions and what they learned about optimizing these types of systems for real-world use (spoiler: it’s not necessarily more or higher-quality data).

Learn more:

Teachable AI Experiences (Tai X)
Project page
Understanding Personalized Accessibility through Teachable AI: Designing and Evaluating Find My Things for People who are Blind or Low Vision
Publication, October 2023
(opens in new tab)Microsoft Inclusive Design (opens in new tab)
Inclusive design resource center
DeafBlind Everest Project (opens in new tab)
Karolina (Caroline) Pakėnaitė personal website

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

CECILY MORRISON: One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]

Today I’m talking to Dr. Cecily Morrison, MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė, a PhD student and a participant on the citizen design team for the Teachable AI research project Find My Things. Cecily and Karolina are part of a growing movement to bring accessible technologies to people with different abilities by closely collaborating with those communities during research and development. Cecily, Karolina, welcome to Collaborators!

CECILY MORRISON: Thank you, Gretchen.

KAROLINA PAKĖNAITĖ: Yeah, thank you.

HUIZINGA: Before we hear more about Find My Things, let’s get to know the both of you. And, Cecily, I’ll start with you. Give us a brief overview of your background, including your training and expertise, and what you’re up to in general right now. We’ll get specific shortly, but I just want to have sort of the umbrella of your raison d’être, or your reason for research being, as it were.

MORRISON: Sure, I’m a researcher in human-computer interaction with a very specific focus on AI and inclusion. Now this for me brings together an undergraduate degree in anthropology—understanding people—a PhD in computer science—understanding computers and technology—as well as a life role as a parent of a disabled child. And I’m currently leading a team that’s really trying to push the boundaries of what’s possible in human-AI interaction and motivated by creating technologies that lead us to a more inclusive world.

HUIZINGA: As a quick follow-up, Cecily, for our non-UK listeners, tell us what MBE stands for and why you were awarded this honor.

MORRISON: Yes, MBE. I also had to look it up when I first received the, uh, the award. [LAUGHTER] It stands for Member of the British Empire, and it’s part of the UK honor system. My MBE was awarded in 2020 for services to inclusive design. Now much of my career at Microsoft Research has been dedicated to innovating inclusive technology and then ensuring that it gets into the hands for those whom we made it for.

HUIZINGA: Right. Was there a big ceremony?

MORRISON: Things were a little bit different during the, the COVID times, but I did have the honor of going to Buckingham Palace to receive the award. And it was a wonderful time bringing my mother and my manager, uh, the important women around me, who’ve made it possible for me to do this work.

HUIZINGA: That’s wonderful. Well, Karolina, let’s talk to you for a minute here. You’re one of the most unique guests we’ve ever had on this podcast. Tell us a bit about yourself. Obviously, we’d like to know where you’re studying and what you’re studying, but this would be a great opportunity to share a little bit about your life story, including the rare condition that brought you to this collaboration.

PAKĖNAITĖ: Thank you so much again for having me. What an amazing opportunity to be here on the podcast. So I’m a PhD student at the University of Bath looking into making visual photographs accessible through text. Maybe you can tell from my speech that I am deaf-blind. So I got diagnosed with Usher syndrome type 2A at the age of 19, which means that I was born hard of hearing but then started to lose sight just around my early 20s. It has been a journey accepting this condition, but it’s also brought me some opportunities like becoming part of this collaboration for Microsoft Research project.

HUIZINGA: Karolina, a quick follow-up for you. Because of the nature of your condition, you’ve encountered some unique challenges, um, one of which made the news a couple of years ago. Can you talk a little bit about how perceptions about people with varying degrees of disability can cause skepticism, both from others and in fact, as you’ve pointed out, yourself? What can we learn about this here?

PAKĖNAITĖ: Yeah, so I have experienced many misunderstandings, and I know I’m not alone. So I have tunnel vision, a progressive condition at the stage where my specialists have registered me as blind instead of partially sighted. My central sight is still excellent, so that means I can still make eye contact, read books, do photography. Some people even tell me that I don’t look blind, but what does that even mean? [LAUGHTER] So since my early 20s, I became very, very clumsy. I stepped over children, walked into elderly, stepped on cat tails, experienced too many near-miss car accidents. So my brain no longer processes the world in the same way as before. But, yeah, for the longest time in my sight-loss journey, I felt like I had imposter syndrome, being completely skeptical about my own diagnosis despite the clumsy experiences, extensive eye tests, and genetic confirmation. I think the major reason is because of a lack of representation of the blind community in the media. Blindness is not black and white. Statistically, most of us have some remaining vision. Disability is not about having a certain look. This also applies to people with some form of visual impairment. I love it, how I can … how there’s so many more new Instagrammers and YouTubers who are just like me, but I still think there is a long way to go before having disability representation becoming a norm for greater understanding and inclusivity.

HUIZINGA: You know, I have to say, this is a great reminder that there is a kind of a spectrum of ability, and that we should be gracious to people as opposed to critical of them. So, um, thank you so much for that understanding that you bring to this, Karolina. Before we get into specifics of this collaboration—and that’s what we’re here for on this podcast—I think the idea of Teachable AI warrants some explication. So, Cecily, what is Teachable AI, and why is it an important line of research, including its applications in things like Find My Things?

MORRISON: Gretchen, that’s a great question. Teachable AI enables users to provide examples or higher-level constraints to an AI model in order to personalize that AI system to meet their own needs. Now most people are familiar with personalization. Our favorite shopping site or entertainment service offers us personalized suggestions. But we don’t always have a way to shape those suggestions. So you can imagine it’s pretty annoying, for example, if you keep being offered nappies by your favorite shopping service because you’ve been buying them for a friend, but actually, you don’t have or even plan to have a baby. So now Teachable AI gives, us—the user—agency in personalizing that AI system to make a choice about what are the things you want to be reflected in yourself, your identity, when you work or interact with that AI system? Now this is really important for AI systems that enable inclusion. So if we consider disability to be a mismatch between a person’s capabilities and their environment, then AI has a really significant role to play in reducing that mismatch. However, as we were working on this, we soon discovered that the number of potential mismatches between a person and their environment is incredibly large. I mean, it’s like the number of stars, right.

HUIZINGA: Right, right.

MORRISON: Because disability is a really heterogeneous group. But then we say, oh, well, let’s just consider people who are blind. Well, as Karolina has just shown us, um, even people who are blind are very, very diverse. So there are people with different amounts of vision or different types of vision. People who have different … experience the world with vision or without. People can lose their vision later in life. They can be born blind. People have different personalities. Some people are happy to go with whatever. Some people not so much.

HUIZINGA: Right.

MORRISON: People are from different cultures. Maybe they, they are used to being in an interdependent context. Other people might have intersecting disabilities like deaf-blindness and have, again, its own set of needs. So as we got into building AI for accessibility and AI for inclusion more generally, we realized that we needed to figure out how can we make AI systems work for individuals, not quote-unquote “people with disabilities”? So we focused on Teachable AI so that each user could shape the AI system to work for their own needs as an individual in a way that they choose, not somebody else. So Find My Things is a simple but working example of a Teachable AI system. And in this example, people can personalize a object finder or object detector for the personal items that matter to them. And they can do this by taking four videos of that personal item that’s important to them and then training, on their phone, a little model that will then recognize those items and guide them to those items. So you might say, well, recognizing objects with phone, we can do that now for a couple of years. And that’s very true. But much of what’s been recognizable wasn’t necessarily very helpful for people who are blind and low vision. Now it’s great if you can recognize doors, chairs, but carnivores and sombrero hats? [LAUGHTER] You know, perhaps this is less handy on a day-to-day basis. But your own keys, your friend’s front door, your guide cane, maybe even the TV remote that somebody’s always putting somewhere else. I mean these are the things that people want to keep track of. And each person has their own set of things that they want. So the Find My Things research prototype allows people to choose what they want to train or to teach to their phone and then be able to teach it and to find those things.

HUIZINGA: OK, so just to clarify, I have my phone. I’ve trained it to find certain objects that I want to find. What’s the mechanism that I use to say, what, you know … do you just say, “Find my keys,” and your phone leads you there through beeps or, you know, Marco Polo? Closer? Warmer?

MORRISON: Sure, how, how does it work?

HUIZINGA: Yeah!

MORRISON: Well, that’s a great question. So you then have a list of things that you can find. So for most people, there’s five or 10 things that are pretty important to them. And then you would find that … then you would scan your phone around the room. And you need to be within sort of 4 to 6 meters of something that you want to find. So if, if it’s in your back studio in the garden, it’s not going to find it. It’s not telepathic in that regard. It’s a computer vision system using vision. If it’s underneath your sofa, you probably won’t find it either. But we found that with all things human-AI interaction, we, we rely on the interaction between the person and the AI to make things work. So most people know where things might be. So if you’re looking for a TV remote, it’s probably not in the bathtub, right? It’s probably going to be somewhere in the living room, but, you know, your, your daughter or your brother or your housemate might have dropped it on the floor; they might have accidentally taken it into the kitchen. But you probably have some good ideas of where that thing might be. So this is then going to help you find it a little bit faster so you don’t need to get on your hands and knees and feel around to where it is.

HUIZINGA: Gotcha. The only downside of this is “find my phone,” which would help me find my things! [LAUGHTER] Anyway, that’s all …

MORRISON: Well, well, I think Apple has solved that one.

HUIZINGA: They do! They have, they have an app. Find My phone. I don’t know how that works. Well, listen, let’s talk about the collaboration a bit and, and talk about the meetup, as I say, on how you started working together. I like to call this bit “how I met your mother” because I’m always interested to hear each side of the collaboration story. So, Karolina, why don’t you take the lead here and then Cecily can fill in the blanks from her side on how you got together.

PAKĖNAITĖ: Um, yeah, so I found this opportunity to join this collaboration for Microsoft Research project as a citizen designer through an email newsletter from a charity, VICTA. From the newsletter, it looked like it was organized in a way where you were way more than just a participant for another research project. It looked like an amazing opportunity to actually get some experiences and skills. So gaining just as much as giving. So, yeah, I thought that I shouldn’t miss out.

HUIZINGA: So you responded to the email, “Yeah, I’m in.”

PAKĖNAITĖ: Yeah.

HUIZINGA: Cecily, what, what was going on from your side? How did you put this out there with this charity and bring this thing together?

MORRISON: So VICTA is a fantastic charity in the UK that works with, uh, blind and low vision young people up to the age of 30. And they’re constantly trying to bring educational and meaningful experiences to the people that they serve. And we thought this would be a great moment of collaboration where we could bring an educational experience about learning how to do design and they could help us reach out to the people who might want to learn about design and might want to be part of this collaboration.

HUIZINGA: So Karolina was one of many? How many other citizen designers on this project did you end up with?

MORRISON: Oh, that’s a great question. We had a lot of interest, I do have to say, and from there, we selected eight citizen designers from around the UK who were willing to make the journey to Cambridge and work with us over a period of almost six months. People came up to us about monthly, although we did some virtual ones, as well.

HUIZINGA: Well, Cecily, let’s talk about this idea of citizen designers. I, I like that term very much. Inclusive design isn’t new in computer-human interaction circles—or human-computer interaction circles—and you already operate on the principle of “nothing about us without us,” so tell us how the concept of citizen designer is different and why you think citizen designers take user input to another level.

MORRISON: Sure, I think citizen designer is a really interesting concept and one that we, we need more of. But let me first start with inclusive design and how that brings us to think about citizen designers. So inclusive design has been a really productive innovation tool because it brings us unusual constraints to the design problem. Within the Microsoft Inclusive Design toolkit, we refer to this as “designing for one.” And once you’ve got this very novel design that emerges, we then optimize it to work for everyone, or we extend it to many. So this approach really jogs the mind to radical solutions. So let me give you just one example. In years past, we developed a physical coding language to support blind and sighted children to learn to code together. So we thought, ah, OK, sighted children have blocks on a screen, so we’re going to make blocks on a table. Well, our young design team lined up the blocks on the table, put their hands in their lap, and I looked at them and I thought, we failed! [LAUGHTER] So we started again, and we said, OK, show us. And we worked with them to show us what excites the hands. You know, here are kids who live through their hands. You know, what are the shapes? What are the interactions? What are the kinds of things they want to do with their hands? And through this, we developed a completely different base idea and design, and we found that it didn’t just excite the hands of children who are blind or low vision, but it excited the hands of all children. They had brought us their expertise in thinking about the world in a different way. And so now we have this product Code Jumper, which kids just can’t put down.

HUIZINGA: Right.

MORRISON: So that’s great. So we, we know that inclusive design is going to generate great ideas. We also know that diverse teams generate the best ideas because diverse life experience can prompt us to think out of the box. But how do we get diverse teams when it can be hard for people with disabilities to enter the field of design and technology? So design assumes often good visual skills; it assumes the ability to draw. And that can knock out a lot of people who might be great at designing technology experiences without those skills. So with our citizen design team, we wanted to open up the opportunity to young people who are blind and low vision to really set the stage for them to think about what would a career in technology design be like? Could I be part of this? Can I be that generation who’s going to design the next cohort of accessible, inclusive technologies? So we did this through teaching key design skills like the design process itself, prototyping, as well as having, uh, this team act as full members of our own R&D team, so in an apprenticeship style. So our citizen designers weren’t just giving feedback as, as participants might, but they were creating prototypes, running A/B tests, and it was our hope and I think we succeeded in making it a give-give situation. We were giving them a set of skills, and they were giving us their design knowledge that was really valuable to our innovation process.

HUIZINGA: That is so awesome. I’m, you know, just thinking of, of the sense of belonging that you might get instead of being, as Karolina kind of referred to, it’s not just another user-research study where you’ll go and be part of a project that someone else is doing. You’re actually integrally connected to the project. And on that note, Karolina, talk a little bit about what it’s like to be a citizen designer. What were some of your aha moments on the project, maybe the items that you wanted to be able to find and what surprises you encountered in the process of developing a technique to teach a personal item?

PAKĖNAITĖ: Yeah, so it was, uh, incredibly fascinating to play the role of a citizen designer and testing a Teachable AI for use and providing further comments. It took me a bit of time to really understand how this tool is different from existing ones, but then I realized it’s literally in the name, a Teachable AI. [LAUGHTER] So it’s a tool designed for teaching it about your very own personal items. Yeah, your items may, may not look like a typical standard item; maybe you personalized them with engravings or stickers, or maybe it’s a unique gadget or maybe, say, a medical device. So it’s not about teaching every single item that you own, but rather a tool, a tool that lets us identify what matters most to you. So, yeah, I have about five to 10 small personal items that I always carry with me, and most of them are like very, very, very important to me. Like losing a bus pass means I can’t get anywhere. Losing a key means I can’t get home. Because these items are small and I use them daily, that means they are also, uh, being lost most commonly. So now I have a tool that is able to locate my personal items if they happen to be lost.

HUIZINGA: Right. And as you said earlier, you do have some sight. It’s, it’s tunnel vision at this point, so the peripheral part, um, is more challenging for you. But having this tool helps you to focus in a broader spectrum of, of visual sight. Cecily, this would be a great time to get a bit more specific about your Teachable AI discovery process. Tell us some research stories. How did you go about optimizing this AI system, and what things did you learn from both your successes and your failures?

MORRISON: Ah, yes, lots of research stories with this system, I’m afraid, but I think the very first thing we did was, OK, a user wants to teach this system, so we need to tell the user what makes a good teaching example. Well, we don’t know. Actually, we assumed we did know because in machine learning, the idea is more data, better quote-unquote “quality data,” and the system will work better. So the first thing that really surprised us when we actually ran some experimental analysis was that more data was not better and higher-quality data, or data that has less blur or is perfectly framed, was also not better. So what we realized is that it wasn’t our aim to kind of squeeze as much data as we could from the users but really to get the data that was the right kind of data. So we did need the object in the image. It’s, it’s really hard to train a system to recognize an object that’s not there at all. But what we needed was data that looked exactly like what the user was going to use when they were finding the objects. So if the user moves the camera really fast and the image becomes blurry, then we need those teaching examples to have blur, too.

HUIZINGA: Right.

MORRISON: So it was in understanding this relationship between the teaching examples and the user that really helped us craft a process that was going to help the user get the best result from the system. One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well. So I remember watching Karolina taking her teaching frames, and she was moving very far away. And I was thinking, hmm, I don’t think that data is going to work very well because there’s just not going to be enough pixels of the object to make a good representation for the system. So I asked Karolina about her strategy, and she said, well, if I want it to work from far away, then I should take teaching examples from far away. And I thought, ah, that’s a very logical mental model.

HUIZINGA: Right.

MORRISON: But unfortunately, we’ve broken the user’s mental model because that’s not actually how the system works because we were cropping frames and taking pixels out and doing all kinds of fancy image manipulation to, actually, to improve the performance under the hood. So I think this was an experience where we thought, ah, we want the user to develop a good mental model, but to do that, we need to actually structure this teaching process so they don’t need to think so hard and we’re guiding them into the, the kinds of things that make the system work well as opposed to not, and then they don’t need to guess. So the other thing that we found was that teaching should be fast and easy. Otherwise, it’s just too much work. No matter how personalized something is, if you have to work too hard, it’s a no-go. So we thought, ah, we want this to be really fast. We want it to take as few frames as possible. And we want the users to be really confident that they’ve got the object in the frame because that’s the one thing we really need. So we’re going to tell them all the time if the object’s in the frame: it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame. Well, there’s … citizen designers [LAUGHTER], including Karolina, came back to us and said, you know, this is really stressful. You know, I’m constantly worrying, “Is it in frame? Is it in frame? Is it in frame?” And actually, the cognitive load of that, even though we were trying to make the process really, really easy, um, was, was really overwhelming. And one of them said to us, well, why don’t I just assume that I’m doing a good job unless you tell me otherwise? [LAUGHTER] And that really helped shift our mindset to say, well, OK, we can help the user by giving them a gentle nudge back on track, but we don’t need to grab all their cognitive attention to make the perfect video!

HUIZINGA: [LAUGHS] That’s, that’s so hilarious. Well, Cecily, I want to stay with you for a minute and discuss the broader benefits of what you call “designing outside the mean.” And despite the challenges of developing technologies, we’ve seen specialized research deliver the so-called curb-cut effect over and over. Now you’ve already alluded to this a bit earlier. But clearly people with blindness and low vision aren’t the only ones who can’t find their things. So might this research help other people? Could it, could it be something I could incorporate into my phone?

MORRISON: That’s a great question. And I think an important question when we do any research is how do we broaden this out to meet the, the widest need possible? So I’m going to think about rather than Find My Things specifically, I’m going to think about Teachable AI. And Teachable AI should benefit everybody who needs something specific to themselves. And who of us don’t think that we need things to be specific to ourselves in this day and age?

HUIZINGA: Right … [LAUGHS]

MORRISON: But it’s going to be particularly useful for people on the margins of technology design for many reasons. So it doesn’t matter—it could be where your home is different or the way you go about your daily lives or perhaps the intersection of your identities. By having Teachable AI, we make systems that are going to work for individuals. Regardless of the labels that you might have or the life experience you might have, we want an AI system that works for you. And this is an approach that’s moving us in that direction.

HUIZINGA: You know, I love … I, I remembered what you said earlier, and it was for individuals, not people with disabilities. And I just love that framing anyway because we’re all individuals, and everyone has some kind of a disability, whether you call it that or not. So I just love this work so much. Karolina, back to you for a minute. You have said you’re a very tactile person. What role does haptics, which is the touch/feel part of computer science, play for you in this research, and how do physical cues work for you in this technology?

PAKĖNAITĖ: Yeah, so because I’m deaf-blind, I think my brain naturally craves information through senses which I have full access to. For me, it’s touch. So I find it very stimulating when the tools are tactile, whether that’s vibrations or textures. Tactile feedback not only enhances the experiences, but I think it’s also a good accessibility cue, as well. For example, one big instance happened that as a citizen designer was when I was pointing my camera at an object and, being hard of hearing, that means I couldn’t hear what it was saying, so I had to bring it close to my, my ear, and that meant that the object was lost in the camera view. [LAUGHS]

HUIZINGA: Right … [LAUGHS]

PAKĖNAITĖ: So … yeah, yeah, I think having tactile cues could be very beneficial for people like me who are deaf-blind but also others. Like, for example, you don’t always want your phone to be on sound all the time. Maybe in a quiet train, in a quiet tube, you don’t want your phone to start talking; you might be feeling self-conscious. So, yeah, I think …

HUIZINGA: Right …

PAKĖNAITĖ: … always adding those tactile cues will benefit me and everyone else.

HUIZINGA: Yeah, so to clarify, is haptics or touch involved in any of this particular Teachable AI technology, Cecily? I know that Karolina has that as a, you know, a “want to have” kind of thing. Where does it stand here?

MORRISON: Yeah, no, I, I think Karolina’s participation, um, was actually fairly critical in us adding, um, vibration cues to the experience.

HUIZINGA: Yeah, so it does use the, the haptic …

MORRISON: Yeah, we use auditory, visual, and, and vibration as a means of interaction. And I think in general, we should be designing all of our experiences with technology to be multisensory because, as Karolina pointed out, in certain circumstances, you don’t really want your computer talking at you. In other circumstances, you need something else. And in our different individual needs, we might need something else. So this allows people to be as flexible as possible for their context and for their own needs to make an experience work for them.

HUIZINGA: Right. Yeah, and I feel like this is already kind of part of our lives when our phones buzz or, or, you know, vibrate or when you wear the watch that gives you a little tip on your wrist that you’ve got a notification or you need to turn left or [LAUGHTER] whatever you’re using it for. Cecily, I always like to know where a project is on the spectrum from lab to life, as we say on this show. What’s the status of Teachable AI in general and Find My Things in particular, and how close is it to being able to be used in real life by a broader audience than your citizen designers and your team?

MORRISON: So it’s really important for us that the technologies we research become available to the communities to whom they are valuable. And in the past, we’ve had a whole set of partners, including Seeing AI, American Printing House for the Blind, to help us take ideas, research prototypes, and make them into products that people can have. Now Teachable AI is a grand vision. I think we are … showed with this work in Find My Things that the machine learning is there. We can do this, and it’s coming. And as we move into this new era of machine learning with these very large models, we’re going to need it there, too, because the larger the model, the more personalized we’re probably going to need the experience. In terms of Find My Things, we are also on that journey to finding the right opportunity to bring it out to the blind community.

HUIZINGA: So this has been fascinating. I’m … there’s so many more questions I want to ask, but we don’t have a lot of time to ask them all. I’m sure that we’re going to be watching as this unfolds and probably becomes part of all of our lives at some point thanks to the wonderful people doing the research. I like to end the podcast with a little future casting from each of my guests, and, Karolina, I’d like you to go first. I have a very specific question for you. Aside from your studies and your research work, you’ve said you’re on a mission. What’s that mission, and what does Mount Everest have to do with it?

PAKĖNAITĖ: So firstly, I’m hoping to complete my PhD this year. That’s my big priority for, for this year. And then, uh, I will be on a mission, an ambitious one that I feel a little bit nervous to share but also very excited. As an adventurer at heart, my dream is to summit Mount Everest. So before it’s always seemed like a fantasy, but I recently came back from an Everest base camp trek just a few months ago, and I met some mountaineers who were on their way to the top, and I found myself quietly saying, what if? And then, as I was thinking how I’m slowly losing my sight, I realized that if I do want to summit Everest, I would want to go there while I still can see with my remaining vision, so I realized that it would have to be now or never.

HUIZINGA: Right!

PAKĖNAITĖ: So when I came back, I decided … I just made some actions. So I reached out to different organizations and surprisingly a film production team is eager to document this journey and … yeah, it seems like something might be happening. So this mission isn’t just about me potentially becoming the first deaf-blind person to summit Everest but also a commitment to raising awareness and providing representation for the blind and deaf-blind community. I hope to stay in the research field, and I believe this mission has some potential for research. So I think that, for example, I’m looking for accessibility tools for, for me to climb Everest so that I can be the best climber I can be as a deaf-blind person, being independent but part of the team, or maybe make a documentary film a multisensory experience, accessible to a wider community, including deaf-blind. So, yeah, I’m actively looking for collaborators and would love to be contacted by anyone.

HUIZINGA: I love the fact that you are bringing awareness to the fact, first of all, that the deaf-blind community or even the blind community isn’t a one-size-fits-all. So, um, yeah, I hope you get to summit Everest to be able to see the world from the tallest peak in the world before anything progresses that way. Well, Cecily, I’d like to close with you. Go with me on a little forward-thinking, backward-thinking journey. You’re at the end of your career looking back. What have you accomplished as a researcher, and how has your work disrupted the field of accessible technology and made the world a better place?

MORRISON: Where would I like to be? I would say more like where would we like to be. So in collaboration with colleagues, I hope we have brought a sense of individual’s agency in their experience with AI systems, which allow people to shape them for their own unique experience, whoever they might be and wherever they might be in the world. And I think this idea is no less important, or perhaps it’s even more important, as we move into a world of large foundation models that underpin many or perhaps all of our experiences as we, as we go forward. And I think particularly large foundation models will bring really significant change to accessibility, and I hope the approach of teachability will be a significantly positive influence in making those experiences just what we need them to be. And I have to say, in my life role, I’m personally really very hopeful for my own blind child’s opportunities in the world of work in 10 years’ time. At the moment, only 25 percent of people who are blind or low vision work. I think technology can play a huge role in getting rid of this mismatch between the environment and a person and allowing many more people with disabilities to enjoy being in the workplace.

HUIZINGA: This is exciting research and really a wonderful collaboration. I’m so grateful, Cecily Morrison and Karolina Pakėnaitė, for coming on the show and talking about it with us today. Thank you so much.

MORRISON: Thank you, Gretchen, and thank you, Karolina.

PAKĖNAITĖ: Thank you.

The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.

‘Christmas Rush’ 3D Scene Brings Holiday Cheer This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows.

‘Tis the season for friends, family and beautifully rendered Santa animations from this week’s In the NVIDIA Studio artist, 3D expert Božo Balov.

This week also marks an incredible milestone, with over 500 NVIDIA RTX-powered games and creative apps now available with support for ray tracing and AI-powered technologies like NVIDIA DLSS. Over 120 of the most popular apps — including the Adobe Creative Cloud suite, Autodesk Maya, Blender, Blackmagic Design’s Davinci Resolve, OBS, Unity and more — use RTX to accelerate workflows by orders of magnitude, power new AI tools and enhancements and enable real-time, ray-traced previews.

To celebrate, NVIDIA GeForce is hosting a giveaway for gift cards, rare, sought-after #RTXON keyboard keycaps and more. Follow GeForce on Facebook, Instagram, TikTok or X (formerly known as Twitter) for instructions on how to enter.

THANK YOU FOR 500 RTX GAMES & APPS

To celebrate use #RTX500 ALL December across our social channels for a chance to win $500 gift cards courtesy of Green Man Gaming + other great prize giveaways.

For your first chance to win:

Comment #RTX500
Tell us your favorite… pic.twitter.com/l3g7kTKujD

— NVIDIA GeForce (@NVIDIAGeForce) December 4, 2023

Say it ain’t snow: the NVIDIA Studio #WinterArtChallenge is back. Through the end of the year, share winter-themed art on Facebook, Instagram or X for a chance to be featured on NVIDIA Studio social media channels. Be sure to tag #WinterArtChallenge to join.

Winter has returned and so has our #WinterArtChallenge!

Share your winter-themed art (like this incredible one created on an RTX GPU by @rafianimates) using the hashtag for a chance to be featured on our social channels!

We can’t wait to see what you create! pic.twitter.com/Ml4cUAUgW3

— NVIDIA Studio (@NVIDIAStudio) December 4, 2023

Finally, 80 Level — the creative community for digital artists, animators and computer-generated imagery specialists — is hosting its Community Metasites Challenge. Artists can showcase their creativity by applying unique aesthetics to a simple block level via characters, game mechanics, visual effects and more — with a chance to win a new NVIDIA Studio laptop. Register today.

Special Thanks to @NVIDIAStudio who is offering this incredible prize for the first place winner of the 80 Level Community Metasites Challenge.
An ASUS – ProArt Studiobook 16 OLED H7600 16″ Laptop!

Register Nowhttps://t.co/OigWqOUlSD #NVIDIA #80levelmetasiteschallenge #UE5 #CG pic.twitter.com/atOozRqSxR

— 80 LEVEL (@80Level) November 27, 2023

Wrapper’s Delight

Balov’s Christmas Rush 3D animation reimagines Santa as a resident of the coastal city of Split, Croatia — but with a harsher, less jolly edge.

Balov jumped straight into modeling edgy Saint Nick in the virtual-reality modeling software Quill. He deployed vertex-painting techniques and used a photogrammetry scan of a Vespa as a base, adding brushstrokes to blend it with the rest of the scene.

To achieve a flickering effect on Santa’s clothing, Balov created a custom texture with different brush strokes in Adobe Photoshop. The texture doubles as an alpha map, which intentionally clips the geometry.

“When it comes to rendering 3D graphics, nothing really comes close to NVIDIA GPUs.” — Božo Balov

He then used Adobe Photoshop to paint monochromatic background layers. Balov’s GeForce RTX 3080 Ti GPU unlocked over 30 GPU-accelerated features, including blur gallery, liquify, smart sharpen and perspective warp.

Balov then converted the files to the FBX adaptable file format for 3D software before importing them into Blender, where he animated the layers to move in the opposite direction of the character to create a sense of speed. He kept the lighting fairly simple, with one light source as the base and a few supplemental ones to emphasize specific parts of the scene.

Balov prefers working in Blender’s real-time engine EEVEE to animate his scene, cutting wait times. RTX-accelerated OptiX ray tracing in the viewport enabled greater interactivity with smoother movement, speeding his ideation and creative workflow.

“Rendering is a joy on NVIDIA RTX cards,” said Balov. “Since OptiX made its debut, rendering times have been cut in half or more — Blender Cycles feels like a real-time engine.”

When asked for advice to give aspiring artists, Balov emphasized the importance of individual passion.

“Pursue what matters to you,” he said. “Don’t spend time fulfilling other people’s ideas of what art should be.”

Check out Balov’s art portfolio on Instagram.

Follow NVIDIA Studio on Facebook, Instagram and X. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Snowflake Joins the PyTorch Foundation as a General Member

The PyTorch Foundation, a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem, is announcing today that Snowflake has joined as a general member.

Snowflake enables thousands of organizations to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads across multiple clouds and geographies.

“By joining the PyTorch community, we know that Snowflake will help accelerate data warehousing solutions and cutting-edge AI frameworks. This showcases the commitment to advancing innovation for data and artificial intelligence,” said Ibrahim Haddad, Executive Director, PyTorch Foundation. “We are thrilled to have Snowflake join the PyTorch Foundation, marking a significant stride in the convergence of data management and deep learning technologies.”

Snowflake enables collaboration with AI technologies to handle the storage and analysis of large datasets generated by machine learning and AI applications through scalability and SQL support.

With the integrated repository of Python libraries from Anaconda in Snowpark, Snowflake users have always had a streamlined experience to deploy pre-trained PyTorch models in Snowflake to easily and securely make them a part of applications. Now with the addition of GPU instances in Snowpark Container Services (in private preview), training and other computationally intensive processing using PyTorch will also be streamlined, providing teams with an end-to-end solution for AI development and deployment.

“Most if not all of our customers incorporate open source software as part of their data stacks, so it is critical for us to work with open source ecosystems like the PyTorch Foundation, alongside incorporating open source to meet the needs of our customers,” said Adrien Treuille, Co-Founder of Streamlit, Director of Product Management at Snowflake. “As AI developers continue to integrate their models as part of applications, the power of Snowflake and PyTorch — coupled with Streamlit as the powerful front-end — creates near-limitless innovation for developers looking to build next-generation apps and unlock even more use cases.”

To learn more about the power of Snowflake and PyTorch, tune into Snowflake’s developer conference for AI and apps, BUILD.

To learn more about how you can be a part of the PyTorch Foundation, visit our website.

About Snowflake

Snowflake enables every organization to mobilize their data with Snowflake’s Data Cloud. Customers use the Data Cloud to unite siloed data, discover and securely share data, power data applications, and execute diverse AI/ML and analytic workloads. Wherever data or users live, Snowflake delivers a single data experience that spans multiple clouds and geographies. Thousands of customers across many industries, including 639 of the 2023 Forbes Global 2000 (G2K) as of July 31, 2023, use Snowflake Data Cloud to power their businesses. Learn more at snowflake.com.

About PyTorch Foundation

The PyTorch Foundation is a neutral home for the deep learning community to collaborate on the open source PyTorch framework and ecosystem. The PyTorch Foundation is supported by its members and leading contributors to the PyTorch open source project. The Foundation leverages resources provided by members and contributors to enable community discussions and collaboration.

About The Linux Foundation

The Linux Foundation is the world’s leading home for collaboration on open source software, hardware, standards, and data. Linux Foundation projects are critical to the world’s infrastructure including Linux, Kubernetes, Node.js, ONAP, PyTorch, RISC-V, SPDX, OpenChain, and more. The Linux Foundation focuses on leveraging best practices and addressing the needs of contributors, users, and solution providers to create sustainable models for open collaboration. For more information, please visit us at linuxfoundation.org. The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see its trademark usage page. Linux is a registered trademark of Linus Torvalds.

You Might Also Like

Board & Organizing Committee

Google Research booth activities

Papers

Findings of EMNLP

Workshops

Tutorials

Solution overview

Walkthrough

AWS-optimized AllGather

Usage

DeepSpeed

FSDP

Benchmarks

Model training benchmarks

Conclusion

About the Authors

Solution overview

Data security and IAM considerations

Train the Comprehend custom classifier using training data

Train the Amazon Comprehend custom entity recognizer (NER) using training data

Create the Amazon Comprehend custom classifier and custom entities (NER) endpoints

Create and deploy a Lambda function for post extraction enrichment

Create and populate the Amazon Kendra index

Clean up

Conclusion

About the Authors

Stained Glass Transform overview

Solution overview

Randomized re-representations to protect LLM prompts and fine-tuning data

RAG use case

Broad applicability across LLMs and languages

Protecting fine-tuning data as well as prompts

Under the hood of Stained Glass Transform for LLMs

Conclusion

About Protopia AI

About the authors

LLMs’ potential in supporting online health facilitators

Importance of human facilitation

Looking forward

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Wrapper’s Delight

About Snowflake

About PyTorch Foundation

About The Linux Foundation

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.