PyTorch framework for cryptographically secure random number generation, torchcsprng, now available

One of the key components of modern cryptography is the pseudorandom number generator. Katz and Lindell stated, “The use of badly designed or inappropriate random number generators can often leave a good cryptosystem vulnerable to attack. Particular care must be taken to use a random number generator that is designed for cryptographic use, rather than a ‘general-purpose’ random number generator which may be fine for some applications but not ones that are required to be cryptographically secure.”[1] Additionally, most pseudorandom number generators scale poorly to massively parallel high-performance computation because of their sequential nature. Others don’t satisfy cryptographically secure properties.

torchcsprng is a PyTorch C++/CUDA extension that provides cryptographically secure pseudorandom number generators for PyTorch.

torchcsprng overview

Historically, PyTorch had only two pseudorandom number generator implementations: Mersenne Twister for CPU and Nvidia’s cuRAND Philox for CUDA. Despite good performance properties, neither of them are suitable for cryptographic applications. Over the course of the past several months, the PyTorch team developed the torchcsprng extension API. Based on PyTorch dispatch mechanism and operator registration, it allows the users to extend c10::GeneratorImpl and implement their own custom pseudorandom number generator.

torchcsprng generates a random 128-bit key on the CPU using one of its generators and then runs AES128 in CTR mode either on CPU or GPU using CUDA. This then generates a random 128-bit state and applies a transformation function to map it to target tensor values. This approach is based on Parallel Random Numbers: As Easy as 1, 2, 3 (John K. Salmon, Mark A. Moraes, Ron O. Dror, and David E. Shaw, D. E. Shaw Research). It makes torchcsprng both crypto-secure and parallel on both CPU and CUDA.

Since torchcsprng is a PyTorch extension, it is available on the platforms where PyTorch is available (support for Windows-CUDA will be available in the coming months).

Using torchcsprng

The torchcsprng API is very simple to use and is fully compatible with the PyTorch random infrastructure:

Step 1: Install via binary distribution

Anaconda:

conda install torchcsprng -c pytorch

pip:

pip install torchcsprng

Step 2: import packages as usual but add csprng

import torch
import torchcsprng as csprng

Step 3: Create a cryptographically secure pseudorandom number generator from /dev/urandom:

urandom_gen = csprng.create_random_device_generator('/dev/urandom')

and simply use it with the existing PyTorch methods:

torch.randn(10, device='cpu', generator=urandom_gen)

Step 4: Test with Cuda

One of the advantages of torchcsprng generators is that they can be used with both CPU and CUDA tensors:

torch.randn(10, device='cuda', generator=urandom_gen)

Another advantage of torchcsprng generators is that they are parallel on CPU unlike the default PyTorch CPU generator.

Getting Started

The easiest way to get started with torchcsprng is by visiting the GitHub page where you can find installation and build instructions, and more how-to examples.

Cheers,

The PyTorch Team

[1] Introduction to Modern Cryptography: Principles and Protocols (Chapman & Hall/CRC Cryptography and Network Security Series) by Jonathan Katz and Yehuda Lindell

Read More

Stanford AI Lab Papers and Talks at ECCV 2020

Stanford AI Lab Papers and Talks at ECCV 2020

The European Conference on Computer Vision (ECCV) 2020 is being hosted virtually from August 23rd – 28th. We’re excited to share all the work from SAIL that’s being presented, and you’ll find links to papers, videos and blogs below. Feel free to reach out to the contact authors directly to learn more about the work that’s happening at Stanford!

List of Accepted Papers

Contact and Human Dynamics from Monocular Video


Authors: Davis Rempe, Leonidas J. Guibas, Aaron Hertzmann, Bryan Russell, Ruben Villegas, Jimei Yang

Contact: drempe@stanford.edu

Links: Paper | Video

Keywords: 3d human pose, 3d human motion, pose estimation, dynamics, physics-based, contact, trajectory optimization, character animation, deep learning


Curriculum DeepSDF


Authors: Yueqi Duan, Haidong Zhu, He Wang, Li Yi, Ram Nevatia, Leonidas J. Guibas

Contact: duanyq19@stanford.edu

Links: Paper

Keywords: shape representation, implicit function, deepsdf, curriculum learning


Deformation-Aware 3D Model Embedding and Retrieval


Authors: Mikaela Angelina Uy, Jingwei Huang, Minhyuk Sung, Tolga Birdal, Leonidas Guibas

Contact: mikacuy@stanford.edu

Links: Paper | Video

Keywords: 3d model retrieval, deformation-aware embedding, non- metric embedding


Generative Sparse Detection Networks for 3D Single-shot Object Detection


Authors: JunYoung Gwak, Christopher Choy, Silvio Savarese

Contact: jgwak@cs.stanford.edu

Links: Paper | Video

Keywords: single shot detection, 3d object detection, generative sparsenetwork, point cloud


Learning 3D Part Assembly from A Single Image


Authors: Yichen Li, Kaichun Mo, Lin Shao, Minhyuk Sung, Leonidas Guibas

Contact: liyichen@cs.stanford.edu

Links: Paper | Video

Keywords: 3d vision, vision for robotics, 3d representation


Learning Predictive Models From Observation and Interaction


Authors: Karl Schmeckpeper, Annie Xie, Oleh Rybkin, Stephen Tian, Kostas Daniilidis, Sergey Levine, Chelsea Finn

Contact: cbfinn@cs.stanford.edu

Links: Paper | Video

Keywords: video prediction, visual planning, action representations, robotic manipulation


PT2PC: Learning to Generate 3D Point Cloud Shapes from Part Tree Conditions


Authors: Kaichun Mo, He Wang, Xinchen Yan, Leonidas J. Guibas

Contact: kaichunm@stanford.edu

Links: Paper | Video

Keywords: 3d vision and graphics, generative adversarial network, 3d point cloud


Pix2Surf: Learning Parametric 3D Surface Models of Objects from Images


Authors: Jiahui Lei, Srinath Sridhar, Paul Guerrero, Minhyuk Sung, Niloy Mitra, Leonidas J. Guibas

Contact: lei_jiahui@zju.edu.cn, ssrinath@cs.stanford.edu

Links: Paper | Video

Keywords: 3d reconstruction, multi-view, single-view, parametrization


Quaternion Equivariant Capsule Networks for 3D Point Clouds


Authors: Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari

Contact: tbirdal@stanford.edu

Links: Paper

Keywords: equivariance, 3d point clouds, quaternion, weiszfeld algorithm, capsule networks, dynamic routing, riemannian


ReferIt3D: Neural Listeners for Fine-Grained 3D Object Identification in Real-World Scenes


Authors: Panos Achlioptas, Ahmed Abdelreheem, Fei Xia, Mohamed Elhoseiny, Leonidas J. Guibas

Contact: panos@cs.stanford.edu

Links: Paper | Video

Keywords: 3d neural-listeners, spatial relations, object identification, referential language


Robust and On-the-fly Dataset Denoising for Image Classification


Authors: Jiaming Song, Yann Dauphin, Michael Auli, Tengyu Ma

Contact: tsong@cs.stanford.edu

Links: Paper

Keywords: web supervision, noisy labels, robust data denoising


RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition


Authors: Linxi Fan*, Shyamal Buch*, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei

Contact: {jimfan,shyamal}@cs.stanford.edu

Links: Paper | Video | Website

Keywords: efficient action recognition, spatiotemporal, learnable shift, budget-constrained, video understanding


Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data


Authors: Tim Salzmann*, Boris Ivanovic*, Punarjay Chakravarty, Marco Pavone

Contact: borisi@stanford.edu

Links: Paper | Blog Post

Keywords: trajectory forecasting, spatiotemporal graph modeling, human-robot interaction, autonomous driving


We look forward to seeing you at ECCV 2020!

Read More

Facebook research at KDD 2020

Facebook researchers and engineers in data science, data mining, knowledge discovery, large-scale data analytics, and more will be presenting their research at the Conference on Knowledge Discovery and Data Mining (KDD). Researchers will also be giving talks and participating in workshops and tutorials throughout the conference.

The Facebook Core Data Science team is presenting two research papers: “CLARA: Confidence of labels and raters” and “TIES: Temporal interaction embeddings for enhancing social media integrity at facebook.” CLARA, which is described more in this blog post, is a system built and deployed at Facebook to estimate the uncertainty in human-generated decisions in our content review process. TIES is a deep learning, application-agnostic, scalable framework that leverages interactions between accounts to improve the safety and security of our online community.

For more information on Facebook’s presence at KDD this year, from August 23 to 28, check out the Facebook at KDD page.

Facebook research being presented at KDD 2020

AutoCTR: Towards automated neural architecture discovering for click-through rate prediction
Qingquan Song, Dehua Cheng, Eric Zhou, Jiyan Yang, Yuandong Tian, Xia Hu

Click-through rate (CTR) prediction is one of the most important machine learning tasks in recommender systems, driving personalized experience for billions of consumers. Neural architecture search (NAS), as an emerging field, has demonstrated its capabilities in discovering powerful neural network architectures, which motivates us to explore its potential for CTR predictions. Due to (1) diverse unstructured feature interactions, (2) heterogeneous feature space, and (3) high data volume and intrinsic data randomness, it is challenging to construct, search, and compare different architectures effectively for recommendation models. To address these challenges, we propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR. Via modularizing simple yet representative interactions as virtual building blocks and wiring them into a space of direct acyclic graphs, AutoCTR performs evolutionary architecture exploration with learning-to-rank guidance at the architecture level and achieves acceleration using low-fidelity model. Empirical analysis demonstrates the effectiveness of AutoCTR on different data sets comparing to human-crafted architectures. The discovered architecture also enjoys generalizability and transferability among different data sets.

CLARA: Confidence of labels and raters
Viet-An Nguyen, Peibei Shi, Jagdish Ramakrishnan, Udi Weinsberg, Henry C. Lin, Steve Metz, Neil Chandra, Jane Jing, Dimitris Kalimeris

Large online services employ thousands of people to label content for applications such as video understanding, natural language processing, and content policy enforcement. While labelers typically reach their decisions by following a well-defined protocol, humans may still make mistakes. A common countermeasure is to have multiple people review the same content; however, this process is often time-intensive and requires accurate aggregation of potentially noisy decisions.

In this paper, we present CLARA (confidence of labels and raters), a system developed and deployed at Facebook for aggregating reviewer decisions and estimating their uncertainty. We perform extensive validations and describe the deployment of CLARA for measuring the base rate of policy violations, quantifying reviewers’ performance, and improving their efficiency. In our experiments, we found that CLARA (a) provides an unbiased estimator of violation rates that is robust to changes in reviewer quality, with accurate confidence intervals, (b) provides an accurate assessment of reviewers’ performance, and (c) improves efficiency by reducing the number of reviews based on the review certainty, and enables the operational selection of a threshold on the cost/accuracy efficiency frontier.

Compositional embeddings using complementary partitions for memory-efficient recommendation models
Hao-Jun Michael Shi, Dheevatsa Mudigere, Maxim Naumov, Jiyan Yang

Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the categorical data, embeddings map each category to a unique dense representation within an embedded space. Since each categorical feature could take on as many as tens of millions of different possible categories, the embedding tables form the primary memory bottleneck during both training and inference. We propose a novel approach for reducing the embedding size in an end-to-end fashion by exploiting complementary partitions of the category set to produce a unique embedding vector for each category without explicit definition. By storing multiple smaller embedding tables based on each complementary partition and combining embeddings from each table, we define a unique embedding for each category at smaller cost. This approach may be interpreted as using a specific fixed codebook to ensure uniqueness of each category’s representation. Our experimental results demonstrate the effectiveness of our approach over the hashing trick for reducing the size of the embedding tables in terms of model loss and accuracy, while retaining a similar reduction in the number of parameters.

Embedding-based retrieval in Facebook search
Jui-Ting Huang, Ashish Sharma, Shuying Sun, Li Xia, David Zhang, Philip Pronin, Janani Padmanabhan, Giuseppe Ottaviano, Linjun Yang

Search in social networks such as Facebook poses different challenges than in classical web search: Besides the query text, it is important to take into account the searcher’s context to provide relevant results. Their social graph is an integral part of this context and is a unique aspect of Facebook search. While embedding-based retrieval (EBR) has been applied in web search engines for years, Facebook search was still mainly based on a Boolean matching model. In this paper, we discuss the techniques for applying EBR to a Facebook Search system. We introduce the unified embedding framework developed to model semantic embeddings for personalized search, and the system to serve embedding-based retrieval in a typical search system based on an inverted index. We discuss various tricks and experiences on end-to-end optimization of the whole system, including ANN parameter tuning and full-stack optimization. Finally, we present our progress on two selected advanced topics about modeling. We evaluated EBR on verticals for Facebook Search with significant metrics gains observed in online A/B experiments. We believe this paper will provide useful insights and experiences to help people on developing embedding-based retrieval systems in search engines.

ShopNet: Unified computer vision model trunk and embeddings for commerce
Sean Bell, Yiqun Liu, Sami Alsheikh, Yina Tang, Ed Pizzi, Michael Henning, Karun Singh, Fedor Borisyuk

In this paper, we present ShopNet, a deployed image recognition system for commerce applications. ShopNet leverages a multi-task learning approach to train a single computer vision trunk. We achieve a 2.1x improvement in exact product match accuracy when compared to the previous state-of-the-art Facebook product recognition system. We share our experience of combining different data sources with wide-ranging label semantics and image statistics, including learning from human annotations, user-generated tags, and noisy search engine interaction data. We experiment with a diverse set of loss functions, optimizing jointly for exact product recognition accuracy and various classification tasks. We provide insights on what worked best in practice. ShopNet is deployed in production applications with gains and operates at Facebook scale.

TIES: Temporal interaction embeddings for enhancing social media integrity at Facebook
Nima Noorshams, Saurabh Verma, Aude Hofleitner

Since its inception, Facebook has become an integral part of the online social community. People rely on Facebook to make connections with others and build communities. As a result, it is paramount to protect the integrity of such a rapidly growing network in a fast and scalable manner. In this paper, we present our efforts to protect various social media entities at Facebook from people who try to abuse our platform. We present a novel temporal interaction embeddings (TIES) model that is designed to capture rogue social interactions and flag them for further suitable actions. TIES is a supervised, deep learning, production-ready model at Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing either only static or certain dynamic features of social entities. In contrast, TIES can capture both of these variant behaviors in a unified model owing to the recent strides made in the domains of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES, we present a few applications especially for preventing spread of misinformation, fake account detection, and reducing ads payment risks in order to enhance the Facebook platform’s integrity.

Training deep learning recommendation model with quantized collective communications
Jie (Amy) Yang, Jongsoo Park, Srinivas Sridharan, Ping Tak Peter Tang

Deep learning recommendation model (DLRM) captures our representative model architectures developed for click-through rate (CTR) prediction based on high-dimensional sparse categorical data. Collective communications can account for a significant fraction of time in synchronous training of DLRM at scale. In this work, we explore using fine-grain integer quantization to reduce the communication volume of alltoall and allreduce collectives. We emulate quantized alltoall and allreduce, the latter using ring or recursive-doubling and each with optional carried-forward error compensation. We benchmark accuracy loss of quantized alltoall and allreduce with a representative DLRM model and Kaggle 7D data set. We show that alltoall forward and backward passes and dense allreduce can be quantized to 4 bits without accuracy loss, compared to full-precision training.

Other activities at KDD 2020

Applied data science invited talks

Data paucity and low resource scenarios: Challenges and opportunities
Mona Diab, invited speaker

Preserving integrity in online social media
Alon Halevy, invited speaker

Hands-on tutorials

Building recommender systems with PyTorch
Dheevatsa Mudigere, Maxim Naumov, Joe Spisak, and Geeta Chauhan, presenters

Special days

Deep learning day
Joelle Pineau

Workshops

Humanitarian mapping workshop
Shankar Iyer, initiative lead/chair
Kristen Altenburger, Eugenia Giraudy, Alex Dow, Paige Maas, Alex Pompe, Eric Sodomka, program committee

Workshop on applied data science for Healthcare
Paper: Information extraction of clinical trial eligibility criteria
Yitong Tseo, M. I. Salkola, Ahmed Mohamed, Anuj Kumar, Freddy Abnousi

Workshop on deep learning practice for high-dimensional spare data
Liang Xiong, workshop chair

Workshop on mining and learning with graphs
Aude Hofleitner, organizer
Jin Kyu Kim, program committee

The post Facebook research at KDD 2020 appeared first on Facebook Research.

Read More

Improving the accuracy of Community Standards enforcement by certainty estimation of human decisions

Improving the accuracy of Community Standards enforcement by certainty estimation of human decisions

What we did

Online services have made great strides in leveraging machine-learned models to fight abuse at scale. For example, 99.5 percent of takedowns on fake Facebook accounts are proactively detected before users report them. However, despite this progress, there are many areas where large-scale systems still rely on human decisions for a range of tasks, including collecting labels for training models, enforcing a range of policies, and reviewing appeals.

An obvious challenge that arises when relying on human reviewers is that we are inherently noisy and potentially biased decision-makers. While bias is a trait of the individual, noise can result from subjectivity or ambiguity of the decision guidelines, or from simple mistakes that are commonly the result of fatigue or pressure. In this work, we consider three applications where mistakes made by human decisions can have negative outcomes:

  • Enforcement: When community standards are being enforced, an incorrect decision can result in taking down a benign piece of content from the platform or leaving violating content on the platform.
  • Training machine learning models: Using inaccurate human-generated “ground truth” labels might lead to inaccurate models.
  • Prevalence estimation: Prevalence is the percentage of policy-violating content out of all content seen by Facebook users. It is computed by sampling content and sending it to reviewers, who review it for violations. Failing to consider mistakes in these reviews can lead to incorrect prevalence estimates and confidence intervals.

A scalable solution to reduce the likelihood of mistakes is to assign multiple reviewers to each task. If available, it is possible to augment human decisions with additional nonhuman signals, such as scores from machine learning models. A key challenge that rises in these settings is the need to aggregate multiple potentially conflicting decisions and provide an estimate for the certainty of the decision.

In our paper, to be published at the 2020 ACM Conference on Knowledge Discovery and Data Mining, we present CLARA (Confidence of Labels and Raters), a system built and deployed at Facebook to estimate the uncertainty in human-generated decisions. We show how CLARA is used at Facebook to obtain more accurate decisions overall while reducing operational resource use.

How we did it

We follow a rich body of research on crowdsourcing and take a Bayesian probabilistic approach to define different latent variables and the generative process of the observed data. In particular, the observed data includes a set of items, each of which receives multiple labels and potentially one or more scores from machine learning models. CLARA estimates the following latent variables:

  • Overall prevalence: The rate at which each label category occurs
  • Per-reviewer confusion matrix: Each reviewer’s ability to correctly label items of different true label categories
  • Per-item true label: The true latent label category of each item
  • Score mixture: The different score distributions of items from different label categories

For posterior inference, we implemented a collapsed Gibbs sampling algorithm to infer the values of all latent variables given the observed data. Figure 1 shows the graphical model of CLARA together with illustrative examples of the observed and latent variables.

Figure 1

Results

We’ve deployed CLARA at scale at Facebook. While similar underlying models have been studied in the literature, this work provides the details of a large-scale, real-world deployment of a complete system, with both offline and online aggregation and uncertainty estimation capabilities.

Figure 2 illustrates an overview of how CLARA is deployed at scale in production at Facebook.

Figure 2

One of the key applications where we use CLARA in Facebook is the efficient allocation of labeling resources based on confidence scores. We achieve this by obtaining additional reviews only when the decision confidence given by CLARA is not sufficiently high. This results in a cost/accuracy trade-off, where higher levels of decision confidence result in additional reviews. An example trade-off curve, which uses simulated “ground truth” and labeling mistakes, is shown in Figure 3. The figure depicts the change in accuracy (left) and mean absolute error (right) as a function of the percent of labels. Compared to a random sampling baseline, the figure shows that CLARA provides a better trade-off curve, enabling an efficient usage of labeling resources. In a production deployment, we found that CLARA can save up to 20 percent of total reviews compared to majority vote. You can find more details and results in our paper.

Figure 3

How we are extending this work

The current implementation of CLARA leverages machine learning scores by treating them as nonbinary “artificial reviewers.” However, we observe that human mistakes are often correlated with the difficulty of the task, which can be reflected in the machine learning score. We are developing a continuous “confusion function” and prevalence function, which takes into account the difficulty of the task as captured by the machine learning score.

The post Improving the accuracy of Community Standards enforcement by certainty estimation of human decisions appeared first on Facebook Research.

Read More

Need Healthcare? AI Startup Curai Has an App for That

Need Healthcare? AI Startup Curai Has an App for That

As a child, Neal Khosla became engrossed by the Oakland Athletics baseball team’s “Moneyball” approach of using data analytics to uncover the value and potential of the sport’s players. A few years ago, the young engineer began pursuing similar techniques to improve medical decision-making.

It wasn’t long after Khosla met Xavier Amatriain, who was looking to apply his engineering skills to a higher mission, that the pair founded Curai. The three-year-old startup, based in Palo Alto, Calif., is using AI to improve the entire process of providing healthcare.

The scope of their challenge — transforming how medical care is accessed and delivered — is daunting. But even modest success could bring huge gains to people’s well-being when one considers that more than half of the world’s population has no access to essential health services, and nearly half of the 400,000 deaths a year attributed to incorrect diagnoses are considered preventable.

“When we think about a world where 8 billion people will need access to high-quality primary care, it’s clear to us that our current system won’t work,” said Khosla, Curai’s CEO. “The accessibility of Google is the level of accessibility we need.”

Curai’s efforts to lower the barrier to entry for healthcare for billions of people center on applying GPU-powered AI to connect patients, providers and health coaches via a chat-based application. Behind the scenes, the app is designed to effectively connect all of the healthcare dots, from understanding symptoms to making diagnoses to determining treatments.

“Healthcare as it is now does not scale. There are not enough doctors in the world, and the situation is not going to get better,” Khosla said. “Our hypothesis is that we can not only scale, but also improve the quality of medicine by automating many parts of the process.”

COVID-19 Fans the Flames

The COVID-19 pandemic has only made Curai’s mission more urgent. With healthcare in the spotlight, there is more momentum than ever to bring more efficiency, accessibility and scale to the industry.

Curai’s platform uses AI and machine learning to automate every part of the process. It’s fronted by the company’s chat-based application, which delivers whatever the user needs.

Patients can use it to input information about their conditions, access their medical profiles, chat with providers 24/7, and see where the process stands.

For providers, it puts a next-generation electronic health record system at their fingertips, where they can access all relevant information about a patient’s care. The app also supports providers by offering diagnostic and treatment suggestions based on Curai’s ever improving algorithms.

“Our approach is to meticulously and carefully log and record data about what the practitioners are doing so we can train models that learn from them,” said Amatriain, chief technology officer at Curai. “We make sure that everything we implement in our platform is designed to improve our ‘learning loop’ – our ability to generate training data that improves our algorithms over time.”

Curai’s main areas of AI focus have been natural language processing (for extracting data from medical conversations), medical reasoning (for providing diagnosis and treatment recommendations) and image processing and classification (largely for dermatology images uploaded by patients).

Across all of these areas, Curai is tapping state-of-the-art techniques like using synthetic data in combination with natural data to train its deep neural networks.

Curai online assessment tool
Curai online assessment tool.

Most of Curai’s experimentation, and much of its model training, occurs on two custom Supermicro workstations each running two NVIDIA TITAN XP GPUs. For its dermatology image classification, Curai initialized a 50-layer convolutional neural network with 23,000 images. For its diagnostic models, the company trained a model on 400,000 simulated medical cases using a CNN. Finally, it trained a class of neural network known as a multilayer perceptron using electronic health records from nearly 80,000 patients.

Curai has occasionally turned to a combination of the Google Cloud Platform and Amazon Web Services to access larger compute capabilities, such as using a doubly fine-tuned BERT model for working out medical question similarities. This used 363,000 text training examples from its own service, with training occurring on two NVIDIA V100 Tensor Core GPUs.

Ready to Scale

There’s still much work to be done on the platform, but Amatriain believes Curai is ready to scale. The company is a premier member of NVIDIA Inception, a program that enables companies working in AI and data science with fundamental tools, expertise and marketing support to help them get to market faster.

Curai plans to finalize its go-to-market strategy over the coming months, and is currently focused on continued training of text- and image-based models, which are good fits for a chat setting. But Amatriain also made it clear that Curai has every intention of bringing sensors, wearable technology and other sources of data into its loop.

In Curai’s view, more data will yield a better solution, and a better solution is the best outcome for patients and providers alike.

“In five years, we see ourselves serving millions of people around the world, and providing them with great-quality, affordable healthcare,” said Amatriain. “We feel that we not only have the opportunity, but also the responsibility, to make this work.”

The post Need Healthcare? AI Startup Curai Has an App for That appeared first on The Official NVIDIA Blog.

Read More

Understanding View Selection for Contrastive Learning

Understanding View Selection for Contrastive Learning

Posted by Yonglong Tian, Student Researcher and Chen Sun, Staff Research Scientist, Google Research

Most people take for granted the ability to view an object from several different angles, but still recognize that it’s the same object— a dog viewed from the front is still a dog when viewed from the side. While people do this naturally, computer scientists need to explicitly enable machines to learn representations that are view-invariant, with the goal of seeking robust data representations that retain information that is useful to downstream tasks.

Of course, in order to learn these representations, manually annotated training data can be used. However, as in many cases such annotations aren’t available, which gives rise to a series of self- and crossmodal supervised approaches that do not require manually annotated training data. Currently, a popular paradigm for training with such data is contrastive multiview learning, where two views of the same scene (for example, different image channels, augmentations of the same image, and video and text pairs) will tend to converge in representation space while two views of different scenes diverge. Despite their success, one important question remains: “If one doesn’t have annotated labels readily available, how does one select the views to which the representations should be invariant?” In other words, how does one identify an object using information that resides in the pixels of the image itself, while still remaining accurate when that image is viewed from disparate viewpoints?

In “What makes for good views for contrastive learning”, we use theoretical and empirical analysis to better understand the importance of view selection, and argue that one should reduce the mutual information between views while keeping task-relevant information intact. To verify this hypothesis, we devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their mutual information. We also consider data augmentation as a way to reduce mutual information, and show that increasing data augmentation indeed leads to decreasing mutual information while improving downstream classification accuracy. To encourage further research in this space, we have open-sourced the code and pre-trained models.

The InfoMin Hypothesis
The goal of contrastive multiview learning is to learn a parametric encoder, whose output representations can be used to discriminate between pairs of views with the same identities, and pairs with different identities. The amount and type of information shared between the views determines how well the resulting model performs on downstream tasks. We hypothesize that the views that yield the best results should discard as much information in the input as possible except for the task relevant information (e.g., object labels), which we call the InfoMin principle.

Consider the example below in which two patches of the same image represent the different “views”. The training objective is to identify that the two views belong to the same image. It is undesirable to have views that share too much information, for example, where low-level color and texture cues can be exploited as “shortcuts” (left), or to have views that share too little information to identify that they belong to the same image (right). Rather, views at the “sweet spot” share the information related to downstream tasks, such as patches corresponding to different parts of the panda for an object classification task (center).

An illustration of three regimes of information captured during contrastive multiview learning. Views should not share too much information (left) or too little information (right), but should find an optimal mix (the “sweet spot”, middle) that maximizes the downstream performance.

A Unified View on Contrastive Learning
We design several sets of experiments to verify the InfoMin hypothesis, motivated by the fact that there are simple ways to control the mutual information shared between views without any supervision. For example, we can sample different patches from the same images, and reduce their mutual information simply by increasing the distance between the patches. Here, we estimate the mutual information using InfoNCE (INCE), which is a quantitative measure of the mutual information lower bound.. Indeed, we observe a reverse U-shape curve: as mutual information is reduced, the downstream task accuracy first increases and then begins to decrease.

Downstream classification accuracy on STL-10 (left) and CIFAR-10 (right) by applying linear classifiers on representations learned with contrastive learning. Same as the previous illustration, the views are sampled as different patches from the same images. Increasing the Euclidean distance between patches leads to decreasing mutual information. A reverse U-shape curve between classification accuracy and INCE (patch distance) is observed.

Furthermore, we demonstrate that several state-of-the-art contrastive learning methods (InstDis, MoCo, CMC, PIRL, SimCLR and CPC) can be unified through the perspective of view selection: despite the differences in architecture, objective and engineering details, all recent contrastive learning methods create two views that implicitly follow the InfoMin hypothesis, where the information shared between views are controlled by the strength of data augmentation. Motivated by this, we propose a new set of data augmentations, which outperforms the prior state of the art, SimCLR, by nearly 4% on the ImageNet linear readout benchmark. We also found that transferring our unsupervised pre-trained models to object detection and instance segmentation consistently outperforms ImageNet pre-training.

Learning to Generate Views
In our work, we design unsupervised and semi-supervised methods that synthesize novel views following the InfoMin hypothesis. We learn flow-based models that transfer natural color spaces into novel color spaces, from which we split the channels to get views. For the unsupervised setup, the view generators are optimized to minimize the InfoNCE bound between views. As shown in the results below, we observe a similar reverse U-shape trend while minimizing the InfoNCE bound.

View generators learned by unsupervised (left) and semi-supervised (right) objectives.

To reach the sweet spot without overly minimizing mutual information, we can use the semi-supervised setup and guide the view generator to retain label information. As expected, all learned views are now centered around the sweet spot, no matter what the input color space is.

Code and Pretrained Models
To accelerate research in self-supervised contastive learning, we are excited to share the code and pretrained models of InfoMin with the academic community. They can be found here.

Acknowledgements
The core team includes Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid and Phillip Isola. We would like to thank Kevin Murphy for insightful discussion; Lucas Beyer for feedback on the manuscript; and the Google Cloud team for computation support.

Read More

Introducing Semantic Reactor: Explore NLP in Google Sheets

Introducing Semantic Reactor: Explore NLP in Google Sheets

Posted by Dale Markowitz, Applied AI Engineer

Editor’s note: An earlier version of this article was published on Dale’s blog.

Machine learning can be tricky, so being able to prototype ML apps quickly is a boon. If you’re building a language-powered app — like a video game with characters players can talk to or a customer service bot — the Semantic Reactor is a tool that will help you do just that.

The Semantic Reactor is a new plugin for Google Sheets that lets you run natural language understanding (NLU) models (variations the Universal Sentence Encoder) on your own data, right from a spreadsheet.
In this post, I’ll show you how to work with the tool and the NLU models it uses, but first, how does NLP actually work? What’s going on under the hood? (Want to skip straight to the tool? Scrolling to the next section.)

Understanding Embeddings

What are Word Embeddings?

One simple (but powerful) technique for building natural-language-powered software is to use “embeddings.”

In machine learning, embeddings are a learned way of representing data in space (i.e. points plotted on an n-dimensional grid) such that the distances between points are meaningful. Word vectors are one popular example:
The picture above is a rough visual example of how words can be closer or further away from each other. Note that the words “Austin,” “Texas,” and “barbecue” have a close relationship with each other, as do “pet” and “dog,” and “walk” and “run.” Each word is represented by a set of coordinates (or a vector) and are placed on a graph where we can see relationships. For instance, we can see that the word “rat” is close to both “pet” and also “cat”.

Where do these numbers come from? They’re learned by a machine learning model through many bits of conversational and language data. By showing all those examples, the model learns which words tend to occur in the same spots in sentences.

Consider these two sentences:

  • “My mother gave birth to a son.”
  • “My mother gave birth to a daughter.”

Because the words “daughter” and “son” are often used in similar contexts, the model will learn that they should be represented close to each other in space. Word embeddings are useful in natural language processing. They can be used to find synonyms (“semantic similarity”), to solve analogies, or as a preprocessing step for a more complicated model. You can quickly train your own basic word embeddings with TensorFlow here.

What are Sentence Embeddings?

It turns out that entire sentences (and even short paragraphs) can be effectively embedded in space too, using a type of model called a universal sentence encoder. Using sentence embeddings, we can figure out if two sentences are similar. This is useful, for example, if you’re building a chatbot and want to know if a question a user asked (i.e. “When will you wake me up?”) is semantically similar to a question you – the chatbot programmer – have anticipated and written a response to (“What time is my alarm?”).

Semantic Reactor: Prototype using NLP in a Google Sheet

Alright, now onto the fun part: Building things! There are three NLP models available in the Semantic Reactor:

  • Local – A small TensorFlow.js version of the Universal Sentence Encoder that can run entirely within a webpage.
  • Basic Online – A full sized, general-use version of the Universal Sentence Encoder.
  • Multilingual Online – A full-sized Universal Sentence Encoder model trained on question/answer pairs in 16 languages.

Each model offers two ranking methods:

  • Semantic Similarity: How similar are two blocks of text?

    Great for applications where you can anticipate what users might ask, like an FAQ bot. (Many customer service bots use semantic similarity to help deliver good answers to users.)

  • Input / Response: How good of a response is one block of text to another?

    Useful for when you have a large, and constantly changing, set of texts and you don’t know what users might ask. For instance, Talk to Books, a semantic search tool for a regularly updated collection of 100,000 books, uses input / response.

You can use the Semantic Reactor to test a response list against each model and ranking method. Sometimes it takes a good bit of experimenting before you get your response list and model selection to one you think will work for your application. The good news is that doing that work in a Google Sheet makes it fast and easy.
Once you have your response list, model selection and ranking method decided on, you can then begin writing code, and if you want to keep all operations within a website or on device (without requiring online API calls), you can use the newly updated TensorFlow.js model.
As mentioned, there are lots of great uses for NLU tech, and more interesting applications come out almost everyday. Every digital assistant, customer service bot, and search engine is likely using some flavor of machine learning. Smart Reply and Smart Compose in Gmail are two well-used features that make good use of semantic tech.
However, it’s fun and helpful to play with the tech within applications where the quality demands aren’t so high, where failure is okay and even entertaining. To that end, we’ve used the same tech that’s within the Semantic Reactor to create a couple of example games. Semantris is a word association game that uses the input-response ranking method, and The Mystery of the Three Bots uses semantic similarity.
Playing those two games, and finding out where they work and where they don’t, might give you ideas on what experiences you might create.

Semantris, a word-association game powered by word embeddings.
The Mystery of the Three Bots is a simple game powered by NLU and available as open source code. (It’s also playable here.)

One of the coolest applications of this tech comes from Anna Kipnis, a former game designer at Double Fine who now works with Stadia. She used Semantic Reactor to prototype a video game world that infers how the environment should react to player inputs using ML. Check out our conversation here.
In Anna’s game, players interact with a virtual fox by asking any question they think of:

  • “Fox, can I have some coffee?”

Then, using Semantic ML, the game engine (or the utility system) considers all of the possible ways the game might respond:

  • “Fox turns on lights.“
  • “Fox turns on radio.“
  • “Fox move to you.“
  • “Fox brings you mug.“

Using a sentence encoder model, the game decides what the best response is and executes it (in this case, the best response is “Fox brings you a mug,” so the game animates the Fox bringing you a mug). If that sounds a little abstract, definitely watch the video linked above.
Let’s see how you might build something like Anna’s game with Semantic Reactor (for all the nitty gritties of the fox demo, check out her original post).
First, create a new Google sheet and write some sentences in the first column. I put these sentences in the first column of my Google sheet:

  • I grab a ball
  • I go to you
  • I play with a ball
  • I go to school.
  • I go to the mug.
  • I bring you the mug.
  • I turn on music.
  • I take a nap.
  • I go for a hike.
  • I tell you a secret.
  • I snuggle with you.
  • I ask for a belly rub.
  • I send a text.
  • I answer the phone.
  • I make a sandwich.
  • I drink some water.
  • I play a board game.
  • I do some coding.

You’ll have to use your imagination here and think of these “actions” that a potential character (e.g. a chatbot or an actor in a video game) might take.
Once you’ve applied for and been given access to Semantic Reactor, you’ll be able to enable it by clicking on “Add-ons -> Semantic Reactor -> Start”. Clicking “Start” will open a panel that allows you to type in an input and hit “React”: When you hit “React”, Semantic Reactor uses a model to embed all of the responses you’ve written in that first column, calculate a score (how good a response is this sentence to the query?), and sort the results. For example, when my input was “I want some coffee,” the top ranked responses from my spreadsheet were, “I go to the mug” and “I bring you the mug.” You’ll also notice that there are two different ways to rank sentences using this tool: “Input/Response” and “Semantic Similarity.” As the name implies, the former ranks sentences by how good they are as responses to the given query, whereas “Semantic Similarity” simply rates how similar the sentences are to the query.

From Spreadsheet to Code with TensorFlow.js

Underneath the hood, Semantic Reactor is powered by the open-source TensorFlow.js models found here.
Let’s take a look at how to use those models in JavaScript, so that you can convert your spreadsheet prototype into a working app.
1 – Create a new Node project and install the module:

npm init
npm install @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

2 – Create a new file (use_demo.js) and require the library:

require('@tensorflow/tfjs');
const encoder = require('@tensorflow-models/universal-sentence-encoder');

3 – Load the model:

const model = await encoder.loadQnA();

4 – Encode your sentences and query:

const input = {
queries: ["I want some coffee"],
responses: [
"I grab a ball",
"I go to you",
"I play with a ball",
"I go to school.",
"I go to the mug.",
"I bring you the mug."
]
};

const embeddings = await model.embed(input);

5 – Voila! You’ve transformed your responses and query into vectors. Unfortunately, vectors are just points in space. To rank the responses, you’ll want to compute the distance between those points (you can do this by computing the dot product, which gives you the squared Euclidean distance between points):

  //zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
const zipWith =
(f, xs, ys) => {
const ny = ys.length;
return (xs.length .map((x, i) => f(x, ys[i]));
}

// Calculate the dot product of two vector arrays.
const dotProduct = (xs, ys) => {
const sum = xs => xs ? xs.reduce((a, b) => a + b, 0) : undefined;

return xs.length === ys.length ?
sum(zipWith((a, b) => a * b, xs, ys))
: undefined;
}

If you run this code, you should see output like:

 [
{ response: 'I grab a ball', score: 10.788130270345432 },
{ response: 'I go to you', score: 11.597091717283469 },
{ response: 'I play with a ball', score: 9.346379028479209 },
{ response: 'I go to school.', score: 10.130473646521292 },
{ response: 'I go to the mug.', score: 12.475453722603106 },
{ response: 'I bring you the mug.', score: 13.229019199245684 }
]

Check out the full code sample here.
And that’s it–that’s how you go from a Semantic ML spreadsheet to code fast!
An earlier version of this post was published at https://daleonai.com/semantic-ml.Read More

Real-time data for a better response to disease outbreaks

Kinsa was founded by MIT alumnus Inder Singh MBA ’06, SM ’07 in 2012, with the mission of collecting information about when and where infectious diseases are spreading in real-time. Today the company is fulfilling that mission along several fronts.

It starts with families. More than 1.5 million of Kinsa’s “smart” thermometers have been sold or given away across the country, including hundreds of thousands to families from low-income school districts. The thermometers link to an app that helps users decide if they should seek medical attention based on age, fever, and symptoms.

At the community level, the data generated by the thermometers are anonymized and aggregated, and can be shared with parents and school officials, helping them understand what illnesses are going around and prevent the spread of disease in classrooms.

By working with over 2,000 schools to date in addition to many businesses, Kinsa has also developed predictive models that can forecast flu seasons each year. In the spring of this year, the company showed it could predict flu spread 12-20 weeks in advance at the city level.

The milestone prepared Kinsa for its most profound scale-up yet. When Covid-19 came to the U.S., the company was able to estimate its spread in real-time by tracking fever levels above what would normally be expected. Now Kinsa is working with health officials in five states and three cities to help contain and control the virus.

“By the time the CDC [U.S. Centers for Disease Control] gets the data, it has been processed, deidentified, and people have entered the health system to see a doctor,” say Singh, who is Kinsa’s CEO as well as its founder. “There’s a huge delay from when someone contracts an illness and when they see a doctor. The current health care system only sees the latter; we see the former.”

Today Kinsa finds itself playing a central role in America’s Covid-19 response. In addition to its local partnerships, the company has become a central information hub for the public, media, and researchers with its Healthweather tool, which maps unusual rates of fevers — among the most common symptom of Covid-19 — to help visualize the prevalence of illness in communities.

Singh says Kinsa’s data complement other methods of containing the virus like testing, contact tracing, and the use of face masks.

Better data for better responses

Singh’s first exposure to MIT came while he was attending the Harvard University Kennedy School of Government as a graduate student.

“I remember I interacted with some MIT undergrads, we brainstormed some social-impact ideas,” Singh recalls. “A week later I got an email from them saying they’d prototyped what we were talking about. I was like, ‘You prototyped what we talked about in a week!?’ I was blown away, and it was an insight into how MIT is such a do-er campus. It was so entrepreneurial. I was like, ‘I want to do that.’”

Soon Singh enrolled in the Harvard-MIT Program in Health Sciences and Technology, an interdisciplinary program where Singh earned his master’s and MBA degrees while working with leading research hospitals in the area. The program also set him on a course to improve the way we respond to infectious disease.

Following his graduation, he joined the Clinton Health Access Initiative (CHAI), where he brokered deals between pharmaceutical companies and low-resource countries to lower the cost of medicines for HIV, malaria, and tuberculosis. Singh described CHAI as a dream job, but it opened his eyes to several shortcomings in the global health system.

“The world tries to curb the spread of infectious illness with almost zero real-time information about when and where disease is spreading,” Singh says. “The question I posed to start Kinsa was ‘how do you stop the next outbreak before it becomes an epidemic if you don’t know where and when it’s starting and how fast it’s spreading’?”

Kinsa was started in 2012 with the insight that better data were needed to control infectious diseases. In order to get that data, the company needed a new way of providing value to sick people and families.

“The behavior in the home when someone gets sick is to grab the thermometer,” Singh says. “We piggy-backed off of that to create a communication channel to the sick, to help them get better faster.”

Kinsa started by selling its thermometers and creating a sponsorship program for corporate donors to fund thermometer donations to Title 1 schools, which serve high numbers of economically disadvantaged students. Singh says 40 percent of families that receive a Kinsa thermometer through that program did not previously have any thermometer in their house.

The company says its program has been shown to help schools improve attendance, and has yielded years of real-time data on fever rates to help compare to official estimates and develop its models.

“We had been forecasting flu incidence accurately several weeks out for years, and right around early 2020, we had a massive breakthrough,” Singh recalls. “We showed we could predict flu 12 to 20 weeks out — then March hit. We said, let’s try to remove the fever levels associated with cold and flu from our observed real time signal. What’s left over is unusual fevers, and we saw hotspots across the country. We observed six years of data and there’d been hot spots, but nothing like we were seeing in early March.”

The company quickly made their real-time data available to the public, and on March 14, Singh got on a call with the former New York State health commissioner, the former head of the U.S. Food and Drug Administration, and the man responsible for Taiwan’s successful Covid-19 response.

“I said, ‘There’s hotspots everywhere,” Singh recalls. “They’re in New York, around the Northeast, Texas, Michigan. They said, ‘This is interesting, but it doesn’t look credible because we’re not seeing case reports of Covid-19.’ Low and behold, days and weeks later, we saw the Covid cases start building up.”

A tool against Covid-19

Singh says Kinsa’s data provide an unprecedented look into the way a disease is spreading through a community.

“We can predict the entire incidence curve [of flu season] on a city-by-city basis,” Singh says. “The next best model is [about] three weeks out, at a multistate level. It’s not because we’re smarter than others; it’s because we have better data. We found a way to communicate with someone consistently when they’ve just fallen ill.”

Kinsa has been working with health departments and research groups around the country to help them interpret the company’s data and react to early warnings of Covid-19’s spread. It’s also helping companies around the country as they begin bringing employees back to offices.

Now Kinsa is working on expanding its international presence to help curb infectious diseases on multiple fronts around the world, just like it’s doing in the U.S. The company’s progress promises to help authorities monitor diseases long after Covid-19.

“I started Kinsa to create a global, real-time outbreak monitoring and detection system, and now we have predictive power beyond that,” Singh says. “When you know where and when symptoms are starting and how fast their spreading, you can empower local individuals, families, communities, and governments.”

Read More