Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex

Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex

Organizations across industries have a large number of physical documents such as invoices that they need to process. It is difficult to extract information from a scanned document when it contains tables, forms, paragraphs, and check boxes. Organization have been addressing these problems with manual effort or custom code or by using Optical Character Recognition (OCR) technology. However, that requires templates for form extraction and custom workflows.

Moreover, after extracting the text or content from a document, they want to extract insights from these receipts or invoices for their end users. However, that would require building a complex NLP model. Training the model would require a large amount of training data and compute resources. Building and training a machine learning model could be expensive and time-consuming.

Further, providing a human like interface to interact with these documents is cumbersome for end users. These end users often call the help desk but over time this adds cost to the organization.

This post shows you how to use AWS AI services to automate text data processing and insight discovery. With AWS AI services such as Amazon Textract, Amazon Comprehend and Amazon Lex, you can set up an automated serverless solution to address this requirement. We will walk you through below steps:

  1. Extract text from receipts or invoices in pdf or images with Amazon Textract.
  2. Derive insights with Amazon Comprehend.
  3. Interact with these insights in natural language using Amazon Lex.

Next, we will go through the services and the architecture for building the solution to solve the problem.

Services used

This solution uses the following AI services, serverless technologies, and managed services to implement a scalable and cost-effective architecture:

  • Amazon Cognito – Lets you add user signup, signin, and access control to your web and mobile apps quickly and easily.
  • AWS Lambda – Executes code in response to triggers such as changes in data, shifts in system state, or user actions. Because Amazon S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.
  • Amazon Lex – Provides an interface to create conversational chatbots.
  • Amazon Comprehend – NLP service that uses machine learning to find insights and relationships in text.
  • Amazon Textract– Uses ML to extract text and data from scanned documents in PDF, JPEG, or PNG formats.
  • Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls.


The following diagram illustrates the architecture of the solution.

The architecture contains the following steps:

  1. The backend user or administrator uses the AWS Management Console or AWS Command Line Interface (AWS CLI) to upload the PDF documents or images to an S3 bucket.
  2. The Amazon S3 upload triggers a AWS Lambda function.
  3. The Lambda function invokes an Amazon Textract StartDocumentTextDetection API, which sets up an asynchronous job to detect text from the PDF you uploaded.
  4. Amazon Textract notifies Amazon Simple Notification Service (Amazon SNS) when text processing is complete.
  5. A second Lambda function gets the notification from SNS topic when the job is completed to detect text.
  6. Once the lambda is notified of job completion from Amazon SNS, it calls a Amazon Textract GetDocumentTextDetection API to receive the result from asynchronous operation and loads the results into an S3 bucket.
  7. A Lambda function is used for fulfillment of the Amazon Lex intents. For a more detailed sequence of interactions please refer to the Building your chatbot step in “Deploying the Architecture with Cloudformation” section.
  8. Amazon Comprehend uses ML to find insights and relationships in text. The lambda function uses boto3 APIs that Amazon Comprehend provides for entity and key phrases detection.
    1. In response to the Bot’s welcome message, the user types “Show me the invoice summary”, this invokes the GetInvoiceSummary Lex intent and the Lambda function invokes the Amazon Comprehend DetectEntities API to detect entities for fulfillment.
    2. When the user types “Get me the invoice details”, this invokes the GetInvoiceDetails intent, Amazon Lex prompts the user to enter Invoice Number, and the Lambda function invokes the Amazon Comprehend DetectEntities API to return the Invoice Details message.
    3. When the user types “Can you show me the invoice notes for <invoice number>”, this invokes the GetInvoiceNotes intent, and the Lambda function invokes the Amazon Comprehend DetectKeyPhrases API to return comments associated with the invoice.
  9. You deploy the Lexbot Web UI in your AWS Cloudformation template by using an existing CloudFormation stack as a nested stack. To download the stack, see Deploy a Web UI for Your Chatbot. This nested stack deploys a Lex Web UI, the webpage is served as a static website from an S3 bucket. The web UI uses Amazon Cognito to generate an access token for authentication and uses AWS CodeStar to set up a delivery pipeline.The end-users interact this chatbot web UI.

Deploying the architecture with AWS CloudFormation

You deploy a CloudFormation template to provision the necessary AWS Indentity and Access Management (IAM) roles, services, and components of the solution including Amazon S3, Lambda, Amazon Textract, Amazon Comprehend, and the Amazon Lex chatbot.

  1. Launch the following CloudFormation template and in the US East (N. Virginia) Region:
  2. Don’t make any changes to stack name or parameters botname InvoiceBot.
  3. In the Capabilities and transforms section, select all three check-boxes to provide acknowledgment to AWS CloudFormation to create IAM resources and expand the template.

For more information about these resources, see AWS IAM resources.

This template uses AWS Serverless Application Model (AWS SAM), which simplifies how to define functions and APIs for serverless applications, and also has features for these services, like environment variables.

  1. Choose Create stack.

The following screenshot of the Stack Detail page shows the status of the stack as CREATE_IN_PROGRESS. It can take up to 20 minutes for the status to change to CREATE_COMPLETE.

  1. On the Outputs tab, copy the value of LexLambaFunctionArn, AssetsUploadBucket, ExtractedTextfilesBucket, and LexUIWebAppUrl.

Uploading documents to the S3 bucket

To upload your documents to your new S3 bucket, choose the S3 bucket URL corresponding to AssetsUploadBucket that you copied earlier. Upload a PDF or image to start the text extraction flow.

You can download the invoice used in this blog from the GitHub repo and upload it to the AssetsUploadBucket S3 URL. We recommend to customize this solution for your invoice templates. For more information about uploading files, see How do I upload files and folders to an S3 bucket?

After the upload completes, you can see the file on the Amazon S3 console on the Overview tab.

After you upload the file, the text is extracted from the document. To see an extracted file with the text, open the bucket by choosing the URL you copied earlier.

On the Overview tab, you can download the file and inspect the content to see if it’s the same as the text in the uploaded file.

Building your chatbot

We will use the following conversation to model the bot:

Bot: Welcome to InvoiceBot. You can ask me to provide your invoice summary, or details of your invoices, or your invoice notes
User: Show me the invoice summary
Bot: I reviewed your input documents and found 1 invoice with invoice numbers 35678-9 totaling $2100.0. I can get you invoice details or invoice notes. Simply type your request
User: Get me the invoice details
Bot: Please enter the invoice number
User: 35678-9
Bot: Invoice Details for 35678-9: On 5/10/1019 for the item One there is a charge of 1500.00. On 5/11/2019 for the item Merchant Two there is a charge of 100.00. On 5/12/2019 for the item Merchant Three there is a charge of 300.00. On 5/13/2019 for the item Merchant Three there is a charge of 200.00. You can request me for invoice notes or simply close this chat.
User: Can you show me the invoice notes for 35678-9
Bot: Invoice Notes for 35678-9: 5/13/2019 Merchant Three 200.00 Merchant Three 300.00 Laptop Office Supplies Merchant Two 100.00 Team Dinner Food 5/12/2019 5/11/2019 Desks and Office Supplies 5/10/1019 Merchant One 1500.00 Chairs . Feel free to try the options again or you can simply close this chat

We will build an Amazon Lex bot (InvoiceBot) with the following intents:

  • GetInvoiceSummary – Intent that’s invoked when the user requests to view the Invoice Summary. This is fulfilled by a Lambda function and returns the count of invoices available, and the total amount of the invoices
  • GetInvoiceDetails – Intent that’s invoked when the user requests to view the Invoice Details. This is fulfilled by a Lambda function and provides item level breakdown of the invoices including Date, Quantity and Item Details
  • GetInvoiceNotes – Intent that’s invoked when the user requests to view the Invoice Notes. This is fulfilled by a Lambda function and provides notes from the invoices uploaded with Date and Item Description.

Publishing your chatbot

As described in the solution overview earlier, you use an Amazon Lex chatbot (InvoiceBot) to interact with the insights Amazon Comprehend derives from the text Amazon Textract extracts.

To publish your chatbot, complete the following steps:

  1. On the Amazon Lex console, choose Bots.
  2. Choose the chatbot you created.
  3. Under Intents, choose GetInvoiceSummary.
  4. Under Fulfilment, select your Lambda function.
  5. Search for the function by entering LexLambdaFunction and selecting the result.

A pop-up box appears.

  1. Choose OK.
  2. Choose Save intent.
  3. Repeat these steps for the remaining two intents, GetInvoiceDetails and GetInvoiceNotes.
  4. Choose Build.
  5. When the build is complete, choose Publish.
  6. For Create an alias, enter Latest. You can consider a different name; names like test, dev, beta, or prod primarily refer to the environment of the bot.
  7. Choose Publish.

The following page opens after the bot is published.

  1. Choose Close.

Using the chatbot

Your chatbot is now ready to use. Navigate to the URL LexUIWebAppUrl copied from the AWS CloudFormation Outputs tab. The following screenshots show the user conversation with the bot (read from left to right):


This post demonstrated how to create a conversational chatbot in Amazon Lex that enables interaction with insights derived using Amazon Comprehend and Amazon Textract from a text in images or in a PDF document. The code from this post is available on the GitHub repo for you to use and extend. We are interested to hear how you would like to apply this solution for your usecase. Please share your thoughts and questions in the comments section.

About the Authors

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with World Wide Public Sector Team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML Explainability areas in AI/ML .




Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an Autonomous Vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.



Saida Chanda is a Senior Partner Solutions Architect based out of Seattle, WA. He is a technology enthusiast who drives innovation through AWS partners to meet customers complex business requirements via simple solutions. His areas of interest are ML and DevOps. In his spare time, he likes to spend time with family and exploring his innerself through meditation.




Read More

SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search

SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search

Posted by Xianzhi Du, Software Engineer and Jaeyoun Kim, Technical Program Manager, Google Research

Convolutional neural networks created for image tasks typically encode an input image into a sequence of intermediate features that capture the semantics of an image (from local to global), where each subsequent layer has a lower spatial dimension. However, this scale-decreased model may not be able to deliver strong features for multi-scale visual recognition tasks where recognition and localization are both important (e.g., object detection and segmentation). Several works including FPN and DeepLabv3+ propose multi-scale encoder-decoder architectures to address this issue, where a scale-decreased network (e.g., a ResNet) is taken as the encoder (commonly referred to as a backbone model). A decoder network is then applied to the backbone to recover the spatial information.

While this architecture has yielded improved success for image recognition and localization tasks, it still relies on a scale-decreased backbone that throws away spatial information by down-sampling, which the decoder then must attempt to recover. What if one were to design an alternate backbone model that avoids this loss of spatial information, and is thus inherently well-suited for simultaneous image recognition and localization?

In our recent CVPR 2020 paper “SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization”, we propose a meta architecture called a scale-permuted model that enables two major improvements on backbone architecture design. First, the spatial resolution of intermediate feature maps should be able to increase or decrease anytime so that the model can retain spatial information as it grows deeper. Second, the connections between feature maps should be able to go across feature scales to facilitate multi-scale feature fusion. We then use neural architecture search (NAS) with a novel search space design that includes these features to discover an effective scale-permuted model. We demonstrate that this model is successful in multi-scale visual recognition tasks, outperforming networks with standard, scale-reduced backbones. To facilitate continued work in this space, we have open sourced the SpineNet code to the Tensorflow TPU GitHub repository in Tensorflow 1 and TensorFlow Model Garden GitHub repository in Tensorflow 2.

A scale-decreased backbone is shown on the left and a scale-permuted backbone is shown on the right. Each rectangle represents a building block. Colors and shapes represent different spatial resolutions and feature dimensions. Arrows represent connections among building blocks.

Design of SpineNet Architecture
In order to efficiently design the architecture for SpineNet, and avoid a time-intensive manual search of what is optimal, we leverage NAS to determine an optimal architecture. The backbone model is learned on the object detection task using the COCO dataset, which requires simultaneous recognition and localization. During architecture search, we learn three things:

  • Scale permutations: The orderings of network building blocks are important because each block can only be built from those that already exist (i.e., with a “lower ordering”). We define the search space of scale permutations by rearranging intermediate and output blocks, respectively.
  • Cross-scale connections: We define two input connections for each block in the search space. The parent blocks can be any block with a lower ordering or a block from the stem network.
  • Block adjustments (optional): We allow the block to adjust its scale level and type.
The architecture search process from a scale-decreased backbone to a scale-permuted backbone.

Taking the ResNet-50 backbone as the seed for the NAS search, we first learn scale-permutation and cross-scale connections. All candidate models in the search space have roughly the same computation as ResNet-50 since we just permute the ordering of feature blocks to obtain candidate models. The learned scale-permuted model outperforms ResNet-50-FPN by +2.9% average precision (AP) in the object detection task. The efficiency can be further improved (-10% FLOPs) by adding search options to adjust scale and type (e.g., residual block or bottleneck block, used in the ResNet model family) of each candidate feature block.

We name the learned 49-layer scale-permuted backbone architecture SpineNet-49. SpineNet-49 can be further scaled up to SpineNet-96/143/190 by repeating blocks two, three, or four times and increasing the feature dimension. An architecture comparison between ResNet-50-FPN and the final SpineNet-49 is shown below.

The architecture comparison between a ResNet backbone (left) and the SpineNet backbone (right) derived from it using NAS.

We demonstrate the performance of SpineNet models through comparison with ResNet-FPN. Using similar building blocks, SpineNet models outperform their ResNet-FPN counterparts by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, our largest model, SpineNet-190, achieves 52.1% AP on COCO for a single model without multi-scale testing during inference, significantly outperforming prior detectors. SpineNet also transfers to classification tasks, achieving 5% top-1 accuracy improvement on the challenging iNaturalist fine-grained dataset.

Performance comparisons of SpineNet models and ResNet-FPN models adopting the RetinaNet detection framework on COCO bounding box detection.
Performance comparisons of SpineNet models and ResNet models on ImageNet classification and iNaturalist fine-grained image classification.

In this work, we identify that the conventional scale-decreased model, even with a decoder network, is not effective for simultaneous recognition and localization. We propose the scale-permuted model, a new meta-architecture, to address the issue. To prove the effectiveness of scale-permuted models, we learn SpineNet by Neural Architecture Search in object detection and demonstrate it can be used directly in image classification. In the future, we hope the scale-permuted model will become the meta-architecture design of backbones across many visual tasks beyond detection and classification.

Special thanks to the co-authors of the paper: Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, and Xiaodan Song. We also would like to acknowledge Yeqing Li, Youlong Cheng, Jing Li, Jianwei Xie, Russell Power, Hongkun Yu, Chad Richards, Liang-Chieh Chen, Anelia Angelova, and the larger Google Brain Team for their help.

Heart of the Matter: AI Helps Doctors Navigate Pandemic

Heart of the Matter: AI Helps Doctors Navigate Pandemic

A month after it got FDA approval, a startup’s first product was saving lives on the front lines of the battle against COVID-19.

Caption Health develops software for ultrasound systems, called Caption AI. It uses deep learning to empower medical professionals, including those without prior ultrasound experience, to perform echocardiograms quickly and accurately. 

The results are images of the heart often worthy of an expert sonographer that help doctors diagnose and treat critically ill patients.

The coronavirus pandemic provided plenty of opportunities to try out the first dozen systems. Two doctors who used the new tool shared their stories on the condition that their patients remain anonymous.

In March, a 53-year-old diabetic woman with COVID-19 went into cardiac shock in a New York hospital. Without the images from Caption AI, it would have been difficult to clinch the diagnosis, said a doctor on the scene.

The system helped the physician identify heart problems in an 86-year-old man with the virus in the same hospital, helping doctors bring him back to health. It was another case among more than 200 in the facility that was effectively turned into a COVID-19 hospital this spring.

The Caption Health system made a tremendous impact for a staff spread thin, said the doctor. It would have been hard for a trained sonographer to keep up with the demand for heart exams, he added.

Heart Test Becomes Standard Procedure

Caption AI helped doctors in North Carolina determine that a 62-year-old man had COVID-19-related heart damage. Thanks, in part, to the ease of using the system, the hospital now performs echocardiograms for most patients with the virus.

At the height of the pandemic’s first wave, the hospital stationed ultrasound systems with Caption AI in COVID-19 wards. Rather than sending sonographers from unit to unit, which is the usual practice, staff stationed at the wards used the systems. The change reduced staff exposure to the virus and conserved precious protective gear. 

Beyond the pandemic, the system will help hospitals provide urgent services while keeping a lid on rising costs, said a doctor at that hospital.

“AI-enabled machines will be the next big wave in taking care of patients wherever they are,” said Randy Martin, chief medical officer of Caption Health and emeritus professor of cardiology at Emory University, in Atlanta.

Martin joined the startup about four years ago after meeting its founders, who shared expertise and passion for medicine and AI. Today their software “takes a user through 10 standard views of the heart, coaching them through some 90 fine movements experts make,” he said.

“We don’t intend to replace sonographers; we’re just expanding the use of portable ultrasound systems to the periphery for more early detection,” he added.

Coping with Unexpected Demand Spike

In the early days of the pandemic, that expansion couldn’t come fast enough.

In late March, the startup exhausted supplies that included NVIDIA Quadro P3000 GPUs that ran its AI software. In the early days of the global shutdown, the startup reached out to its supply chain.

“We are experiencing overwhelming demand for our product,” the company’s CEO wrote, after placing orders for 100 GPUs with a distributor.

Caption Health has systems currently in use at 11 hospitals. It expects to deploy Caption AI at several additional sites in the coming weeks. 

GPUs at the Heart of Automated Heart Tests

The startup currently integrates its software in a portable ultrasound from Terason. It intends to partner with more ultrasound makers in the future. And it advises partners to embed GPUs in their future ultrasound equipment.

The Quadro P3000 in Caption AI runs real-time inference tasks using deep convolutional neural networks. They provide operators guidance in positioning a probe that captures images. Then they automatically choose the highest-quality heart images and interpret them to help doctors make informed decisions.

The NVIDIA GPU also freed up four CPU cores, making space to process other tasks on the system, such as providing a smooth user experience.

The startup trained its AI models on a database of 1 million echocardiograms from clinical partners. An early study in partnership with Northwestern Medicine and the Minneapolis Heart Institute showed Caption AI helped eight registered nurses with no prior ultrasound experience capture highly accurate images on a wide variety of patients.

Inception Program Gives Startup Momentum

Caption Heath, formerly called Bay Labs, was founded in 2015 in Brisbane, Calif. It received a $125,000 prize at a 2017 GTC competition for members of NVIDIA’s Inception program, which gives startups access to technology, expertise and markets.

“Being part of the Inception program has provided us with increased recognition in the field of deep learning, a platform to share our AI innovations with healthcare and deep learning communities, and phenomenal support getting NVIDIA GPUs into our supply chain so we could deliver Caption AI,” said Charles Cadieu, co-founder and president of Caption Health.

Now that its tool has been tested in a pandemic, Caption Health looks forward to opportunities to help save lives across many ailments. The company aims to ride a trend toward more portable systems that extend availability and lower costs of diagnostic imaging.

“We hope to see our technology used everywhere from big hospitals to rural villages to examine people for a wide range of medical conditions,” said Cadieu.

To learn more about Caption Health and other companies like it, watch this webinar on healthcare startups working against COVID-19.

The post Heart of the Matter: AI Helps Doctors Navigate Pandemic appeared first on The Official NVIDIA Blog.

Read More

NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely

NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely

For many organizations, the coronavirus pandemic has created a permanent shift in how their employees work. From now on, they’ll have the option to collaborate at home or in the office.

NVIDIA is giving these millions of professionals around the world a boost with a new version of our virtual GPU software, vGPU July 2020. The software adds support for more workloads and is loaded with features that improve operational efficiencies for IT administrators.

GPU virtualization is key to offering everyone from designers to data scientists a flexible way to collaborate on projects that require advanced graphics and computing power, wherever they are.

Employee productivity was the primary concern among organizations addressing remote work due to the COVID-19 pandemic, according to recent research by IDC. When the market intelligence firm interviewed NVIDIA customers using GPU-accelerated virtual desktops, it found organizations with 500-1,000 users experienced a 13 percent increase in productivity, resulting in approximately more than $1 million in annual savings.

According to Alex Herrera, an analyst with Jon Peddie Research/Cadalyst, “In a centralized computing environment with virtualized GPU technology, users no longer have to be tied to their physical workstations. As proven recently through remote work companies can turn on a dime, enabling anywhere/anytime access to big data without compromising on performance.”

Expanded Support in the Data Center and Cloud with SUSE

NVIDIA has expanded hypervisor support by partnering with SUSE on its Linux Enterprise Server, providing vGPU support on its kernel-based virtual machine platform.

Initial offerings will be supported with NVIDIA vComputeServer software, enabling GPU virtualization for AI and data science workloads. This will expand hypervisor platform options for enterprises and cloud service providers that are seeing an increased need to support GPUs.

“Demand for accelerated computing has grown beyond specialized HPC environments into virtualized data centers,” said Brent Schroeder, global chief technology officer at SUSE. “To ensure the needs of business leaders are met, SUSE and NVIDIA have worked to simplify the use of NVIDIA virtual GPUs in SUSE Linux Enterprise Server. These efforts modernize the IT infrastructure and accelerate AI and ML workloads to enhance high-performance and time-sensitive workloads for SUSE customers everywhere.”

Added Support for Immersive Collaboration

NVIDIA CloudXR technology uses NVIDIA RTX and vGPU software to deliver VR and augmented reality across 5G and Wi-Fi networks. vGPU July 2020 adds 120Hz VSync support at resolutions up to 4K, giving CloudXR users an even smoother immersive experience on untethered devices. It creates a level of fidelity that’s indistinguishable from native tethered configurations.

“Streaming AR/VR over Wi-Fi or 5G enables organizations to truly take advantage of its benefits, enabling immersive training, product design and architecture and construction,” said Matt Coppinger, director of AR/VR at VMware. “We’re partnering with NVIDIA to more securely deliver AR and VR applications running on VMware vSphere and NVIDIA Quadro Virtual Workstation, streamed using NVIDIA CloudXR to VMware’s Project VXR client application running on standalone headsets.”

The latest release of vGPU enables a better user experience and manageability needed for demanding workloads like the recently debuted Omniverse AEC Experience, which combines Omniverse, a real-time collaboration platform, with RTX Server and NVIDIA Quadro Virtual Workstation software for the data center. The reference design supports up to two virtual workstations on an NVIDIA Quadro RTX GPU, running multiple workloads such as collaborative, computer-aided design while also providing real-time photorealistic rendering of the model.

With Quadro vWS, an Omniverse-enabled virtual workstation can be provisioned in minutes to new users, anywhere in the world. Users don’t need specialized client hardware, just an internet-connected device, laptop or tablet, and data remains highly secured in the data center.

Improved Operational Efficiency for IT Administrators

New features in vGPU July 2020 help enterprise IT admins and cloud service providers streamline management, boosting their operational efficiency.

This includes cross-branch support, where the host and guest vGPU software can be on different versions, easing upgrades and large deployments.

IT admins can move quicker to the latest hypervisor versions to pick up fixes, security patches and new features, while staggering deployments for end-user images.

Enterprise data centers running VMware vSphere will see improved operational efficiency by having the ability to manage vGPU powered VMs with the latest release of VMware vRealize Operations.

As well, VMware recently added Distributed Resource Scheduler support for GPU-enabled VMs into vSphere. Now, vSphere 7 introduces a new feature called “Assignable Hardware,” which enhances initial placement so that a VM can be automatically “placed” on a host that has exactly the right GPU and  profile available before powering it on.

For IT managing large deployments, this means reducing deployment time of new VMs to a few minutes, as opposed to a manual process that can take hours. As well, this feature works with VMware’s vSphere High Availability, so if a host fails for any reason, a GPU-enabled VM can be automatically restarted on another host with the right GPU resources.


NVIDIA vGPU July 2020 release is coming soon. Learn more at and watch this video.

The post NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely appeared first on The Official NVIDIA Blog.

Read More

The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19

The MIT Press has announced the launch of Rapid Reviews: COVID-19 (RR:C19), an open access, rapid-review overlay journal that will accelerate peer review of Covid-19-related research and deliver real-time, verified scientific information that policymakers and health leaders can use.

Scientists and researchers are working overtime to understand the SARS-CoV-2 virus and are producing an unprecedented amount of preprint scholarship that is publicly available online but has not been vetted yet by peer review for accuracy. Traditional peer review can take four or more weeks to complete, but RR:C19’s editorial team, led by Editor-in-Chief Stefano M. Bertozzi, professor of health policy and management and dean emeritus of the School of Public Health at the University of California at Berkeley, will produce expert reviews in a matter of days.

Using artificial intelligence tools, a global team will identify promising scholarship in preprint repositories, commission expert peer reviews, and publish the results on an open access platform in a completely transparent process. The journal will strive for disciplinary and geographic breadth, sourcing manuscripts from all regions and across a wide variety of fields, including medicine; public health; the physical, biological, and chemical sciences; the social sciences; and the humanities. RR:C19 will also provide a new publishing option for revised papers that are positively reviewed.

Amy Brand, director of the MIT Press sees the no-cost open access model as a way to increase the impact of global research and disseminate high-quality scholarship. “Offering a peer-reviewed model on top of preprints will bring a level of diligence that clinicians, researchers, and others worldwide rely on to make sound judgments about the current crisis and its amelioration,” says Brand. “The project also aims to provide a proof-of-concept for new models of peer-review and rapid publishing for broader applications.”

Made possible by a $350,000 grant from the Patrick J. McGovern Foundation and hosted on PubPub, an open-source publishing platform from the Knowledge Futures Group for collaboratively editing and publishing journals, monographs, and other open access scholarly content, RR:C19 will limit the spread of misinformation about Covid-19, according to Bertozzi.

“There is an urgent need to validate — or debunk — the rapidly growing volume of Covid-19-related manuscripts on preprint servers,” explains Bertozzi. “I’m excited to be working with the MIT Press, the Patrick J. McGovern Foundation, and the Knowledge Futures Group to create a novel publishing model that has the potential to more efficiently translate important scientific results into action. We are also working with COVIDScholar, an initiative of UC Berkeley and Lawrence Berkeley National Lab, to create unique AI/machine learning tools to support the review of hundreds of preprints per week.”

“This project signals a breakthrough in academic publishing, bringing together urgency and scientific rigor so the world’s researchers can rapidly disseminate new discoveries that we can trust,” says Vilas Dhar, trustee of the Patrick J. McGovern Foundation. “We are confident the RR:C19 journal will quickly become an invaluable resource for researchers, public health officials, and healthcare providers on the frontline of this pandemic. We’re also excited about the potential for a long-term transformation in how we evaluate and share research across all scientific disciplines.”

On the collaboration around this new journal, Travis Rich, executive director of the Knowledge Futures Group notes, “At a moment when credibility is increasingly crucial to the well-being of society, we’re thrilled to be partnering with this innovative journal to expand the idea of reviews as first-class research objects, both on PubPub and as a model for others.

RR:C19 will publish its first reviews in July 2020 and is actively recruiting potential reviewers and contributors. To learn more about this project and its esteemed editorial board, visit

Read More

Reflecting on Pride: How five Facebook researchers honor their LGBTQ+ history

The LGBTQ+ community has a long history of resilience and activism in the fight toward acceptance and equal rights in the United States. Pride Month is celebrated every June to honor the 1969 Stonewall Uprising in Manhattan and activists such as Marsha P. Johnson and Sylvia Rivera. This year, the 50th anniversary of Pride coincides with an increased swell of support for the fight against racial injustice and the Black Lives Matter movement, with protests and demonstrations occurring in every state across the U.S. and in countries around the world.

To reflect on the history of Pride Month and its roots in activism, we reached out to the LGBTQ+ community at Facebook. Researchers Gregory Davis, Meghan Rolfe, Darwin Mastin, TJ Olojede, and Hannah Furnas each volunteered their time to share what Pride means to them, how their research influences our products, and how they’re recognizing Pride this year.

Designing a product to bring our authentic selves

Gregory Davis (he/him) is a UX Researcher working on Portal.

Portal allows us to connect with significant others from all aspects of our lives. Making sure people are comfortable is vitally important to that goal. As a UX Researcher, I work on what Portal users need to be able to bring their multiple selves to the device. For LGBTQ+ users, these questions take on enhanced importance.

Pride, to me, is about celebrating the things about you that people can’t see and that many don’t want to see. The freedom to be out — to be all of our identities all of the time — is a gift that LGBTQ+ people cherish, given to us by our queer fore-parents and paid for with the blood, sweat, and tears of their activism and resistance. That activism tipped the scales toward equality with the Stonewall Uprising in June of 1969 when members of the LGBTQ+ community protested against the frequent police raids on the Stonewall Inn — a fight that is extremely relevant to today’s protests against police brutality.

Because queer people fought back at Stonewall and galvanized on the streets, in their homes, and at the ballot, we celebrate Pride Month every June. We celebrate winning marriage equality, protection against discrimination, and the ability to live our lives openly and honestly. That work isn’t done, however. In 2018 and 2019, at least two transgender or gender-nonconforming people were murdered each month. Most of these victims were Black trans women. Black LGBTQ+ people are still in the fight for respect from their families, recognition and equity at work, and safety from state violence.

When I look at my work as a bisexual Black man here at Facebook and beyond, I bring that history and knowledge with me. I design and implement projects at Portal thinking about the consumer in all their facets, including their race and sexuality. This helps us create a better product for everybody by making sure no one is excluded or neglected.

Working toward a safer platform for people to live their truth

Meghan Rolfe (she/her) is a UX Researcher working on Community Integrity.

Pride, to me, represents that journey we take that ends with the open-armed embrace of the LGBTQ+ community and the feeling that we are all in this together — that we see one another. I believe many of us grew up with a deep pain caused by the feeling that we are the “Other” in society, internalizing a deep-rooted fear of rejection and staying tightly in the closet. For me, Pride is about the release of that rejection and the overwhelming joy you feel once you can live your truth.

My work in Community Integrity relates to this. Part of my role is understanding the potential benefits and harms of identity verification on our platforms, as well as the steps we can take to support individuals regardless of identity, documentation status, or membership in marginalized groups. Many people use our platforms to find a community where they can safely express their authentic selves. Transgender people in particular are often able to be their true selves online before they’ve come out to their family and friends. This is a wonderful use of our products, and we should find ways to support this even more.

However, there is a flip side to this. Like many other companies, our security systems are built around identity verification: If someone is hacked, they are asked for government-issued documents that can confirm they are who they say they are. This means that for those who use a different identity online — even if that identity is their most authentic — an exact match with government-issued documents is expected, which makes it difficult to resolve disparities between on- and offline identities. Based on prior feedback, we’ve changed our policy to allow a wider range of documents beyond government-issued ID; even so, we are currently conducting additional research on this experience to understand how we can better support individuals with different on- and offline identities.

Identity verification also allows us to hold people accountable for any violations of our Community Standards, such as bullying and harassment. It’s important that we provide victims with the ability to report not just the accounts responsible for harassing behaviors, but also the individuals behind those accounts. By creating systems of accountability, we can better protect members of the LGBTQ+ community from both online and offline attacks.

This Pride, we must not only remember the LGBTQ+ leaders who fought for us to be able to live our truths, but also remind ourselves that this fight continues.

Listening, learning, and teaching with empathy

Darwin Mastin (he/him; they/them) is a UX Researcher working on pathfinding.

As a human behavior researcher, I love to learn about what drives people. I want to understand our unconscious actions, and make this knowledge available through stories and products. My research at Facebook is focused on understanding current and future gaps within the Facebook app and the company. We are not perfect, but I think research can influence the products and company by bringing other necessary and underrepresented voices to the table.

To me, Pride doesn’t stop at being proud. In addition to celebrating ourselves and our community, we must continue to stand up for our community and have the support of our allies in doing so. We all need to focus on listening, learning, and being involved — because our celebrations of Pride were born from similar calls for justice by queer trans people of color.

One of the ways we can help is educating others about the issues that marginalized communities face. The LGBTQ+ community spans every demographic group — race, age, education level, and so on. We can’t make assumptions about anyone else’s experience; we need to reach out and listen and learn, because all of our histories are so different and so broad, and coming together to celebrate and understand these differences makes for a stronger community.

However, while it is necessary to do the work of educating others, it should not be the sole responsibility of marginalized communities. I’ve found that when those who are less informed are able to attach to a story or an experience, it drives empathy and inspires them to want to learn more rather than just learn now. But once inspired, new allies must share the burden, internalize their learning, and educate others. It’s important that our allies take a moment and have those difficult conversations. It will be hard, but that’s where it starts.

A movement is not a moment. It is action and reaction, and building on that over and over again. Change will come from listening to our broader communities, giving voice to people who have not been heard. It’s not just something today and not tomorrow. We know the next step is voting, representation, policy — those steps will follow from the public’s demands. Those are the building blocks to pride.

Fostering inclusion through work and at work

TJ Olojede (he/him) is a Creative Researcher in Facebook’s Creative Shop.

As the Creative Research team within Facebook’s Creative Shop, our focus is on elevating creativity in advertising on the platform and making sure that advertisers utilize creativity maximally to achieve better business outcomes. Our research helps us understand what creative strategies perform better, and we share those best practices with all of the businesses who advertise on Facebook. I like to think that in this way, we make Facebook advertising more inclusive and accessible to the everyday “pop-and-pop” shop and to businesses big and small.

Pride Month feels bittersweet to me this year. It is an interesting time to exist at the intersection of being gay and Black in America, even as an immigrant. Often these two identities exist in conflict and influence how much I feel like I belong to either group, since I’m still an “other” within each community. When I first moved to the U.S., I was excited to leave Nigeria, and to be somewhere where LGBTQ+ rights were leaps and bounds above anything back home — even though not optimal. And then I started to understand what it meant to be Black in America, and I remember thinking I had just exchanged freedom from one oppression for another.

When the more recent Black Lives Matter protests started after the murder of George Floyd, it felt clear to me how to feel. With June being Pride Month, however, it didn’t feel right to celebrate Pride and be happy in the midst of all that was going on. Even worse, I felt betrayed seeing all of my non-POC friends who hadn’t said anything about BLM suddenly want to celebrate Pride.

But at work, I appreciated that the Pride team was sensitive and empathetic enough to hold off on all of the Pride fanfare in the middle of the protests, and that to me spoke volumes about how much we care about each other within the company. I appreciate that I work with a team of inclusive, caring people who make work a safe space and engender that sense of belonging and emotional closeness. For me, inclusion boils down to feeling like I matter, like I belong here, and that there are others here like me.

Acquiring a more complete picture of our community

Hannah Furnas (she/her) is a Research Science Manager on the Demography and Survey Science team.

At Facebook, I support a team of researchers working on projects at the intersection of survey research and machine learning. We design projects to collect survey ground truth that’s used to train and evaluate machine learning models.

To me, Pride means embracing my own queer identity and showing up for the LGBTQ+ community. I’m continuing to embrace my own queer identity in an ongoing process of showing up more fully for myself so that I can show up for others. I’m intentionally expanding my understanding of what it means to belong to the LGBTQ+ community — which includes noticing and unlearning a lot of what my upbringing has taught me.

I grew up in a very white, cis-normative, heteronormative environment. People, structures, and institutions praised heterosexual couples and shamed other types of relationships. When I came out two years ago, a lot of the pushback I received was from people who couldn’t fathom why I wanted to come out as bi/pan since I was in a relationship with a cis man. This idea that I should hide who I am is one of the reasons it took me so long to come out to myself. Through support from my colleagues in the Pride community at Facebook, I’ve begun to truly embrace who I am.

Not only did my early socialization impact my coming-out experience, but it also gave me an incomplete picture of LGBTQ+ history and the current issues we face. I was exposed to media that suggested the work was done and equal rights were won. This obviously isn’t the case. Systemic discrimination and violence disproportionately impacts BIPOC and trans communities despite the fact that our movement wouldn’t exist today without trans activists like Marsha P. Johnson and Sylvia Rivera.

If I fail to acknowledge the full lived experiences of others in our community, then I’m also upholding the structures and systems that continue to oppress the LGBTQ+ community. I’m continuing this lifelong work of building awareness and taking action to embrace ourselves and our community more fully. To me, this is what Pride is all about — this year and every year.

Diversity is crucial to understanding where we’re succeeding and where we need to do better in our business. It enables us to build better products, make better decisions, and better serve our communities. We are proud of our attention to the LGBTQ+ experience across our apps and technologies, often thanks to the many LGBTQ+ people and allies who work at Facebook.

To learn more about Diversity at Facebook, visit our website.

The post Reflecting on Pride: How five Facebook researchers honor their LGBTQ+ history appeared first on Facebook Research.

Read More

How Euler Hermes detects typo squatting with Amazon SageMaker

How Euler Hermes detects typo squatting with Amazon SageMaker

This is a guest post from Euler Hermes. In their own words, “For over 100 years, Euler Hermes, the world leader in credit insurance, has accompanied its clients to provide simpler and safer digital products, thus becoming a key catalyzer in the world’s commerce.”

Euler Hermes manages more than 600,000 B2B transactions per month and effectuates data analytics from over 30 million companies worldwide. At-scale artificial intelligence and machine learning (ML) have become the heart of the business.

Euler Hermes uses ML across a variety of use cases. One recent example is typo squatting detection, which came about after an ideation workshop between the Cybersecurity and IT Innovation teams to better protect clients. As it turns out, moving from idea to production has never been easier when your data is in the AWS Cloud and you can put the right tools in the hands of your data scientists in minutes.

Typo squatting, or hijacking, is a form of cybersecurity attack. It consists of registering internet domain names that closely resemble legitimate, reputable, and well-known ones with the goal of phishing scams, identity theft, advertising, and malware installation, among other potential issues. The sources of typo squatting can be varied, including different top-level domains (TLD), typos, misspellings, combo squatting, or differently phrased domains.

The challenge we faced was building an ML solution to quickly detect any suspicious domains registered that could be used to exploit the Euler Hermes brand or its products.

To simplify the ML workflow and reduce time-to-market, we opted to use Amazon SageMaker. This fully managed AWS service was a natural choice due to the ability to easily build, train, tune, and deploy ML models at scale without worrying about the underlying infrastructure while being able to integrate with other AWS services such as Amazon Simple Storage Service (Amazon S3) or AWS Lambda. Furthermore, Amazon SageMaker meets the strict security requirements necessary for financial services companies like Euler Hermes, including support for private notebooks and endpoints, encryption of data in transit and at rest, and more.

Solution overview

To build and tune ML models, we used Amazon SageMaker notebooks as the main working tool for our data scientists. The idea was to train an ML model to recognize domains related to Euler Hermes. To accomplish this, we worked on the following two key steps: dataset construction and model building.

Dataset construction

Every ML project requires a lot of data, and our first objective was to build the training dataset.

The dataset of negative examples was composed of 1 million entries randomly picked from Alexa, Umbrella, and publicly registered domains, whereas the dataset of 1 million positive examples was created from a domain generated algorithm (DGA) using Euler Hermes’s internal domains.

Model building and tuning

One of the project’s biggest challenges was to decrease the number of false positives to a minimum. On a daily basis, we need to unearth domains related to Euler Hermes from a large dataset of approximately 150,000 publicly registered domains.

We tried two approaches: classical ML models and deep learning.

We considered various models for classical ML, including Random Forest, Logistic regression, and gradient boosting (LightGBM and XGBoost). For these models, we manually created more than 250 features. After an extensive feature-engineering phase, we selected the following as the most relevant:

  • Number of FQDN levels
  • Vowels ration
  • Number of characters
  • Bag of n-grams (top 50 n-grams)
  • Features TF-IDF
  • Latent Dirichlet allocation features

For deep learning, we decided to work with recurrent neural networks. The model we adopted was a Bidirectional LSTM (BiLSTM) with an attention layer. We found this model to be the best at extracting a URL’s underlying structure.

The following diagram shows the architecture designed for the BiLSTM model. To avoid overfitting, a Dropout layer was added.

The following code orchestrates the set of layers:

def AttentionModel_(vocab_size, input_length, hidden_dim):
    model = tf.keras.models.Sequential()
    model.add(Embedding(MAX_VOCAB_SIZE, hidden_dim, input_length=input_length))
    model.add(Bidirectional(LSTM(units=hidden_dim, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc", tf.keras.metrics.FalsePositives()])
    return model

We built and tuned the classical ML and the deep learning models using the Amazon SageMaker-provided containers for Scikit-learn and Keras.

The following table summarizes the results we obtained. The BiLSTM outperformed the other models with a 13% precision improvement compared to the second-best model (LightGBM). For this reason, we put the BiLSTM model into production.


Precision F1-Score


(Area Under the Curve)

Random Forest
















Model training

For model training, we made use of Managed Spot Training in Amazon SageMaker to use Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances for training jobs. This allowed us to optimize the cost of training models at a lower cost compared to On-Demand Instances.

Because we predominantly used custom deep learning models, we needed GPU instances for time-consuming neural network training jobs, with times ranging from minutes to a few hours. Under these constraints, Managed Spot Training was a game-changing solution. The on-demand solution permitted no interruption of our data scientists while managing instance-stopping conditions.


Euler Hermes’s cloud principles follow a serverless-first strategy, with an Infrastructure as Code DevOps practice. Systematically, we construct a serverless architecture based on Lambda whenever possible, but when this isn’t possible, we deploy to containers using AWS Fargate.

Amazon SageMaker allows us to deploy our ML models at scale within the same platform on a 100% serverless and scalable architecture. It creates a model endpoint that is ready to serve inference requests. To get inferences for an entire dataset, we use batch transform, which cuts the dataset off in smaller batches and gets the predictions on each one. Batch transform manages all the compute resources required to get inferences, including launching instances and deleting them after the batch transform job is complete.

The following figure depicts the architecture deployed for the use case in this post.

First, a daily Amazon CloudWatch event is set to trigger a Lambda function with two jobs: download all the publicly registered domains and store them in an Amazon Simple Storage Service (Amazon S3) bucket subfolder and trigger the BatchTransform job. Amazon SageMaker automatically saves the inferences in an S3 bucket that you specify when creating the batch transform job.

Finally, a second CloudWatch event monitors the task success of Amazon SageMaker. If the task succeeds, it triggers a second Lambda function that retrieves the inferred domains and selects those that have label 1—related to Euler Hermes or its products—and stores them in another S3 bucket subfolder.

Following Euler Hermes’s DevOps principles, all the infrastructure in this solution is coded in Terraform to implement an MLOps pipeline to deploy to production.


Amazon SageMaker provides the tool that our data scientists need to quickly and securely experiment and test while maintaining compliance with strict financial service standards. This allows us to bring new ideas into production very rapidly. With flexibility and inherent programmability, Amazon SageMaker helped us tackle our main pain point of industrializing ML models at scale. After we train an ML model, we can use Amazon SageMaker to deploy the model, and can automate the entire pipeline following the same DevOps principles and tools we use for all other applications we run with AWS.

In under 7 months, we were able to launch a new internal ML service from ideation to production and can now identify URL squatting fraud within 24 hours after the creation of a malicious domain.

Although our application is ready, we have some additional steps planned. First, we’ll extend the inferences currently stored on Amazon S3 to our SIEM platform. Second, we’ll implement a web interface to monitor the model and allow manual feedback that is captured for model retraining.

About the Authors

Luis Leon is the IT Innovation Advisor responsible for the data science practice in the IT at Euler Hermes. He is in charge of the ideation of digital projects as well as managing the design, build and industrialization of at scale machine learning products. His main interests are Natural Language Processing, Time Series Analysis and non-supervised learning.




Hamza Benchekroun is Data Scientist in the IT Innovation hub at Euler Hermes focusing on deep learning solutions to increase productivity and guide decision making across teams. His research interests include Natural Language Processing, Time Series Analysis, Semi-Supervised Learning and their applications.



Hatim Binani is data scientist intern in the IT Innovation hub at Euler Hermes. He is an engineering student at INSA Lyon in the computer science department. His field of interest is data science and machine learning. He contributed within the IT innovation team to the deployment of Watson on Amazon Sagemaker.



Guillaume Chambert is an IT security engineer at Euler Hermes. As SOC manager, he strives to stay ahead of new threats in order to protect Euler Hermes’ sensitive and mission-critical data. He is interested in developing innovation solutions to prevent critical information from being stolen, damaged or compromised by hackers.





Read More

Holographic optics for thin and lightweight virtual reality

Holographic optics for thin and lightweight virtual reality

Facebook Reality Labs (FRL) is always exploring new optical architectures to improve form factor, comfort, and optical performance. Last fall, at Oculus Connect 6, FRL Chief Scientist Michael Abrash introduced new miniaturization progress in VR with Half Dome 2 and 3, two prototypes that examine how varifocal displays can improve visual and physical comfort. This year, at the virtual SIGGRAPH conference, we’re presenting another research milestone on this path: a new optical architecture that is significantly more compact and offers the potential for better visual performance.

In this work, “Holographic Optics for Thin and Lightweight Virtual Reality,” researchers Andrew Maimone and Junren Wang propose a new class of near-eye displays, which combine the power of holographic optics and polarization-based optical folding — an approach that could be used to develop future sunglasses-like VR hardware. These two methods help keep the optics as thin as possible while making the most efficient use of space. We anticipate that such lightweight and comfortable form factors may enable extended VR sessions and new use cases, including productivity.

The design is demonstrated in a proof-of-concept research device that uses only thin, flat films as optics to achieve a display thickness of less than 9 mm while supporting a field of view comparable to today’s consumer VR products. The work demonstrates the promise of better visual performance, as well: Laser illumination is used to deliver a much wider gamut of colors to VR displays, and progress is made toward scaling resolution to the limit of human vision.

This video demonstrates video game animation, as shown on our proof-of-concept research. 

The approach

This image shows our research device display modules mounted into a frame. This research device was used to capture the green image shown below (some components are mounted externally).

Today’s VR displays have three primary components: a source of light (e.g., LEDs), a display panel that brightens or dims the light to form an image (e.g., an LCD panel), and a viewing optic that focuses the image far enough away so that the viewer’s eyes can see it (e.g., a plastic lens). As the first two components can readily be formed into thin and flat modules, most of the weight and volume go into the viewing optics. To significantly reduce the overall size and weight of VR displays, we combine two techniques: holographic optics and polarization-based optical folding.

Most VR displays share a common viewing optic: a simple refractive lens composed of a thick, curved piece or glass or plastic. We propose replacing this bulky element with holographic optics. You may be familiar with holographic images seen at a science museum or on your credit card, which appear to be three-dimensional with realistic depth in or out of the page. Like these holographic images, our holographic optics are a recording of the interaction of laser light with objects, but in this case the object is a lens rather than a 3D scene. The result is a dramatic reduction in thickness and weight: The holographic optic bends light like a lens but looks like a thin, transparent sticker.

However, even if the lens itself is made thin, the viewing optics as a whole may still be large — a considerable amount of empty space must be placed between the display panel and the lens to properly focus the image. Ordinarily, light from the display panel propagates forward to the lens and then continues toward the eye. However, when we apply polarization-based optical folding, light can be controlled to move both forward and backward within the lens so that this empty space can be traversed multiple times, collapsing it to a fraction of the original volume.

Wider color gamut

Shown on the left, a photograph captured with the proof-of-concept research device shown above. On the right, a photograph taken through a larger full-color benchtop prototype. We are currently working on achieving full color on the smaller research prototype.

When we apply holographic optics to a VR display, we must reevaluate all other optical components. Notably, holographic optics compel the use of laser light sources, which are more difficult to integrate but provide a much richer set of colors than the LEDs common in nearly all of today’s VR headsets, phones, computers, and televisions.

To illustrate the difference, the figure below shows the gamut of human-visible colors. A common set of colors reproducible on many displays today is the sRGB color space (illustrated by the smaller triangle). Note that it can capture only a small fraction of the colors that we can actually see. In contrast, the outer triangle represents the much larger set of colors that can be reproduced using the lasers on one of our research prototype displays. This allows the reproduction of vivid and saturated colors. Think of a brightly lit neon sign or the iridescent sheen of a butterfly wing.

This figure illustrates the gamut of human-visible colors. The sRGB space represents a common set of colors reproducible on many displays today. The outer triangle represents the larger set of colors reproducible on our research prototype.

What’s next

While it points toward the future development of lightweight, comfortable, and high-performance AR/VR technology, at present our work is purely research. In our technical paper, we identify the current limitations of our proposed display architecture and discuss future areas of research that will make the approach more practical. To our knowledge, our work demonstrates the thinnest VR display demonstrated to date, and we’re excited to see what the future holds.

The post Holographic optics for thin and lightweight virtual reality appeared first on Facebook Research.

Read More

Responsible AI with TensorFlow

Responsible AI with TensorFlow

Posted by Tulsee Doshi, Andrew Zaldivar

As billions of people around the world continue to use products or services with AI at their core, it becomes more important than ever that AI is deployed responsibly: preserving trust and putting each individual user’s well-being first. It has always been our highest priority to build products that are inclusive, ethical, and accountable to our communities, and in the last month, especially as the US has grappled with its history of systemic racism, that approach has been, and continues to be, as important as ever.

Two years ago, Google introduced its AI Principles, which guide the ethical development and use of AI in our research and products. The AI principles articulate our Responsible AI goals around privacy, accountability, security, fairness and interpretability. Each of these is a critical tenant in ensuring that AI-based products work well for every user.

As a Product Lead and Developer Advocate for Responsible AI at Google, we have seen first-hand how developers play an important role in building for Responsible AI goals using platforms like TensorFlow. As one of the most popular ML frameworks in the world, with millions of downloads and a global developer community, TensorFlow is not only used across Google, but around the globe to solve challenging real-world problems. This is why we’re continuing to expand the Responsible AI toolkit in the TensorFlow ecosystem, so that developers everywhere can better integrate these principles in their ML development workflows.

In this blog post, we will outline ways to use TensorFlow to build AI applications with Responsible AI in mind. The collection of tools here are just the beginning of what we hope will be a growing toolkit and library of lessons learned and resources to apply them.

You can find all the tools discussed below at TensorFlow’s collection of Responsible AI Tools.

Building Responsible AI with TensorFlow: A Guide

Building into the workflow

While every TensorFlow pipeline likely faces different challenges and development needs, there is a consistent workflow that we see developers follow as they build their own products. And, at each stage in this flow, developers face different Responsible AI questions and considerations. With this workflow in mind, we are designing our Responsible AI Toolkit to complement existing developer processes, so that Responsible AI efforts are directly embedded into a structure that is already familiar.

You can see a full summary of the workflow and tools at:

To simplify our discussion, we’ll break the workflow into 5 key steps:

  • Step 1: Define the problem
  • Step 2: Collect and prepare the data
  • Step 3: Build and train the model
  • Step 4: Evaluate performance
  • Step 5: Deploy and monitor

In practice, we expect that developers will move between these steps frequently. For example, a developer may train the model, identify poor performance, and return to collect and prepare additional data to account for these concerns. Likely, a model will be iterated and improved numerous times once it has been deployed and these steps will be repeated.
Regardless of when and the order in which you reach these steps, there are critical Responsible AI questions to ask at each phase—as well as related tools available to help developers debug and identify critical insights. As we go through each step in more detail, you will see several questions listed along with a set of tools and resources we recommend looking into in order to answer the questions raised. These questions, of course, are not meant to be comprehensive; rather, they serve as examples to stimulate thinking along the way.
Keep in mind that many of these tools and resources can be used throughout the workflow—not just exclusive for the step in which it is being featured. Fairness Indicators and ML Metadata, for example, can be used as standalone tools to respectively evaluate and monitor your model for unintended biases. These tools are also integrated in TensorFlow Extended, which provides a pathway for developers to not only put their model into production, but also equipping them with a unified platform to iterate through the workflow in a more seamless way.

Step 1: Define the Problem

What am I building? What is the goal?
Who am I building this for?
How are they going to use it? What are the consequences for the user when it fails?
The first step in any development process is the definition of the problem itself. When is AI actually a valuable solution, and what problem is it addressing? As you define your AI needs, make sure to keep in mind the different users you might be building for, and the different experiences they may have with the product.
For example, if you are building a medical model to screen individuals for a disease, as is done in this Explorable, the model may learn and work differently for adults versus children. When the model fails, it may have critical repercussions that both doctors and users need to know about.
How do you identify the important questions, potential harms, and opportunities for all users? The Responsible AI Toolkit in TensorFlow has a couple tools to help you:
PAIR Guidebook
The People + AI Research (PAIR) Guidebook, which focuses on designing human-centered AI, is a companion as you build, outlining the key questions to ask as you develop your product. It’s based on insights from Googlers across 40 product teams. We recommend reading through the key questions—and use the helpful worksheets!—as you define the problem, but referring back to these questions as development proceeds.
AI Explorables
A set of lightweight interactive tools, the Explorables provide an introduction to some of the key Responsible AI concepts.

Step 2: Collect & Prepare Data

Who does my dataset represent? Does it represent all my potential users?
How is my dataset being sampled, collected, and labeled?
How do I preserve the privacy of my users?
What underlying biases might my dataset encode?
Once you have defined the problem you seek to use AI to solve, a critical part of the process is collecting the data that best takes into account the societal and cultural factors necessary to solve the problem in question. Developers wanting to train, say, a speech detection model based on a very specific dialect might want to consider obtaining their data from sources that have gone through efforts in accommodating languages lacking linguistic resources.
As the heart and soul of an ML model, a dataset should be considered a product in its own right, and our goal is to equip you with the tools to understand who the dataset represents and what gaps may have existed in the collection process.
TensorFlow Data Validation
You can utilize TensorFlow Data Validation (TFDV) to analyze your dataset and slice across different features to understand how your data is distributed, and where there may be gaps or defects. TFDV combines tools such as TFX and Facets Overview to help you quickly understand the distribution of values across the features in your dataset. That way, you don’t have to create a separate codebase to monitor your training and production pipelines for skewness.

Example of a Data Card for the Colombian Spanish speaker dataset.

Analysis generated by TFDV can be used to create Data Cards for your datasets when appropriate. You can think about a Data Card as a transparency report for your dataset—providing insight into your collection, processing, and usage practices. As an example, one of our research-driven engineering initiatives focused on creating datasets for regions with both low resources for building natural language processing applications and rapidly growing Internet penetration. To help other researchers that desire to explore speech technology for these regions, the team behind this initiative created Data Cards for different Spanish speaking countries to start with, including the Colombian Spanish speaker dataset shown above, providing a template for what to expect when using their dataset.
Details on Data Cards, a framework on how to create them, and guidance on how to integrate aspects of Data Cards into processes or tools you use will be published soon.

Step 3: Build and Train the Model

How do I preserve privacy or think about fairness while training my model?
What techniques should I use?
Training your TensorFlow model can be one of the most complex pieces of the development process. How do you train it in such a way that it performs optimally for everyone while still preserving user privacy? We’ve developed a set of tools to simplify aspects of this workflow, and enable integration of best practices while you are setting up your TensorFlow pipeline:
TensorFlow Federated
Federated learning is a new approach to machine learning that enables many devices or clients to jointly train machine learning models while keeping their data local. Keeping the data local provides benefits around privacy, and helps protect against risks of centralized data collection, like theft or large-scale misuse. Developers can experiment with applying federated learning to their own models by using the TensorFlow Federated library.
[New] We recently released a tutorial for running high-performance simulations with TensorFlow Federated using Kubernetes clusters.
TensorFlow Privacy
You can also support privacy in training with differential privacy, which adds noise in your training to hide individual examples in the datasets. TensorFlow Privacy provides a set of optimizers that enable you to train with differential privacy, from the start.
TensorFlow Constrained Optimization and TensorFlow Lattice
In addition to building in privacy considerations when training your model, there may be a set of metrics that you want to configure and use in training machine learning problems to achieve desirable outcomes. Creating more equitable experiences across different groups, for example, is an outcome that may be difficult to achieve unless you consider taking into account a combination of metrics that satisfy this real-world requirement. The TFCO and TensorFlow
Lattice are libraries that provide a number of different research-based methods, enabling constraint-based approaches that could help you address broader societal issues such as fairness. In the next quarter, we hope to develop and offer more Responsible AI training methods, releasing infrastructure that we have used in our own products to work towards remediating fairness concerns. We’re excited to continue to build a suite of tools and case studies that show how different methods may be more or less suited to different use cases, and to provide opportunities for each case.

Step 4: Evaluate the Model

Is my model privacy preserving?
How is my model performing across my diverse user base?
What are examples of failures, and why are these occurring?

Once a model has been initially trained, the iteration process begins. Often, the first version of a model does not perform the way a developer hopes it would, and it is important to have easy to use tools to identify where it fails. It can be particularly challenging to identify what the right metrics and approaches are for understanding privacy and fairness concerns. Our goal is to support these efforts with tools that enable developers to evaluate privacy and fairness, in partnership with traditional evaluations and iteration steps.
[New] Privacy Tests
Last week, we announced a privacy testing library as part of TensorFlow Privacy. This library is the first of many tests we hope to release to enable developers to interrogate their models and identify instances where a single datapoint’s information has been memorized and might warrant further analysis on the part of the developer, including the consideration to train the model to be differentially private.
Evaluation Tool Suite: Fairness Indicators, TensorFlow Model Analysis, TensorBoard, and What-If Tool
You can also explore TensorFlow’s suite of evaluation tools to understand fairness concerns in your model and debug specific examples.
Fairness Indicators enables evaluation of common fairness metrics for classification and regression models on extremely large datasets. The tool is accompanied by a series of case studies to help developers easily identify appropriate metrics for their needs and set up Fairness Indicators with a TensorFlow model. Visualizations are available via the widely popular TensorBoard platform that modelers already use to track their training metrics. Most recently, we launched a case study highlighting how Fairness Indicators can be used with pandas, to enable evaluations over more datasets and data types.
Fairness Indicators is built on top of TensorFlow Model Analysis (TFMA), which contains a broader set of metrics for evaluating common metrics across concerns.

The What-If Tool lets you test hypothetical situtations on datapoints.

Once you’ve identified a slice that isn’t performing well or want to understand and explain errors more carefully, you can further evaluate your model with the What-If Tool (WIT), which can be used directly from Fairness Indicators and TFMA. With the What-if Tool, you can deepen your analysis on your specific slice of data by inspecting the model predictions at the datapoint level. The tool offers a large range of features, from testing hypothetical situations on datapoint, such as “what if this datapoint was from a different category?”, to visualizing the importance of different data features to your model’s prediction.
Beyond the integration in Fairness Indicators, the What-If Tool can also be used in other user flows as a standalone tool and is accessible from TensorBoard or in Colaboratory, Jupyter and Cloud AI Platform notebooks.
[New] Today, to help WIT users get started faster, we’re releasing a series of new educational tutorials and demos to help our users better use the tool’s numerous capabilities, from making good use of counterfactuals to interpret your model behaviors, to exploring your features and identifying common biases.
Explainable AIGoogle Cloud users can take WIT’s capabilities a step further with Explainable AI, a toolkit that builds upon WIT to introduce additional interpretability features including Integrated Gradients, which identify the features that most significantly impacted model performance.
Tutorials on TensorFlow.orgYou may also be interested in these tutorials for handling imbalanced datasets, and for explaining an image classifier using Integrated Gradients, similar to that mentioned above.

Using the tutorial above to explain why this image was classified as a fireboat (it’s likely because of the water spray).

Step 5: Deploy and Monitor

How does my model perform overtime? How does it perform in different scenarios? How do I continue to track and improve its progress?
No model development process is static. As the world changes, users change, and so do their needs. The model discussed earlier to screen patients, for example, may no longer work effectively during a pandemic. It’s important that developers have tools that enable tracking of models, and clear channels and frameworks for communicating helpful details about their models, especially to developers who may inherit a model, or to users and policy makers who seek to understand how it will work for various people. The TensorFlow ecosystem has tools to help with this kind of lineage tracking and transparency:
ML Metadata
As you design and train your model, you can allow ML Metadata (MLMD) to generate trackable artifacts throughout your development process. From your training data ingestion and any metadata around the execution of the individual steps, to exporting your model with evaluation metrics and accompanying context such as changelists and owners, the MLMD API can create a trace of all the intermediate components of your ML workflow. This ongoing monitoring of progress that MLMD provides helps identify security risks or complications in training.
Model Cards
As you deploy your model, you could also accompany its deployment with a Model Card—a document structured in a format that serves as an opportunity for you to communicate the values and limitations of your model. Model Cards could enable developers, policy makers, and users to understand aspects about trained models, contributing to the larger developer ecosystem with added clarity and explainability so that ML is less likely to be used in contexts for which it is inappropriate. Based on a framework proposed in an academic paper by Google researchers published in early 2019, Model Cards have since been released with Google Cloud Vision API models, including their Object and Face Detection APIs, as well as a number of open source models.
Today, you can get inspiration from the paper and existing examples to develop your own Model Card. In the next two months, we plan to combine ML Metadata and the Model Card framework to provide developers with a more automated way of creating these important artifacts. Stay tuned for our Model Cards Toolkit, which we will add to the Responsible AI Toolkit collection.

Excited to try out these resources? You can find all of them at

It’s important to note that while Responsible AI in the ML workflow is a critical factor, building products with AI ethics in mind is a combination of technical, product, policy, process, and cultural factors. These concerns are multifaceted and fundamentally sociotechnical. Issues of fairness, for example, can often be traced back to histories of bias in the world’s underlying systems. As such, proactive AI responsibility efforts not only require measurement and modelling adjustments, but also policy and design changes to provide transparency, rigorous review processes, and a diversity of decision makers who can bring in multiple perspectives.
This is why many of the tools and resources we covered in this post are founded in the sociotechnical research work we do at Google. Without such a robust foundation, these ML and AI models are bound to be ineffective in benefiting society as they could erroneously become integrated into the entanglements of decision-making systems. Adopting a cross-cultural perspective, grounding our work in human-centric design, extending transparency towards all regardless of expertise, and operationalizing our learnings into practices—these are some of the steps we take to responsibly build AI.
We understand Responsible AI is an evolving space that is critical, which is why we are hopeful when we see how the TensorFlow community is thinking about the issues we’ve discussed—and more importantly, when the community takes action. In our latest Dev Post Challenge, we asked the community to build something great with TensorFlow incorporating AI Principles. The winning submissions explored areas of fairness, privacy, and interpretability, and showed us that Responsible AI tools should be well integrated into TensorFlow ecosystem libraries. We will be focusing on this to ensure these tools are easily accessible.
As you begin your next TensorFlow project, we encourage you to use the tools above, and to provide us feedback at Share your learnings with us, and we’ll continue to do the same, so that we can together build products that truly work well for everyone.
Read More