admin – Page 86 – Vedere AI

Exploring interactions of light and matter

Growing up in a small town in Fujian province in southern China, Juejun Hu was exposed to engineering from an early age. His father, trained as a mechanical engineer, spent his career working first in that field, then in electrical engineering, and then civil engineering.

“He gave me early exposure to the field. He brought me books and told me stories of interesting scientists and scientific activities,” Hu recalls. So when it came time to go to college — in China students have to choose their major before enrolling — he picked materials science, figuring that field straddled his interests in science and engineering. He pursued that major at Tsinghua University in Beijing.

He never regretted that decision. “Indeed, it’s the way to go,” he says. “It was a serendipitous choice.” He continued on to a doctorate in materials science at MIT, and then spent four and a half years as an assistant professor at the University of Delaware before joining the MIT faculty. Last year, Hu earned tenure as an associate professor in MIT’s Department of Materials Science and Engineering.

In his work at the Institute, he has focused on optical and photonic devices, whose applications include improving high-speed communications, observing the behavior of molecules, designing better medical imaging systems, and developing innovations in consumer electronics such as display screens and sensors.

“I got fascinated with light,” he says, recalling how he began working in this field. “It has such a direct impact on our lives.”

Hu is now developing devices to transmit information at very high rates, for data centers or high-performance computers. This includes work on devices called optical diodes or optical isolators, which allow light to pass through only in one direction, and systems for coupling light signals into and out of photonic chips.

Lately, Hu has been focusing on applying machine-learning methods to improve the performance of optical systems. For example, he has developed an algorithm that improves the sensitivity of a spectrometer, a device for analyzing the chemical composition of materials based on how they emit or absorb different frequencies of light. The new approach made it possible to shrink a device that ordinarily requires bulky and expensive equipment down to the scale of a computer chip, by improving its ability to overcome random noise and provide a clean signal.

The miniaturized spectrometer makes it possible to analyze the chemical composition of individual molecules with something “small and rugged, to replace devices that are large, delicate, and expensive,” he says.

Much of his work currently involves the use of metamaterials, which don’t occur in nature and are synthesized usually as a series of ultrathin layers, so thin that they interact with wavelengths of light in novel ways. These could lead to components for biomedical imaging, security surveillance, and sensors on consumer electronics, Hu says. Another project he’s been working on involved developing a kind of optical zoom lens based on metamaterials, which uses no moving parts.

Hu is also pursuing ways to make photonic and photovoltaic systems that are flexible and stretchable rather than rigid, and to make them lighter and more compact. This could allow for installations in places that would otherwise not be practical. “I’m always looking for new designs to start a new paradigm in optics, [to produce] something that’s smaller, faster, better, and lower cost,” he says.

Hu says the focus of his research these days is mostly on amorphous materials — whose atoms are randomly arranged as opposed to the orderly lattices of crystal structures — because crystalline materials have been so well-studied and understood. When it comes to amorphous materials, though, “our knowledge is amorphous,” he says. “There are lots of new discoveries in the field.”

Hu’s wife, Di Chen, whom he met when they were both in China, works in the financial industry. They have twin daughters, Selena and Eos, who are 1 year old, and a son Helius, age 3. Whatever free time he has, Hu says, he likes to spend doing things with his kids.

Recalling why he was drawn to MIT, he says, “I like this very strong engineering culture.” He especially likes MIT’s strong system of support for bringing new advances out of the lab and into real-world application. “This is what I find really useful.” When new ideas come out of the lab, “I like to see them find real utility,” he adds.

Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex

Organizations across industries have a large number of physical documents such as invoices that they need to process. It is difficult to extract information from a scanned document when it contains tables, forms, paragraphs, and check boxes. Organization have been addressing these problems with manual effort or custom code or by using Optical Character Recognition (OCR) technology. However, that requires templates for form extraction and custom workflows.

Moreover, after extracting the text or content from a document, they want to extract insights from these receipts or invoices for their end users. However, that would require building a complex NLP model. Training the model would require a large amount of training data and compute resources. Building and training a machine learning model could be expensive and time-consuming.

Further, providing a human like interface to interact with these documents is cumbersome for end users. These end users often call the help desk but over time this adds cost to the organization.

This post shows you how to use AWS AI services to automate text data processing and insight discovery. With AWS AI services such as Amazon Textract, Amazon Comprehend and Amazon Lex, you can set up an automated serverless solution to address this requirement. We will walk you through below steps:

Extract text from receipts or invoices in pdf or images with Amazon Textract.
Derive insights with Amazon Comprehend.
Interact with these insights in natural language using Amazon Lex.

Next, we will go through the services and the architecture for building the solution to solve the problem.

Services used

This solution uses the following AI services, serverless technologies, and managed services to implement a scalable and cost-effective architecture:

Amazon Cognito – Lets you add user signup, signin, and access control to your web and mobile apps quickly and easily.
AWS Lambda – Executes code in response to triggers such as changes in data, shifts in system state, or user actions. Because Amazon S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.
Amazon Lex – Provides an interface to create conversational chatbots.
Amazon Comprehend – NLP service that uses machine learning to find insights and relationships in text.
Amazon Textract– Uses ML to extract text and data from scanned documents in PDF, JPEG, or PNG formats.
Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls.

Architecture

The following diagram illustrates the architecture of the solution.

The architecture contains the following steps:

The backend user or administrator uses the AWS Management Console or AWS Command Line Interface (AWS CLI) to upload the PDF documents or images to an S3 bucket.
The Amazon S3 upload triggers a AWS Lambda function.
The Lambda function invokes an Amazon Textract StartDocumentTextDetection API, which sets up an asynchronous job to detect text from the PDF you uploaded.
Amazon Textract notifies Amazon Simple Notification Service (Amazon SNS) when text processing is complete.
A second Lambda function gets the notification from SNS topic when the job is completed to detect text.
Once the lambda is notified of job completion from Amazon SNS, it calls a Amazon Textract GetDocumentTextDetection API to receive the result from asynchronous operation and loads the results into an S3 bucket.
A Lambda function is used for fulfillment of the Amazon Lex intents. For a more detailed sequence of interactions please refer to the Building your chatbot step in “Deploying the Architecture with Cloudformation” section.
Amazon Comprehend uses ML to find insights and relationships in text. The lambda function uses boto3 APIs that Amazon Comprehend provides for entity and key phrases detection.
1. In response to the Bot’s welcome message, the user types “Show me the invoice summary”, this invokes the GetInvoiceSummary Lex intent and the Lambda function invokes the Amazon Comprehend DetectEntities API to detect entities for fulfillment.
2. When the user types “Get me the invoice details”, this invokes the GetInvoiceDetails intent, Amazon Lex prompts the user to enter Invoice Number, and the Lambda function invokes the Amazon Comprehend DetectEntities API to return the Invoice Details message.
3. When the user types “Can you show me the invoice notes for <invoice number>”, this invokes the GetInvoiceNotes intent, and the Lambda function invokes the Amazon Comprehend DetectKeyPhrases API to return comments associated with the invoice.
You deploy the Lexbot Web UI in your AWS Cloudformation template by using an existing CloudFormation stack as a nested stack. To download the stack, see Deploy a Web UI for Your Chatbot. This nested stack deploys a Lex Web UI, the webpage is served as a static website from an S3 bucket. The web UI uses Amazon Cognito to generate an access token for authentication and uses AWS CodeStar to set up a delivery pipeline.The end-users interact this chatbot web UI.

Deploying the architecture with AWS CloudFormation

You deploy a CloudFormation template to provision the necessary AWS Indentity and Access Management (IAM) roles, services, and components of the solution including Amazon S3, Lambda, Amazon Textract, Amazon Comprehend, and the Amazon Lex chatbot.

Launch the following CloudFormation template and in the US East (N. Virginia) Region:
Don’t make any changes to stack name or parameters botname InvoiceBot.
In the Capabilities and transforms section, select all three check-boxes to provide acknowledgment to AWS CloudFormation to create IAM resources and expand the template.

For more information about these resources, see AWS IAM resources.

This template uses AWS Serverless Application Model (AWS SAM), which simplifies how to define functions and APIs for serverless applications, and also has features for these services, like environment variables.

Choose Create stack.

The following screenshot of the Stack Detail page shows the status of the stack as CREATE_IN_PROGRESS. It can take up to 20 minutes for the status to change to CREATE_COMPLETE.

On the Outputs tab, copy the value of LexLambaFunctionArn, AssetsUploadBucket, ExtractedTextfilesBucket, and LexUIWebAppUrl.

Uploading documents to the S3 bucket

To upload your documents to your new S3 bucket, choose the S3 bucket URL corresponding to AssetsUploadBucket that you copied earlier. Upload a PDF or image to start the text extraction flow.

You can download the invoice used in this blog from the GitHub repo and upload it to the AssetsUploadBucket S3 URL. We recommend to customize this solution for your invoice templates. For more information about uploading files, see How do I upload files and folders to an S3 bucket?

After the upload completes, you can see the file on the Amazon S3 console on the Overview tab.

After you upload the file, the text is extracted from the document. To see an extracted file with the text, open the bucket by choosing the URL you copied earlier.

On the Overview tab, you can download the file and inspect the content to see if it’s the same as the text in the uploaded file.

Building your chatbot

We will use the following conversation to model the bot:

Bot: Welcome to InvoiceBot. You can ask me to provide your invoice summary, or details of your invoices, or your invoice notes
User: Show me the invoice summary
Bot: I reviewed your input documents and found 1 invoice with invoice numbers 35678-9 totaling $2100.0. I can get you invoice details or invoice notes. Simply type your request
User: Get me the invoice details
Bot: Please enter the invoice number
User: 35678-9
Bot: Invoice Details for 35678-9: On 5/10/1019 for the item One there is a charge of 1500.00. On 5/11/2019 for the item Merchant Two there is a charge of 100.00. On 5/12/2019 for the item Merchant Three there is a charge of 300.00. On 5/13/2019 for the item Merchant Three there is a charge of 200.00. You can request me for invoice notes or simply close this chat.
User: Can you show me the invoice notes for 35678-9
Bot: Invoice Notes for 35678-9: 5/13/2019 Merchant Three 200.00 Merchant Three 300.00 Laptop Office Supplies Merchant Two 100.00 Team Dinner Food 5/12/2019 5/11/2019 Desks and Office Supplies 5/10/1019 Merchant One 1500.00 Chairs . Feel free to try the options again or you can simply close this chat

We will build an Amazon Lex bot (InvoiceBot) with the following intents:

GetInvoiceSummary – Intent that’s invoked when the user requests to view the Invoice Summary. This is fulfilled by a Lambda function and returns the count of invoices available, and the total amount of the invoices
GetInvoiceDetails – Intent that’s invoked when the user requests to view the Invoice Details. This is fulfilled by a Lambda function and provides item level breakdown of the invoices including Date, Quantity and Item Details
GetInvoiceNotes – Intent that’s invoked when the user requests to view the Invoice Notes. This is fulfilled by a Lambda function and provides notes from the invoices uploaded with Date and Item Description.

Publishing your chatbot

As described in the solution overview earlier, you use an Amazon Lex chatbot (InvoiceBot) to interact with the insights Amazon Comprehend derives from the text Amazon Textract extracts.

To publish your chatbot, complete the following steps:

On the Amazon Lex console, choose Bots.
Choose the chatbot you created.
Under Intents, choose GetInvoiceSummary.
Under Fulfilment, select your Lambda function.
Search for the function by entering LexLambdaFunction and selecting the result.

A pop-up box appears.

Choose OK.
Choose Save intent.
Repeat these steps for the remaining two intents, GetInvoiceDetails and GetInvoiceNotes.
Choose Build.
When the build is complete, choose Publish.
For Create an alias, enter Latest. You can consider a different name; names like test, dev, beta, or prod primarily refer to the environment of the bot.
Choose Publish.

The following page opens after the bot is published.

Choose Close.

Using the chatbot

Your chatbot is now ready to use. Navigate to the URL LexUIWebAppUrl copied from the AWS CloudFormation Outputs tab. The following screenshots show the user conversation with the bot (read from left to right):

Conclusion

This post demonstrated how to create a conversational chatbot in Amazon Lex that enables interaction with insights derived using Amazon Comprehend and Amazon Textract from a text in images or in a PDF document. The code from this post is available on the GitHub repo for you to use and extend. We are interested to hear how you would like to apply this solution for your usecase. Please share your thoughts and questions in the comments section.

About the Authors

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with World Wide Public Sector Team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML Explainability areas in AI/ML .

Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an Autonomous Vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

Saida Chanda is a Senior Partner Solutions Architect based out of Seattle, WA. He is a technology enthusiast who drives innovation through AWS partners to meet customers complex business requirements via simple solutions. His areas of interest are ML and DevOps. In his spare time, he likes to spend time with family and exploring his innerself through meditation.

SpineNet: A Novel Architecture for Object Detection Discovered with Neural Architecture Search

Posted by Xianzhi Du, Software Engineer and Jaeyoun Kim, Technical Program Manager, Google Research

Convolutional neural networks created for image tasks typically encode an input image into a sequence of intermediate features that capture the semantics of an image (from local to global), where each subsequent layer has a lower spatial dimension. However, this scale-decreased model may not be able to deliver strong features for multi-scale visual recognition tasks where recognition and localization are both important (e.g., object detection and segmentation). Several works including FPN and DeepLabv3+ propose multi-scale encoder-decoder architectures to address this issue, where a scale-decreased network (e.g., a ResNet) is taken as the encoder (commonly referred to as a backbone model). A decoder network is then applied to the backbone to recover the spatial information.

While this architecture has yielded improved success for image recognition and localization tasks, it still relies on a scale-decreased backbone that throws away spatial information by down-sampling, which the decoder then must attempt to recover. What if one were to design an alternate backbone model that avoids this loss of spatial information, and is thus inherently well-suited for simultaneous image recognition and localization?

In our recent CVPR 2020 paper “SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization”, we propose a meta architecture called a scale-permuted model that enables two major improvements on backbone architecture design. First, the spatial resolution of intermediate feature maps should be able to increase or decrease anytime so that the model can retain spatial information as it grows deeper. Second, the connections between feature maps should be able to go across feature scales to facilitate multi-scale feature fusion. We then use neural architecture search (NAS) with a novel search space design that includes these features to discover an effective scale-permuted model. We demonstrate that this model is successful in multi-scale visual recognition tasks, outperforming networks with standard, scale-reduced backbones. To facilitate continued work in this space, we have open sourced the SpineNet code to the Tensorflow TPU GitHub repository in Tensorflow 1 and TensorFlow Model Garden GitHub repository in Tensorflow 2.

A scale-decreased backbone is shown on the left and a scale-permuted backbone is shown on the right. Each rectangle represents a building block. Colors and shapes represent different spatial resolutions and feature dimensions. Arrows represent connections among building blocks.

Design of SpineNet Architecture
In order to efficiently design the architecture for SpineNet, and avoid a time-intensive manual search of what is optimal, we leverage NAS to determine an optimal architecture. The backbone model is learned on the object detection task using the COCO dataset, which requires simultaneous recognition and localization. During architecture search, we learn three things:

Scale permutations: The orderings of network building blocks are important because each block can only be built from those that already exist (i.e., with a “lower ordering”). We define the search space of scale permutations by rearranging intermediate and output blocks, respectively.
Cross-scale connections: We define two input connections for each block in the search space. The parent blocks can be any block with a lower ordering or a block from the stem network.
Block adjustments (optional): We allow the block to adjust its scale level and type.

The architecture search process from a scale-decreased backbone to a scale-permuted backbone.

Taking the ResNet-50 backbone as the seed for the NAS search, we first learn scale-permutation and cross-scale connections. All candidate models in the search space have roughly the same computation as ResNet-50 since we just permute the ordering of feature blocks to obtain candidate models. The learned scale-permuted model outperforms ResNet-50-FPN by +2.9% average precision (AP) in the object detection task. The efficiency can be further improved (-10% FLOPs) by adding search options to adjust scale and type (e.g., residual block or bottleneck block, used in the ResNet model family) of each candidate feature block.

We name the learned 49-layer scale-permuted backbone architecture SpineNet-49. SpineNet-49 can be further scaled up to SpineNet-96/143/190 by repeating blocks two, three, or four times and increasing the feature dimension. An architecture comparison between ResNet-50-FPN and the final SpineNet-49 is shown below.

The architecture comparison between a ResNet backbone (left) and the SpineNet backbone (right) derived from it using NAS.

Performance
We demonstrate the performance of SpineNet models through comparison with ResNet-FPN. Using similar building blocks, SpineNet models outperform their ResNet-FPN counterparts by ~3% AP at various scales while using 10-20% fewer FLOPs. In particular, our largest model, SpineNet-190, achieves 52.1% AP on COCO for a single model without multi-scale testing during inference, significantly outperforming prior detectors. SpineNet also transfers to classification tasks, achieving 5% top-1 accuracy improvement on the challenging iNaturalist fine-grained dataset.

Performance comparisons of SpineNet models and ResNet-FPN models adopting the RetinaNet detection framework on COCO bounding box detection.

Performance comparisons of SpineNet models and ResNet models on ImageNet classification and iNaturalist fine-grained image classification.

Conclusion
In this work, we identify that the conventional scale-decreased model, even with a decoder network, is not effective for simultaneous recognition and localization. We propose the scale-permuted model, a new meta-architecture, to address the issue. To prove the effectiveness of scale-permuted models, we learn SpineNet by Neural Architecture Search in object detection and demonstrate it can be used directly in image classification. In the future, we hope the scale-permuted model will become the meta-architecture design of backbones across many visual tasks beyond detection and classification.

Acknowledgements
Special thanks to the co-authors of the paper: Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, and Xiaodan Song. We also would like to acknowledge Yeqing Li, Youlong Cheng, Jing Li, Jianwei Xie, Russell Power, Hongkun Yu, Chad Richards, Liang-Chieh Chen, Anelia Angelova, and the larger Google Brain Team for their help.

Fiber: Distributed Computing for AI Made Simple

Project Homepage: GitHub

Over the past several years, increasing processing power of computing machines has led to an increase in machine learning advances. More and more, algorithms exploit parallelism and rely on distributed training to process an enormous amount of …

The post Fiber: Distributed Computing for AI Made Simple appeared first on Uber Engineering Blog.

Heart of the Matter: AI Helps Doctors Navigate Pandemic

A month after it got FDA approval, a startup’s first product was saving lives on the front lines of the battle against COVID-19.

Caption Health develops software for ultrasound systems, called Caption AI. It uses deep learning to empower medical professionals, including those without prior ultrasound experience, to perform echocardiograms quickly and accurately.

The results are images of the heart often worthy of an expert sonographer that help doctors diagnose and treat critically ill patients.

The coronavirus pandemic provided plenty of opportunities to try out the first dozen systems. Two doctors who used the new tool shared their stories on the condition that their patients remain anonymous.

In March, a 53-year-old diabetic woman with COVID-19 went into cardiac shock in a New York hospital. Without the images from Caption AI, it would have been difficult to clinch the diagnosis, said a doctor on the scene.

The system helped the physician identify heart problems in an 86-year-old man with the virus in the same hospital, helping doctors bring him back to health. It was another case among more than 200 in the facility that was effectively turned into a COVID-19 hospital this spring.

The Caption Health system made a tremendous impact for a staff spread thin, said the doctor. It would have been hard for a trained sonographer to keep up with the demand for heart exams, he added.

Heart Test Becomes Standard Procedure

Caption AI helped doctors in North Carolina determine that a 62-year-old man had COVID-19-related heart damage. Thanks, in part, to the ease of using the system, the hospital now performs echocardiograms for most patients with the virus.

At the height of the pandemic’s first wave, the hospital stationed ultrasound systems with Caption AI in COVID-19 wards. Rather than sending sonographers from unit to unit, which is the usual practice, staff stationed at the wards used the systems. The change reduced staff exposure to the virus and conserved precious protective gear.

Beyond the pandemic, the system will help hospitals provide urgent services while keeping a lid on rising costs, said a doctor at that hospital.

“AI-enabled machines will be the next big wave in taking care of patients wherever they are,” said Randy Martin, chief medical officer of Caption Health and emeritus professor of cardiology at Emory University, in Atlanta.

Martin joined the startup about four years ago after meeting its founders, who shared expertise and passion for medicine and AI. Today their software “takes a user through 10 standard views of the heart, coaching them through some 90 fine movements experts make,” he said.

“We don’t intend to replace sonographers; we’re just expanding the use of portable ultrasound systems to the periphery for more early detection,” he added.

Coping with Unexpected Demand Spike

In the early days of the pandemic, that expansion couldn’t come fast enough.

In late March, the startup exhausted supplies that included NVIDIA Quadro P3000 GPUs that ran its AI software. In the early days of the global shutdown, the startup reached out to its supply chain.

“We are experiencing overwhelming demand for our product,” the company’s CEO wrote, after placing orders for 100 GPUs with a distributor.

Caption Health has systems currently in use at 11 hospitals. It expects to deploy Caption AI at several additional sites in the coming weeks.

GPUs at the Heart of Automated Heart Tests

The startup currently integrates its software in a portable ultrasound from Terason. It intends to partner with more ultrasound makers in the future. And it advises partners to embed GPUs in their future ultrasound equipment.

The Quadro P3000 in Caption AI runs real-time inference tasks using deep convolutional neural networks. They provide operators guidance in positioning a probe that captures images. Then they automatically choose the highest-quality heart images and interpret them to help doctors make informed decisions.

The NVIDIA GPU also freed up four CPU cores, making space to process other tasks on the system, such as providing a smooth user experience.

The startup trained its AI models on a database of 1 million echocardiograms from clinical partners. An early study in partnership with Northwestern Medicine and the Minneapolis Heart Institute showed Caption AI helped eight registered nurses with no prior ultrasound experience capture highly accurate images on a wide variety of patients.

Inception Program Gives Startup Momentum

Caption Heath, formerly called Bay Labs, was founded in 2015 in Brisbane, Calif. It received a $125,000 prize at a 2017 GTC competition for members of NVIDIA’s Inception program, which gives startups access to technology, expertise and markets.

“Being part of the Inception program has provided us with increased recognition in the field of deep learning, a platform to share our AI innovations with healthcare and deep learning communities, and phenomenal support getting NVIDIA GPUs into our supply chain so we could deliver Caption AI,” said Charles Cadieu, co-founder and president of Caption Health.

Now that its tool has been tested in a pandemic, Caption Health looks forward to opportunities to help save lives across many ailments. The company aims to ride a trend toward more portable systems that extend availability and lower costs of diagnostic imaging.

“We hope to see our technology used everywhere from big hospitals to rural villages to examine people for a wide range of medical conditions,” said Cadieu.

To learn more about Caption Health and other companies like it, watch this webinar on healthcare startups working against COVID-19.

The post Heart of the Matter: AI Helps Doctors Navigate Pandemic appeared first on The Official NVIDIA Blog.

NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely

For many organizations, the coronavirus pandemic has created a permanent shift in how their employees work. From now on, they’ll have the option to collaborate at home or in the office.

NVIDIA is giving these millions of professionals around the world a boost with a new version of our virtual GPU software, vGPU July 2020. The software adds support for more workloads and is loaded with features that improve operational efficiencies for IT administrators.

GPU virtualization is key to offering everyone from designers to data scientists a flexible way to collaborate on projects that require advanced graphics and computing power, wherever they are.

Employee productivity was the primary concern among organizations addressing remote work due to the COVID-19 pandemic, according to recent research by IDC. When the market intelligence firm interviewed NVIDIA customers using GPU-accelerated virtual desktops, it found organizations with 500-1,000 users experienced a 13 percent increase in productivity, resulting in approximately more than $1 million in annual savings.

According to Alex Herrera, an analyst with Jon Peddie Research/Cadalyst, “In a centralized computing environment with virtualized GPU technology, users no longer have to be tied to their physical workstations. As proven recently through remote work companies can turn on a dime, enabling anywhere/anytime access to big data without compromising on performance.”

Expanded Support in the Data Center and Cloud with SUSE

NVIDIA has expanded hypervisor support by partnering with SUSE on its Linux Enterprise Server, providing vGPU support on its kernel-based virtual machine platform.

Initial offerings will be supported with NVIDIA vComputeServer software, enabling GPU virtualization for AI and data science workloads. This will expand hypervisor platform options for enterprises and cloud service providers that are seeing an increased need to support GPUs.

“Demand for accelerated computing has grown beyond specialized HPC environments into virtualized data centers,” said Brent Schroeder, global chief technology officer at SUSE. “To ensure the needs of business leaders are met, SUSE and NVIDIA have worked to simplify the use of NVIDIA virtual GPUs in SUSE Linux Enterprise Server. These efforts modernize the IT infrastructure and accelerate AI and ML workloads to enhance high-performance and time-sensitive workloads for SUSE customers everywhere.”

Added Support for Immersive Collaboration

NVIDIA CloudXR technology uses NVIDIA RTX and vGPU software to deliver VR and augmented reality across 5G and Wi-Fi networks. vGPU July 2020 adds 120Hz VSync support at resolutions up to 4K, giving CloudXR users an even smoother immersive experience on untethered devices. It creates a level of fidelity that’s indistinguishable from native tethered configurations.

“Streaming AR/VR over Wi-Fi or 5G enables organizations to truly take advantage of its benefits, enabling immersive training, product design and architecture and construction,” said Matt Coppinger, director of AR/VR at VMware. “We’re partnering with NVIDIA to more securely deliver AR and VR applications running on VMware vSphere and NVIDIA Quadro Virtual Workstation, streamed using NVIDIA CloudXR to VMware’s Project VXR client application running on standalone headsets.”

The latest release of vGPU enables a better user experience and manageability needed for demanding workloads like the recently debuted Omniverse AEC Experience, which combines Omniverse, a real-time collaboration platform, with RTX Server and NVIDIA Quadro Virtual Workstation software for the data center. The reference design supports up to two virtual workstations on an NVIDIA Quadro RTX GPU, running multiple workloads such as collaborative, computer-aided design while also providing real-time photorealistic rendering of the model.

With Quadro vWS, an Omniverse-enabled virtual workstation can be provisioned in minutes to new users, anywhere in the world. Users don’t need specialized client hardware, just an internet-connected device, laptop or tablet, and data remains highly secured in the data center.

Improved Operational Efficiency for IT Administrators

New features in vGPU July 2020 help enterprise IT admins and cloud service providers streamline management, boosting their operational efficiency.

This includes cross-branch support, where the host and guest vGPU software can be on different versions, easing upgrades and large deployments.

IT admins can move quicker to the latest hypervisor versions to pick up fixes, security patches and new features, while staggering deployments for end-user images.

Enterprise data centers running VMware vSphere will see improved operational efficiency by having the ability to manage vGPU powered VMs with the latest release of VMware vRealize Operations.

As well, VMware recently added Distributed Resource Scheduler support for GPU-enabled VMs into vSphere. Now, vSphere 7 introduces a new feature called “Assignable Hardware,” which enhances initial placement so that a VM can be automatically “placed” on a host that has exactly the right GPU and profile available before powering it on.

For IT managing large deployments, this means reducing deployment time of new VMs to a few minutes, as opposed to a manual process that can take hours. As well, this feature works with VMware’s vSphere High Availability, so if a host fails for any reason, a GPU-enabled VM can be automatically restarted on another host with the right GPU resources.

Availability

NVIDIA vGPU July 2020 release is coming soon. Learn more at nvidia.com/virtualization and watch this video.

The post NVIDIA Puts More Tools in Hands of Artists, Designers and Data Scientists Working Remotely appeared first on The Official NVIDIA Blog.

The MIT Press and UC Berkeley launch Rapid Reviews: COVID-19

The MIT Press has announced the launch of Rapid Reviews: COVID-19 (RR:C19), an open access, rapid-review overlay journal that will accelerate peer review of Covid-19-related research and deliver real-time, verified scientific information that policymakers and health leaders can use.

Scientists and researchers are working overtime to understand the SARS-CoV-2 virus and are producing an unprecedented amount of preprint scholarship that is publicly available online but has not been vetted yet by peer review for accuracy. Traditional peer review can take four or more weeks to complete, but RR:C19’s editorial team, led by Editor-in-Chief Stefano M. Bertozzi, professor of health policy and management and dean emeritus of the School of Public Health at the University of California at Berkeley, will produce expert reviews in a matter of days.

Using artificial intelligence tools, a global team will identify promising scholarship in preprint repositories, commission expert peer reviews, and publish the results on an open access platform in a completely transparent process. The journal will strive for disciplinary and geographic breadth, sourcing manuscripts from all regions and across a wide variety of fields, including medicine; public health; the physical, biological, and chemical sciences; the social sciences; and the humanities. RR:C19 will also provide a new publishing option for revised papers that are positively reviewed.

Amy Brand, director of the MIT Press sees the no-cost open access model as a way to increase the impact of global research and disseminate high-quality scholarship. “Offering a peer-reviewed model on top of preprints will bring a level of diligence that clinicians, researchers, and others worldwide rely on to make sound judgments about the current crisis and its amelioration,” says Brand. “The project also aims to provide a proof-of-concept for new models of peer-review and rapid publishing for broader applications.”

Made possible by a $350,000 grant from the Patrick J. McGovern Foundation and hosted on PubPub, an open-source publishing platform from the Knowledge Futures Group for collaboratively editing and publishing journals, monographs, and other open access scholarly content, RR:C19 will limit the spread of misinformation about Covid-19, according to Bertozzi.

“There is an urgent need to validate — or debunk — the rapidly growing volume of Covid-19-related manuscripts on preprint servers,” explains Bertozzi. “I’m excited to be working with the MIT Press, the Patrick J. McGovern Foundation, and the Knowledge Futures Group to create a novel publishing model that has the potential to more efficiently translate important scientific results into action. We are also working with COVIDScholar, an initiative of UC Berkeley and Lawrence Berkeley National Lab, to create unique AI/machine learning tools to support the review of hundreds of preprints per week.”

“This project signals a breakthrough in academic publishing, bringing together urgency and scientific rigor so the world’s researchers can rapidly disseminate new discoveries that we can trust,” says Vilas Dhar, trustee of the Patrick J. McGovern Foundation. “We are confident the RR:C19 journal will quickly become an invaluable resource for researchers, public health officials, and healthcare providers on the frontline of this pandemic. We’re also excited about the potential for a long-term transformation in how we evaluate and share research across all scientific disciplines.”

On the collaboration around this new journal, Travis Rich, executive director of the Knowledge Futures Group notes, “At a moment when credibility is increasingly crucial to the well-being of society, we’re thrilled to be partnering with this innovative journal to expand the idea of reviews as first-class research objects, both on PubPub and as a model for others.

RR:C19 will publish its first reviews in July 2020 and is actively recruiting potential reviewers and contributors. To learn more about this project and its esteemed editorial board, visit rapidreviewscovid19.mitpress.mit.edu.

Reflecting on Pride: How five Facebook researchers honor their LGBTQ+ history

The LGBTQ+ community has a long history of resilience and activism in the fight toward acceptance and equal rights in the United States. Pride Month is celebrated every June to honor the 1969 Stonewall Uprising in Manhattan and activists such as Marsha P. Johnson and Sylvia Rivera. This year, the 50th anniversary of Pride coincides with an increased swell of support for the fight against racial injustice and the Black Lives Matter movement, with protests and demonstrations occurring in every state across the U.S. and in countries around the world.

To reflect on the history of Pride Month and its roots in activism, we reached out to the LGBTQ+ community at Facebook. Researchers Gregory Davis, Meghan Rolfe, Darwin Mastin, TJ Olojede, and Hannah Furnas each volunteered their time to share what Pride means to them, how their research influences our products, and how they’re recognizing Pride this year.

Designing a product to bring our authentic selves

Gregory Davis (he/him) is a UX Researcher working on Portal.

Portal allows us to connect with significant others from all aspects of our lives. Making sure people are comfortable is vitally important to that goal. As a UX Researcher, I work on what Portal users need to be able to bring their multiple selves to the device. For LGBTQ+ users, these questions take on enhanced importance.

Pride, to me, is about celebrating the things about you that people can’t see and that many don’t want to see. The freedom to be out — to be all of our identities all of the time — is a gift that LGBTQ+ people cherish, given to us by our queer fore-parents and paid for with the blood, sweat, and tears of their activism and resistance. That activism tipped the scales toward equality with the Stonewall Uprising in June of 1969 when members of the LGBTQ+ community protested against the frequent police raids on the Stonewall Inn — a fight that is extremely relevant to today’s protests against police brutality.

Because queer people fought back at Stonewall and galvanized on the streets, in their homes, and at the ballot, we celebrate Pride Month every June. We celebrate winning marriage equality, protection against discrimination, and the ability to live our lives openly and honestly. That work isn’t done, however. In 2018 and 2019, at least two transgender or gender-nonconforming people were murdered each month. Most of these victims were Black trans women. Black LGBTQ+ people are still in the fight for respect from their families, recognition and equity at work, and safety from state violence.

When I look at my work as a bisexual Black man here at Facebook and beyond, I bring that history and knowledge with me. I design and implement projects at Portal thinking about the consumer in all their facets, including their race and sexuality. This helps us create a better product for everybody by making sure no one is excluded or neglected.

Working toward a safer platform for people to live their truth

Meghan Rolfe (she/her) is a UX Researcher working on Community Integrity.

Pride, to me, represents that journey we take that ends with the open-armed embrace of the LGBTQ+ community and the feeling that we are all in this together — that we see one another. I believe many of us grew up with a deep pain caused by the feeling that we are the “Other” in society, internalizing a deep-rooted fear of rejection and staying tightly in the closet. For me, Pride is about the release of that rejection and the overwhelming joy you feel once you can live your truth.

My work in Community Integrity relates to this. Part of my role is understanding the potential benefits and harms of identity verification on our platforms, as well as the steps we can take to support individuals regardless of identity, documentation status, or membership in marginalized groups. Many people use our platforms to find a community where they can safely express their authentic selves. Transgender people in particular are often able to be their true selves online before they’ve come out to their family and friends. This is a wonderful use of our products, and we should find ways to support this even more.

However, there is a flip side to this. Like many other companies, our security systems are built around identity verification: If someone is hacked, they are asked for government-issued documents that can confirm they are who they say they are. This means that for those who use a different identity online — even if that identity is their most authentic — an exact match with government-issued documents is expected, which makes it difficult to resolve disparities between on- and offline identities. Based on prior feedback, we’ve changed our policy to allow a wider range of documents beyond government-issued ID; even so, we are currently conducting additional research on this experience to understand how we can better support individuals with different on- and offline identities.

Identity verification also allows us to hold people accountable for any violations of our Community Standards, such as bullying and harassment. It’s important that we provide victims with the ability to report not just the accounts responsible for harassing behaviors, but also the individuals behind those accounts. By creating systems of accountability, we can better protect members of the LGBTQ+ community from both online and offline attacks.

This Pride, we must not only remember the LGBTQ+ leaders who fought for us to be able to live our truths, but also remind ourselves that this fight continues.

Listening, learning, and teaching with empathy

Darwin Mastin (he/him; they/them) is a UX Researcher working on pathfinding.

As a human behavior researcher, I love to learn about what drives people. I want to understand our unconscious actions, and make this knowledge available through stories and products. My research at Facebook is focused on understanding current and future gaps within the Facebook app and the company. We are not perfect, but I think research can influence the products and company by bringing other necessary and underrepresented voices to the table.

To me, Pride doesn’t stop at being proud. In addition to celebrating ourselves and our community, we must continue to stand up for our community and have the support of our allies in doing so. We all need to focus on listening, learning, and being involved — because our celebrations of Pride were born from similar calls for justice by queer trans people of color.

One of the ways we can help is educating others about the issues that marginalized communities face. The LGBTQ+ community spans every demographic group — race, age, education level, and so on. We can’t make assumptions about anyone else’s experience; we need to reach out and listen and learn, because all of our histories are so different and so broad, and coming together to celebrate and understand these differences makes for a stronger community.

However, while it is necessary to do the work of educating others, it should not be the sole responsibility of marginalized communities. I’ve found that when those who are less informed are able to attach to a story or an experience, it drives empathy and inspires them to want to learn more rather than just learn now. But once inspired, new allies must share the burden, internalize their learning, and educate others. It’s important that our allies take a moment and have those difficult conversations. It will be hard, but that’s where it starts.

A movement is not a moment. It is action and reaction, and building on that over and over again. Change will come from listening to our broader communities, giving voice to people who have not been heard. It’s not just something today and not tomorrow. We know the next step is voting, representation, policy — those steps will follow from the public’s demands. Those are the building blocks to pride.

Fostering inclusion through work and at work

TJ Olojede (he/him) is a Creative Researcher in Facebook’s Creative Shop.

As the Creative Research team within Facebook’s Creative Shop, our focus is on elevating creativity in advertising on the platform and making sure that advertisers utilize creativity maximally to achieve better business outcomes. Our research helps us understand what creative strategies perform better, and we share those best practices with all of the businesses who advertise on Facebook. I like to think that in this way, we make Facebook advertising more inclusive and accessible to the everyday “pop-and-pop” shop and to businesses big and small.

Pride Month feels bittersweet to me this year. It is an interesting time to exist at the intersection of being gay and Black in America, even as an immigrant. Often these two identities exist in conflict and influence how much I feel like I belong to either group, since I’m still an “other” within each community. When I first moved to the U.S., I was excited to leave Nigeria, and to be somewhere where LGBTQ+ rights were leaps and bounds above anything back home — even though not optimal. And then I started to understand what it meant to be Black in America, and I remember thinking I had just exchanged freedom from one oppression for another.

When the more recent Black Lives Matter protests started after the murder of George Floyd, it felt clear to me how to feel. With June being Pride Month, however, it didn’t feel right to celebrate Pride and be happy in the midst of all that was going on. Even worse, I felt betrayed seeing all of my non-POC friends who hadn’t said anything about BLM suddenly want to celebrate Pride.

But at work, I appreciated that the Pride team was sensitive and empathetic enough to hold off on all of the Pride fanfare in the middle of the protests, and that to me spoke volumes about how much we care about each other within the company. I appreciate that I work with a team of inclusive, caring people who make work a safe space and engender that sense of belonging and emotional closeness. For me, inclusion boils down to feeling like I matter, like I belong here, and that there are others here like me.

Acquiring a more complete picture of our community

Hannah Furnas (she/her) is a Research Science Manager on the Demography and Survey Science team.

At Facebook, I support a team of researchers working on projects at the intersection of survey research and machine learning. We design projects to collect survey ground truth that’s used to train and evaluate machine learning models.

To me, Pride means embracing my own queer identity and showing up for the LGBTQ+ community. I’m continuing to embrace my own queer identity in an ongoing process of showing up more fully for myself so that I can show up for others. I’m intentionally expanding my understanding of what it means to belong to the LGBTQ+ community — which includes noticing and unlearning a lot of what my upbringing has taught me.

I grew up in a very white, cis-normative, heteronormative environment. People, structures, and institutions praised heterosexual couples and shamed other types of relationships. When I came out two years ago, a lot of the pushback I received was from people who couldn’t fathom why I wanted to come out as bi/pan since I was in a relationship with a cis man. This idea that I should hide who I am is one of the reasons it took me so long to come out to myself. Through support from my colleagues in the Pride community at Facebook, I’ve begun to truly embrace who I am.

Not only did my early socialization impact my coming-out experience, but it also gave me an incomplete picture of LGBTQ+ history and the current issues we face. I was exposed to media that suggested the work was done and equal rights were won. This obviously isn’t the case. Systemic discrimination and violence disproportionately impacts BIPOC and trans communities despite the fact that our movement wouldn’t exist today without trans activists like Marsha P. Johnson and Sylvia Rivera.

If I fail to acknowledge the full lived experiences of others in our community, then I’m also upholding the structures and systems that continue to oppress the LGBTQ+ community. I’m continuing this lifelong work of building awareness and taking action to embrace ourselves and our community more fully. To me, this is what Pride is all about — this year and every year.

—

Diversity is crucial to understanding where we’re succeeding and where we need to do better in our business. It enables us to build better products, make better decisions, and better serve our communities. We are proud of our attention to the LGBTQ+ experience across our apps and technologies, often thanks to the many LGBTQ+ people and allies who work at Facebook.

To learn more about Diversity at Facebook, visit our website.

The post Reflecting on Pride: How five Facebook researchers honor their LGBTQ+ history appeared first on Facebook Research.

How Euler Hermes detects typo squatting with Amazon SageMaker

This is a guest post from Euler Hermes. In their own words, “For over 100 years, Euler Hermes, the world leader in credit insurance, has accompanied its clients to provide simpler and safer digital products, thus becoming a key catalyzer in the world’s commerce.”

Euler Hermes manages more than 600,000 B2B transactions per month and effectuates data analytics from over 30 million companies worldwide. At-scale artificial intelligence and machine learning (ML) have become the heart of the business.

Euler Hermes uses ML across a variety of use cases. One recent example is typo squatting detection, which came about after an ideation workshop between the Cybersecurity and IT Innovation teams to better protect clients. As it turns out, moving from idea to production has never been easier when your data is in the AWS Cloud and you can put the right tools in the hands of your data scientists in minutes.

Typo squatting, or hijacking, is a form of cybersecurity attack. It consists of registering internet domain names that closely resemble legitimate, reputable, and well-known ones with the goal of phishing scams, identity theft, advertising, and malware installation, among other potential issues. The sources of typo squatting can be varied, including different top-level domains (TLD), typos, misspellings, combo squatting, or differently phrased domains.

The challenge we faced was building an ML solution to quickly detect any suspicious domains registered that could be used to exploit the Euler Hermes brand or its products.

To simplify the ML workflow and reduce time-to-market, we opted to use Amazon SageMaker. This fully managed AWS service was a natural choice due to the ability to easily build, train, tune, and deploy ML models at scale without worrying about the underlying infrastructure while being able to integrate with other AWS services such as Amazon Simple Storage Service (Amazon S3) or AWS Lambda. Furthermore, Amazon SageMaker meets the strict security requirements necessary for financial services companies like Euler Hermes, including support for private notebooks and endpoints, encryption of data in transit and at rest, and more.

Solution overview

To build and tune ML models, we used Amazon SageMaker notebooks as the main working tool for our data scientists. The idea was to train an ML model to recognize domains related to Euler Hermes. To accomplish this, we worked on the following two key steps: dataset construction and model building.

Dataset construction

Every ML project requires a lot of data, and our first objective was to build the training dataset.

The dataset of negative examples was composed of 1 million entries randomly picked from Alexa, Umbrella, and publicly registered domains, whereas the dataset of 1 million positive examples was created from a domain generated algorithm (DGA) using Euler Hermes’s internal domains.

Model building and tuning

One of the project’s biggest challenges was to decrease the number of false positives to a minimum. On a daily basis, we need to unearth domains related to Euler Hermes from a large dataset of approximately 150,000 publicly registered domains.

We tried two approaches: classical ML models and deep learning.

We considered various models for classical ML, including Random Forest, Logistic regression, and gradient boosting (LightGBM and XGBoost). For these models, we manually created more than 250 features. After an extensive feature-engineering phase, we selected the following as the most relevant:

Number of FQDN levels
Vowels ration
Number of characters
Bag of n-grams (top 50 n-grams)
Features TF-IDF
Latent Dirichlet allocation features

For deep learning, we decided to work with recurrent neural networks. The model we adopted was a Bidirectional LSTM (BiLSTM) with an attention layer. We found this model to be the best at extracting a URL’s underlying structure.

The following diagram shows the architecture designed for the BiLSTM model. To avoid overfitting, a Dropout layer was added.

The following code orchestrates the set of layers:

def AttentionModel_(vocab_size, input_length, hidden_dim):
    model = tf.keras.models.Sequential()
    model.add(Embedding(MAX_VOCAB_SIZE, hidden_dim, input_length=input_length))
    model.add(Bidirectional(LSTM(units=hidden_dim, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
    model.add(SecSelfAttention(attention_activation='sigmoid'))
    model.add(Reshape((2*hidden_dim*input_length)))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(loss="binary_crossentropy", optimizer="adam", metrics=["acc", tf.keras.metrics.FalsePositives()])
    return model

We built and tuned the classical ML and the deep learning models using the Amazon SageMaker-provided containers for Scikit-learn and Keras.

The following table summarizes the results we obtained. The BiLSTM outperformed the other models with a 13% precision improvement compared to the second-best model (LightGBM). For this reason, we put the BiLSTM model into production.

Models	Precision	F1-Score	ROC-AUC (Area Under the Curve)
Random Forest	0.832	0.841	0.908
XGBoost	0.870	0.876	0.921
LightGBM	0.880	0.883	0.928
RNN (BiLSTM)	0.996	0.997	0.997

Model training

For model training, we made use of Managed Spot Training in Amazon SageMaker to use Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances for training jobs. This allowed us to optimize the cost of training models at a lower cost compared to On-Demand Instances.

Because we predominantly used custom deep learning models, we needed GPU instances for time-consuming neural network training jobs, with times ranging from minutes to a few hours. Under these constraints, Managed Spot Training was a game-changing solution. The on-demand solution permitted no interruption of our data scientists while managing instance-stopping conditions.

Productizing

Euler Hermes’s cloud principles follow a serverless-first strategy, with an Infrastructure as Code DevOps practice. Systematically, we construct a serverless architecture based on Lambda whenever possible, but when this isn’t possible, we deploy to containers using AWS Fargate.

Amazon SageMaker allows us to deploy our ML models at scale within the same platform on a 100% serverless and scalable architecture. It creates a model endpoint that is ready to serve inference requests. To get inferences for an entire dataset, we use batch transform, which cuts the dataset off in smaller batches and gets the predictions on each one. Batch transform manages all the compute resources required to get inferences, including launching instances and deleting them after the batch transform job is complete.

The following figure depicts the architecture deployed for the use case in this post.

First, a daily Amazon CloudWatch event is set to trigger a Lambda function with two jobs: download all the publicly registered domains and store them in an Amazon Simple Storage Service (Amazon S3) bucket subfolder and trigger the BatchTransform job. Amazon SageMaker automatically saves the inferences in an S3 bucket that you specify when creating the batch transform job.

Finally, a second CloudWatch event monitors the task success of Amazon SageMaker. If the task succeeds, it triggers a second Lambda function that retrieves the inferred domains and selects those that have label 1—related to Euler Hermes or its products—and stores them in another S3 bucket subfolder.

Following Euler Hermes’s DevOps principles, all the infrastructure in this solution is coded in Terraform to implement an MLOps pipeline to deploy to production.

Conclusion

Amazon SageMaker provides the tool that our data scientists need to quickly and securely experiment and test while maintaining compliance with strict financial service standards. This allows us to bring new ideas into production very rapidly. With flexibility and inherent programmability, Amazon SageMaker helped us tackle our main pain point of industrializing ML models at scale. After we train an ML model, we can use Amazon SageMaker to deploy the model, and can automate the entire pipeline following the same DevOps principles and tools we use for all other applications we run with AWS.

In under 7 months, we were able to launch a new internal ML service from ideation to production and can now identify URL squatting fraud within 24 hours after the creation of a malicious domain.

Although our application is ready, we have some additional steps planned. First, we’ll extend the inferences currently stored on Amazon S3 to our SIEM platform. Second, we’ll implement a web interface to monitor the model and allow manual feedback that is captured for model retraining.

About the Authors

Luis Leon is the IT Innovation Advisor responsible for the data science practice in the IT at Euler Hermes. He is in charge of the ideation of digital projects as well as managing the design, build and industrialization of at scale machine learning products. His main interests are Natural Language Processing, Time Series Analysis and non-supervised learning.

Hamza Benchekroun is Data Scientist in the IT Innovation hub at Euler Hermes focusing on deep learning solutions to increase productivity and guide decision making across teams. His research interests include Natural Language Processing, Time Series Analysis, Semi-Supervised Learning and their applications.

Hatim Binani is data scientist intern in the IT Innovation hub at Euler Hermes. He is an engineering student at INSA Lyon in the computer science department. His field of interest is data science and machine learning. He contributed within the IT innovation team to the deployment of Watson on Amazon Sagemaker.

Guillaume Chambert is an IT security engineer at Euler Hermes. As SOC manager, he strives to stay ahead of new threats in order to protect Euler Hermes’ sensitive and mission-critical data. He is interested in developing innovation solutions to prevent critical information from being stolen, damaged or compromised by hackers.

Holographic optics for thin and lightweight virtual reality

Facebook Reality Labs (FRL) is always exploring new optical architectures to improve form factor, comfort, and optical performance. Last fall, at Oculus Connect 6, FRL Chief Scientist Michael Abrash introduced new miniaturization progress in VR with Half Dome 2 and 3, two prototypes that examine how varifocal displays can improve visual and physical comfort. This year, at the virtual SIGGRAPH conference, we’re presenting another research milestone on this path: a new optical architecture that is significantly more compact and offers the potential for better visual performance.

In this work, “Holographic Optics for Thin and Lightweight Virtual Reality,” researchers Andrew Maimone and Junren Wang propose a new class of near-eye displays, which combine the power of holographic optics and polarization-based optical folding — an approach that could be used to develop future sunglasses-like VR hardware. These two methods help keep the optics as thin as possible while making the most efficient use of space. We anticipate that such lightweight and comfortable form factors may enable extended VR sessions and new use cases, including productivity.

The design is demonstrated in a proof-of-concept research device that uses only thin, flat films as optics to achieve a display thickness of less than 9 mm while supporting a field of view comparable to today’s consumer VR products. The work demonstrates the promise of better visual performance, as well: Laser illumination is used to deliver a much wider gamut of colors to VR displays, and progress is made toward scaling resolution to the limit of human vision.

This video demonstrates video game animation, as shown on our proof-of-concept research.

The approach

This image shows our research device display modules mounted into a frame. This research device was used to capture the green image shown below (some components are mounted externally).

Today’s VR displays have three primary components: a source of light (e.g., LEDs), a display panel that brightens or dims the light to form an image (e.g., an LCD panel), and a viewing optic that focuses the image far enough away so that the viewer’s eyes can see it (e.g., a plastic lens). As the first two components can readily be formed into thin and flat modules, most of the weight and volume go into the viewing optics. To significantly reduce the overall size and weight of VR displays, we combine two techniques: holographic optics and polarization-based optical folding.

Most VR displays share a common viewing optic: a simple refractive lens composed of a thick, curved piece or glass or plastic. We propose replacing this bulky element with holographic optics. You may be familiar with holographic images seen at a science museum or on your credit card, which appear to be three-dimensional with realistic depth in or out of the page. Like these holographic images, our holographic optics are a recording of the interaction of laser light with objects, but in this case the object is a lens rather than a 3D scene. The result is a dramatic reduction in thickness and weight: The holographic optic bends light like a lens but looks like a thin, transparent sticker.

However, even if the lens itself is made thin, the viewing optics as a whole may still be large — a considerable amount of empty space must be placed between the display panel and the lens to properly focus the image. Ordinarily, light from the display panel propagates forward to the lens and then continues toward the eye. However, when we apply polarization-based optical folding, light can be controlled to move both forward and backward within the lens so that this empty space can be traversed multiple times, collapsing it to a fraction of the original volume.

Wider color gamut

Shown on the left, a photograph captured with the proof-of-concept research device shown above. On the right, a photograph taken through a larger full-color benchtop prototype. We are currently working on achieving full color on the smaller research prototype.

When we apply holographic optics to a VR display, we must reevaluate all other optical components. Notably, holographic optics compel the use of laser light sources, which are more difficult to integrate but provide a much richer set of colors than the LEDs common in nearly all of today’s VR headsets, phones, computers, and televisions.

To illustrate the difference, the figure below shows the gamut of human-visible colors. A common set of colors reproducible on many displays today is the sRGB color space (illustrated by the smaller triangle). Note that it can capture only a small fraction of the colors that we can actually see. In contrast, the outer triangle represents the much larger set of colors that can be reproduced using the lasers on one of our research prototype displays. This allows the reproduction of vivid and saturated colors. Think of a brightly lit neon sign or the iridescent sheen of a butterfly wing.

This figure illustrates the gamut of human-visible colors. The sRGB space represents a common set of colors reproducible on many displays today. The outer triangle represents the larger set of colors reproducible on our research prototype.

What’s next

While it points toward the future development of lightweight, comfortable, and high-performance AR/VR technology, at present our work is purely research. In our technical paper, we identify the current limitations of our proposed display architecture and discuss future areas of research that will make the approach more practical. To our knowledge, our work demonstrates the thinnest VR display demonstrated to date, and we’re excited to see what the future holds.

The post Holographic optics for thin and lightweight virtual reality appeared first on Facebook Research.

Services used

Architecture

Deploying the architecture with AWS CloudFormation

Uploading documents to the S3 bucket

Building your chatbot

Publishing your chatbot

Using the chatbot

Conclusion

About the Authors

Heart Test Becomes Standard Procedure

Coping with Unexpected Demand Spike

GPUs at the Heart of Automated Heart Tests

Inception Program Gives Startup Momentum

Expanded Support in the Data Center and Cloud with SUSE

Added Support for Immersive Collaboration

Improved Operational Efficiency for IT Administrators

Availability

Designing a product to bring our authentic selves

Working toward a safer platform for people to live their truth

Listening, learning, and teaching with empathy

Fostering inclusion through work and at work

Acquiring a more complete picture of our community

Solution overview

Dataset construction

Model building and tuning

Model training

Productizing

Conclusion

About the Authors

The approach

Wider color gamut

What’s next

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.