Anticipating heart failure with machine learning

Anticipating heart failure with machine learning

Every year, roughly one out of eight U.S. deaths is caused at least in part by heart failure. One of acute heart failure’s most common warning signs is excess fluid in the lungs, a condition known as “pulmonary edema.” 

A patient’s exact level of excess fluid often dictates the doctor’s course of action, but making such determinations is difficult and requires clinicians to rely on subtle features in X-rays that sometimes lead to inconsistent diagnoses and treatment plans.

To better handle that kind of nuance, a group led by researchers at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) has developed a machine learning model that can look at an X-ray to quantify how severe the edema is, on a four-level scale ranging from 0 (healthy) to 3 (very, very bad). The system determined the right level more than half of the time, and correctly diagnosed level 3 cases 90 percent of the time.

Working with Beth Israel Deaconess Medical Center (BIDMC) and Philips, the team plans to integrate the model into BIDMC’s emergency-room workflow this fall.

“This project is meant to augment doctors’ workflow by providing additional information that can be used to inform their diagnoses as well as enable retrospective analyses,” says PhD student Ruizhi Liao, who was the co-lead author of a related paper with fellow PhD student Geeticka Chauhan and MIT professors Polina Golland and Peter Szolovits. 

The team says that better edema diagnosis would help doctors manage not only acute heart issues, but other conditions like sepsis and kidney failure that are strongly associated with edema. 

As part of a separate journal article, Liao and colleagues also took an existing public dataset of X-ray images and developed new annotations of severity labels that were agreed upon by a team of four radiologists. Liao’s hope is that these consensus labels can serve as a universal standard to benchmark future machine learning development.

An important aspect of the system is that it was trained not just on more than 300,000 X-ray images, but also on the corresponding text of reports about the X-rays that were written by radiologists. The team was pleasantly surprised that their system found such success using these reports, most of which didn’t have labels explaining the exact severity level of the edema.

“By learning the association between images and their corresponding reports, the method has the potential for a new way of automatic report generation from the detection of image-driven findings,says Tanveer Syeda-Mahmood, a researcher not involved in the project who serves as chief scientist for IBM’s Medical Sieve Radiology Grand Challenge. “Of course, further experiments would have to be done for this to be broadly applicable to other findings and their fine-grained descriptors.”

Chauhan’s efforts focused on helping the system make sense of the text of the reports, which could often be as short as a sentence or two. Different radiologists write with varying tones and use a range of terminology, so the researchers had to develop a set of linguistic rules and substitutions to ensure that data could be analyzed consistently across reports. This was in addition to the technical challenge of designing a model that can jointly train the image and text representations in a meaningful manner.

“Our model can turn both images and text into compact numerical abstractions from which an interpretation can be derived,” says Chauhan. “We trained it to minimize the difference between the representations of the X-ray images and the text of the radiology reports, using the reports to improve the image interpretation.”

On top of that, the team’s system was also able to “explain” itself, by showing which parts of the reports and areas of X-ray images correspond to the model prediction. Chauhan is hopeful that future work in this area will provide more detailed lower-level image-text correlations, so that clinicians can build a taxonomy of images, reports, disease labels and relevant correlated regions. 

“These correlations will be valuable for improving search through a large database of X-ray images and reports, to make retrospective analysis even more effective,” Chauhan says.

Chauhan, Golland, Liao and Szolovits co-wrote the paper with MIT Assistant Professor Jacob Andreas, Professor William Wells of Brigham and Women’s Hospital, Xin Wang of Philips, and Seth Berkowitz and Steven Horng of BIDMC. The paper will be presented Oct. 5 (virtually) at the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). 

The work was supported in part by the MIT Deshpande Center for Technological Innovation, the MIT Lincoln Lab, the National Institutes of Health, Philips, Takeda, and the Wistron Corporation.

Read More

Halloween-themed AWS DeepComposer Chartbusters Challenge: Track or Treat

Halloween-themed AWS DeepComposer Chartbusters Challenge: Track or Treat

We are back with a spooktacular AWS DeepComposer Chartbusters challenge, Track or Treat! In this challenge, you can interactively collaborate with the ghost in the machine (learning) and compose spooky music! Chartbusters is a global monthly challenge where you can use AWS DeepComposer to create original compositions on the console using machine learning techniques, compete to top the charts, and win prizes. This challenge launches today and participants can submit their compositions until October 23, 2020.

Participation is easy: you can generate spooky compositions using one of the supported generative AI techniques and models on the AWS DeepComposer console. You can add or remove notes and interactively collaborate with AI using the Edit melody feature, and can include spooky instruments in your composition.

How to compete

To participate in Track or Treat, just do the following:

  1. Go to AWS DeepComposer Music Studio and create a melody with the keyboard, import a melody, or choose a sample melody on the console.
  2. Under Generative AI technique, for Model parameters, choose Autoregressive.
  3. For Model, choose Autoregressive CNN Bach.

You have four advanced parameters that you can choose to adjust: Maximum notes to add, Maximum notes to remove, Sampling iterations, and Creative risk.

  1. After adjusting the values to your liking, choose Enhance input melody.

  1. Choose Edit melody to add or remove notes.
  2. You can also change the note duration and pitch.

  1. When finished, choose Apply changes.

  1. Repeat these steps until you’re satisfied with the generated music.
  2. To add accompaniments to your melody, switch the Generative AI technique to Generative Adversarial Networks, then choose Generate composition.
  1. Choose the right arrow next to an accompaniment track (green bar).
  2. For Instrument type, choose Spooky.

  1. When you’re happy with your composition, choose Download composition.

You can choose to post-process your composition; however, one of the judging criteria is how close your final submission is to the track generated using AWS DeepComposer.

  1. In the navigation pane, choose Chartbusters.
  2. Choose Submit a composition.
  3. Select Import a post-processed audio track and upload your composition.
  4. Provide a track name for your composition and choose Submit.

AWS DeepComposer then submits your composition to the Track or Treat playlist on SoundCloud.

Conclusion

You’ve successfully submitted your composition to the AWS DeepComposer Chartbusters challenge Track or Treat. Now you can invite your friends and family to listen to your creation on SoundCloud, vote for their favorite, and join the fun by participating in the competition.

Although you don’t need a physical keyboard to compete, you can buy the AWS DeepComposer keyboard for $99.00 to enhance your music generation experience. To learn more about the different generative AI techniques supported by AWS DeepComposer, check out the learning capsules available on the AWS DeepComposer console.


About the Author

Maryam Rezapoor is a Senior Product Manager with AWS AI Ecosystem team. As a former biomedical researcher and entrepreneur, she finds her passion in working backward from customers’ needs to create new impactful solutions. Outside of work, she enjoys hiking, photography, and gardening.

 

Read More

Audiovisual Speech Enhancement in YouTube Stories

Audiovisual Speech Enhancement in YouTube Stories

Posted by Inbar Mosseri, Software Engineer and Michael Rubinstein, Research Scientist, Google Research

While tremendous efforts are invested in improving the quality of videos taken with smartphone cameras, the quality of audio in videos is often overlooked. For example, the speech of a subject in a video where there are multiple people speaking or where there is high background noise might be muddled, distorted, or difficult to understand. In an effort to address this, two years ago we introduced Looking to Listen, a machine learning (ML) technology that uses both visual and audio cues to isolate the speech of a video’s subject. By training the model on a large-scale collection of online videos, we are able to capture correlations between speech and visual signals such as mouth movements and facial expressions, which can then be used to separate the speech of one person in a video from another, or to separate speech from background sounds. We showed that this technology not only achieves state-of-the-art results in speech separation and enhancement (a noticeable 1.5dB improvement over audio-only models), but in particular can improve the results over audio-only processing when there are multiple people speaking, as the visual cues in the video help determine who is saying what.

We are now happy to make the Looking to Listen technology available to users through a new audiovisual Speech Enhancement feature in YouTube Stories (on iOS), allowing creators to take better selfie videos by automatically enhancing their voices and reducing background noise. Getting this technology into users’ hands was no easy feat. Over the past year, we worked closely with users to learn how they would like to use such a feature, in what scenarios, and what balance of speech and background sounds they would like to have in their videos. We heavily optimized the Looking to Listen model to make it run efficiently on mobile devices, overall reducing the running time from 10x real-time on a desktop when our paper came out, to 0.5x real-time performance on the phone. We also put the technology through extensive testing to verify that it performs consistently across different recording conditions and for people with different appearances and voices.

From Research to Product
Optimizing Looking to Listen to allow fast and robust operation on mobile devices required us to overcome a number of challenges. First, all processing needed to be done on-device within the client app in order to minimize processing time and to preserve the user’s privacy; no audio or video information would be sent to servers for processing. Further, the model needed to co-exist alongside other ML algorithms used in the YouTube app in addition to the resource-consuming video recording itself. Finally, the algorithm needed to run quickly and efficiently on-device while minimizing battery consumption.

The first step in the Looking to Listen pipeline is to isolate thumbnail images that contain the faces of the speakers from the video stream. By leveraging MediaPipe BlazeFace with GPU accelerated inference, this step is now able to be executed in just a few milliseconds. We then switched the model part that processes each thumbnail separately to a lighter weight MobileNet (v2) architecture, which outputs visual features learned for the purpose of speech enhancement, extracted from the face thumbnails in 10 ms per frame. Because the compute time to embed the visual features is short, it can be done while the video is still being recorded. This avoids the need to keep the frames in memory for further processing, thereby reducing the overall memory footprint. Then, after the video finishes recording, the audio and the computed visual features are streamed to the audio-visual speech separation model which produces the isolated and enhanced speech.

We reduced the total number of parameters in the audio-visual model by replacing “regular” 2D convolutions with separable ones (1D in the frequency dimension, followed by 1D in the time dimension) with fewer filters. We then optimized the model further using TensorFlow Lite — a set of tools that enable running TensorFlow models on mobile devices with low latency and a small binary size. Finally, we reimplemented the model within the Learn2Compress framework in order to take advantage of built-in quantized training and QRNN support.

Our Looking to Listen on-device pipeline for audiovisual speech enhancement

These optimizations and improvements reduced the running time from 10x real-time on a desktop using the original formulation of Looking to Listen, to 0.5x real-time performance using only an iPhone CPU; and brought the model size down from 120MB to 6MB now, which makes it easier to deploy. Since YouTube Stories videos are short — limited to 15 seconds — the result of the video processing is available within a couple of seconds after the recording is finished.

Finally, to avoid processing videos with clean speech (so as to avoid unnecessary computation), we first run our model only on the first two seconds of the video, then compare the speech-enhanced output to the original input audio. If there is sufficient difference (meaning the model cleaned up the speech), then we enhance the speech throughout the rest of the video.

Researching User Needs
Early versions of Looking to Listen were designed to entirely isolate speech from the background noise. In a user study conducted together with YouTube, we found that users prefer to leave in some of the background sounds to give context and to retain some the general ambiance of the scene. Based on this user study, we take a linear combination of the original audio and our produced clean speech channel: output_audio = 0.1 x original_audio + 0.9 x speech. The following video presents clean speech combined with different levels of the background sounds in the scene (10% background is the balance we use in practice).

Below are additional examples of the enhanced speech results from the new Speech Enhancement feature in YouTube Stories. We recommend watching the videos with good speakers or headphones.

Fairness Analysis
Another important requirement is that the model be fair and inclusive. It must be able to handle different types of voices, languages and accents, as well as different visual appearances. To this end, we conducted a series of tests exploring the performance of the model with respect to various visual and speech/auditory attributes: the speaker’s age, skin tone, spoken language, voice pitch, visibility of the speaker’s face (% of video in which the speaker is in frame), head pose throughout the video, facial hair, presence of glasses, and the level of background noise in the (input) video.

For each of the above visual/auditory attributes, we ran our model on segments from our evaluation set (separate from the training set) and measured the speech enhancement accuracy, broken down according to the different attribute values. Results for some of the attributes are summarized in the following plots. Each data point in the plots represents hundreds (in most cases thousands) of videos fitting the criteria.

Speech enhancement quality (signal-to-distortion ratio, SDR, in dB) for different spoken languages, sorted alphabetically. The average SDR was 7.89 dB with a standard deviation of 0.42 dB — deviation that for human listeners is considered hard to notice.
Left: Speech enhancement quality as a function of the speaker’s voice pitch. The fundamental voice frequency (pitch) of an adult male typically ranges from 85 to 180 Hz, and that of an adult female ranges from 165 to 255 Hz. Right: speech enhancement quality as a function of the speaker’s predicted age.
As our method utilizes facial cues and mouth movements to isolate the speech, we tested whether facial hair (e.g., a moustache, beard) may obstruct those visual cues and affect the method’s performance. Our evaluations show that the quality of speech enhancement is maintained well also in the presence of facial hair.

Using the Feature
YouTube creators who are eligible for YouTube Stories creation may record a video on iOS, and select “Enhance speech” from the volume controls editing tool. This will immediately apply speech enhancement to the audio track and will play back the enhanced speech in a loop. It is then possible to toggle the feature on and off multiple times to compare the enhanced speech with the original audio.

In parallel to this new feature in YouTube, we are also exploring additional venues for this technology. More to come later this year — stay tuned!

Acknowledgements
This feature is a collaboration across multiple teams at Google. Key contributors include: from Research-IL: Oran Lang; from VisCAM: Ariel Ephrat, Mike Krainin, JD Velasquez, Inbar Mosseri, Michael Rubinstein; from Learn2Compress: Arun Kandoor; from MediaPipe: Buck Bourdon, Matsvei Zhdanovich, Matthias Grundmann; from YouTube: Andy Poes, Vadim Lavrusik, Aaron La Lau, Willi Geiger, Simona De Rosa, and Tomer Margolin.

Read More

NVIDIA vGPU Software Accelerates Performance with Support for NVIDIA Ampere Architecture

NVIDIA vGPU Software Accelerates Performance with Support for NVIDIA Ampere Architecture

From AI to VDI, NVIDIA virtual GPU products provide employees with powerful performance for any workflow.

vGPU technology helps IT departments easily scale the delivery of GPU resources, and allows professionals to collaborate and run advanced graphics and computing workflows from the data center or cloud.

Now, NVIDIA is expanding its vGPU software features with a new release that supports the NVIDIA A100 Tensor Core GPU with NVIDIA Virtual Compute Server (vCS) software. Based on NVIDIA vGPU technology, vCS enables AI and compute-intensive workloads to run in VMs.

With support for the NVIDIA A100, the latest NVIDIA vCS delivers significantly faster performance for AI and data analytics workloads.

Powered by the NVIDIA Ampere architecture, the A100 GPU provides strong scaling for GPU compute and deep learning applications running in single- and multi-GPU workstations, servers, clusters, cloud data centers, systems at the edge and supercomputers.

Enterprise data centers standardized on hypervisor-based virtualization can now deploy the A100 with vCS for all the operational benefits that virtualization brings with management and monitoring, without sacrificing performance. And with the workloads running in virtual machines, they can be managed, monitored and run remotely on any device, anywhere.

Graph shows normalized performance of MIG 2g.10gb running inferencing workload in bare metal (dark green) is nearly the same when running a Virtual Compute Server VM on each MIG instance (light green).

Engineers, researchers, students, data scientists and others can now tackle compute-intensive workloads in a virtual environment, accessing the most powerful GPU in the world through virtual machines that can be securely provisioned in minutes. As NVIDIA A100 GPUs become available in vGPU-certified servers from NVIDIA’s partners, professionals across all industries can accelerate their workloads with powerful performance.

Also, IT professionals get the management, monitoring and multi-tenancy benefits from hypervisors like Red Hat RHV/RHEL.

“Our customers have an increasing need to manage multi-tenant workflows running on virtual machines while providing isolation and security benefits,” said Chuck Dubuque, senior director of product marketing at Red Hat. “The new multi-instance GPU capabilities on NVIDIA A100 GPUs enable a new range of AI-accelerated workloads that run on Red Hat platforms from the cloud to the edge.”

Additional new features of the NVIDIA vGPU September 2020 release include:

  1. Multi-Instance GPU (MIG) with VMs: MIG expands the performance and value of NVIDIA A100 by partitioning the GPUs in up to seven instances. Each MIG can be fully isolated with its own high-bandwidth memory, cache and compute cores. Combining MIG with vCS, enterprises can take advantage of management, monitoring and operational benefits of hypervisor-based server virtualization, running a VM on each MIG partition.
  2. Heterogeneous Profiles and OSes: With the ability to have different sized instances through MIG, heterogenous vCS profiles can be used on an A100 GPU. This allows VMs of various sizes to be run on a single A100 GPU. Additionally, with VMs running on the NVIDIA GPUs with vCS, heterogeneous operating systems can also be run on an A100 GPU, where different Linux distributions can be run simultaneously in different VMs.
  3. GPUDirect Remote Direct Memory Access: Now supported with NVIDIA vCS, GPUDirect RDMA enables network devices to directly access GPU memory, bypassing CPU host memory and decreasing GPU-GPU communication latency to completely offload the CPU in a virtualized environment.

Learn more about NVIDIA Virtual Compute Server, including how the technology was recognized as Disruptive Technology of the Year at VMworld, and see the latest announcement of VMware and NVIDIA partnering to develop enterprise AI solutions.

VMware vSphere support for vCS with A100 will be available next year. The NVIDIA virtual GPU portfolio also includes the Quadro Virtual Workstation for technical and creative professionals, and GRID vPC and vApps for knowledge workers.

GTC Brings the Latest in vGPU

Hear more about how NVIDIA Virtual Compute Server is being used in industries at the GPU Technology Conference, taking place October 5-9.

Adam Tetelman and Jeff Weiss from NVIDIA, joined by Timothy Dietrich from NetApp, will give an overview of NVIDIA Virtual Compute Server technology and discuss use cases and manageability.

As well, a panel of experts from NVIDIA, ManTech and Maxar will share how NVIDIA vGPU is used in their solutions to analyze large amounts of data, enable remote visualization and accelerate compute for video streams and images.

Register now for GTC and check out all the sessions available.

The post NVIDIA vGPU Software Accelerates Performance with Support for NVIDIA Ampere Architecture appeared first on The Official NVIDIA Blog.

Read More

What if you could turn your voice into any instrument?

What if you could turn your voice into any instrument?

Imagine whistling your favorite song. You might not sound like the real deal, right? Now imagine your rendition using auto-tune software. Better, sure, but the result is still your voice. What if there was a way to turn your voice into something like a violin, or a saxophone, or a flute? 

Google Research’s Magenta team, which has been focused on the intersection of machine learning and creative tools for musicians, has been experimenting with exactly this. The team recently created an open source technology called Differentiable Digital Signal Processing (DDSP). DDSP is a new approach to machine learning that enables models to learn the characteristics of a musical instrument and map them to a different sound. The process can lead to so many creative, quirky results. Try replacing a capella singing with a saxophone solo, or a dog barking with a trumpet performance. The options are endless. 

And so are the sounds you can make. This development is important because it enables music technologies to become more inclusive. Machine learning models inherit biases from the datasets they are trained on, and music models are no different. Many are trained on the structure of western musical scores, which excludes much of the music from the rest of the world. Rather than following the formal rules of western music, like the 12 notes on a piano, DDSP transforms sound by modeling frequencies in the audio itself. This opens up machine learning technologies to a wider range of musical cultures. 

In fact, anyone can give it a try.  We created a tool called Tone Transfer to allow musicians and amateurs alike to tap into DDSP as a delightful creative tool. Play with the Tone Transfer showcase to sample sounds, or record your own, and listen to how they can be transformed into a myriad of instruments using DDSP technology. Check out our film that shows artists using Tone Transfer for the first time.

DDSP does not create music on its own; think of it like another instrument that requires skill and thought. It’s an experimental soundscape environment for music, and we’re so excited to see how the world uses it.

Read More

Get Trained, Go Deep: How Organizations Can Transform Their Workforce into an AI Powerhouse

Get Trained, Go Deep: How Organizations Can Transform Their Workforce into an AI Powerhouse

Despite the pandemic putting in-person training on hold, organizations can still offer instructor-led courses to their staff to develop key skills in AI, data science and accelerated computing.

NVIDIA’s Deep Learning Institute offers many online courses that deliver hands-on training. One of its most popular — recently updated and retitled as The Fundamentals of Deep Learning — will be taken by hundreds of attendees at next week’s GPU Technology Conference, running Oct. 5-9.

Organizations interested in boosting the deep learning skills of their personnel can arrange to get their teams trained by requesting a workshop from the DLI Course Catalog.

“Technology professionals who take our revamped deep learning course will emerge with the basics they need to start applying deep learning to their most challenging AI and machine learning applications,” said Craig Clawson, director of Training Services at NVIDIA. “This course is a key building block for developing a cutting-edge AI skillset.”

Huge Demand for Deep Learning

Deep learning is at the heart of the fast-growing fields of machine learning and AI. This makes it a skill that’s in huge demand and has put companies across industries in a race to recruit talent. Linkedin recently reported that the fastest growing job category in the U.S. is AI specialist, with annual job growth of 74 percent and an average annual salary of $136,000.

For many organizations, especially those in the software, internet, IT, higher education and consumer electronics sectors, investing in upskilling current employees can be critical to their success while offering a path to career advancement and increasing worker retention.

Deep Learning Application Development-

With interest in the field heating up, a recent article in Forbes highlighted that AI and machine learning, data science and IoT are among the most in-demand skills tech professionals should focus on. In other words, tech workers who lack these skills could soon find themselves at a professional disadvantage.

By developing needed skills, employees can make themselves more valuable to their organizations. And their employers benefit by embedding machine learning and AI functionality into their products, services and business processes.

“Organizations are looking closely at how AI and machine learning can improve their business,” Clawson said. “As they identify opportunities to leverage these technologies, they’re hustling to either develop or import the required skills.”

Get a glimpse of the DLI experience in this short video:

DLI Courses: An Invaluable Resource

The DLI has trained more than 250,000 developers globally. It has continued to deliver a wide range of training remotely via virtual classrooms during the COVID-19 pandemic.

Classes are taught by DLI-certified instructors who are experts in their fields, and breakout rooms support collaboration among students, and interaction with the instructors.

And by completing select courses, students can earn an NVIDIA Deep Learning Institute certificate to demonstrate subject matter competency and support career growth.

It would be hard to exaggerate the potential that this new technology and the NVIDIA developer community holds for improving the world — and the community is growing faster than ever. It took 13 years for the number of registered NVIDIA developers to reach 1 million. Just two years later, it has grown to over 2 million.

Whether enabling new medical procedures, inventing new robots or joining the effort to combat COVID-19, the NVIDIA developer community is breaking new ground every day.

Courses like the re-imagined Fundamentals of Deep Learning are helping developers and data scientists deliver breakthrough innovations across a wide range of industries and application domains.

“Our courses are structured to give developers the skills they need to thrive as AI and machine learning leaders,” said Clawson. “What they take away from the courses, both for themselves and their organizations, is immeasurable.”

To get started on the journey of transforming your organization into an AI powerhouse, request a DLI workshop today.

What is deep learning? Read more about this core technology.

The post Get Trained, Go Deep: How Organizations Can Transform Their Workforce into an AI Powerhouse appeared first on The Official NVIDIA Blog.

Read More

Announcing the Winners of the 2020 Global PyTorch Summer Hackathon

Announcing the Winners of the 2020 Global PyTorch Summer Hackathon

More than 2,500 participants in this year’s Global PyTorch Summer Hackathon pushed the envelope to create unique new tools and applications for PyTorch developers and researchers.

Notice: None of the projects submitted to the hackathon are associated with or offered by Facebook, Inc.

This year’s projects fell into three categories:

  • PyTorch Developer Tools: a tool or library for improving productivity and efficiency for PyTorch researchers and developers.

  • Web/Mobile Applications Powered by PyTorch: a web or mobile interface and/or an embedded device built using PyTorch.

  • PyTorch Responsible AI Development Tools: a tool, library, or web/mobile app to support researchers and developers in creating responsible AI that factors in fairness, security, privacy, and more throughout its entire development process.

The virtual hackathon ran from June 22 to August 25, with more than 2,500 registered participants, representing 114 countries from Republic of Azerbaijan, to Zimbabwe, to Japan, submitting a total of 106 projects. Entrants were judged on their idea’s quality, originality, potential impact, and how well they implemented it.

Meet the winners of each category below.

PyTorch Developer Tools

1st placeDeMask

DeMask is an end-to-end model for enhancing speech while wearing face masks — offering a clear benefit during times when face masks are mandatory in many spaces and for workers who wear face masks on the job. Built with Asteroid, a PyTorch-based audio source separation toolkit, DeMask is trained to recognize distortions in speech created by the muffling from face masks and to adjust the speech to make it sound clearer.

This submission stood out in particular because it represents both a high-quality idea and an implementation that can be reproduced by other researchers.

Here is an example on how to train a speech separation model in less than 20 lines:

from torch import optim
from pytorch_lightning import Trainer

from asteroid import ConvTasNet
from asteroid.losses import PITLossWrapper
from asteroid.data import LibriMix
from asteroid.engine import System

train_loader, val_loader = LibriMix.loaders_from_mini(task='sep_clean', batch_size=4)
model = ConvTasNet(n_src=2)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
loss = PITLossWrapper(
    lambda x, y: (x - y).pow(2).mean(-1),  # MSE
    pit_from="pw_pt",  # Point in the pairwise matrix.
)

system = System(model, optimizer, loss, train_loader, val_loader)

trainer = Trainer(fast_dev_run=True)
trainer.fit(system)

2nd placecarefree-learn

A PyTorch-based automated machine learning (AutoML) solution, carefree-learn provides high-level APIs to make training models using tabular data sets simpler. It features an interface similar to scikit-learn and functions as an end-to-end end pipeline for tabular data sets. It automatically detects feature column types and redundant feature columns, imputes missing values, encodes string columns and categorical columns, and preprocesses numerical columns, among other features.

3rd PlaceTorchExpo

TorchExpo is a collection of models and extensions that simplifies taking PyTorch from research to production in mobile devices. This library is more than a web and mobile application, and also comes with a Python library. The Python library is available via pip install and it helps researchers convert a state-of-the-art model in TorchScript and ONNX format in just one line. Detailed docs are available here.

Web/Mobile Applications Powered by PyTorch

1st placeQ&Aid

Q&Aid is a conceptual health-care chatbot aimed at making health-care diagnoses and facilitating communication between patients and doctors. It relies on a series of machine learning models to filter, label, and answer medical questions, based on a medical image and/or questions in text provided by a patient. The transcripts from the chat app then can be forwarded to the local hospitals and the patient will be contacted by one of them to make an appointment to determine proper diagnosis and care. The team hopes that this concept application helps hospitals to work with patients more efficiently and provide proper care.

2nd placeRasoee

Rasoee is an application that can take images as input and output the name of the dish. It also lists the ingredients and recipe, along with the link to the original recipe online. Additionally, users can choose a cuisine from the list of cuisines in the drop menu, and describe the taste and/or method of preparation in text. Then the application will return matching dishes from the list of 308 identifiable dishes. The team has put a significant amount of effort gathering and cleaning various datasets to build more accurate and comprehensive models. You can check out the application here.

3rd placeRexana the Robot — PyTorch

Rexana is an AI voice assistant meant to lay the foundation for a physical robot that can complete basic tasks around the house. The system is capable of autonomous navigation (knowing its position around the house relative to landmarks), recognizing voice commands, and object detection and recognition — meaning it can be commanded to perform various household tasks (e.g., “Rexana, water the potted plant in the lounge room.”). Rexana can be controlled remotely via a mobile device, and the robot itself features customizable hands (magnets, grippers, etc.) for taking on different jobs.

PyTorch Responsible AI Development Tools

1st place: FairTorch

FairTorch is a fairness library for PyTorch. It lets developers add constraints to their models to equalize metrics across subgroups by simply adding a few lines of code. Model builders can choose a metric definition of fairness for their context, and enforce it at time of training. The library offers a suite of metrics that measure an AI system’s performance among subgroups, and can apply to high-stakes examples where decision-making algorithms are deployed, such as hiring, school admissions, and banking.

2nd place: Fluence

Fluence is a PyTorch-based deep learning library for language research. It specifically addresses the large compute demands of natural language processing (NLP) research. Fluence aims to provide low-resource and computationally efficient algorithms for NLP, giving researchers algorithms that can enhance current NLP methods or help discover where current methods fall short.

3rd place: Causing: CAUSal INterpretation using Graphs

Causing (CAUSal INterpretation using Graphs) is a multivariate graphic analysis tool for bringing transparency to neural networks. It explains causality and helps researchers and developers interpret the causal effects of a given equation system to ensure fairness. Developers can input data and a model describing the dependencies between the variables within the data set into Causing, and Causing will output a colored graph of quantified effects acting between the model’s variables. In addition, it also allows developers to estimate these effects to validate whether data fits a model.

Thank you,

The PyTorch team

Read More