Researchers Use AI to Help Earbud Users Mute Background Noise

Thanks to earbuds, people can take calls anywhere, while doing anything. The problem: those on the other end of the call can hear all the background noise, too, whether it’s the roommate’s vacuum cleaner or neighboring conversations at a café.

Now, work by a trio of graduate students at the University of Washington, who spent the pandemic cooped up together in a noisy apartment, lets those on the other end of the call hear just the speaker — rather than all the surrounding sounds.

Users found that the system, dubbed “ClearBuds” — presented last month at the ACM International Conference on Mobile Systems, Applications and Services — improved background noise suppression much better than a commercially available alternative.

AI Podcast host Noah Kravitz caught up with the team at ClearBuds to discuss the unlikely pandemic-time origin story behind a technology that promises to make calls clearer and easier, wherever we go.

You Might Also Like

Listen Up: How Audio Analytic Is Teaching Machines to Listen

Audio Analytic has been using machine learning that enables a vast array of devices to make sense of the world of sound. Dr. Chris Mitchell, CEO and founder of Audio Analytic, discusses the challenges and the fun involved in teaching machines to listen.

A Podcast With Teeth: How Overjet Brings AI to Dentists’ Offices

Overjet, a member of the NVIDIA Inception program for startups, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of Overjet, talks about how her company improves patient care with AI-powered technology that analyzes and annotates X-rays for dentists and insurance providers.

Sing It, Sister! Maya Ackerman on LyricStudio, an AI-Based Writing Assistant

Maya Ackerman is the CEO of WaveAI, a Silicon Valley startup using AI and machine learning to, as the company motto puts it, “unlock new heights of human creative expression.” She discusses WaveAI’s LyricStudio software, an AI-based lyric and poetry writing assistant.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

The post Researchers Use AI to Help Earbud Users Mute Background Noise appeared first on NVIDIA Blog.

Read More

Meet the Omnivore: Ph.D. Student Lets Anyone Bring Simulated Bots to Life With NVIDIA Omniverse Extension

Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds.

Yizhou Zhao

When not engrossed in his studies toward a Ph.D. in statistics, conducting data-driven research on AI and robotics, or enjoying his favorite hobby of sailing, Yizhou Zhao is winning contests for developers who use NVIDIA Omniverse — a platform for connecting and building custom 3D pipelines and metaverse applications.

The fifth-year doctoral candidate at the University of California, Los Angeles recently received first place in the inaugural #ExtendOmniverse contest, where developers were invited to create their own Omniverse extension for a chance to win an NVIDIA RTX GPU.

Omniverse extensions are core building blocks that let anyone create and extend functions of Omniverse apps using the popular Python programming language.

Zhao’s winning entry, called “IndoorKit,” allows users to easily load and record robotics simulation tasks in indoor scenes. It sets up robotics manipulation tasks by automatically populating scenes with the indoor environment, the bot and other objects with just a few clicks.

“Typically, it’s hard to deploy a robotics task in simulation without a lot of skills in scene building, layout sampling and robot control,” Zhao said. “By bringing assets into Omniverse’s powerful user interface using the Universal Scene Description framework, my extension achieves instant scene setup and accurate control of the robot.”

Within “IndoorKit,” users can simply click “add object,” “add house,” “load scene,” “record scene” and other buttons to manipulate aspects of the environment and dive right into robotics simulation.

With Universal Scene Description (USD), an open-source, extensible file framework, Zhao seamlessly brought 3D models into his environments using Omniverse Connectors for Autodesk Maya and Blender software.

The “IndoorKit” extension also relies on assets from the NVIDIA Isaac Sim robotics simulation platform and Omniverse’s built-in PhysX capabilities for accurate, articulated manipulation of the bots.

In addition, “IndoorKit” can randomize a scene’s lighting, room materials and more. One scene Zhao built with the extension is highlighted in the feature video above.

Omniverse for Robotics 

The “IndoorKit” extension bridges Omniverse and robotics research in simulation.

A view of Zhao’s “IndoorKit” extension

“I don’t see how accurate robot control was performed prior to Omniverse,” Zhao said. He provides four main reasons for why Omniverse was the ideal platform on which to build this extension:

First, Python’s popularity means many developers can build extensions with it to unlock machine learning and deep learning research for a broader audience, he said.

Second, using NVIDIA RTX GPUs with Omniverse greatly accelerates robot control and training.

Third, Omniverse’s ray-tracing technology enables real-time, photorealistic rendering of his scenes. This saves 90% of the time Zhao used to spend for experiment setup and simulation, he said.

And fourth, Omniverse’s real-time advanced physics simulation engine, PhysX, supports an extensive range of features — including liquid, particle and soft-body simulation — which “land on the frontier of robotics studies,” according to Zhao.

“The future of art, engineering and research is in the spirit of connecting everything: modeling, animation and simulation,” he said. “And Omniverse brings it all together.”

Join In on the Creation

Creators and developers across the world can download NVIDIA Omniverse for free, and enterprise teams can use the platform for their 3D projects.

Discover how to build an Omniverse extension in less than 10 minutes.

For a deeper dive into developing on Omniverse, watch the on-demand NVIDIA GTC session, “How to Build Extensions and Apps for Virtual Worlds With NVIDIA Omniverse.”

Find additional documentation and tutorials in the Omniverse Resource Center, which details how developers like Zhao can build custom USD-based applications and extensions for the platform.

To discover more free tools, training and a community for developers, join the NVIDIA Developer Program.

Follow NVIDIA Omniverse on Instagram, Medium, Twitter and YouTube for additional resources and inspiration. Check out the Omniverse forums, and join our Discord server and Twitch channel to chat with the community.

The post Meet the Omnivore: Ph.D. Student Lets Anyone Bring Simulated Bots to Life With NVIDIA Omniverse Extension appeared first on NVIDIA Blog.

Read More

Face-off Probability, part of NHL Edge IQ: Predicting face-off winners in real time during televised games

A photo showing a faceoff between two teams during an NHL game. Face-off Probability is the National Hockey League’s (NHL) first advanced statistic using machine learning (ML) and artificial intelligence. It uses real-time Player and Puck Tracking (PPT) data to show viewers which player is likely to win a face-off before the puck is dropped, and provides broadcasters and viewers the opportunity to dive deeper into the importance of face-off matches and the differences in player abilities. Based on 10 years of historical data, hundreds of thousands of face-offs were used to engineer over 70 features fed into the model to provide real-time probabilities. Broadcasters can now discuss how a key face-off win by a player led to a goal or how the chances of winning a face-off decrease as a team’s face-off specialist is waived out of a draw. Fans can see visual, real-time predictions that show them the importance of a key part of the game.

In this post, we focus on how the ML model for Face-off Probability was developed and the services used to put the model into production. We also share the key technical challenges that were solved during construction of the Face-off Probability model.

How it works

Imagine the following scenario: It’s a tie game between two NHL teams that will determine who moves forward. We’re in the third period with 1:22 seconds left to play. Two players from opposite teams line up to take the draw in the closest face-off closer to one of the nets. The linesman notices a defensive player encroaching into the face-off circle and waives their player out of the face-off due to the violation. A less experienced defensive player comes in to take the draw as his replacement. The attacking team wins the face-off, gets possession of the puck, and immediately scores to take the lead. The score holds up for the remaining minute of the game and decides who moves forward. What player was favored to win the face-off before the initial duo was changed? How much did the defensive’s team probability of winning the face-off decrease by the violation that forced a different player to take the draw? Face-off Probability, the newest NHL Edge IQ statistic powered by AWS, can now answer these questions.

When there is a stoppage in play, Face-off Probability generates predictions for who will win the upcoming face-off based on the players on the ice, the location of the face-off, and the current game situation. The predictions are generated throughout the stoppage until the game clock starts running again. Predictions occur at sub-second latency and are triggered any time there is a change in the players involved in the face-off.

An NHL faceoff shot from up top

Overcoming key obstacles for face-off probability

Predicting face-off probability in real-time broadcasts can be broken down into two specific sub-problems:

  • Modeling the face-off event as an ML problem, understanding the requirements and limitations, preparing the data, engineering the data signals, exploring algorithms, and ensuring reliability of results
  • Detecting a face-off event during the game from a stream of PPT events, collecting parameters needed for prediction, calling the model, and submitting results to broadcasters

Predicting the probability of a player winning a face-off in real time on a televised broadcast has several technical challenges that had to be overcome. These included determining the features required and modeling methods to predict an event that has a large amount of uncertainty, and determining how to use streaming PPT sensor data to identify where a face-off is occurring, the players involved, and the probability of each player winning the face-off, all within hundreds of milliseconds.

Players huddling in a shot of a Faceoff during a game

Building an ML model for difficult-to-predict events

Predicting events such as face-off winning probabilities during a live game is a complex task that requires a significant amount of quality historic data and data streaming capabilities. To identify and understand the important signals in such a rich data environment, the development of ML models requires extensive subject matter expertise. The Amazon Machine Learning Solutions Lab partnered with NHL hockey and data experts to work backward from their target goal of enhancing their fan experience. By continuously listening to NHL’s expertise and testing hypotheses, AWS’s scientists engineered over 100 features that correlate to the face-off event. In particular, the team classified this feature set into one of three categories:

  • Historical statistics on player performances such as the number of face-offs a player has taken and won in the last five seasons, the number of face-offs the player has taken and won in previous games, a player’s winning percentages over several time windows, and the head-to-head winning percentage for each player in the face-off
  • Player characteristics such as height, weight, handedness, and years in the league
  • In-game situational data that might affect a player’s performance, such as the score of the game, the elapsed time in the game to that point, where the face-off is located, the strength of each team, and which player has to put their stick down for the face-off first

AWS’s ML scientists considered the problem as a binary classification problem: either the home player wins the face-off or the away player wins the face-off. With data from more than 200,000 historical face-offs, they used a LightGBM model to predict which of the two players involved with a face-off event is likely to win.

Determining if a face-off is about to occur and which players are involved

When a whistle blows and the play is stopped, Face-off Probability begins to make predictions. However, Face-off Probability has to first determine where the face-off is occurring and which player from each team is involved in the face-off. The data stream indicates events as they occur but doesn’t provide information on when an event is likely to occur in the future. As such, the sensor data of the players on the ice is needed to determineif and where a face-off is about to happen.

The PPT system produces real-time locations and velocities for players on the ice at up to 60 events per second. These locations and velocities were used to determine where the face-off is happening on the ice and if it’s likely to happen soon. By knowing how close the players are to known face-off locations and how stationary the players were, Face-off Probability was able to determine that a face-off was likely to occur and the two players that would be involved in the face-off.

Determining the correct cut-off distance for proximity to a face-off location and the corresponding cut-off velocity for stationary players was accomplished using a decision tree model. With PPT data from the 2020-2021 season, we built a model to predict the likelihood that a face-off is occurring at a specified location given the average distance of each team to the location and the velocities of the players. The decision tree provided the cut-offs for each metric, which we included as rules-based logic in the streaming application.

With the correct face-off location determined, the player from each team taking the face-off was calculated by taking the player closest to the known location from each team. This provided the application with the flexibility to identify the correct players while also being able to adjust to a new player having to take a face-off if a current player is waived out due to an infraction. Making and updating the prediction for the correct player was a key focus for the real-time usability of the model in broadcasts, which we describe further in the next section.

Model development and training

To develop the model, we used more than 200,000 historical face-off data points, along with the custom engineered feature set designed by working with the subject matter experts. We looked at features like in-game situations, historical performance of the players taking the face-off, player-specific characteristics, and head-to-head performances of the players taking the face-off, both in the current season and for their careers. Collectively, this resulted in over 100 features created using a combination of available and derived techniques.

To assess different features and how they might influence the model, we conducted extensive feature analysis as part of the exploratory phase. We used a mix of univariate tests and multivariate tests. For multivariate tests, for interpretability, we used decision tree visualization techniques. To assess statistical significance, we used Chi Test and KS tests to test dependence or distribution differences.

A decision tree showing how the model estimates based on the underlying data and features

We explored classification techniques and models with the expectation that the raw probabilities would be treated as the predictions. We explored nearest neighbors, decision trees, neural networks, and also collaborative filtering in terms of algorithms, while trying different sampling strategies (filtering, random, stratified, and time-based sampling) and evaluated performance on Area Under the Curve (AUC) and calibration distribution along with Brier score loss. At the end, we found that the LightGBM model worked best with well-calibrated accuracy metrics.

To evaluate the performance of the models, we used multiple techniques. We used a test set that the trained model was never exposed to. Additionally, the teams conducted extensive manual assessments of the results, looking at edge cases and trying to understand the nuances of how the model looked to determine why a certain player should have won or lost a face-off event.

With information collected from manual reviewers, we would adjust the features when required, or run iterations on the model to see if the performance of the model was as expected.

Deploying Face-off Probability for real-time use during national television broadcasts

One of the goals of the project was not just to predict the winner of the face-off, but to build a foundation for solving a number of similar problems in a real-time and cost-efficient way. That goal helped determine which components to use in the final architecture.

architecture diagram for faceoff application

The first important component is Amazon Kinesis Data Streams, a serverless streaming data service that acts as a decoupler between the specific implementation of the PPT data provider and consuming applications, thereby protecting the latter from the disrupting changes in the former. It has also enhanced the fan-out feature, which provides the ability to connect up to 20 parallel consumers and maintain a low latency of 70 milliseconds and the same throughput of 2MB/s per shard between all of them simultaneously.

PPT events don’t come for all players at once, but arrive discretely for each player as well as other events in the game. Therefore, to implement the upcoming face-off detection algorithm, the application needs to maintain a state.

The second important component of the architecture is Amazon Kinesis Data Analytics for Apache Flink. Apache Flink is a distributed streaming, high-throughput, low-latency data flow engine that provides a convenient and easy way to use the Data Stream API, and it supports stateful processing functions, checkpointing, and parallel processing out of the box. This helps speed up development and provides access to low-level routines and components, which allows for a flexible design and implementation of applications.

Kinesis Data Analytics provides the underlying infrastructure for your Apache Flink applications. It eliminates a need to deploy and configure a Flink cluster on Amazon Elastic Compute Cloud (Amazon EC2) or Kubernetes, which reduces maintenance complexity and costs.

The third crucial component is Amazon SageMaker. Although we used SageMaker to build a model, we also needed to make a decision at the early stages of the project: should scoring be implemented inside the face-off detecting application itself and complicate the implementation, or should the face-off detecting application call SageMaker remotely and sacrifice some latency due to communication over the network? To make an informed decision, we performed a series of benchmarks to verify SageMaker latency and scalability, and validated that average latency was less than 100 milliseconds under the load, which was within our expectations.

With the main parts of high-level architecture decided, we started to work on the internal design of the face-off detecting application. A computation model of the application is depicted in the following diagram.

a diagram representing the flowchart/computation model of the faceoff application

The compute model of the face-off detecting application can be modeled as a simple finite-state machine, where each incoming message transitions the system from one state to another while performing some computation along with that transition. The application maintains several data structures to keep track of the following:

  • Changes in the game state – The current period number, status and value of the game clock, and score
  • Changes in the player’s state – If the player is currently on the ice or on the bench, the current coordinates on the field, and the current velocity
  • Changes in the player’s personal face-off stats – The success rate of one player vs. another, and so on

The algorithm checks each location update event of a player to decide whether a face-off prediction should be made and whether the result should be submitted to broadcasters. Taking into account that each player location is updated roughly every 80 milliseconds and players move much slower during game pauses than during the game, we can conclude that the situation between two updates doesn’t drastically change. If the application called SageMaker for predictions and sent predictions to broadcasters every time a new location update event was received and all conditions are satisfied, SageMaker and the broadcasters would be overwhelmed with a number of duplicate requests.

To avoid all this unnecessary noise, the application keeps track of a combination of parameters for which predictions were already made, along with the result of the prediction, and caches them in memory to avoid expensive duplicate requests to SageMaker. Also, it keeps track of what predictions were already sent to broadcasters and makes sure that only new predictions are sent or the previously sent ones are sent again only if necessary. Testing showed that this approach reduces the amount of outgoing traffic by more than 100 times.

Another optimization technique that we used was grouping requests to SageMaker and performing them asynchronously in parallel. For example, if we have four new combinations of face-off parameters for which we need to get predictions from SageMaker, we know that each request will take less than 100 milliseconds. If we perform each request synchronously one by one, the total response time will be under 400 milliseconds. But if we group all four requests, submit them asynchronously, and wait for the result for the entire group before moving forward, we effectively parallelize requests and the total response time will be under 100 milliseconds, just like for only one request.

Summary

NHL Edge IQ, powered by AWS, is bringing fans closer to the action with advanced analytics and new ML stats. In this post, we showed insights into the building and deployment of the new Face-off Probability model, the first on-air ML statistic for the NHL. Be sure to keep an eye out for the probabilities generated by Face-off Probability in upcoming NHL games.

To find full examples of building custom training jobs for SageMaker, visit Bring your own training-completed model with SageMaker by building a custom container. For examples of using Amazon Kinesis for streaming, refer to Learning Amazon Kinesis Development.

To learn more about the partnership between AWS and the NHL, visit NHL Innovates with AWS Cloud Services. If you’d like to collaborate with experts to bring ML solutions to your organization, contact the Amazon ML Solutions Lab.


About the Authors

Ryan Gillespie is a Sr. Data Scientist with AWS Professional Services. He has a MSc from Northwestern University and a MBA from the University of Toronto. He has previous experience in the retail and mining industries.

Yash Shah is a Science Manager in the Amazon ML Solutions Lab. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.

Alexander Egorov is a Principal Data Architect, specializing in streaming technologies. He helps organizations to design and build platforms for processing and analyzing streaming data in real time.

Miguel Romero Calvo is an Applied Scientist at the Amazon ML Solutions Lab where he partners with AWS internal teams and strategic customers to accelerate their business through ML and cloud adoption.

Erick Martinez is a Sr. Media Application Architect with 25+ years of experience, with focus on Media and Entertainment. He is experienced in all aspects of systems development life-cycle ranging from discovery, requirements gathering, design, implementation, testing, deployment, and operation.

Read More

CircularNet: Reducing waste with Machine Learning

CircularNet: Reducing waste with Machine Learning

Posted by Sujit Sanjeev, Product Manager, Robert Little, Sustainability Program Manager, Umair Sabir, Machine Learning Engineer

Have you ever been confused about how to file your taxes? Perplexed when assembling furniture? Unsure about how to understand your partner? It turns out that many of us find the act of recycling as more confusing than all of the above. As a result, we do a poor job of recycling right, with less than 10% of our global resources recycled, and tossing 1 of every 5 items (~17%) in a recycling bin that shouldn’t be there. That’s bad news for everyone — recycling facilities catch fire, we lose billions of dollars in recyclable material every year — and at an existential level, we miss an opportunity to leverage recycling as an impactful tool to combat climate change. With this context in mind, we asked ourselves – how might we use the power of technology to ensure that we recycle more and recycle right?

As the world population grows and urbanizes, waste production is estimated to reach 2.6 billion tons a year in 2030, an increase from its current level of around 2.1 billion tons. Efficient recycling strategies are critical to foster a sustainable future.

The facilities where our waste and recyclables are processed are called “Material Recovery Facilities” (MRFs). Each MRF processes tens of thousands of pounds of our societal “waste” every day, separating valuable recyclable materials like metals and plastics from non-recyclable materials. A key inefficiency within the current waste capture and sorting process is the inability to identify and segregate waste into high quality material streams. The accuracy of the sorting directly determines the quality of the recycled material; for high-quality, commercially viable recycling, the contamination levels need to be low. Even though the MRFs use various technologies alongside manual labor to separate materials into distinct and clean streams, the exceptionally cluttered and contaminated nature of the waste stream makes automated waste detection challenging to achieve, and the recycling rates and the profit margins stay at undesirably low levels.

Enter what we call “CircularNet”, a set of models that lowers barriers to AI/ML tech for waste identification and all the benefits this new level of transparency can offer.

Our goal with CircularNet is to develop a robust and data-efficient model for waste/recyclables detection, which can support the way we identify, sort, manage, and recycle materials across the waste management ecosystem. Models such as this could potentially help with:

  • Better understanding and capturing more value from recycling value chains
  • Increasing landfill diversion of materials
  • Identifying and reducing contamination in inbound and outbound material streams

Challenges

Processing tens of thousands of pounds of material every day, Material Recovery Facility waste streams present a unique and ever-changing challenge: a complex, cluttered, and diverse flow of materials at any given moment. Additionally, there is a lack of comprehensive and readily accessible waste imagery datasets to train and evaluate ML models.

The models should be able to accurately identify different types of waste in “real world” conditions of a MRF – meaning identifying items despite severe clutter and occlusions, high variability of foreground object shapes and textures, and severe object deformation.

In addition to these challenges, others that need to be addressed are visual diversity of foreground and background objects that are often severely deformed, and fine-grained differences between the object classes (e.g. brown paper vs. cardboard; or soft vs. rigid plastic).

There also needs to be consistency while tracking recyclables through the recycling value chain e.g. at point of disposal, within recycling bins and hauling trucks, and within material recovery facilities.

Solution

The CircularNet model is built to perform Instance Segmentation by training on thousands of images with the Mask R-CNN algorithm. Mask R-CNN was implemented from the TensorFlow Model Garden, which is a repository consisting of multiple models and modeling solutions for Tensorflow users.

By collaborating with experts in the recycling industry, we developed a customized and globally-applicable taxonomy of material types (e.g. “paper” “metal”,”plastic”, etc.) and material forms (e.g. “bag”, “bottle”, “can”, etc.), which is used to annotate training data for the model. Models were developed to identify material types, material forms and plastic types (HDPE, PETE, etc). Unique models were trained for different purposes, thus helping achieve better accuracy (when harmonized and flexibility to cater to different applications). The models are trained with various backbones such as ResNet, MobileNet and, SpineNet.

To train the model on distinct waste and recyclable items, we have collaborated with several MRFs and have started to accumulate real-world images. We plan to continue growing the number and geographic locations of our MRF and waste management ecosystem partnerships in order to continue training the model across diverse waste streams.

Here are a few details on how our model was trained.

  • Data importing, cleaning and pre-processing
    • Once the data was collected, the annotation files had to be converted into COCO JSON format. All noise, errors and incorrect labels were removed from the COCO JSON file. Corrupt images were also removed both from the COCO JSON and dataset to ensure smooth training.
    • The final file is converted to the TFRecord format for faster training
  • Training
    • Mask RCNN was trained using the Model Garden repository on Google Cloud Platform.
    • Hyper parameter optimization was done by changing image size, batch size, learning rate, training steps, epochs and data augmentation steps
  • Model conversion 
    • Final checkpoints achieved after training the model were converted to both saved model and TFLite model formats to support server side and edge side deployments
  • Model deployment 
    • We are deploying the model on Google Cloud for server side inferencing and on edge computing devices
  • Visualization
    • Three ways in which the CircularNet model characterizes recyclables: Form, Material, & Plastic Type


      • Model identifying the material type (Ex. “Plastic”)
      • Model identifying the product form of the material (Ex. “Bottle”)
      • Model identifying the types of plastics (Ex. “HDPE”)

    How to use the CircularNet model

    All the models are available with guides and their respective colab scripts for pre-processing, training, model conversion, inference and visualization are available in the Tensorflow Model Garden repository. Pre-trained models for direct use from servers, browsers or mobile devices are available on TensorFlow Hub.

    Conclusion

    We hope the model can be deployed by, tinkered with, and improved upon by various stakeholders across the waste management ecosystem. We are in the early days of model development. By collaborating with a diverse set of stakeholders throughout the material recovery value chain, we can better create a more globally applicable model. If you are interested in collaborating with us on this journey, please reach out to waste-innovation-external@google.com.

    Acknowledgement

    A huge thank you to everyone who’s hard work made this project possible! We couldn’t have done this without partnering with the recycling ecosystem.

    Special thanks to Mark McDonald, Fan Yang, Vighnesh Birodkar and Jeff Rechtman

    Read More

    AI Esperanto: Large Language Models Read Data With NVIDIA Triton

    Julien Salinas wears many hats. He’s an entrepreneur, software developer and, until lately, a volunteer fireman in his mountain village an hour’s drive from Grenoble, a tech hub in southeast France.

    He’s nurturing a two-year old startup, NLP Cloud, that’s already profitable, employs about a dozen people and serves customers around the globe. It’s one of many companies worldwide using NVIDIA software to deploy some of today’s most complex and powerful AI models.

    NLP Cloud is an AI-powered software service for text data. A major European airline uses it to summarize internet news for its employees. A small healthcare company employs it to parse patient requests for prescription refills. An online app uses it to let kids talk to their favorite cartoon characters.

    Large Language Models Speak Volumes

    It’s all part of the magic of natural language processing (NLP), a popular form of AI that’s spawning some of the planet’s biggest neural networks called large language models. Trained with huge datasets on powerful systems, LLMs can handle all sorts of jobs such as recognizing and generating text with amazing accuracy.

    NLP Cloud uses about 25 LLMs today, the largest has 20 billion parameters, a key measure of the sophistication of a model. And now it’s implementing BLOOM, an LLM with a whopping 176 billion parameters.

    Running these massive models in production efficiently across multiple cloud services is hard work. That’s why Salinas turns to NVIDIA Triton Inference Server.

    High Throughput, Low Latency

    “Very quickly the main challenge we faced was server costs,” Salinas said, proud his self-funded startup has not taken any outside backing to date.

    “Triton turned out to be a great way to make full use of the GPUs at our disposal,” he said.

    For example, NVIDIA A100 Tensor Core GPUs can process as many as 10 requests at a time — twice the throughput of alternative software —  thanks to FasterTransformer, a part of Triton that automates complex jobs like splitting up models across many GPUs.

    FasterTransformer also helps NLP Cloud spread jobs that require more memory across multiple NVIDIA T4 GPUs while shaving the response time for the task.

    Customers who demand the fastest response times can process 50 tokens — text elements like words or punctuation marks — in as little as half a second with Triton on an A100 GPU, about a third of the response time without Triton.

    “That’s very cool,” said Salinas, who’s reviewed dozens of software tools on his personal blog.

    Touring Triton’s Users

    Around the globe, other startups and established giants are using Triton to get the most out of LLMs.

    Microsoft’s Translate service helped disaster workers understand Haitian Creole while responding to a 7.0 earthquake. It was one of many use cases for the service that got a 27x speedup using Triton to run inference on models with up to 5 billion parameters.

    NLP provider Cohere was founded by one of the AI researchers who wrote the seminal paper that defined transformer models. It’s getting up to 4x speedups on inference using Triton on its custom LLMs, so users of customer support chatbots, for example, get swift responses to their queries.

    NLP Cloud and Cohere are among many members of the NVIDIA Inception program, which nurtures cutting-edge startups. Several other Inception startups also use Triton for AI inference on LLMs.

    Tokyo-based rinna created chatbots used by millions in Japan, as well as tools to let developers build custom chatbots and AI-powered characters. Triton helped the company achieve inference latency of less than two seconds on GPUs.

    In Tel Aviv, Tabnine runs a service that’s automated up to 30% of the code written by a million developers globally (see a demo below). Its service runs multiple LLMs on A100 GPUs with Triton to handle more than 20 programming languages and 15 code editors.

    Twitter uses the LLM service of Writer, based in San Francisco. It ensures the social network’s employees write in a voice that adheres to the company’s style guide. Writer’s service achieves a 3x lower latency and up to 4x greater throughput using Triton compared to prior software.

    If you want to put a face to those words, Inception member Ex-human, just down the street from Writer, helps users create realistic avatars for games, chatbots and virtual reality applications. With Triton, it delivers response times of less than a second on an LLM with 6 billion parameters while reducing GPU memory consumption by a third.

    A Full-Stack Platform

    Back in France, NLP Cloud is now using other elements of the NVIDIA AI platform.

    For inference on models running on a single GPU, it’s adopting NVIDIA TensorRT software to minimize latency. “We’re getting blazing-fast performance with it, and latency is really going down,” Salinas said.

    The company also started training custom versions of LLMs to support more languages and enhance efficiency. For that work, it’s adopting NVIDIA Nemo Megatron, an end-to-end framework for training and deploying LLMs with trillions of parameters.

    The 35-year-old Salinas has the energy of a 20-something for coding and growing his business. He describes plans to build private infrastructure to complement the four public cloud services the startup uses, as well as to expand into LLMs that handle speech and text-to-image to address applications like semantic search.

    “I always loved coding, but being a good developer is not enough: You have to understand your customers’ needs,” said Salinas, who posted code on GitHub nearly 200 times last year.

    If you’re passionate about software, learn the latest on Triton in this technical blog.

    The post AI Esperanto: Large Language Models Read Data With NVIDIA Triton appeared first on NVIDIA Blog.

    Read More

    Discovering novel algorithms with AlphaTensor

    In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices. This paper is a stepping stone in DeepMind’s mission to advance science and unlock the most fundamental problems using AI. Our system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first time.Read More

    Discovering novel algorithms with AlphaTensor

    In our paper, published today in Nature, we introduce AlphaTensor, the first artificial intelligence (AI) system for discovering novel, efficient, and provably correct algorithms for fundamental tasks such as matrix multiplication. This sheds light on a 50-year-old open question in mathematics about finding the fastest way to multiply two matrices. This paper is a stepping stone in DeepMind’s mission to advance science and unlock the most fundamental problems using AI. Our system, AlphaTensor, builds upon AlphaZero, an agent that has shown superhuman performance on board games, like chess, Go and shogi, and this work shows the journey of AlphaZero from playing games to tackling unsolved mathematical problems for the first time. Read More

    Redact sensitive data from streaming data in near-real time using Amazon Comprehend and Amazon Kinesis Data Firehose

    Near-real-time delivery of data and insights enable businesses to rapidly respond to their customers’ needs. Real-time data can come from a variety of sources, including social media, IoT devices, infrastructure monitoring, call center monitoring, and more. Due to the breadth and depth of data being ingested from multiple sources, businesses look for solutions to protect their customers’ privacy and keep sensitive data from being accessed from end systems. You previously had to rely on personally identifiable information (PII) rules engines that could flag false positives or miss data, or you had to build and maintain custom machine learning (ML) models to identify PII in your streaming data. You also needed to implement and maintain the infrastructure necessary to support these engines or models.

    To help streamline this process and reduce costs, you can use Amazon Comprehend, a natural language processing (NLP) service that uses ML to find insights and relationships like people, places, sentiments, and topics in unstructured text. You can now use Amazon Comprehend ML capabilities to detect and redact PII in customer emails, support tickets, product reviews, social media, and more. No ML experience is required. For example, you can analyze support tickets and knowledge articles to detect PII entities and redact the text before you index the documents. After that, documents are free of PII entities and users can consume the data. Redacting PII entities helps you protect your customer’s privacy and comply with local laws and regulations.

    In this post, you learn how to implement Amazon Comprehend into your streaming architectures to redact PII entities in near-real time using Amazon Kinesis Data Firehose with AWS Lambda.

    This post is focused on redacting data from select fields that are ingested into a streaming architecture using Kinesis Data Firehose, where you want to create, store, and maintain additional derivative copies of the data for consumption by end-users or downstream applications. If you’re using Amazon Kinesis Data Streams or have additional use cases outside of PII redaction, refer to Translate, redact and analyze streaming data using SQL functions with Amazon Kinesis Data Analytics, Amazon Translate, and Amazon Comprehend, where we show how you can use Amazon Kinesis Data Analytics Studio powered by Apache Zeppelin and Apache Flink to interactively analyze, translate, and redact text fields in streaming data.

    Solution overview

    The following figure shows an example architecture for performing PII redaction of streaming data in real time, using Amazon Simple Storage Service (Amazon S3), Kinesis Data Firehose data transformation, Amazon Comprehend, and AWS Lambda. Additionally, we use the AWS SDK for Python (Boto3) for the Lambda functions. As indicated in the diagram, the S3 raw bucket contains non-redacted data, and the S3 redacted bucket contains redacted data after using the Amazon Comprehend DetectPiiEntities API within a Lambda function.

    Costs involved

    In addition to Kinesis Data Firehose, Amazon S3, and Lambda costs, this solution will incur usage costs from Amazon Comprehend. The amount you pay is a factor of the total number of records that contain PII and the characters that are processed by the Lambda function. For more information, refer to Amazon Kinesis Data Firehose pricing, Amazon Comprehend Pricing, and AWS Lambda Pricing.

    As an example, let’s assume you have 10,000 logs records, and the key value you want to redact PII from is 500 characters. Out of the 10,000 log records, 50 are identified as containing PII. The cost details are as follows:

    Contains PII Cost:

    • Size of each key value = 500 characters (1 unit = 100 characters)
    • Number of units (100 characters) per record (minimum is 3 units) = 5
    • Total units = 10,000 (records) x 5 (units per record) x 1 (Amazon Comprehend requests per record) = 50,000
    • Price per unit = $0.000002
      • Total cost for identifying log records with PII using ContainsPiiEntities API = $0.1 [50,000 units x $0.000002] 

    Redact PII Cost:

    • Total units containing PII = 50 (records) x 5 (units per record) x 1 (Amazon Comprehend requests per record) = 250
    • Price per unit = $0.0001
      • Total cost for identifying location of PII using DetectPiiEntities API = [number of units] x [cost per unit] = 250 x $0.0001 = $0.025

    Total Cost for identification and redaction:

    • Total cost: $0.1 (validation if field contains PII) + $0.025 (redact fields that contain PII) = $0.125

    Deploy the solution with AWS CloudFormation

    For this post, we provide an AWS CloudFormation streaming data redaction template, which provides the full details of the implementation to enable repeatable deployments. Upon deployment, this template creates two S3 buckets: one to store the raw sample data ingested from the Amazon Kinesis Data Generator (KDG), and one to store the redacted data. Additionally, it creates a Kinesis Data Firehose delivery stream with DirectPUT as input, and a Lambda function that calls the Amazon Comprehend ContainsPiiEntities and DetectPiiEntities API to identify and redact PII data. The Lambda function relies on user input in the environment variables to determine what key values need to be inspected for PII.

    The Lambda function in this solution has limited payload sizes to 100 KB. If a payload is provided where the text is greater than 100 KB, the Lambda function will skip it.

    To deploy the solution, complete the following steps:

    1. Launch the CloudFormation stack in US East (N. Virginia) us-east-1:
    2. Enter a stack name, and leave other parameters at their default
    3. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
    4. Choose Create stack.

    Deploy resources manually

    If you prefer to build the architecture manually instead of using AWS CloudFormation, complete the steps in this section.

    Create the S3 buckets

    Create your S3 buckets with the following steps:

    1. On the Amazon S3 console, choose Buckets in the navigation pane.
    2. Choose Create bucket.
    3. Create one bucket for your raw data and one for your redacted data.
    4. Note the names of the buckets you just created.

    Create the Lambda function

    To create and deploy the Lambda function, complete the following steps:

    1. On the Lambda console, choose Create function.
    2. Choose Author from scratch.
    3. For Function Name, enter AmazonComprehendPII-Redact.
    4. For Runtime, choose Python 3.9.
    5. For Architecture, select x86_64.
    6. For Execution role, select Create a new role with Lambda permissions.
    7. After you create the function, enter the following code:
      import json
      import boto3
      import os
      import base64
      import sys
      
      def lambda_handler(event, context):
          
          output = []
          
          for record in event['records']:
              
              # Gathers keys from enviroment variables and makes a list of desired keys to check for PII
              rawkeys = os.environ['keys']
              splitkeys = rawkeys.split(", ")
              print(splitkeys)
              #decode base64
              #Kinesis data is base64 encoded so decode here
              payloadraw=base64.b64decode(record["data"]).decode('utf-8')
              #Loads decoded payload into json
              payloadjsonraw = json.loads(payloadraw)
              
              # Creates Comprehend client
              comprehend_client = boto3.client('comprehend')
              
              
              # This codes handles the logic to check for keys, identify if PII exists, and redact PII if available. 
              for i in payloadjsonraw:
                  # checks if the key found in the message matches a redact
                  if i in splitkeys:
                      print("Redact key found, checking for PII")
                      payload = str(payloadjsonraw[i])
                      # check if payload size is less than 100KB
                      if sys.getsizeof(payload) < 99999:
                          print('Size is less than 100KB checking if value contains PII')
                          # Runs Comprehend ContainsPiiEntities API call to see if key value contains PII
                          pii_identified = comprehend_client.contains_pii_entities(Text=payload, LanguageCode='en')
                          
                          # If PII is not found, skip over key
                          if (pii_identified['Labels']) == []:
                              print('No PII found')
                          else:
                          # if PII is found, run through redaction logic
                              print('PII found redacting')
                              # Runs Comprehend DetectPiiEntities call to find exact location of PII
                              response = comprehend_client.detect_pii_entities(Text=payload, LanguageCode='en')
                              entities = response['Entities']
                              # creates redacted_payload which will be redacted
                              redacted_payload = payload
                              # runs through a loop that gathers necessary values from Comprehend API response and redacts values
                              for entity in entities:
                                  char_offset_begin = entity['BeginOffset']
                                  char_offset_end = entity['EndOffset']
                                  redacted_payload = redacted_payload[:char_offset_begin] + '*'*(char_offset_end-char_offset_begin) + redacted_payload[char_offset_end:]
                              # replaces original value with redacted value
                              payloadjsonraw[i] = redacted_payload
                              print(str(payloadjsonraw[i]))
                      else:
                          print ('Size is more than 100KB, skipping inspection')
                  else:
                      print("Key value not found in redaction list")
              
              redacteddata = json.dumps(payloadjsonraw)
              
              # adds inspected record to record
              output_record = {
                  'recordId': record['recordId'],
                  'result': 'Ok',
                  'data' : base64.b64encode(redacteddata.encode('utf-8'))
              }
              output.append(output_record)
              print(output_record)
              
          print('Successfully processed {} records.'.format(len(event['records'])))
          
          return {'records': output}

    8. Choose Deploy.
    9. In the navigation pane, choose Configuration.
    10. Navigate to Environment variables.
    11. Choose Edit.
    12. For Key, enter keys.
    13. For Value, enter the key values you want to redact PII from, separated by a comma and space. For example, enter Tweet1, Tweet2 if you’re using the sample test data provided in the next section of this post.
    14. Choose Save.
    15. Navigate to General configuration.
    16. Choose Edit.
    17. Change the value of Timeout to 1 minute.
    18. Choose Save.
    19. Navigate to Permissions.
    20. Choose the role name under Execution Role.
      You’re redirected to the AWS Identity and Access Management (IAM) console.
    21. For Add permissions, choose Attach policies.
    22. Enter Comprehend into the search bar and choose the policy ComprehendFullAccess.
    23. Choose Attach policies.

    Create the Firehose delivery stream

    To create your Firehose delivery stream, complete the following steps:

    1. On the Kinesis Data Firehose console, choose Create delivery stream.
    2. For Source, select Direct PUT.
    3. For Destination, select Amazon S3.
    4. For Delivery stream name, enter ComprehendRealTimeBlog.
    5. Under Transform source records with AWS Lambda, select Enabled.
    6. For AWS Lambda function, enter the ARN for the function you created, or browse to the function AmazonComprehendPII-Redact.
    7. For Buffer Size, set the value to 1 MB.
    8. For Buffer Interval, leave it as 60 seconds.
    9. Under Destination Settings, select the S3 bucket you created for the redacted data.
    10. Under Backup Settings, select the S3 bucket that you created for the raw records.
    11. Under Permission, either create or update an IAM role, or choose an existing role with the proper permissions.
    12. Choose Create delivery stream.

    Deploy the streaming data solution with the Kinesis Data Generator

    You can use the Kinesis Data Generator (KDG) to ingest sample data to Kinesis Data Firehose and test the solution. To simplify this process, we provide a Lambda function and CloudFormation template to create an Amazon Cognito user and assign appropriate permissions to use the KDG.

    1. On the Amazon Kinesis Data Generator page, choose Create a Cognito User with CloudFormation.You’re redirected to the AWS CloudFormation console to create your stack.
    2. Provide a user name and password for the user with which you log in to the KDG.
    3. Leave the other settings at their defaults and create your stack.
    4. On the Outputs tab, choose the KDG UI link.
    5. Enter your user name and password to log in.

    Send test records and validate redaction in Amazon S3

    To test the solution, complete the following steps:

    1. Log in to the KDG URL you created in the previous step.
    2. Choose the Region where the AWS CloudFormation stack was deployed.
    3. For Stream/delivery stream, choose the delivery stream you created (if you used the template, it has the format accountnumber-awscomprehend-blog).
    4. Leave the other settings at their defaults.
    5. For the record template, you can create your own tests, or use the following template.If you’re using the provided sample data below for testing, you should have updated environment variables in the AmazonComprehendPII-Redact Lambda function to Tweet1, Tweet2. If deployed via CloudFormation, update environment variables to Tweet1, Tweet2 within the created Lambda function. The sample test data is below:
      {"User":"12345", "Tweet1":" Good morning, everybody. My name is Van Bokhorst Serdar, and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address SerdarvanBokhorst@dayrep.com. My address is 2657 Koontz Lane, Los Angeles, CA. My phone number is 818-828-6231.", "Tweet2": "My Social security number is 548-95-6370. My Bank account number is 940517528812 and routing number 195991012. My credit card number is 5534816011668430, Expiration Date 6/1/2022, my C V V code is 121, and my pin 123456. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this streaming record. Let's check"}

    6. Choose Send Data, and allow a few seconds for records to be sent to your stream.
    7. After few seconds, stop the KDG generator and check your S3 buckets for the delivered files.

    The following is an example of the raw data in the raw S3 bucket:

    {"User":"12345", "Tweet1":" Good morning, everybody. My name is Van Bokhorst Serdar, and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address SerdarvanBokhorst@dayrep.com. My address is 2657 Koontz Lane, Los Angeles, CA. My phone number is 818-828-6231.", "Tweet2": "My Social security number is 548-95-6370. My Bank account number is 940517528812 and routing number 195991012. My credit card number is 5534816011668430, Expiration Date 6/1/2022, my C V V code is 121, and my pin 123456. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this streaming record. Let's check"}

    The following is an example of the redacted data in the redacted S3 bucket:

    {"User":"12345", "Tweet1":"Good morning, everybody. My name is *******************, and today I feel like sharing a whole lot of personal information with you. Let's start with my Email address ****************************. My address is ********************************** My phone number is ************.", "Tweet"2: "My Social security number is ***********. My Bank account number is ************ and routing number *********. My credit card number is ****************, Expiration Date ********, my C V V code is ***, and my pin ******. Well, I think that's it. You know a whole lot about me. And I hope that Amazon comprehend is doing a good job at identifying PII entities so you can redact my personal information away from this streaming record. Let's check"}

    The sensitive information has been removed from the redacted messages, providing confidence that you can share this data with end systems.

    Cleanup

    When you’re finished experimenting with this solution, clean up your resources by using the AWS CloudFormation console to delete all the resources deployed in this example. If you followed the manual steps, you will need to manually delete the two buckets, the AmazonComprehendPII-Redact function, the ComprehendRealTimeBlog stream, the log group for the ComprehendRealTimeBlog stream, and any IAM roles that were created.

    Conclusion

    This post showed you how to integrate PII redaction into your near-real-time streaming architecture and reduce data processing time by performing redaction in flight. In this scenario, you provide the redacted data to your end-users and a data lake administrator secures the raw bucket for later use. You could also build additional processing with Amazon Comprehend to identify tone or sentiment, identify entities within the data, and classify each message.

    We provided individual steps for each service as part of this post, and also included a CloudFormation template that allows you to provision the required resources in your account. This template should be used for proof of concept or testing scenarios only. Refer to the developer guides for Amazon Comprehend, Lambda, and Kinesis Data Firehose for any service limits.

    To get started with PII identification and redaction, see Personally identifiable information (PII). With the example architecture in this post, you could integrate any of the Amazon Comprehend APIs with near-real-time data using Kinesis Data Firehose data transformation. To learn more about what you can build with your near-real-time data with Kinesis Data Firehose, refer to the Amazon Kinesis Data Firehose Developer Guide. This solution is available in all AWS Regions where Amazon Comprehend and Kinesis Data Firehose are available.


    About the authors

    Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), helping Enterprise customers across the Midwest US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. In his free time, he enjoys spending quality time with his family exploring new places and overanalyzing his sports team’s performance

    Sriharsh Adari is a Senior Solutions Architect at Amazon Web Services (AWS), where he helps customers work backwards from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise include Technology Strategy, Data Analytics, and Data Science. In his spare time, he enjoys playing Tennis, binge-watching TV shows, and playing Tabla.

    Read More