AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries

AI-Fueled Productivity: Generative AI Opens New Era of Efficiency Across Industries

A watershed moment on Nov. 22, 2022, was mostly virtual, yet it shook the foundations of nearly every industry on the planet.

On that day, OpenAI released ChatGPT, the most advanced artificial intelligence chatbot ever developed. This set off demand for generative AI applications that help businesses become more efficient, from providing consumers with answers to their questions to accelerating the work of researchers as they seek scientific breakthroughs, and much, much more.

Businesses that previously dabbled in AI are now rushing to adopt and deploy the latest applications. Generative AI — the ability of algorithms to create new text, images, sounds, animations, 3D models and even computer code — is moving at warp speed, transforming the way people work and play.

By employing large language models (LLMs) to handle queries, the technology can dramatically reduce the time people devote to manual tasks like searching for and compiling information.

The stakes are high. AI could contribute more than $15 trillion to the global economy by 2030, according to PwC. And the impact of AI adoption could be greater than the inventions of the internet, mobile broadband and the smartphone — combined.

The engine driving generative AI is accelerated computing. It uses GPUs, DPUs and networking along with CPUs to accelerate applications across science, analytics, engineering, as well as consumer and enterprise use cases.

Early adopters across industries — from drug discovery, financial services, retail and telecommunications to energy, higher education and the public sector — are combining accelerated computing with generative AI to transform business operations, service offerings and productivity.

Generating the Next Wave of AI Transformation
Click to view the infographic: Generating the Next Wave of AI Transformation

Generative AI for Drug Discovery

Today, radiologists use AI to detect abnormalities in medical images, doctors use it to scan electronic health records to uncover patient insights, and researchers use it to accelerate the discovery of novel drugs.

Traditional drug discovery is a resource-intensive process that can require the synthesis of over 5,000 chemical compounds and yields an average success rate of just 10%. And it takes more than a decade for most new drug candidates to reach the market.

Researchers are now using generative AI models to read a protein’s amino acid sequence and accurately predict the structure of target proteins in seconds, rather than weeks or months.

Using NVIDIA BioNeMo models, Amgen, a global leader in biotechnology, has slashed the time it takes to customize models for molecule screening and optimization from three months to just a few weeks. This type of trainable foundation model enables scientists to create variants for research into specific diseases, allowing them to develop target treatments for rare conditions.

Whether predicting protein structures or securely training algorithms on large real-world and synthetic datasets, generative AI and accelerated computing are opening new areas of research that can help mitigate the spread of disease, enable personalized medical treatments and boost patient survival rates.

Generative AI for Financial Services

According to a recent NVIDIA survey, the top AI use cases in the financial services industry are customer services and deep analytics, where natural language processing and LLMs are used to better respond to customer inquiries and uncover investment insights. Another common application is in recommender systems that power personalized banking experiences, marketing optimization and investment guidance.

Advanced AI applications have the potential to help the industry better prevent fraud and transform every aspect of banking, from portfolio planning and risk management to compliance and automation.

Eighty percent of business-relevant information is in an unstructured format — primarily text — which makes it a prime candidate for generative AI. Bloomberg News produces 5,000 stories a day related to the financial and investment community. These stories represent a vast trove of unstructured market data that can be used to make timely investment decisions.

NVIDIA, Deutsche Bank, Bloomberg and others are creating LLMs trained on domain-specific and proprietary data to power finance applications.

Financial Transformers, or “FinFormers,” can learn context and understand the meaning of unstructured financial data. They can power Q&A chatbots, summarize and translate financial texts, provide early warning signs of counterparty risk, quickly retrieve data and identify data-quality issues.

These generative AI tools rely on frameworks that can integrate proprietary data into model training and fine-tuning, integrate data curation to prevent bias and use guardrails to keep conversations finance-specific.

Expect fintech startups and large international banks to expand their use of LLMs and generative AI to develop sophisticated virtual assistants to serve internal and external stakeholders, create hyper-personalized customer content, automate document summarization to reduce manual work, and analyze terabytes of public and private data to generate investment insights.

Generative AI for Retail

With 60% of all shopping journeys starting online and consumers more connected and knowledgeable than ever, AI has become a vital tool to help retailers match shifting expectations and differentiate from a rising tide of competition.

Retailers are using AI to improve customer experiences, power dynamic pricing, create customer segmentation, design personalized recommendations and perform visual search.

Generative AI can support customers and employees at every step through the buyer journey.

With AI models trained on specific brand and product data, they can generate robust product descriptions that improve search engine optimization rankings and help shoppers find the exact product they’re looking for. For example, generative AI can use metatags containing product attributes to generate more comprehensive product descriptions that include various terms like “low sugar” or “gluten free.”

AI virtual assistants can check enterprise resource planning systems and generate customer service messages to inform shoppers about which items are available and when orders will ship, and even assist customers with order change requests.

Fashable, a member of NVIDIA Inception’s global network of technology startups, is using generative AI to create virtual clothing designs, eliminating the need for physical fabric during product development. With the models trained on both proprietary and market data, this reduces the environmental impact of fashion design and helps retailers design clothes according to current market trends and tastes.

Expect retailers to use AI to capture and retain customer attention, deliver superior shopping experiences, and drive revenue by matching shoppers with the right products at the right time.

Generative AI for Telecommunications

In an NVIDIA survey covering the telecommunications industry, 95% of respondents reported that they were engaged with AI, while two-thirds believed that AI would be important to their company’s future success.

Whether improving customer service, streamlining network operations and design, supporting field technicians or creating new monetization opportunities, generative AI has the potential to reinvent the telecom industry.

Telcos can train diagnostic AI models with proprietary data on network equipment and services, performance, ticket issues, site surveys and more. These models can accelerate troubleshooting of technical performance issues, recommend network designs, check network configurations for compliance, predict equipment failures, and identify and respond to security threats.

Generative AI applications on handheld devices can support field technicians by scanning equipment and generating virtual tutorials to guide them through repairs. Virtual guides can then be enhanced with augmented reality, enabling technicians to analyze equipment in a 3D immersive environment or call on a remote expert for support.

New revenue opportunities will also open for telcos. With large edge infrastructure and access to vast datasets, telcos around the world are now offering generative AI as a service to enterprise and government customers.

As generative AI advances, expect telecommunications providers to use the technology to optimize network performance, improve customer support, detect security intrusions and enhance maintenance operations.

Generative AI for Energy

In the energy industry, AI is powering predictive maintenance and asset optimization, smart grid management, renewable energy forecasting, grid security and more.

To meet growing data needs across aging infrastructure and new government compliance regulations, energy operators are looking to generative AI.

In the U.S., electric utility companies spend billions of dollars every year to inspect, maintain and upgrade power generation and transmission infrastructure.

Until recently, using vision AI to support inspection required algorithms to be trained on thousands of manually collected and tagged photos of grid assets, with training data constantly updated for new components. Now, generative AI can do the heavy lifting.

With a small set of image training data, algorithms can generate thousands of physically accurate images to train computer vision models that help field technicians identify grid equipment corrosion, breakage, obstructions and even detect wildfires. This type of proactive maintenance enhances grid reliability and resiliency by reducing downtime, while diminishing the need to dispatch teams to the field.

Generative AI can also reduce the need for manual research and analysis. According to McKinsey, employees spend up to 1.8 hours per day searching for information — nearly 20% of the work week. To increase productivity, energy companies can train LLMs on proprietary data, including meeting notes, SAP records, emails, field best practices and public data such as standard material data sheets.

With this type of knowledge repository connected to an AI chatbot, engineers and data scientists can get instant answers to highly technical questions. For example, a maintenance engineer troubleshooting pitch control issues on a turbine’s hydraulic system could ask a bot: “How should I adjust the hydraulic pressure or flow to rectify pitch control issues on a model turbine from company X?” A properly trained model would deliver specific instructions to the user, who wouldn’t have to look through a bulky manual to find answers.

With AI applications for new system design, customer service and automation, expect generative AI to enhance safety and energy efficiency, as well as reduce operational expenses in the energy industry.

Generative AI for Higher Education and Research

From intelligent tutoring systems to automated essay grading, AI has been employed in education for decades. As universities use AI to improve teacher and student experiences, they’re increasingly dedicating resources to build AI-focused research initiatives.

For example, researchers at the University of Florida have access to one of the world’s fastest supercomputers in academia. They’ve used it to develop GatorTron — a natural language processing model that enables computers to read and interpret medical language in clinical notes that are stored in electronic health records. With a model that understands medical context, AI developers can create numerous medical applications, such as speech-to-text apps that support doctors with automated medical charting.

In Europe, an industry-university collaboration involving the Technical University of Munich is demonstrating that LLMs trained on genomics data can generalize across a plethora of genomic tasks, unlike previous approaches that required specialized models. The genomics LLM is expected to help scientists understand the dynamics of how DNA is translated into RNA and proteins, unlocking new clinical applications that will benefit drug discovery and health.

To conduct this type of groundbreaking research and attract the most motivated students and qualified academic professionals, higher education institutes should consider a whole-university approach to pool budget, plan AI initiatives, and distribute AI resources and benefits across disciplines.

Generative AI for the Public Sector

Today, the biggest opportunity for AI in the public sector is helping public servants to perform their jobs more efficiently and save resources.

The U.S. federal government employs over 2 million civilian employees — two-thirds of whom work in professional and administrative jobs.

These administrative roles often involve time-consuming manual tasks, including drafting, editing and summarizing documents, updating databases, recording expenditures for auditing and compliance, and responding to citizen inquiries.

To control costs and bring greater efficiency to routine job functions, government agencies can use generative AI.

Generative AI’s ability to summarize documents has great potential to boost the productivity of policymakers and staffers, civil servants, procurement officers and contractors. Consider a 756-page report recently released by the National Security Commission on Artificial Intelligence. With reports and legislation often spanning hundreds of pages of dense academic or legal text, AI-powered summaries generated in seconds can quickly break down complex content into plain language, saving the human resources otherwise needed to complete the task.

AI virtual assistants and chatbots powered by LLMs can instantly deliver relevant information to people online, taking the burden off of overstretched staff who work phone banks at agencies like the Treasury Department, IRS and DMV.

With simple text inputs, AI content generation can help public servants create and distribute publications, email correspondence, reports, press releases and public service announcements.

The analytical capabilities of AI can also help process documents to speed the delivery of vital services provided by organizations like Medicare, Medicaid, Veterans Affairs, USPS and the State Department.

Generative AI could be a pivotal tool to help government bodies work within budget constraints, deliver government services more quickly and achieve positive public sentiment.

Generative AI – A Key Ingredient for Business Success 

Across every field, organizations are transforming employee productivity, improving products and delivering higher-quality services with generative AI.

To put generative AI into practice, businesses need expansive amounts of data, deep AI expertise and sufficient compute power to deploy and maintain models quickly. Enterprises can fast-track adoption with the NeMo generative AI framework, part of NVIDIA AI Enterprise software, running on DGX Cloud. NVIDIA’s pretrained foundation models offer a simplified approach to building and running customized generative AI solutions for unique business use cases.

Learn more about powerful generative AI tools to help your business increase productivity, automate tasks, and unlock new opportunities for employees and customers. 

Read More

Full-Scale Gaming: ‘Dragon’s Dogma: Dark Arisen’ Comes to GeForce NOW

Full-Scale Gaming: ‘Dragon’s Dogma: Dark Arisen’ Comes to GeForce NOW

Arise, members! Capcom’s legendary role-playing game Dragon’s Dogma: Dark Arisen joins the GeForce NOW library today.

The RPG and THQ Nordic’s Jagged Alliance 3 are newly supported on GeForce NOW, playable on nearly any device.

From Dusk Till Pawn

Dragon's Dogma: Dark Arisen on GeForce NOW
It’s dangerous to go alone, so bring a Pawn along in “Dragon’s Dogma: Dark Arisen.”

Become the Arisen and take up the challenge in Capcom’s critically acclaimed RPG. Set in a huge open world, Dragon’s Dogma: Dark Arisen brings players on an epic adventure filled with challenging battles and action.

But there’s no need to go it alone: Adventure with up to three Pawns. These customizable AI companions fight independently, demonstrating prowess and ability they’ve developed based on traits learned from each player.

Players can share their Pawns online and reap rewards of treasures, tips and strategy hints for taking down terrifying enemies. Pawns can also be borrowed when specific skills are needed to complete various challenging quests.

Revisit Gransys or experience Dragon’s Dogma for the first time. Members can play the real Steam version of this RPG classic with support for stunning visuals and high-resolution graphics, even on devices like Macs, mobile devices and smart TVs. Priority members can adventure at up to 1080p 60 frames per second, or upgrade to an Ultimate membership for gameplay at up to 4K 120 fps, longer streaming sessions and RTX ON for supported games.

Game On

Jagged Alliance 3 on GeForce NOW
The cloud is locked and loaded.

Another week means new games.

THQ Nordic’s tactical RPG Jagged Alliance 3 joins the cloud this week. Chaos reigns when the elected president of Grand Chien — a nation of rich natural resources and deep political divides — goes missing and a paramilitary force known as “The Legion” seizes control of the countryside. Recruit from a large cast of unique mercenaries and make choices to impact the country’s fate.

Members can look forward to the following this week:

  • Jagged Alliance 3 (New release on Steam, July 14)
  • Dragon’s Dogma: Dark Arisen (Steam)

On top of that, in collaboration with EE, the U.K.’s biggest and fastest mobile network, GeForce NOW launched new cloud gaming bundles featuring Priority and Ultimate memberships. To celebrate, check out how streamer Leah ‘Leahviathan’ Alexandra showcased GeForce NOW in action at the U.K.’s highest-altitude gaming den on the slopes of Ben Nevis, 1,500 feet above sea level in the clouds of the Scottish Highlands.

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

Read More

Thinking beyond audio: Augmenting headphones for everyday digital interactions

Thinking beyond audio: Augmenting headphones for everyday digital interactions

This research was accepted by and received a Best Paper Award during ACM Designing Interactive Systems (DIS) 2023, which is dedicated to advancing the field of user-centered system design.

Headphones are traditionally used to provide and manage audio experiences through physical controls and a range of sensors. Nonetheless, these controls and sensors have remained confined to audio input and output functionality, such as adjusting the volume or muting the microphone. Imagine if headphones could transcend their role as mere audio devices. 

Because headphones rank among the most popular wearables in the market, we have an exciting opportunity to expand their capabilities through integrating existing sensors with supplementary ones to enable a wide variety of experiences that go beyond traditional audio control. In our paper, “Beyond Audio: Towards a Design Space of Headphones as a Site for Interaction and Sensing,” we share a vision that explores this potential.

By using sensors such as microphones, proximity sensors, motion sensors, inertial measurement units (IMUs), and LiDARs, headphone designers can explore new avenues of input and interaction. The fact that headphones are worn on a person’s head allows for a wide range of applications, such as following head movements, body postures, and hand gestures. Furthermore, as wearable devices, headphones have the potential to provide wearers with context-rich information and enable more intuitive and immersive interactions with their devices and environment beyond traditional button-based controls.

Spotlight: Microsoft Research Podcast

AI Frontiers: The Physics of AI with Sébastien Bubeck

What is intelligence? How does it emerge and how do we measure it? Ashley Llorens and machine learning theorist Sébastian Bubeck discuss accelerating progress in large-scale AI and early experiments with GPT-4.

Potential scenarios for sensor-enhanced headphones 

To explore this concept further, we propose augmenting headphones with additional sensors and input widgets. These include: 

  • IMUs to sense head orientation
  • Swappable sets of input controls  
  • A range-sensing LiDAR that enables the sensing of hand gestures

By incorporating these capabilities, we envision a wide range of applications where headphone input acts as a bridge between the person wearing it and their environment and enable more efficient and context-aware interactions among multiple devices and tasks. For example, a headphone could assist people with applications like video games or help manage interruptions during a video call.  

Let’s explore some scenarios to illustrate the potential of our headphone design concept. Consider a person engaged in a video call with teammates when they are suddenly interrupted by a colleague who approaches in person. In this situation, our headphones would be equipped to detect contextual cues, such as when the wearer rotates their head away from a video call, signaling a shift in attention. In response, the headphones could automatically blur the video feed and mute the microphone to protect the wearer’s privacy, as shown in Figure 1. This feature could also communicate to other participants that the wearer is temporarily engaged in another conversation or activity. When the wearer returns their attention to the call, the system removes the blur and reactivates the microphone.

Figure 1: Two videos side-by-side showing the headphones in a context-aware privacy-control scenario. On the left, there is an over-the-shoulder view of a wearer participating in a video call on a laptop. As he looks away from the call, the laptop screen changes color, and the application is muted, depicted by a mute icon overlayed on the video. As the wearer looks back at the screen, it becomes unblurred and a unmute icon is overlaid on the image, indicating the mute has been turned off. On the right, we see the laptop screen previously described.
Figure 1. These videos illustrate a context-aware privacy control system implemented during a video conference. In this scenario, the wearer temporarily disengages from the video conference to engage in an in-person conversation. After a predefined period, the system detects the wearer’s continued attention directed away from any known device, taking into account the environment context. As a result, privacy measures are triggered, including video blurring, microphone muting, and notifying other participants on the call. Once the wearer re-engages with the screen, their video and microphone settings return to normal, ensuring a seamless experience.

In another privacy-focused scenario, imagine a person simultaneously conversing with multiple teammates in separate video call channels. Our headphone design allows the wearer to control to whom their speech is directed by simply looking at their intended audience, as shown in Figure 2. This directed speech interaction can extend beyond video calls and be applied to other contexts, such as sending targeted voice commands to teammates in a multiplayer video game.

DIS 2023 - Figure 2: Two videos side-by-side showing the wearer controlling where his input is being sent among a multitude of devices. On the left, a video shows an over-the-shoulder view of a wearer interacting with a monitor and aptop while wearing headphones. There are two separate video calls on each screen. As the wearer turns from one screen to another, a large microphone icon appears on the screen at which the wearer is looking, and a muted microphone icon is shown on the other screen.

The video on the right shows an over-the-shoulder view of a wearer interacting with a laptop while wearing headphones. The laptop screen shows a video game and four circular icons on each corner depicting the other players. The user looks at the bottom left of the screen, which enlarges the icon of the teammate in that corner, and the wearer starts to speak. The wearer then looks at the top-right of the screen, and the teammate in that corner is highlighted while the wearer speaks.
Figure 2. Headphones track the wearer’s head pose, seamlessly facilitating the distribution of video and/or audio across multiple private chats. They effectively communicate the wearer’s availability to other participants, whether in a video conferencing scenario (left) or a gaming scenario (right).

In our paper, we also demonstrate how socially recognizable gestures can introduce new forms of audio-visual control instead of relying solely on on-screen controls. For example, wearers could interact with media through gestural actions, such as cupping their ear towards the audio source to increase the volume while simultaneously reducing ambient noise, as shown in Figure 3. These gestures, ingrained in social and cultural contexts, can serve as both control mechanisms and nonverbal communication signals.

DIS 2023 - Fig 3 - image showing gestural controls for volume
Figure 3. Top: Raising the earcup, a commonly used gesture to address in-person interruptions, mutes both the sound and the microphone to ensure privacy. Bottom: Cupping the earcup, a gesture indicating difficulty hearing, increases the system volume.

Additionally, we can estimate the wearer’s head gaze through the use of an IMU. When combined with the physical location of computing devices in the wearer’s vicinity, it opens up possibilities for seamless interactions across multiple devices. For instance, during a video call, the wearer can share the screen of the device they are actively focusing on. In this scenario, the wearer shifts their attention from an external monitor to a tablet device. Even though this tablet is not directly connected to the main laptop, our system smoothly transitions the screen sharing for the wearer’s audience in the video call, as shown in Figure 4.

DIS 2023 - Figure 4: Two videos side-by-side showing a headphone wearer among a multitude of devices controlling which screen is shared in a video call. The video on the left shows an over-the-shoulder view of a person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. A video call is in progress on the laptop, and the wearer is giving a presentation, which appears as a slide on the attached monitor. As the wearer turns from the laptop screen to the monitor, the presentation slide appears on the shared laptop screen. The video on the right shows an over-the-shoulder view of the person interacting with three screens—a monitor, a laptop, and a tablet—while wearing headphones. We see the wearer looking at the monitor with a presentation slide, which is mirrored on the laptop screen. He then turns from the monitor to the tablet, which has a drawing app open. As he does this, the drawing app appears on the shared laptop screen. The wearer uses a pen to draw on the tablet, and this is mirrored on the laptop. Finally, the wearer looks up from the tablet to the laptop, and the laptop screen switches to the video call view with the participants’ videos.
Figure 4. A wearer delivers a presentation using a video conferencing tool. As the wearer looks at different devices, the streamed video dynamically updates to display the relevant source to participants.

Finally, in our paper we also show the use of embodied interactions, where the wearer’s body movements serve to animate a digital representation of themselves, such as an avatar in a video call, as shown in Figure 5. This feature can also be implemented as a gameplay mechanism. Take a racing game for instance, where the wearer’s body movements could control the vehicle’s steering, shown on the left in Figure 6. To extend this capability, these movements could enable a wearer to peek around obstacles in any first-person game, enhancing the immersion and gameplay experience, shown on the right in Figure 6.

DIS 2023 - Figure 5: Two videos showing a headphone wearer controlling an avatar in a video call through head movements. The video on the left shows an over-the-shoulder view of a headphones wearer interacting with another participant on the call. The video on the right shows a wearer using a touch control to depict an emotion in his avatar.
Figure 5. Left: Headphones use an IMU to monitor and capture natural body movements, which are then translated into corresponding avatar movements. Right: Touch controls integrated into headphones enable wearers to evoke a range of emotions on the avatar, enhancing the user experience.
DIS 2023 - Figure 6: Two videos showing a wearer playing a video game while leaning left and right. These movements control his character’s movements, enabling him to duck and peek around walls.
Figure 6. Leaning while wearing the headphone (with an integrated IMU) has a direct impact on game play action. On the left, it results in swerving the car to the side, while on the right, in enables the player to duck behind a wall.

Design space for headphone interactions 

We define a design space for interactive headphones through an exploration of two distinct concepts, which we discuss in depth in our paper.

First, we look at the type of input gesture for the interaction, which we further classify into three categories. The gestural input from the wearer might fall under one or more of these categories, which we outline in more detail below and illustrate in Figure 7.

  • Touch-based gestures that involve tangible inputs on the headphones, such as buttons or knobs, requiring physical contact by the wearer
  • Mid-air gestures, which the wearer makes with their hands in close proximity to the headphones, detected through LiDAR technology
  • Head orientation, indicating the direction of the wearer’s attention
DIS 2023 - Figure 7: List of three stylized images showing the three main kinds of gestures we look at: touch, head orientation, and mid-air gestures.
Figure 7. Sensor-enhanced headphones can use touch-based gestures (left), head orientation (middle), or mid-air gestures (right) as types of input.

The second way that we define the design space is through the context within which the wearer executes the action. Here, design considerations for sensor-enhanced headphones go beyond user intentionality and observed motion. Context-awareness enables these headphones to understand the wearer’s activities, the applications they are engaged with, and the devices in their vicinity, as illustrated in Figure 8. This understanding enables the headphones to provide personalized experiences and seamlessly integrate with the wearer’s environment. The four categories that define this context-awareness are comprised of the following: 

  • Context-free actions, which produce similar results regardless of the active application, the wearer’s activity, or the social or physical environment.  
  • Context that is defined by the application with which the wearer is interacting. For example, are they listening to music, on a video call, or watching a movie?  
  • Context that is defined by the wearer’s body. For example, is the wearer’s gesture close to a body part that has an associated meaning? Eyes might relate to visual functions, ears to audio input, and the mouth to audio output. 
  • Context that is defined by the wearer’s environment. For example, are there other devices or people around the wearer with whom they might want to interact?
DIS 2023 - Figure 8: Diagram showing the different levels of context we look at: context free, application, user's body, and the environment.
Figure 8. The system uses diverse contextual information to enable personalized responses to user input.

Looking ahead: Expanding the possibilities of HCI with everyday wearables  

Sensor-enhanced headphones offer a promising avenue for designers to create immersive and context-aware user experiences. By incorporating sensors, these headphones can capture subtle user behaviors, facilitating seamless interactions and enhancing the wearer’s overall experience.  

From safeguarding privacy to providing intuitive control mechanisms, the potential applications for sensor-enhanced headphones are vast and exciting. This exploration with headphones scratches the surface of what context-aware wearable technology can empower its wearers to achieve. Consider the multitude of wearables we use every day that could benefit from integrating similar sensing and interaction capabilities into these devices. For example, imagine a watch that can track your hand movements and detect gestures. By enabling communication between sensor-enhanced wearables, we can establish a cohesive ecosystem for human-computer interaction that spans across applications, devices, and social contexts.

The post Thinking beyond audio: Augmenting headphones for everyday digital interactions appeared first on Microsoft Research.

Read More

Score! Team NVIDIA Takes Trophy in Recommendation Systems

Score! Team NVIDIA Takes Trophy in Recommendation Systems

A crack NVIDIA team of five machine learning experts spread across four continents won all three tasks in a hotly contested, prestigious competition to build state-of-the-art recommendation systems.

The results reflect the group’s savvy applying the NVIDIA AI platform to real-world challenges for these engines of the digital economy. Recommenders serve up trillions of search results, ads, products, music and news stories to billions of people daily.

More than 450 teams of data scientists competed in the Amazon KDD Cup ‘23. The three-month challenge had its share of twists and turns and a nail-biter of a finish.

Shifting Into High Gear

In the first 10 weeks of the competition, the team had a comfortable lead. But in the final phase, organizers switched to new test datasets and other teams surged ahead.

The NVIDIANs shifted into high gear, working nights and weekends to catch up. They left a trail of round-the-clock Slack messages from team members living in cities from Berlin to Tokyo.

“We were working nonstop, it was pretty exciting,” said Chris Deotte, a team member in San Diego.

A Product by Any Other Name

The last of the three tasks was the hardest.

Participants had to predict which products users would buy based on data from their browsing sessions. But the training data didn’t include brand names of many possible choices.

“I knew from the beginning, this would be a very, very difficult test,” said Gilberto “Giba” Titericz.

KGMON to the Rescue

Based in Curitaba, Brazil, Titericz was one of four team members ranked as grandmasters in Kaggle competitions, the online Olympics of data science. They’re part of a team of machine learning ninjas who’ve won dozens of competitions. NVIDIA founder and CEO Jensen Huang calls them KGMON (Kaggle Grandmasters of NVIDIA), a playful takeoff on Pokémon.

In dozens of experiments, Titericz used large language models (LLMs) to build generative AIs to predict product names, but none worked.

In a creative flash, the team discovered a work-around. Predictions using their new hybrid ranking/classifier model were spot on.

Down to the Wire

In the last hours of the competition, the team raced to package all their models together for a few final submissions. They’d been running overnight experiments across as many as 40 computers.

Kazuki Onodera, a KGMON in Tokyo, was feeling jittery. “I really didn’t know if our actual scores would match what we were estimating,” he said.

KGMON pictures
The four KGMON (clockwise from upper left) Onodera, Titericz, Deotte and Puget.

Deotte, also a KGMON, remembered it as “something like 100 different models all working together to produce a single output … we submitted it to the leaderboard, and POW!”

The team inched ahead of its closest rival in the AI equivalent of a photo finish.

The Power of Transfer Learning

In another task, the team had to take lessons learned from large datasets in English, German and Japanese and apply them to meager datasets a tenth the size in French, Italian and Spanish. It’s the kind of real-world challenge many companies face as they expand their digital presence around the globe.

Jean-Francois Puget, a three-time Kaggle grandmaster based outside Paris, knew an effective approach to transfer learning. He used a pretrained multilingual model to encode product names, then fine-tuned the encodings.

“Using transfer learning improved the leaderboard scores enormously,” he said.

Blending Savvy and Smart Software

The KGMON efforts show the field known as recsys is sometimes more art than science, a practice that combines intuition and iteration.

It’s expertise that’s encoded into software products like NVIDIA Merlin, a framework to help users quickly build their own recommendation systems.

Chart of Merlin framework for recommendation
The Merlin framework provides an end-to-end solution for building recommendation systems.

Benedikt Schifferer, a Berlin-based teammate who helps design Merlin, used the software to train transformer models that crushed the competition’s classic recsys task.

“Merlin provides great results right out of the box, and the flexible design lets me customize models for the specific challenge,” he said.

Riding the RAPIDS

Like his teammates, he also used RAPIDS, a set of open-source libraries for accelerating data science on GPUs.

For example, Deotte accessed code from NGC, NVIDIA’s hub for accelerated software. Called DASK XGBoost, the code helped spread a large, complex task across eight GPUs and their memory.

For his part, Titericz used a RAPIDS library called cuML to search through millions of product comparisons in seconds.

The team focused on session-based recommenders that don’t require data from multiple user visits. It’s a best practice these days when many users want to protect their privacy.

To learn more:

 

 

Read More

MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time

MosaicML Helps AI Users Boost Accuracy, Cut Costs and Save Time

Startup MosaicML is on a mission to help the AI community improve prediction accuracy, decrease costs and save time by providing tools for easy training and deployment of large AI models.

In this episode of NVIDIA’s AI Podcast, host Noah Kravitz speaks with MosaicML CEO and co-founder Naveen Rao about how the company aims to democratize access to large language models.

MosaicML, a member of NVIDIA’s Inception program, has identified two key barriers to widespread adoption: the difficulty of coordinating a large number of GPUs to train a model and the costs associated with this process.

MosaicML was in the news earlier this month when Databricks announced an agreement to acquire MosaicML for $1.3 billion.

Making training of models accessible is key for many companies that need control over model behavior, respect data privacy and iterate fast to develop new products based on AI.

You Might Also Like

Jules Anh Tuan Nguyen Explains How AI Lets Amputee Control Prosthetic Hand, Video Games

A postdoctoral researcher at the University of Minnesota discusses his efforts to allow amputees to control their prosthetic limb — right down to the finger motions — with their minds.

Overjet’s Ai Wardah Inam on Bringing AI to Dentistry

Overjet, a member of NVIDIA Inception, is moving fast to bring AI to dentists’ offices. Dr. Wardah Inam, CEO of the company, discusses using AI to improve patient care.

Immunai CTO and Co-Founder Luis Voloch on Using Deep Learning to Develop New Drugs

Luis Voloch, co-founder and chief technology officer of Immunai, talks about tackling the challenges of the immune system with a machine learning and data science mindset.

Subscribe to the AI Podcast

The AI Podcast is now available through Amazon Music. Additionally, you can also get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better. Have a few minutes to spare? Fill out this listener survey.

Read More

An open-source gymnasium for machine learning assisted computer architecture design

An open-source gymnasium for machine learning assisted computer architecture design

Computer Architecture research has a long history of developing simulators and tools to evaluate and shape the design of computer systems. For example, the SimpleScalar simulator was introduced in the late 1990s and allowed researchers to explore various microarchitectural ideas. Computer architecture simulators and tools, such as gem5, DRAMSys, and many more have played a significant role in advancing computer architecture research. Since then, these shared resources and infrastructure have benefited industry and academia and have enabled researchers to systematically build on each other’s work, leading to significant advances in the field.

Nonetheless, computer architecture research is evolving, with industry and academia turning towards machine learning (ML) optimization to meet stringent domain-specific requirements, such as ML for computer architecture, ML for TinyML accelerationDNN accelerator datapath, memory controllers, power consumption, security, and privacy. Although prior work has demonstrated the benefits of ML in design optimization, the lack of strong, reproducible baselines hinders fair and objective comparison across different methods and poses several challenges to their deployment. To ensure steady progress, it is imperative to understand and tackle these challenges collectively.

To alleviate these challenges, in “ArchGym: An Open-Source Gymnasium for Machine Learning Assisted Architecture Design”, accepted at ISCA 2023, we introduced ArchGym, which includes a variety of computer architecture simulators and ML algorithms. Enabled by ArchGym, our results indicate that with a sufficiently large number of samples, any of a diverse collection of ML algorithms are capable of finding the optimal set of architecture design parameters for each target problem; no one solution is necessarily better than another. These results further indicate that selecting the optimal hyperparameters for a given ML algorithm is essential for finding the optimal architecture design, but choosing them is non-trivial. We release the code and dataset across multiple computer architecture simulations and ML algorithms.

Challenges in ML-assisted architecture research

ML-assisted architecture research poses several challenges, including:

  1. For a specific ML-assisted computer architecture problem (e.g., finding an optimal solution for a DRAM controller) there is no systematic way to identify optimal ML algorithms or hyperparameters (e.g., learning rate, warm-up steps, etc.). There is a wider range of ML and heuristic methods, from random walk to reinforcement learning (RL), that can be employed for design space exploration (DSE). While these methods have shown noticeable performance improvement over their choice of baselines, it is not evident whether the improvements are because of the choice of optimization algorithms or hyperparameters.

    Thus, to ensure reproducibility and facilitate widespread adoption of ML-aided architecture DSE, it is necessary to outline a systematic benchmarking methodology.

  2. While computer architecture simulators have been the backbone of architectural innovations, there is an emerging need to address the trade-offs between accuracy, speed, and cost in architecture exploration. The accuracy and speed of performance estimation widely varies from one simulator to another, depending on the underlying modeling details (e.g., cycleaccurate vs. MLbased proxy models). While analytical or ML-based proxy models are nimble by virtue of discarding low-level details, they generally suffer from high prediction error. Also, due to commercial licensing, there can be strict limits on the number of runs collected from a simulator. Overall, these constraints exhibit distinct performance vs. sample efficiency trade-offs, affecting the choice of optimization algorithm for architecture exploration.

    It is challenging to delineate how to systematically compare the effectiveness of various ML algorithms under these constraints.

  3. Finally, the landscape of ML algorithms is rapidly evolving and some ML algorithms need data to be useful. Additionally, rendering the outcome of DSE into meaningful artifacts such as datasets is critical for drawing insights about the design space.

    In this rapidly evolving ecosystem, it is consequential to ensure how to amortize the overhead of search algorithms for architecture exploration. It is not apparent, nor systematically studied how to leverage exploration data while being agnostic to the underlying search algorithm.

ArchGym design

ArchGym addresses these challenges by providing a unified framework for evaluating different ML-based search algorithms fairly. It comprises two main components: 1) the ArchGym environment and 2) the ArchGym agent. The environment is an encapsulation of the architecture cost model — which includes latency, throughput, area, energy, etc., to determine the computational cost of running the workload, given a set of architectural parameters — paired with the target workload(s). The ArchGym agent is an encapsulation of the ML algorithm used for the search and consists of hyperparameters and a guiding policy. The hyperparameters are intrinsic to the algorithm for which the model is to be optimized and can significantly influence performance. The policy, on the other hand, determines how the agent selects a parameter iteratively to optimize the target objective.

Notably, ArchGym also includes a standardized interface that connects these two components, while also saving the exploration data as the ArchGym Dataset. At its core, the interface entails three main signals: hardware state, hardware parameters, and metrics. These signals are the bare minimum to establish a meaningful communication channel between the environment and the agent. Using these signals, the agent observes the state of the hardware and suggests a set of hardware parameters to iteratively optimize a (user-defined) reward. The reward is a function of hardware performance metrics, such as performance, energy consumption, etc. 

ArchGym comprises two main components: the ArchGym environment and the ArchGym agent. The ArchGym environment encapsulates the cost model and the agent is an abstraction of a policy and hyperparameters. With a standardized interface that connects these two components, ArchGym provides a unified framework for evaluating different ML-based search algorithms fairly while also saving the exploration data as the ArchGym Dataset.

ML algorithms could be equally favorable to meet user-defined target specifications

Using ArchGym, we empirically demonstrate that across different optimization objectives and DSE problems, at least one set of hyperparameters exists that results in the same hardware performance as other ML algorithms. A poorly selected (random selection) hyperparameter for the ML algorithm or its baseline can lead to a misleading conclusion that a particular family of ML algorithms is better than another. We show that with sufficient hyperparameter tuning, different search algorithms, even random walk (RW), are able to identify the best possible normalized reward. However, note that finding the right set of hyperparameters may require exhaustive search or even luck to make it competitive.

With a sufficient number of samples, there exists at least one set of hyperparameters that results in the same performance across a range of search algorithms. Here the dashed line represents the maximum normalized reward. Cloud-1, cloud-2, stream, and random indicate four different memory traces for DRAMSys (DRAM subsystem design space exploration framework).

Dataset construction and high-fidelity proxy model training

Creating a unified interface using ArchGym also enables the creation of datasets that can be used to design better data-driven ML-based proxy architecture cost models to improve the speed of architecture simulation. To evaluate the benefits of datasets in building an ML model to approximate architecture cost, we leverage ArchGym’s ability to log the data from each run from DRAMSys to create four dataset variants, each with a different number of data points. For each variant, we create two categories: (a) Diverse Dataset (DD), which represents the data collected from different agents (ACO, GA, RW, and BO), and (b) ACO only, which shows the data collected exclusively from the ACO agent, both of which are released along with ArchGym. We train a proxy model on each dataset using random forest regression with the objective to predict the latency of designs for a DRAM simulator. Our results show that:

  1. As we increase the dataset size, the average normalized root mean squared error (RMSE) slightly decreases.
  2. However, as we introduce diversity in the dataset (e.g., collecting data from different agents), we observe 9× to 42× lower RMSE across different dataset sizes.

Diverse dataset collection across different agents using ArchGym interface.
The impact of a diverse dataset and dataset size on the normalized RMSE.

The need for a community-driven ecosystem for ML-assisted architecture research

While, ArchGym is an initial effort towards creating an open-source ecosystem that (1) connects a broad range of search algorithms to computer architecture simulators in an unified and easy-to-extend manner, (2) facilitates research in ML-assisted computer architecture, and (3) forms the scaffold to develop reproducible baselines, there are a lot of open challenges that need community-wide support. Below we outline some of the open challenges in ML-assisted architecture design. Addressing these challenges requires a well coordinated effort and a community driven ecosystem.

Key challenges in ML-assisted architecture design.

We call this ecosystem Architecture 2.0. We outline the key challenges and a vision for building an inclusive ecosystem of interdisciplinary researchers to tackle the long-standing open problems in applying ML for computer architecture research. If you are interested in helping shape this ecosystem, please fill out the interest survey.

Conclusion

ArchGym is an open source gymnasium for ML architecture DSE and enables an standardized interface that can be readily extended to suit different use cases. Additionally, ArchGym enables fair and reproducible comparison between different ML algorithms and helps to establish stronger baselines for computer architecture research problems.

We invite the computer architecture community as well as the ML community to actively participate in the development of ArchGym. We believe that the creation of a gymnasium-type environment for computer architecture research would be a significant step forward in the field and provide a platform for researchers to use ML to accelerate research and lead to new and innovative designs.

Acknowledgements

This blogpost is based on joint work with several co-authors at Google and Harvard University. We would like to acknowledge and highlight Srivatsan Krishnan (Harvard) who contributed several ideas to this project in collaboration with Shvetank Prakash (Harvard), Jason Jabbour (Harvard), Ikechukwu Uchendu (Harvard), Susobhan Ghosh (Harvard), Behzad Boroujerdian (Harvard), Daniel Richins (Harvard), Devashree Tripathy (Harvard), and Thierry Thambe (Harvard).  In addition, we would also like to thank James Laudon, Douglas Eck, Cliff Young, and Aleksandra Faust for their support, feedback, and motivation for this work. We would also like to thank John Guilyard for the animated figure used in this post. Amir Yazdanbakhsh is now a Research Scientist at Google DeepMind and Vijay Janapa Reddi is an Associate Professor at Harvard.

Read More

Access private repos using the @remote decorator for Amazon SageMaker training workloads

Access private repos using the @remote decorator for Amazon SageMaker training workloads

As more and more customers are looking to put machine learning (ML) workloads in production, there is a large push in organizations to shorten the development lifecycle of ML code. Many organizations prefer writing their ML code in a production-ready style in the form of Python methods and classes as opposed to an exploratory style (writing code without using methods or classes) because this helps them ship production-ready code faster.

With Amazon SageMaker, you can use the @remote decorator to run a SageMaker training job simply by annotating your Python code with an @remote decorator. The SageMaker Python SDK will automatically translate your existing workspace environment and any associated data processing code and datasets into a SageMaker training job that runs on the SageMaker training platform.

Running a Python function locally often requires several dependencies, which may not come with the local Python runtime environment. You can install them via package and dependency management tools like pip or conda.

However, organizations operating in regulated industries like banking, insurance, and healthcare operate in environments that have strict data privacy and networking controls in place. These controls often mandate having no internet access available to any of their environments. The reason for such restriction is to have full control over egress and ingress traffic so they can reduce the chances of unscrupulous actors sending or receiving non-verified information through their network. It’s often also mandated to have such network isolation as part of the auditory and industrial compliance rules. When it comes to ML, this restricts data scientists from downloading any package from public repositories like PyPI, Anaconda, or Conda-Forge.

To provide data scientists access to the tools of their choice while also respecting the restrictions of the environment, organizations often set up their own private package repository hosted in their own environment. You can set up private package repositories on AWS in multiple ways:

In this post, we focus on the first option: using CodeArtifact.

Solution overview

The following architecture diagram shows the solution architecture.

Solution-Architecture-vpc-no-internet

The high-level steps to implement the solution are as follows

  • Set up a virtual private cloud (VPC) with no internet access using an AWS CloudFormation template.
  • Use a second CloudFormation template to set up CodeArtifact as a private PyPI repository and provide connectivity to the VPC, and set up an Amazon SageMaker Studio environment to use the private PyPI repository.
  • Train a classification model based on the MNIST dataset using an @remote decorator from the open-source SageMaker Python SDK. All the dependencies will be downloaded from the private PyPI repository.

Note that using SageMaker Studio in this post is optional. You can choose to work in any integrated development environment (IDE) of your choice. You just need to set up your AWS Command Line Interface (AWS CLI) credentials correctly. For more information, refer to Configure the AWS CLI.

Prerequisites

You need an AWS account with an AWS Identity and Access Management (IAM) role with permissions to manage resources created as part of the solution. For details, refer to Creating an AWS account.

Set up a VPC with no internet connection

Create a new CloudFormation stack using the vpc.yaml template. This template creates the following resources:

  • A VPC with two private subnets across two Availability Zones with no internet connectivity
  • A Gateway VPC endpoint for accessing Amazon S3
  • Interface VPC endpoints for SageMaker, CodeArtifact, and a few other services to allow the resources in the VPC to connect to AWS services via AWS PrivateLink

Provide a stack name, such as No-Internet, and complete the stack creation process.

vpc-no-internet-stack

Wait for the stack creation process to complete.

Set up a private repository and SageMaker Studio using the VPC

The next step is to deploy another CloudFormation stack using the sagemaker_studio_codeartifact.yaml template. This template creates the following resources:

Provide a stack name and keep the default values or adjust the parameters for the CodeArtifact domain name, private repository name, user profile name for SageMaker Studio, and name for the upstream public PyPI repository. You also we need to provide the VPC stack name created in the previous step.

Studio-CodeArtifact-stack

When the stack creation is complete, the SageMaker domain should be visible on the SageMaker console.

studio-domain

To verify there is no internet connection available in SageMaker Studio, launch SageMaker Studio. Choose File, New, and Terminal to launch a terminal and try to curl any internet resource. It should fail to connect, as shown in the following screenshot.

terminal-showing-no-internet

Train an image classifier using an @remote decorator with the private PyPI repository

In this section, we use the @remote decorator to run a PyTorch training job that produces a MNIST image classification model. To achieve this, we set up a configuration file, develop the training script, and run the training code.

Set up a configuration file

We set up a config.yaml file and provide the configurations needed to do the following:

  • Run a SageMaker training job in the no-internet VPC created earlier
  • Download the required packages by connecting to the private PyPI repository created earlier

The file looks like the following code:

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: '../config/requirements.txt'
        InstanceType: 'ml.m5.xlarge'
        PreExecutionCommands:
            - 'aws codeartifact login --tool pip --domain <domain-name> --domain-owner <AWS account number> --repository <private repository name> --endpoint-url <VPC-endpoint-url-prefixed with https://>
        RoleArn: '<execution role ARN for running training job>'
        S3RootUri: '<s3 bucket to store the job output>'
        VpcConfig:
            SecurityGroupIds: 
            - '<security group id used by SageMaker Studio>'
            Subnets: 
            - '<VPC subnet id 1>'
            - '<VPC subnet id 2>'

The Dependencies field contains the path to requirements.txt, which contains all the dependencies needed. Note that all the dependencies will be downloaded from the private repository. The requirements.txt file contains the following code:

torch
torchvision
sagemaker>=2.156.0,<3

The PreExecutionCommands section contains the command to connect to the private PyPI repository. To get the CodeArtifact VPC endpoint URL, use the following code:

response = ec2.describe_vpc_endpoints(
    Filters=[
        {
            'Name': 'service-name',
            'Values': [
                f'com.amazonaws.{boto3_session.region_name}.codeartifact.api'
            ]
        },
    ]
)

code_artifact_api_vpc_endpoint = response['VpcEndpoints'][0]['DnsEntries'][0]['DnsName']

endpoint_url = f'https://{code_artifact_api_vpc_endpoint}'
endpoint_url

Generally, we get two VPC endpoints for CodeArtifact, and we can use any of them in the connection commands. For more details, refer to Use CodeArtifact from a VPC.

Additionally, configurations like execution role, output location, and VPC configurations are provided in the config file. These configurations are needed to run the SageMaker training job. To know more about all the configurations supported, refer to Configuration file.

It’s not mandatory to use the config.yaml file in order to work with the @remote decorator. This is just a cleaner way to supply all configurations to the @remote decorator. All the configs could also be supplied directly in the decorator arguments, but that reduces readability and maintainability of changes in the long run. Also, the config file can be created by an admin and shared with all the users in an environment.

Develop the training script

Next, we prepare the training code in simple Python files. We have divided the code into three files:

  • load_data.py – Contains the code to download the MNIST dataset
  • model.py – Contains the code for the neural network architecture for the model
  • train.py – Contains the code for training the model by using load_data.py and model.py

In train.py, we need to decorate the main training function as follows:

@remote(include_local_workdir=True)
def perform_train(train_data,
                  test_data,
                  *,
                  batch_size: int = 64,
                  test_batch_size: int = 1000,
                  epochs: int = 3,
                  lr: float = 1.0,
                  gamma: float = 0.7,
                  no_cuda: bool = True,
                  no_mps: bool = True,
                  dry_run: bool = False,
                  seed: int = 1,
                  log_interval: int = 10,
                  ):
    # pytorch native training code........

Now we’re ready to run the training code.

Run the training code with an @remote decorator

We can run the code from a terminal or from any executable prompt. In this post, we use a SageMaker Studio notebook cell to demonstrate this:

!python ./train.py

Running the preceding command triggers the training job. In the logs, we can see that it’s downloading the packages from the private PyPI repository.

training-job-logs

This concludes the implementation of an @remote decorator working with a private repository in an environment with no internet access.

Clean up

To clean up the resources, follow the instructions in CLEANUP.md.

Conclusion

In this post, we learned how to effectively use the @remote decorator’s capabilities while still working in restrictive environments without any internet access. We also learned how can we integrate CodeArtifact private repository capabilities with the help of configuration file support in SageMaker. This solution makes iterative development much simpler and faster. Another added advantage is that you can still continue to write the training code in a more natural, object-oriented way and still use SageMaker capabilities to run training jobs on a remote cluster with minimal changes in your code. All the code shown as part of this post is available in the GitHub repository.

As a next step, we encourage you to check out the @remote decorator functionality and Python SDK API and use it in your choice of environment and IDE. Additional examples are available in the amazon-sagemaker-examples repository to get you started quickly. You can also check out the post Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes for more details.


About the author

Vikesh Pandey is a Machine Learning Specialist Solutions Architect at AWS, helping customers from financial industries design and build solutions on generative AI and ML. Outside of work, Vikesh enjoys trying out different cuisines and playing outdoor sports.

Read More