The Creative AI: NVIDIA Studio Unveils New RTX- and AI-Accelerated Tools and Systems for Creators

The Creative AI: NVIDIA Studio Unveils New RTX- and AI-Accelerated Tools and Systems for Creators

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

NVIDIA Studio is debuting at CES powerful new software and hardware upgrades to elevate content creation.

It brings the release of powerful NVIDIA Studio laptops and desktops from Acer, ASUS, Dell, HP, Lenovo, MSI and Samsung, as well as the launch of the new GeForce RTX 40 SUPER Series GPUs — including the GeForce RTX 4080 SUPER, GeForce RTX 4070 Ti SUPER and GeForce RTX 4070 SUPER — to supercharge creating, gaming and AI tasks.

Generative AI by iStock from Getty Images is a new generative AI tool trained by NVIDIA Picasso that uses licensed artwork and the NVIDIA Edify architecture model to ensure that generated assets are commercially safe.

RTX Video HDR coming Jan. 24 transforms standard dynamic range video playing in internet browsers into stunning high dynamic range (HDR). By pairing it with RTX Video Super Resolution, NVIDIA RTX and GeForce RTX GPU owners can achieve dramatic video quality improvements on their HDR10 displays.

Twitch, OBS and NVIDIA are enhancing livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available later this month, the beta will enable users to stream multiple encodes concurrently, providing optimal viewing experiences for a broad range of device types and connections.

And NVIDIA RTX Remix — a free modding platform for quickly remastering classic games with RTX — releases in open beta later this month. It provides full ray tracing, NVIDIA DLSS, NVIDIA Reflex and generative AI texture tools.

This week’s In the NVIDIA Studio installment also features NVIDIA artists Ashlee Martino-Tarr, a 3D content specialist, and Daniela Flamm Jackson, a technical product marketer, who transform 2D illustrations into dynamic 3D scenes using AI and Adobe Firefly — powered by NVIDIA in the cloud and natively with GeForce RTX GPUs.

New Year, New NVIDIA Studio Laptops

The new NVIDIA Studio laptops and desktops level up power and efficiency with exclusive software like Studio Drivers preinstalled — enhancing creative features, reducing time-consuming tasks and speeding workflows.

The Acer Predator Triton Neo 16 features several 16-inch screen options with up to a 3.2K resolution at a 165Hz refresh rate and 16:10 aspect ratio. It provides DCI-P3 100% color gamut and support for NVIDIA Optimus and NVIDIA G-SYNC technology for sharp color hues and tear-free frames. It’s expected to be released in March.

The Acer Predator Triton Neo 16, with up to the GeForce RTX 4070 Laptop GPU.

The ASUS ROG Zephryus G14 features a Nebula Display with a OLED panel and a G-SYNC OLED display running at 240Hz. It’s expected to release on Feb. 6.

The ASUS ROG Zephryus G14 with up to the GeForce RTX 4070 Laptop GPU.

The XPS 16 is Dell’s most powerful laptop featuring a large 16.3” InfinityEdge display, available with a 4K+ OLED touch display, true-to-life color delivering up to 80W of sustained performance, all with tone-on-tone finishes for an elegant, minimalistic design. Stay tuned for an update on release timing.

Dell’s XPS 16 with up to the GeForce RTX 4070 Laptop GPU.

Lenovo’s Yoga Pro 9i sports a 16-inch 3.2K PureSight Pro display, delivering a grid of over 1,600 mini-LED dimming zones, expertly calibrated colors accurate to Delta E< 1 and up to 165Hz. With Microsoft’s Auto Color Management feature, its display toggles automatically between 100% P3, 100% sRGB and 100% Adobe RGB color to ensure the highest-quality color. It’s expected to be released in April.

Lenovo Yoga Pro 9i with up to the GeForce RTX 4070 Laptop GPU.

HP’s OMEN 14 Transcend features a 14-inch 4K OLED WQXGA screen, micro-edge, edge-to-edge glass and 100% DCI-P3 with a 240Hz refresh rate. NVIDIA DLSS 3 technology helps unlock more efficient content creation and gaming sessions using only one-third of the expected battery power. It’s targeting a Jan. 19 release.

HP’s OMEN 14 Transcend with up to GeForce RTX 4070 Laptop GPU.

Samsung’s Galaxy Book4 Ultra includes an upgraded Dynamic AMOLED 2X display for high contrast and vivid color, as well as a convenient touchscreen. Its Vision Booster feature uses an Intelligent Outdoor Algorithm to automatically enhance visibility and color reproduction in bright conditions.

Samsung’s Galaxy Book4 Ultra with up to the GeForce RTX 4070 Laptop GPU.

Check back for more information on the new line of Studio systems, including updates to release dates.

A SUPER Debut for New GeForce RTX 40 Series Graphics Cards

The GeForce RTX 40 Series has been supercharged with the new GeForce RTX 4080 SUPER, GeForce RTX 4070 Ti SUPER and GeForce RTX 4070 SUPER graphics cards. This trio is faster than its predecessors, with RTX platform superpowers that enhance creating, gaming and AI tasks.

The GeForce RTX 4080 SUPER sports more CUDA cores than the GeForce RTX 4080 and includes the world’s fastest GDDR6X video memory at 23 Gbps. In 3D apps like Blender, it can run up to 70% faster than previous generations. In generative AI apps like Stable Diffusion XL or Stable Video Diffusion, it can produce 1,024×1,024 images 1.7x faster and video 1.5x faster. Or play fully ray-traced games, including Alan Wake 2, Cyberpunk 2077: Phantom Liberty and Portal with RTX, in stunning 4K. The RTX 4080 SUPER will be available Jan. 31 as a Founders Edition and as custom boards for partners starting at $999.

The GeForce RTX 4070 Ti SUPER is equipped with more CUDA cores than the RTX 4070, a frame buffer increased to 16GB, and a 256-bit bus. It’s suited for video editing and rendering large 3D scenes and runs up to 1.6x faster than the RTX 3070 Ti and 2.5x faster with DLSS 3 in the most graphics-intensive games. Gamers can max out high-refresh 1440p panels or even game at 4K. The RTX 4070 Ti SUPER will be available Jan. 24 from custom board partners in stock-clocked and factory-overclocked configurations starting at $799.

The GeForce RTX 4070 SUPER has 20% more CUDA cores than the GeForce RTX 4070 and is great for 1440p creating. With DLSS 3, it’s 1.5x faster than a GeForce RTX 3090 while using a fraction of the power.

Read more on the GeForce article.

Creative Vision Meets Reality With Getty Images and NVIDIA

Content creators using the new Generative AI by iStock from Getty Images tool powered by NVIDIA Picasso can now safely, affordably use AI-generated images with full protection.

Generative AI by iStock is trained on Getty Images’ vast creative library of high-quality licensed content, including millions of exclusive photos, illustrations and videos. Users can enter prompts to generate photo-quality images at up to 4K for social media promotion, digital advertisements and more.

Getty Images is also making advanced inpainting and outpainting features available via application programming interfaces. Developers can seamlessly integrate the new APIs with creative applications to add people and objects to images, replace specific elements and expand images to a wide range of aspect ratios.

Customers can use Generative AI by iStock online today. Advanced editing features are coming soon to the iStock website.

RTX Video HDR Brings AI Video Upgrades

RTX Video HDR brings a new AI-enhanced feature that instantly converts any standard dynamic range video playing in internet browsers into vibrant HDR.

HDR delivers stunning video quality but is not widely available because of effort and hardware limitations.

RTX Video HDR allows NVIDIA RTX and GeForce RTX GPU owners to maximize their HDR panel’s ability to display more vivid, dynamic colors, helping preserve intricate details that may be lost in standard dynamic range.

The feature requires an HDR10-compatible display or TV connected to a RTX-powered PC and works with Chromium-based browsers such as Google Chrome or Microsoft Edge.

RTX Video HDR and RTX Video Super Resolution can be used together to produce the clearest livestreamed video.

RTX Video HDR is coming to all NVIDIA RTX and GeForce RTX GPUs as part of a driver update later this month. Once the update goes through, navigate to the NVIDIA control panel and switch it on.

Enhanced Broadcasting Beta Enables Multi-Encode Livestreaming

With Twitch Enhanced Broadcasting beta, GeForce RTX GPU owners will be able to broadcast up to three resolutions simultaneously at up to 1080p. In the coming months, Twitch plans to roll out support for up to five concurrent encodes to further optimize viewer experiences.

As part of the beta, Twitch will test higher input bit rates as well as new codecs, which are expected to further improve visual quality. The new codecs include the latest-generation AV1 for GeForce RTX 40 Series GPUs, which provides 40% more encoding efficiency than H.264, and HEVC for previous-generation GeForce GPUs.

To simplify the setup process, Enhanced Broadcasting will automatically configure all open broadcaster software encoder settings, including resolution, bit rate and encoding parameters.

Sign up for the Twitch Enhanced Broadcasting beta today.

A Righteous RTX Remix

Built on NVIDIA Omniverse, RTX Remix allows modders to easily capture game assets, automatically enhance materials with generative AI tools, reimagine assets via Omniverse-connected apps and Universal Scene Description (OpenUSD), and quickly create stunning RTX remasters of classic games with full ray tracing and NVIDIA DLSS technology.

The RTX Remix open beta releases later this month.

RTX Remix has already delivered stunning remasters in Portal with RTX and the modder-made Portal: Prelude RTX. Now, Orbifold Studios is using RTX Remix to develop Half-Life 2 RTX: An RTX Remix Project, a community remaster of one of the highest-rated games of all time. Check out the new Half-Life 2 RTX gameplay trailer, showcasing Orbifold Studios’ latest updates to Ravenholm:

AI and RTX Bring Illustrations to Life

NVIDIA artists and this week’s In the NVIDIA Studio features Ashlee Martino-Tarr and Daniela Flamm Jackson are passionate about illustration — whether in work or at play.

They used Adobe Firefly’s generative AI features, powered by NVIDIA GPUs in the cloud and accelerated with Tensor Cores in GeForce RTX GPUs, to animate a 2D illustration with special effects.

To begin, the pair separated the 2D image into multiple layers and expanded the canvas. Firefly’s Generative Expand feature automatically filled the added space with AI-generated content.

 

Next, the team separated select elements — starting with character — and used the AI Object Select feature to automatically mask the layer. The Generative Fill feature then created new content to fill in the background, saving even more time.

 

This process continued until all distinct layers were separated and imported into Adobe After Effects. Next, they used the Mercury 3D Engine on local RTX GPUs to accelerate playback, unlocking smoother movement in the viewport. Previews and adjustments like camera shake and depth of field were also GPU-accelerated.

 

Firefly’s Style Match feature then took the existing illustration and created new imagery in its likeness — in this case, a vibrant butterfly sporting similar colors and tones. The duo also used Adobe Illustrator’s Generative Recolor feature, which enables artists to explore a wide variety of colors and themes without having to manually recolor their work.

 

Martino-Tarr and Jackson then chose their preferred assets and animated them in Adobe After Effects. Firefly’s powerful AI effects helped speed or entirely eliminate tedious tasks such as patching holes, handpainting set extensions and caching animation playbacks.

A variety of high-quality images to choose from.

The artists concluded post-production work by putting the finishing touches on their AI animation in After Effects.

 

Firefly’s powerful AI capabilities were developed with the creative community in mind — guided by AI ethics principles of content and data transparency — to ensure morally responsible output. NVIDIA technology continues to power these features from the cloud for photographers, illustrators, designers, video editors, 3D artists and more.

NVIDIA artists Ashlee Martino-Tarr and Daniela Flamm Jackson.

Check out Martino-Tarr’s portfolio on ArtStation and Jackson’s on IMDb.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Twitch, OBS and NVIDIA to Release Multi-Encode Livestreaming

Twitch, OBS and NVIDIA to Release Multi-Encode Livestreaming

Twitch, OBS and NVIDIA are leveling up livestreaming technology with the new Twitch Enhanced Broadcasting beta, powered by GeForce RTX GPUs. Available in a few days, streamers will be able to stream multiple encodes concurrently, providing optimal viewing experiences for all viewers. 

Twitch Enhanced Broadcasting

Today, many streamers must choose between higher resolution and reliable streaming. High-quality video provides more enjoyable viewing experiences but causes streams to buffer for viewers with low bandwidth or older viewing devices. Streaming lower-bitrate video allows more people to watch the content seamlessly, but introduces artifacts.

Twitch — the interactive livestreaming platform — provides server-side transcoding for top-performing channels, meaning it will create different versions of the same stream for different bandwidth levels, improving the viewing experience. But the audience of many channels are left with a single stream option.

Twitch, OBS and NVIDIA have collaborated on a new feature to address this — Twitch Enhanced Broadcasting, releasing in beta later this month. Using the high-quality dedicated encoder (NVENC) in modern GeForce RTX and GTX GPUs, streamers will be able to broadcast up to three resolutions simultaneously at up to 1080p.

In the coming months, Enhanced Broadcasting beta testers will be able to experiment with higher-input bit rates, up to 4K resolutions, up to 5 concurrent streams, as well as new codecs. The new codecs include the latest-generation AV1 for GeForce RTX 40 Series GPUs, which provides 40% more encoding efficiency than H.264, and HEVC for previous-generation GeForce GPUs.

To simplify set up, Enhanced Broadcasting will automatically configure all OBS encoder settings, including resolution, bit rate and encoding parameters. A server-side algorithm will return the best possible configuration for OBS Studio based on the streamer’s setup, taking the headaches out of tuning settings for the best viewer experiences.

Using the dedicated NVENC hardware encoder, streamers can achieve the highest quality video across streaming bitrates, with minimal impact to app and game performance.

Sign up for the Twitch Enhanced Broadcasting beta today at twitch.tv/broadcast. Twitch will enroll participants on a first-come, first-served basis, starting later this month. Once a creator has been enrolled in the beta, they’ll receive an email with additional instructions.

To further elevate livestreams, download the NVIDIA Broadcast app, free for RTX GPU owners and powered by dedicated AI Tensor Cores, to augment broadcast capabilities for microphones and cameras.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Read More

Picture This: Getty Images Releases Generative AI By iStock Powered by NVIDIA Picasso

Picture This: Getty Images Releases Generative AI By iStock Powered by NVIDIA Picasso

Getty Images, a global visual content creator and marketplace, today at CES released Generative AI by iStock, an affordable and commercially safe image generation service trained on the company’s creative library of licensed, proprietary data.

Built on NVIDIA Picasso, a foundry for custom AI models, Generative AI by iStock provides designers and businesses with a text-to-image generation tool to create ready-to-license visuals, with legal protection and usage rights for generated images included.

Alongside the release of the service on the iStock website, Getty Images is also making advanced inpainting and outpainting features available via application programming interfaces, launching on iStock.com and Gettyimages.com soon. Developers can seamlessly integrate the new APIs with creative applications to add people and objects to images, replace specific elements and expand images in a wide range of aspect ratios.

Create With Im-AI-gination

Generative AI by iStock is trained with NVIDIA Picasso on Getty Images’ vast creative library — including exclusive photos, illustrations and videos — providing users with a commercially safe way to generate visuals. Users can enter simple text prompts to generate photo-quality images at up to 4K resolution.

Generative AI by iStock Powered by Picasso Editing APIs
Inpainting and outpainting APIs, with Reflex feature coming soon.

New editing APIs give customers powerful control over their generated images.

The Inpainting feature allows users to mask a region of an image, then fill in the region with a person or object described via a text prompt.

Outpainting enables users to expand images to fit various aspect ratios, filling in new areas based on the context of the original image. This is a powerful tool to create assets with unique aspect ratios for advertising or social media promotion.

And coming soon, a Replace feature provides similar capabilities to Inpainting but with stricter adherence to the mask.

Transforming Visual Design

The NVIDIA Picasso foundry enables developers and service providers to seamlessly train, fine-tune, optimize and deploy generative AI models tailored to their visual design requirements. Developers can use their own AI models or train new ones using the NVIDIA Edify model architecture to generate images, videos, 3D assets, 360-degree high-dynamic-range imaging and physically based rendering materials from simple text prompts.

Using NVIDIA Picasso, Getty Images trained a bespoke Edify image generator based on its catalog of licensed images and videos to power the Generative AI by iStock service.

Customers can use Generative AI by iStock online today. Advanced editing features are now available via APIs and coming soon to the iStock website.

Read More

NVIDIA Omniverse Adopted by Global Automotive-Configurator Developer Ecosystem

NVIDIA Omniverse Adopted by Global Automotive-Configurator Developer Ecosystem

Whether building a super-capable truck or conjuring up a dream sports car, spending hours playing with online car configurators is easy.

With auto industry insiders predicting that most new vehicle purchases will move online by 2030, these configurators are more than just toys.

They’re crucial to the future of the world’s automakers — essential in showing off what their brand is all about, boosting average selling prices and helping customers select and personalize their vehicles.

It’s also a natural use case for the sophisticated simulation capabilities of NVIDIA Omniverse, a software platform for developing and deploying advanced 3D applications and pipelines based on OpenUSD. It provides the ability to instantly visualize changes to a car’s color or customize its interior with luxurious finishes.

Studies show that 80% of shoppers are drawn to brands that give them a personal touch while shopping.

Aiming to meet these customer demands, a burgeoning ecosystem of partners and customers is putting to work elements of Omniverse.

Key creative partners and developers like BITONE, Brickland, Configit, Katana Studio Ltd. (serving Craft Detroit), WPP and ZeroLight are pioneering Omniverse-powered configurators. And leading automakers such as Lotus are adopting these advanced solutions.

That’s because traditional auto configurators, often limited by pre-rendered images, experience difficulty achieving personalization and dynamic environment representation.

They use different kinds of data in various tools, such as static images of what users see on the website, lists of available options based on location, product codes and personal information.

These challenges extend from the consumer experience — often characterized by limited interactivity and realism — to back-end processes for original equipment manufacturers (OEMs) and agencies, where inflexibility and inefficiencies in updating configurators and repurposing assets are common.

Reconfiguring Configurators With NVIDIA Omniverse

Omniverse helps software developers and service providers streamline their work.

Service providers can now access the platform to craft state-of-the-art 3D experiences and showcase lifelike graphics and high-end, immersive environments with advanced lighting and textures.

And OEMs can benefit from a unified asset pipeline that simplifies the integration of design and engineering data for marketing purposes. Omniverse’s enhanced tools also allow them to quickly produce diverse marketing materials, boosting customer engagement through customized content.

Independent software vendors, or ISVs, can use the native OpenUSD platform as a foundation for creating scene construction tools — or to help develop tools for managing configuration variants.

With the NVIDIA Graphics Delivery Network (GDN) software development kit, high-quality, real-time NVIDIA RTX viewports can be embedded into web applications, ensuring seamless operation on nearly any device.

This, along with support for large-scale scenes and physically accurate graphics, allows developers to concentrate on enhancing the user experience without compromising quality on lower-spec machines.

Omniverse Cloud taps GDN, which uses NVIDIA’s global cloud-streaming infrastructure to deliver seamless access to high-fidelity 3D interactive experiences.

Configurators, when run on GDN, can be easily published at scale using the same GPU architecture on which they were developed and streamed to nearly any device.

All this means less redundancy in data prep, aggregated and accessible data, fewer manual pipeline updates and instant access for the entire intended audience.

Global Adoption by Innovators and Industry Leaders

Omniverse is powering a new era in automotive design and customer interaction, heralded by a vibrant ecosystem of partners and customers.

Lotus is at the forefront, developing an interactive dealership user experience using Omniverse and generative AI tools including NVIDIA Avatar Cloud Engine and NVIDIA Omniverse Audio2Face.

To dive deeper into the world of advanced car configurators, read more on Omniverse and GDN

Read More

Three’s a Cloud: New Activision and Blizzard Games, Day Passes, G-SYNC Technology Coming to GeForce NOW

Three’s a Cloud: New Activision and Blizzard Games, Day Passes, G-SYNC Technology Coming to GeForce NOW

NVIDIA is bringing more games, membership options and innovative tech to its GeForce NOW cloud gaming service.

The next Activision and Blizzard titles to join the cloud, Diablo IV and Overwatch 2, will be coming soon. They’ll be joined by a host of top titles, including Capcom’s Exoprimal, HoYoverse’s Honkai: Star Rail and Mainframe Industries’ Pax Dei.

Available starting in February, new day passes for Ultimate and Priority memberships will offer full premium benefits one day at a time.

NVIDIA is also bringing G-SYNC technology to the cloud, raising cloud streaming performance while lowering latency and minimizing stuttering for the smoothest gameplay. Paired with new 60 and 120 fps streaming options for GFN Reflex mode, the two together make cloud gaming experiences nearly indistinguishable from local ones.

Plus, mobile gamers are getting a boost to 1440p resolution on Android phones. And Japan is the newest region to be operated by NVIDIA, which will soon enable gamers across the country to play their favorite PC games in the cloud with Ultimate performance.

Here Come the Games

The GeForce NOW catalog features many of the most popular PC games — over 1,800 titles from Steam, Xbox and supported PC Game Pass titles, Epic Games Store, Ubisoft, GOG.com and other digital stores. Backed by up to GeForce RTX 4080 GPU-class graphics, GeForce NOW is bringing even more top titles to the cloud from celebrated publishers.

The latest games from top developer Blizzard Entertainment — Diablo IV and Overwatch 2 — are coming soon to GeForce NOW. They join the recent release of Call of Duty, the first Activision game in the cloud, as part of a 10-year NVIDIA and Microsoft partnership.

Diablo IV coming soon to GeForce NOW
Join the fight for Sanctuary.

Fight the forces of hell while discovering countless abilities to master, legendary loot to gather and nightmarish dungeons full of evil enemies to vanquish in Diablo IV. Experience the campaign solo or with friends in a shared open world as the dark, gripping story unfolds.

Overwatch 2 coming soon to GeForce NOW
Team up and answer the call of heroes in “Overwatch 2.”

Team up and answer the call of heroes in Overwatch 2, a free-to-play shooter featuring 30+ epic heroes, each with game-changing abilities. Join the battle across dozens of futuristic maps inspired by real-world locations and master unique game modes in the always-on, ever-evolving, live game.

Members will soon be able to stream the Steam versions of Diablo IV and Overwatch 2 on nearly any device with the power of a GeForce RTX 4080 rig in the cloud, with support for the Battle.net launcher to follow.

Honkai Star Rail coming soon to GeForce NOW
The Astral Express is coming to GeForce NOW.

GeForce NOW also brings top role-playing games to the cloud. The immensely popular Honkai: Star Rail from HoYoverse will join Genshin Impact coming soon in the cloud. The space-fantasy RPG is set in a diverse universe filled with wonder, adventure and thrills, and expands the library of hit free-to-play titles for members. Plus, members can experience all the latest updates without worrying about download times.

Dinosaurs? Oh my.

Top publisher Capcom is working with NVIDIA to bring more of its hit titles to the cloud, including Exoprimal, an online, team-based action game that pits humanity’s cutting-edge exosuit technology against history’s most ferocious beasts: dinosaurs. Look forward to seeing it in the cloud on Jan. 18.

Ghosts do exist!

Mainframe Industries’ Pax Dei is a highly anticipated social sandbox massively multiplayer online game inspired by legends of the medieval era. It’s planned to release on GeForce NOW when it launches for PC.

Get ready to play these titles and more at high performance coming soon. Ultimate members will be able to stream at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS and Reflex technology, and experience the action even on low-powered devices. Keep an eye out on GFN Thursdays for the latest on their release dates in the cloud.

Don’t Pass This Up

Day Passes, available in early February, will give gamers a fast pass to try out premium membership benefits before committing to one- or six-month memberships that offer better value. The passes provide access to all the same features as Priority and Ultimate members for 24 hours.

Day Pass users can experience RTX ON for supported games with Priority and Ultimate Day Passes. And Ultimate Day Pass users gain exclusive access to innovative technologies like NVIDIA DLSS 3.5, full ray tracing and NVIDIA Reflex.

CES 2024 GeForce NOW
Pssst, pass it on.

These new membership options let gamers freely choose when to tap into the cloud.

The Ultimate Day Pass will be available for $7.99 and the Priority Day Pass for $3.99. The 24 hours of continuous play will begin at purchase. Day Passes can be combined for continued access to GeForce NOW high-performance cloud streaming.

Let That Sync In

NVIDIA continues to push the boundaries for cloud gaming. The Ultimate membership tier introduced many cloud gaming firsts, from 240 fps to ultra-wide streaming, making gameplay with GeForce NOW — streaming from GeForce RTX 4080-powered servers — nearly identical to a local gaming experience.

Cloud GSYNC coming to GeForce NOW
Get in sync.

Coming soon, cloud G-SYNC technology will raise the bar even further, minimizing stutter and latency, with support for variable refresh rate monitors and fully optimized for G-SYNC-compatible monitors. With Cloud G-SYNC enabled, GeForce NOW will vary the display’s refresh rates to match the streaming rate, for the smoothest gameplay experience available from the cloud.

Ultimate members can also soon take advantage of expanded NVIDIA Reflex support in supported titles.  Building off of 240fps 1080p streaming from last year, Ultimate members will soon be able to utilize Reflex in supported titles at up to 4K resolution and 60 or 120 fps streaming modes, for low-latency gaming on nearly any device.  NVIDIA Reflex support is available in the top PC games on GeForce NOW, including Call of Duty: Modern Warfare III, Cyberpunk 2077, Diablo IV, Overwatch 2, The Witcher 3: Wild Hunt,  Alan Wake 2 and more.

With both Cloud G-SYNC and Reflex, members will feel as if they’re connected directly to GeForce NOW’s RTX 4080 SuperPODs, making their visual experiences smoother, clearer and more immersive than ever.

Mobile Phones Are Now PC Gaming Rigs

Mobile gamers will soon have the option to set streaming resolution to 1440p on Android devices, providing richer graphics on larger screens. Members will be able to turn an Android device into a portable gaming rig with support for quad-high-definition resolution (2,560 x 1,440 pixels), as well as improved keyboard and mouse support.

This offers a glimpse into the future of game streaming, with external displays connected to a mobile device. Using a USB-C docking station, gamers can connect an Android phone to a 1080p or 1440p gaming monitor or TV, with a keyboard and mouse or gamepad.

Paired with a GeForce NOW Ultimate membership, Android phones become portable gaming rigs on which to play the latest triple-A PC games, such as Baldur’s Gate 3, The Finals, and Monster Hunter: World. Now anything, even a phone, can be a high-performance gaming rig.

Up to 1440p for Android devices on GeForce NOW
GeForce NOW improves on-the-go streaming, one device at a time.

The above was on display this week at the CES trade show. The demo streams Cyberpunk 2077 and Alan Wake 2 from GeForce NOW servers in Los Angeles to a Samsung Galaxy S23 Ultra phone connected to a 1440p monitor in Las Vegas.

Clouds in Japan

GeForce NOW Exppansion
The cloud’s drifting into Japan.

NVIDIA will begin operating GeForce NOW in Japan in the spring, operating alongside GeForce NOW Alliance partner KDDI.

Gamers in the region can look forward to Ultimate memberships for the first time, along with all the new games and advancements announced at CES. Visit the page to learn more and sign up for notifications.

With a steady drumbeat of quality games from top publishers, new membership options and the latest NVIDIA technology in the cloud, GeForce NOW is poised to bring another ultimate year of gaming to members.

Read More

Following the Prompts: Generative AI Powers Smarter Robots With NVIDIA Isaac Platform

Following the Prompts: Generative AI Powers Smarter Robots With NVIDIA Isaac Platform

Generative AI is reshaping trillion-dollar industries, and NVIDIA, a front-runner in smart robotics, is seizing the moment.

Speaking today as part of a special address ahead of CES, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla detailed how NVIDIA and its partners are bringing generative AI and robotics together.

It’s a natural fit, with a growing roster of partners — including Boston Dynamics, Collaborative Robotics, Covariant, Sanctuary AI, Unitree Robotics and others — embracing GPU-accelerated large language models to bring unprecedented levels of intelligence and adaptability to machines of all kinds.

The timing couldn’t be better.

“Autonomous robots powered by artificial intelligence are being increasingly utilized for improving efficiency, decreasing costs and tackling labor shortages,” Talla said.

Present at the Creation

NVIDIA has been central to the generative AI revolution from the beginning.

A decade ago, NVIDIA founder and CEO Jensen Huang hand-delivered the first NVIDIA DGX AI supercomputer to OpenAI. Now, thanks to OpenAI’s ChatGPT, generative AI has become one of the fastest-growing technologies of our time.

And it’s just getting started.

The impact of generative AI will go beyond text and image generation — and into homes and offices, farms and factories, hospitals and laboratories, Talla predicted.

The key: LLMs, akin to the brain’s language center, will let robots understand and respond to human instructions more naturally.

Such machines will be able to learn continuously from humans, from each other and from the world around them.

“Given these attributes, generative AI is well-suited for robotics,” Talla said.

How Robots Are Using Generative AI

Agility Robotics, NTT, and others are incorporating generative AI into their robots to help them understand text or voice commands. Robot vacuum cleaners from Dreame Technology are being trained in simulated living spaces created by generative AI models. And Electric Sheep is developing a world model for autonomous lawn mowing.

NVIDIA technologies such as the NVIDIA Isaac and Jetson platforms, which facilitate the development and deployment of AI-powered robots, are already relied on by more than 1.2 million developers and 10,000 customers and partners.

Many of them are at CES this week, including Analog Devices, Aurora Labs, Canonical, Dreame Innovation Technology, DriveU, e-con Systems, Ecotron, Enchanted Tools, GlüxKind, Hesai Technology, Leopard Imaging, Segway-Ninebot (Willand (Beijing) Technology Co., Ltd.), Nodar, Orbbec, QT Group, Robosense, Spartan Radar, TDK Corporation, Telit, Unitree Robotics, Voyant Photonics and ZVISION Technologies Co., Ltd.

Two Brains Are Better Than One

In his talk at CES, Talla showed the dual-computer model (below) essential for deploying AI in robotics, demonstrating NVIDIA’s comprehensive approach to AI development and application.


The first computer, referred to as an “AI factory,” is central to the creation and continuous improvement of AI models.

AI factories use NVIDIA’s data center compute infrastructure along with its AI and NVIDIA Omniverse platforms for the simulation and training of AI models.

The second computer represents the runtime environment of the robot.

This varies depending on the application: It could be in the cloud or a data center; in an on-premises server for tasks like defect inspection in semiconductor manufacturing; or within an autonomous machine equipped with multiple sensors and cameras.

Generating Quality Assets and Scenes

Talla also highlighted the role of LLMs in breaking down technical barriers, turning typical users into technical artists capable of creating complex robotics workcells or entire warehouse simulations.

With generative AI tools like NVIDIA Picasso, users can generate realistic 3D assets from simple text prompts and add them to digital scenes for dynamic and comprehensive robot training environments.

The same capability extends to creating diverse and physically accurate scenarios in Omniverse, enhancing the testing and training of robots to ensure real-world applicability.

This dovetails with the transformative potential of generative AI in reconfiguring the deployment of robots.

Traditionally, robots are purpose-built for specific tasks, and modifying them for different ones is a time-consuming process.

But advancements in LLMs and vision language models are eliminating this bottleneck, enabling more intuitive interactions with robots through natural language, Talla explained.

Such machines — adaptable and aware of the environment around them — will soon spill out across the world.

To learn more, attend a virtual CES session and watch Talla’s full talk below.

Read More

Modernizing data science lifecycle management with AWS and Wipro

Modernizing data science lifecycle management with AWS and Wipro

This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice.

Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models.

Data science and DevOps teams may face challenges managing these isolated tool stacks and systems. Integrating multiple tool stacks to build a compact solution might involve building custom connectors or workflows. Managing different dependencies based on the current version of each stack and maintaining those dependencies with the release of new updates of each stack complicates the solution. This increases the cost of infrastructure maintenance and hampers productivity.

Artificial intelligence (AI) and machine learning (ML) offerings from Amazon Web Services (AWS), along with integrated monitoring and notification services, help organizations achieve the required level of automation, scalability, and model quality at optimal cost. AWS also helps data science and DevOps teams to collaborate and streamlines the overall model lifecycle process.

The AWS portfolio of ML services includes a robust set of services that you can use to accelerate the development, training, and deployment of machine learning applications. The suite of services can be used to support the complete model lifecycle including monitoring and retraining ML models.

In this post, we discuss model development and MLOps framework implementation for one of Wipro’s customers that uses Amazon SageMaker and other AWS services.

Wipro is an AWS Premier Tier Services Partner and Managed Service Provider (MSP). Its AI/ML solutions drive enhanced operational efficiency, productivity, and customer experience for many of their enterprise clients.

Current challenges

Let’s first understand a few of the challenges the customer’s data science and DevOps teams faced with their current setup. We can then examine how the integrated SageMaker AI/ML offerings helped solve those challenges.

  • Collaboration – Data scientists each worked on their own local Jupyter notebooks to create and train ML models. They lacked an effective method for sharing and collaborating with other data scientists.
  • Scalability – Training and re-training ML models was taking more and more time as models became more complex while the allocated infrastructure capacity remained static.
  • MLOps – Model monitoring and ongoing governance wasn’t tightly integrated and automated with the ML models. There are dependencies and complexities with integrating third-party tools into the MLOps pipeline.
  • Reusability – Without reusable MLOps frameworks, each model must be developed and governed separately, which adds to the overall effort and delays model operationalization.

This diagram summarizes the challenges and how Wipro’s implementation on SageMaker addressed them with built-in SageMaker services and offerings.

SageMaker offerings for ML workload migration

Figure 1 – SageMaker offerings for ML workload migration

Wipro defined an architecture that addresses the challenges in a cost-optimized and fully automated way.

The following is the use case and model used to build the solution:

  • Use case: Price prediction based on the used car dataset
  • Problem type: Regression
  • Models used: XGBoost and Linear Learner (SageMaker built-in algorithms)

Solution architecture

Wipro consultants conducted a deep-dive discovery workshop with the customer’s data science, DevOps, and data engineering teams to understand the current environment as well as their requirements and expectations for a modern solution on AWS. By the end of the consulting engagement, the team had implemented the following architecture that effectively addressed the core requirements of the customer team, including:

Code Sharing – SageMaker notebooks enable data scientists to experiment and share code with other team members. Wipro further accelerated their ML model journey by implementing Wipro’s code accelerators and snippets to expedite feature engineering, model training, model deployment, and pipeline creation.

Continuous integration and continuous delivery (CI/CD) pipeline – Using the customer’s GitHub repository enabled code versioning and automated scripts to launch pipeline deployment whenever new versions of the code are committed.

MLOps – The architecture implements a SageMaker model monitoring pipeline for continuous model quality governance by validating data and model drift as required by the defined schedule. Whenever drift is detected, an event is launched to notify the respective teams to take action or initiate model retraining.

Event-driven architecture – The pipelines for model training, model deployment, and model monitoring are well integrated by use Amazon EventBridge, a serverless event bus. When defined events occur, EventBridge can invoke a pipeline to run in response. This provides a loosely-coupled set of pipelines that can run as needed in response to the environment.

Event Driven MLOps architecture with SageMaker

Figure 2 – Event Driven MLOps architecture with SageMaker

Solution components

This section describes the various solution components of the architecture.

Experiment notebooks

  • Purpose: The customer’s data science team wanted to experiment with various datasets and multiple models to come up with the optimal features, using those as further inputs to the automated pipeline.
  • Solution: Wipro created SageMaker experiment notebooks with code snippets for each reusable step, such as reading and writing data, model feature engineering, model training, and hyperparameter tuning. Feature engineering tasks can also be prepared in Data Wrangler, but the client specifically asked for SageMaker processing jobs and AWS Step Functions because they were more comfortable using those technologies. We used the AWS step function data science SDK to create a step function—for flow testing—directly from the notebook instance to enable well-defined inputs for the pipelines. This has helped the data scientist team to create and test pipelines at a much faster pace.

Automated training pipeline

  • Purpose: To enable an automated training and re-training pipeline with configurable parameters such as instance type, hyperparameters, and an Amazon Simple Storage Service (Amazon S3) bucket location. The pipeline should also be launched by the data push event to S3.
  • Solution: Wipro implemented a reusable training pipeline using the Step Functions SDK, SageMaker processing, training jobs, a SageMaker model monitor container for baseline generation, AWS Lambda, and EventBridge services.Using AWS event-driven architecture, the pipeline is configured to launch automatically based on a new data event being pushed to the mapped S3 bucket. Notifications are configured to be sent to the defined email addresses. At a high level, the training flow looks like the following diagram:
Training pipeline step machine

Figure 3 – Training pipeline step machine.

Flow description for the automated training pipeline

The above diagram is an automated training pipeline built using Step Functions, Lambda, and SageMaker. It’s a reusable pipeline for setting up automated model training, generating predictions, creating a baseline for model monitoring and data monitoring, and creating and updating an endpoint based on previous model threshold value.

  1. Pre-processing: This step takes data from an Amazon S3 location as input and uses the SageMaker SKLearn container to perform necessary feature engineering and data pre-processing tasks, such as the train, test, and validate split.
  2. Model training: Using the SageMaker SDK, this step runs training code with the respective model image and trains datasets from pre-processing scripts while generating the trained model artifacts.
  3. Save model: This step creates a model from the trained model artifacts. The model name is stored for reference in another pipeline using the AWS Systems Manager Parameter Store.
  4. Query training results: This step calls the Lambda function to fetch the metrics of the completed training job from the earlier model training step.
  5. RMSE threshold: This step verifies the trained model metric (RMSE) against a defined threshold to decide whether to proceed towards endpoint deployment or reject this model.
  6. Model accuracy too low: At this step the model accuracy is checked against the previous best model. If the model fails at metric validation, the notification is sent by a Lambda function to the target topic registered in Amazon Simple Notification Service (Amazon SNS). If this check fails, the flow exits because the new trained model didn’t meet the defined threshold.
  7. Baseline job data drift: If the trained model passes the validation steps, baseline stats are generated for this trained model version to enable monitoring and the parallel branch steps are run to generate the baseline for the model quality check.
  8. Create model endpoint configuration: This step creates endpoint configuration for the evaluated model in the previous step with an enable data capture configuration.
  9. Check endpoint: This step checks if the endpoint exists or needs to be created. Based on the output, the next step is to create or update the endpoint.
  10. Export configuration: This step exports the parameter’s model name, endpoint name, and endpoint configuration to the AWS Systems Manager Parameter Store.

Alerts and notifications are configured to be sent to the configured SNS topic email on the failure or success of state machine status change. The same pipeline configuration is reused for the XGBoost model.

Automated batch scoring pipeline

  • Purpose: Launch batch scoring as soon as scoring input batch data is available in the respective Amazon S3 location. The batch scoring should use the latest registered model to do the scoring.
  • Solution: Wipro implemented a reusable scoring pipeline using the Step Functions SDK, SageMaker batch transformation jobs, Lambda, and EventBridge. The pipeline is auto triggered based on the new scoring batch data availability to the respective S3 location.
Scoring pipeline step machine for linear learner and XGBoost model

Figure 4 – Scoring pipeline step machine for linear learner and XGBoost model

Flow description for the automated batch scoring pipeline:

  1. Pre-processing: The input for this step is a data file from the respective S3 location, and does the required pre-processing before calling SageMaker batch transformation job.
  2. Scoring: This step runs the batch transformation job to generate inferences, calling the latest version of the registered model and storing the scoring output in an S3 bucket. Wipro has used the input filter and join functionality of SageMaker batch transformation API. It helped enrich the scoring data for better decision making.
Input filter and join flow for batch transformation

Figure 5 – Input filter and join flow for batch transformation

  1. In this step, the state machine pipeline is launched by a new data file in the S3 bucket.

The notification is configured to be sent to the configured SNS topic email on the failure/success of the state machine status change.

Real-time inference pipeline

  • Purpose: To enable real-time inferences from both the models’ (Linear Learner and XGBoost) endpoints and get the maximum predicted value (or by using any other custom logic that can be written as a Lambda function) to be returned to the application.
  • Solution: The Wipro team has implemented reusable architecture using Amazon API Gateway, Lambda, and SageMaker endpoint as shown in Figure 6:
Real-time inference pipeline

Figure 6 – Real-time inference pipeline

Flow description for the real-time inference pipeline shown in Figure 6:

  1. The payload is sent from the application to Amazon API Gateway, which routes it to the respective Lambda function.
  2. A Lambda function (with an integrated SageMaker custom layer) does the required pre-processing, JSON or CSV payload formatting, and invokes the respective endpoints.
  3. The response is returned to Lambda and sent back to the application through API Gateway.

The customer used this pipeline for small and medium scale models, which included using various types of open-source algorithms. One of the key benefits of SageMaker is that various types of algorithms can be brought into SageMaker and deployed using a bring your own container (BYOC) technique. BYOC involves containerizing the algorithm and registering the image in Amazon Elastic Container Registry (Amazon ECR), and then using the same image to create a container to do training and inference.

Scaling is one of the biggest issues in the machine learning cycle. SageMaker comes with the necessary tools for scaling a model during inference. In the preceding architecture, users need to enable auto-scaling of SageMaker, which eventually handles the workload. To enable auto-scaling, users must provide an auto-scaling policy that asks for the throughput per instance and maximum and minimum instances. Within the policy in place, SageMaker automatically handles the workload for real-time endpoints and switches between instances when needed.

Custom model monitor pipeline

  • Purpose: The customer team wanted to have automated model monitoring to capture both data drift and model drift. The Wipro team used SageMaker model monitoring to enable both data drift and model drift with a reusable pipeline for real-time inferences and batch transformation.Note that during the development of this solution, the SageMaker model monitoring didn’t provide provision for detecting data or model drift for batch transformation. We have implemented customizations to use the model monitor container for the batch transformations payload.
  • Solution: The Wipro team implemented a reusable model-monitoring pipeline for real-time and batch inference payloads using AWS Glue to capture the incremental payload and invoke the model monitoring job according to the defined schedule.
Model monitor step machine

Figure 7 – Model monitor step machine

Flow description for the custom model monitor pipeline:
The pipeline runs according to the defined schedule configured through EventBridge.

  1. CSV consolidation – It uses the AWS Glue bookmark feature to detect the presence of incremental payload in the defined S3 bucket of real-time data capture and response and batch data response. It then aggregates that data for further processing.
  2. Evaluate payload – If there is incremental data or payload present for the current run, it invokes the monitoring branch. Otherwise, it bypasses without processing and exits the job.
  3. Post processing – The monitoring branch is designed to have two parallel sub branches—one for data drift and another for model drift.
  4. Monitoring (data drift) – The data drift branch runs whenever there is a payload present. It uses the latest trained model baseline constraints and statistics files generated through the training pipeline for the data features and runs the model monitoring job.
  5. Monitoring (model drift) – The model drift branch runs only when ground truth data is supplied, along with the inference payload. It uses trained model baseline constraints and statistics files generated through the training pipeline for the model quality features and runs the model monitoring job.
  6. Evaluate drift – The outcome of both data and model drift is a constraint violation file that’s evaluated by the evaluate drift Lambda function which sends notification to the respective Amazon SNS topics with details of the drift. Drift data is enriched further with the addition of attributes for reporting purposes. The drift notification emails will look similar to the examples in Figure 8.
SageMaker model drift monitor email

Figure 8 – Data and model drift notification message

SageMaker model drift monitor email

Figure 9 – Data and model drift notification message

Insights with Amazon QuickSight visualization:

  • Purpose: The customer wanted to have insights about the data and model drift, relate the drift data to the respective model monitoring jobs, and find out the inference data trends to understand the nature of the interference data trends.
  • Solution: The Wipro team enriched the drift data by connecting input data with the drift result, which enables triage from drift to monitoring and respective scoring data. Visualizations and dashboards were created using Amazon QuickSight with Amazon Athena as the data source (using the Amazon S3 CSV scoring and drift data).
Model monitoring visualization architecture

Figure 10 – Model monitoring visualization architecture

Design considerations:

  1. Use the QuickSight spice dataset for better in-memory performance.
  2. Use QuickSight refresh dataset APIs to automate the spice data refresh.
  3. Implement group-based security for dashboard and analysis access control.
  4. Across accounts, automate deployment using export and import dataset, data source, and analysis API calls provided by QuickSight.

Model monitoring dashboard:

To enable an effective outcome and meaningful insights of the model monitoring jobs, custom dashboards were created for the model monitoring data. The input data points are combined in parallel with inference request data, jobs data, and monitoring output to create a visualization of trends revealed by the model monitoring.

This has really helped the customer team to visualize the aspects of various data features along with the predicted outcome of each batch of inference requests.

Model monitor dashboard with selection prompts

Figure 11 – Model monitor dashboard with selection prompts

Model monitor dashboard with selection prompts

Figure 12 – Model monitor drift analysis

Conclusion

The implementation explained in this post enabled Wipro to effectively migrate their on-premises models to AWS and build a scalable, automated model development framework.

The use of reusable framework components empowers the data science team to effectively package their work as deployable AWS Step Functions JSON components. Simultaneously, the DevOps teams used and enhanced the automated CI/CD pipeline to facilitate the seamless promotion and retraining of models in higher environments.

Model monitoring component has enabled continuous monitoring of the model performance, and users receive alerts and notifications whenever data or model drift is detected.

The customer’s team is using this MLOps framework to migrate or develop more models and increase their SageMaker adoption.

By harnessing the comprehensive suite of SageMaker services in conjunction with our meticulously designed architecture, customers can seamlessly onboard multiple models, significantly reducing deployment time and mitigating complexities associated with code sharing. Moreover, our architecture simplifies code versioning maintenance, ensuring a streamlined development process.

This architecture handles the entire machine learning cycle, encompassing automated model training, real-time and batch inference, proactive model monitoring, and drift analysis. This end-to-end solution empowers customers to achieve optimal model performance while maintaining rigorous monitoring and analysis capabilities to ensure ongoing accuracy and reliability.

To create this architecture, begin by creating essential resources like Amazon Virtual Private Cloud (Amazon VPC), SageMaker notebooks, and Lambda functions. Make sure to set up appropriate AWS Identity and Access Management (IAM) policies for these resources.

Next, focus on building the components of the architecture—such as training and preprocessing scripts—within SageMaker Studio or Jupyter Notebook. This step involves developing the necessary code and configurations to enable the desired functionalities.

After the architecture’s components are defined, you can proceed with building the Lambda functions for generating inferences or performing post-processing steps on the data.

At the end, use Step Functions to connect the components and establish a smooth workflow that coordinates the running of each step.


About the Authors

Stephen Randolph - AWS Partner Solutions ArchitectStephen Randolph is a Senior Partner Solutions Architect at Amazon Web Services (AWS). He enables and supports Global Systems Integrator (GSI) partners on the latest AWS technology as they develop industry solutions to solve business challenges. Stephen is especially passionate about Security and Generative AI, and helping customers and partners architect secure, efficient, and innovative solutions on AWS.

Bhajandeep SinghBhajandeep Singh has served as the AWS AI/ML Center of Excellence Head at Wipro Technologies, leading customer engagements to deliver data analytics and AI solutions. He holds the AWS AI/ML Specialty certification and authors technical blogs on AI/ML services and solutions. With experience of leading AWS AI/ML solutions across industries, Bhajandeep has enabled clients to maximize the value of AWS AI/ML services through his expertise and leadership.

Ajay VishwakarmaAjay Vishwakarma is an ML engineer for the AWS wing of Wipro’s AI solution practice. He has good experience in building BYOM solution for custom algorithm in SageMaker, end to end ETL pipeline deployment, building chatbots using Lex, Cross account QuickSight resource sharing and building CloudFormation templates for deployments. He likes exploring AWS taking every customers problem as a challenge to explore more and provide solutions to them.

Read More

Generating value from enterprise data: Best practices for Text2SQL and generative AI

Generating value from enterprise data: Best practices for Text2SQL and generative AI

Generative AI has opened up a lot of potential in the field of AI. We are seeing numerous uses, including text generation, code generation, summarization, translation, chatbots, and more. One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing with complex technical code, business users and data analysts can ask questions related to data and insights in plain language. The primary goal is to automatically generate SQL queries from natural language text. To do this, the text input is transformed into a structured representation, and from this representation, a SQL query that can be used to access a database is created.

In this post, we provide an introduction to text to SQL (Text2SQL) and explore use cases, challenges, design patterns, and best practices. Specifically, we discuss the following:

  • Why do we need Text2SQL
  • Key components for Text to SQL
  • Prompt engineering considerations for natural language or Text to SQL
  • Optimizations and best practices
  • Architecture patterns

Why do we need Text2SQL?

Today, a large amount of data is available in traditional data analytics, data warehousing, and databases, which may be not easy to query or understand for the majority of organization members. The primary goal of Text2SQL is to make querying databases more accessible to non-technical users, who can provide their queries in natural language.

NLP SQL enables business users to analyze data and get answers by typing or speaking questions in natural language, such as the following:

  • “Show total sales for each product last month”
  • “Which products generated more revenue?”
  • “What percentage of customers are from each region?”

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) via a single API, enabling to easily build and scale Gen AI applications. It can be leveraged to generate SQL queries based on questions similar to the ones listed above and query organizational structured data and generate natural language responses from the query response data.

Key components for text to SQL

Text-to-SQL systems involve several stages to convert natural language queries into runnable SQL:

  • Natural language processing:
    • Analyze the user’s input query
    • Extract key elements and intent
    • Convert to a structured format
  • SQL generation:
    • Map extracted details into SQL syntax
    • Generate a valid SQL query
  • Database query:
    • Run the AI-generated SQL query on the database
    • Retrieve results
    • Return results to the user

One remarkable capability of Large Language Models (LLMs) is generation of code, including Structured Query Language (SQL) for databases. These LLMs can be leveraged to understand the natural language question and generate a corresponding SQL query as an output. The LLMs will benefit by adopting in-context learning and fine-tuning settings as more data is provided.

The following diagram illustrates a basic Text2SQL flow.

Text 2 SQL high level process flow

Prompt engineering considerations for natural language to SQL

The prompt is crucial when using LLMs to translate natural language into SQL queries, and there are several important considerations for prompt engineering.

Effective prompt engineering is key to developing natural language to SQL systems. Clear, straightforward prompts provide better instructions for the language model. Providing context that the user is requesting a SQL query along with relevant database schema details enables the model to translate the intent accurately. Including a few annotated examples of natural language prompts and corresponding SQL queries helps guide the model to produce syntax-compliant output. Additionally, incorporating Retrieval Augmented Generation (RAG), where the model retrieves similar examples during processing, further improves the mapping accuracy. Well-designed prompts that give the model sufficient instruction, context, examples, and retrieval augmentation are crucial for reliably translating natural language into SQL queries.

The following is an example of a baseline prompt with code representation of the database from the whitepaper Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.

/* Given the following database schema : */
CREATE TABLE IF NOT EXISTS " gymnast " (
" Gymnast_ID " int ,
" Floor_Exercise_Points " real ,
" Pommel_Horse_Points " real ,
" Rings_Points " real ,
" Vault_Points " real ,
" Parallel_Bars_Points " real ,
" Horizontal_Bar_Points " real ,
 " Total_Points " real ,
 PRIMARY KEY ( " Gymnast_ID " ) ,
 FOREIGN KEY ( " Gymnast_ID " ) REFERENCES " people " ( " People_ID " )
 ) ;
 CREATE TABLE IF NOT EXISTS " people " (
 " People_ID " int ,
 " Name " text ,
 " Age " real ,
 " Height " real ,
 " Hometown " text ,
 PRIMARY KEY ( " People_ID " )
 ) ;

/* Answer the following : Return the total points of the gymnast with the lowest age .
*/

select t1 . total_points from gymnast as t1 join people as t2 on t1 . gymnast_id = t2 .
people_id order by t2 . age asc limit 1

As illustrated in this example, prompt-based few-shot learning provides the model with a handful of annotated examples in the prompt itself. This demonstrates the target mapping between natural language and SQL for the model. Typically, the prompt would contain around 2–3 pairs showing a natural language query and the equivalent SQL statement. These few examples guide the model to generate syntax-compliant SQL queries from natural language without requiring extensive training data.

Fine-tuning vs. prompt engineering

When building natural language to SQL systems, we often get into the discussion of if fine-tuning the model is the right technique or if effective prompt engineering is the way to go. Both approaches could be considered and selected based on the right set of requirements:

    • Fine-tuning – The baseline model is pre-trained on a large general text corpus and then can use instruction-based fine-tuning, which uses labeled examples to improve the performance of a pre-trained foundation model on text-SQL. This adapts the model to the target task. Fine-tuning directly trains the model on the end task but requires many text-SQL examples. You can use supervised fine-tuning based on your LLM to improve the effectiveness of text-to-SQL. For this, you can use several datasets like Spider, WikiSQL, CHASE, BIRD-SQL, or CoSQL.
    • Prompt engineering – The model is trained to complete prompts designed to prompt the target SQL syntax. When generating SQL from natural language using LLMs, providing clear instructions in the prompt is important for controlling the model’s output. In the prompt to annotate different components like pointing to columns, schema and then instruct which type of SQL to create. These act like instructions that tell the model how to format the SQL output. The following prompt shows an example where you point table columns and instruct to create a MySQL query:
Table offices, columns = [OfficeId, OfficeName]
Table employees, columns = [OfficeId, EmployeeId,EmployeeName]
Create a MySQL query for all employees in the Machine Learning Department

An effective approach for text-to-SQL models is to first start with a baseline LLM without any task-specific fine-tuning. Well-crafted prompts can then be used to adapt and drive the base model to handle the text-to-SQL mapping. This prompt engineering allows you to develop the capability without needing to do fine-tuning. If prompt engineering on the base model doesn’t achieve sufficient accuracy, fine-tuning on a small set of text-SQL examples can then be explored along with further prompt engineering.

The combination of fine-tuning and prompt engineering may be required if prompt engineering on the raw pre-trained model alone doesn’t meet requirements. However, it’s best to initially attempt prompt engineering without fine-tuning, because this allows rapid iteration without data collection. If this fails to provide adequate performance, fine-tuning alongside prompt engineering is a viable next step. This overall approach maximizes efficiency while still allowing customization if purely prompt-based methods are insufficient.

Optimization and best practices

Optimization and best practices are essential for enhancing effectiveness and ensuring resources are used optimally and the right results are achieved in the best way possible. The techniques help in improving performance, controlling costs, and achieving a better-quality outcome.

When developing text-to-SQL systems using LLMs, optimization techniques can improve performance and efficiency. The following are some key areas to consider:

  • Caching – To improve latency, cost control, and standardization, you can cache the parsed SQL and recognized query prompts from the text-to-SQL LLM. This avoids reprocessing repeated queries.
  • Monitoring – Logs and metrics around query parsing, prompt recognition, SQL generation, and SQL results should be collected to monitor the text-to-SQL LLM system. This provides visibility for the optimization example updating the prompt or revisiting the fine-tuning with an updated dataset.
  • Materialized views vs. tables – Materialized views can simplify SQL generation and improve performance for common text-to-SQL queries. Querying tables directly may result in complex SQL and also result in performance issues, including constant creation of performance techniques like indexes. Additionally, you can avoid performance issues when the same table is used for other areas of application at the same time.
  • Refreshing data – Materialized views need to be refreshed on a schedule to keep data current for text-to-SQL queries. You can use batch or incremental refresh approaches to balance overhead.
  • Central data catalog – Creating a centralized data catalog provides a single pane of glass view to an organization’s data sources and will help LLMs select appropriate tables and schemas in order to provide more accurate responses. Vector embeddings created from a central data catalog can be supplied to an LLM along with information requested to generate relevant and precise SQL responses.

By applying optimization best practices like caching, monitoring, materialized views, scheduled refreshing, and a central catalog, you can significantly improve the performance and efficiency of text-to-SQL systems using LLMs.

Architecture patterns

Let’s look at some architecture patterns that can be implemented for a text to SQL workflow.

Prompt engineering

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering.

illustrates the architecture for generating queries with an LLM using prompt engineering

In this pattern, the user creates prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI-generated SQL, which is validated and then run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock or foundation models in Amazon SageMaker JumpStart.

In this pattern, the user creates a prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI generated SQL which is validated and run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock which is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI or JumpStart Foundation Models which offers state-of-the-art foundation models for use cases such as content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more

Prompt engineering and fine-tuning

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and fine-tuning.

illustrates the architecture for generating queries with an LLM using prompt engineering and fine-tuning

This flow is similar to the previous pattern, which mostly relies on prompt engineering, but with an additional flow of fine-tuning on the domain-specific dataset. The fine-tuned LLM is used to generate the SQL queries with minimal in-context value for the prompt. For this, you can use SageMaker JumpStart to fine-tune an LLM on a domain-specific dataset in the same way you would train and deploy any model on Amazon SageMaker.

Prompt engineering and RAG

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and RAG.

illustrates the architecture for generating queries with an LLM using prompt engineering and RAG

In this pattern, we use Retrieval Augmented Generation using vector embeddings stores, like Amazon Titan Embeddings or Cohere Embed, on Amazon Bedrock from a central data catalog, like AWS Glue Data Catalog, of databases within an organization. The vector embeddings are stored in vector databases like Vector Engine for Amazon OpenSearch Serverless, Amazon Relational Database Service (Amazon RDS) for PostgreSQL with the pgvector extension, or Amazon Kendra. LLMs use the vector embeddings to select the right database, tables, and columns from tables faster when creating SQL queries. Using RAG is helpful when data and relevant information that need to be retrieved by LLMs are stored in multiple separate database systems and the LLM needs to be able to search or query data from all these different systems. This is where providing vector embeddings of a centralized or unified data catalog to the LLMs results in more accurate and comprehensive information returned by the LLMs.

Conclusion

In this post, we discussed how we can generate value from enterprise data using natural language to SQL generation. We looked into key components, optimization, and best practices. We also learned architecture patterns from basic prompt engineering to fine-tuning and RAG. To learn more, refer to Amazon Bedrock to easily build and scale generative AI applications with foundation models


About the Authors

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Arghya Banerjee is a Sr. Solutions Architect at AWS in the San Francisco Bay Area focused on helping customers adopt and use AWS Cloud. Arghya is focused on Big Data, Data Lakes, Streaming, Batch Analytics and AI/ML services and technologies.

Read More