January 2024 – Page 12

Three’s a Cloud: New Activision and Blizzard Games, Day Passes, G-SYNC Technology Coming to GeForce NOW

NVIDIA is bringing more games, membership options and innovative tech to its GeForce NOW cloud gaming service.

The next Activision and Blizzard titles to join the cloud, Diablo IV and Overwatch 2, will be coming soon. They’ll be joined by a host of top titles, including Capcom’s Exoprimal, HoYoverse’s Honkai: Star Rail and Mainframe Industries’ Pax Dei.

Available starting in February, new day passes for Ultimate and Priority memberships will offer full premium benefits one day at a time.

NVIDIA is also bringing G-SYNC technology to the cloud, raising cloud streaming performance while lowering latency and minimizing stuttering for the smoothest gameplay. Paired with new 60 and 120 fps streaming options for GFN Reflex mode, the two together make cloud gaming experiences nearly indistinguishable from local ones.

Plus, mobile gamers are getting a boost to 1440p resolution on Android phones. And Japan is the newest region to be operated by NVIDIA, which will soon enable gamers across the country to play their favorite PC games in the cloud with Ultimate performance.

Here Come the Games

The GeForce NOW catalog features many of the most popular PC games — over 1,800 titles from Steam, Xbox and supported PC Game Pass titles, Epic Games Store, Ubisoft, GOG.com and other digital stores. Backed by up to GeForce RTX 4080 GPU-class graphics, GeForce NOW is bringing even more top titles to the cloud from celebrated publishers.

The latest games from top developer Blizzard Entertainment — Diablo IV and Overwatch 2 — are coming soon to GeForce NOW. They join the recent release of Call of Duty, the first Activision game in the cloud, as part of a 10-year NVIDIA and Microsoft partnership.

Diablo IV coming soon to GeForce NOW — *Join the fight for Sanctuary.*

Fight the forces of hell while discovering countless abilities to master, legendary loot to gather and nightmarish dungeons full of evil enemies to vanquish in Diablo IV. Experience the campaign solo or with friends in a shared open world as the dark, gripping story unfolds.

Overwatch 2 coming soon to GeForce NOW — *Team up and answer the call of heroes in “Overwatch 2.”*

Team up and answer the call of heroes in Overwatch 2, a free-to-play shooter featuring 30+ epic heroes, each with game-changing abilities. Join the battle across dozens of futuristic maps inspired by real-world locations and master unique game modes in the always-on, ever-evolving, live game.

Members will soon be able to stream the Steam versions of Diablo IV and Overwatch 2 on nearly any device with the power of a GeForce RTX 4080 rig in the cloud, with support for the Battle.net launcher to follow.

Honkai Star Rail coming soon to GeForce NOW — *The Astral Express is coming to GeForce NOW.*

GeForce NOW also brings top role-playing games to the cloud. The immensely popular Honkai: Star Rail from HoYoverse will join Genshin Impact coming soon in the cloud. The space-fantasy RPG is set in a diverse universe filled with wonder, adventure and thrills, and expands the library of hit free-to-play titles for members. Plus, members can experience all the latest updates without worrying about download times.

Top publisher Capcom is working with NVIDIA to bring more of its hit titles to the cloud, including Exoprimal, an online, team-based action game that pits humanity’s cutting-edge exosuit technology against history’s most ferocious beasts: dinosaurs. Look forward to seeing it in the cloud on Jan. 18.

Mainframe Industries’ Pax Dei is a highly anticipated social sandbox massively multiplayer online game inspired by legends of the medieval era. It’s planned to release on GeForce NOW when it launches for PC.

Get ready to play these titles and more at high performance coming soon. Ultimate members will be able to stream at up to 4K resolution and 120 frames per second with support for NVIDIA DLSS and Reflex technology, and experience the action even on low-powered devices. Keep an eye out on GFN Thursdays for the latest on their release dates in the cloud.

Don’t Pass This Up

Day Passes, available in early February, will give gamers a fast pass to try out premium membership benefits before committing to one- or six-month memberships that offer better value. The passes provide access to all the same features as Priority and Ultimate members for 24 hours.

Day Pass users can experience RTX ON for supported games with Priority and Ultimate Day Passes. And Ultimate Day Pass users gain exclusive access to innovative technologies like NVIDIA DLSS 3.5, full ray tracing and NVIDIA Reflex.

CES 2024 GeForce NOW — Pssst, pass it on.

These new membership options let gamers freely choose when to tap into the cloud.

The Ultimate Day Pass will be available for $7.99 and the Priority Day Pass for $3.99. The 24 hours of continuous play will begin at purchase. Day Passes can be combined for continued access to GeForce NOW high-performance cloud streaming.

Let That Sync In

NVIDIA continues to push the boundaries for cloud gaming. The Ultimate membership tier introduced many cloud gaming firsts, from 240 fps to ultra-wide streaming, making gameplay with GeForce NOW — streaming from GeForce RTX 4080-powered servers — nearly identical to a local gaming experience.

Cloud GSYNC coming to GeForce NOW — *Get in sync.*

Coming soon, cloud G-SYNC technology will raise the bar even further, minimizing stutter and latency, with support for variable refresh rate monitors and fully optimized for G-SYNC-compatible monitors. With Cloud G-SYNC enabled, GeForce NOW will vary the display’s refresh rates to match the streaming rate, for the smoothest gameplay experience available from the cloud.

Ultimate members can also soon take advantage of expanded NVIDIA Reflex support in supported titles. Building off of 240fps 1080p streaming from last year, Ultimate members will soon be able to utilize Reflex in supported titles at up to 4K resolution and 60 or 120 fps streaming modes, for low-latency gaming on nearly any device. NVIDIA Reflex support is available in the top PC games on GeForce NOW, including Call of Duty: Modern Warfare III, Cyberpunk 2077, Diablo IV, Overwatch 2, The Witcher 3: Wild Hunt, Alan Wake 2 and more.

With both Cloud G-SYNC and Reflex, members will feel as if they’re connected directly to GeForce NOW’s RTX 4080 SuperPODs, making their visual experiences smoother, clearer and more immersive than ever.

Mobile Phones Are Now PC Gaming Rigs

Mobile gamers will soon have the option to set streaming resolution to 1440p on Android devices, providing richer graphics on larger screens. Members will be able to turn an Android device into a portable gaming rig with support for quad-high-definition resolution (2,560 x 1,440 pixels), as well as improved keyboard and mouse support.

This offers a glimpse into the future of game streaming, with external displays connected to a mobile device. Using a USB-C docking station, gamers can connect an Android phone to a 1080p or 1440p gaming monitor or TV, with a keyboard and mouse or gamepad.

Paired with a GeForce NOW Ultimate membership, Android phones become portable gaming rigs on which to play the latest triple-A PC games, such as Baldur’s Gate 3, The Finals, and Monster Hunter: World. Now anything, even a phone, can be a high-performance gaming rig.

Up to 1440p for Android devices on GeForce NOW — *GeForce NOW improves on-the-go streaming, one device at a time.*

The above was on display this week at the CES trade show. The demo streams Cyberpunk 2077 and Alan Wake 2 from GeForce NOW servers in Los Angeles to a Samsung Galaxy S23 Ultra phone connected to a 1440p monitor in Las Vegas.

Clouds in Japan

GeForce NOW Exppansion — *The cloud’s drifting into Japan.*

NVIDIA will begin operating GeForce NOW in Japan in the spring, operating alongside GeForce NOW Alliance partner KDDI.

Gamers in the region can look forward to Ultimate memberships for the first time, along with all the new games and advancements announced at CES. Visit the page to learn more and sign up for notifications.

With a steady drumbeat of quality games from top publishers, new membership options and the latest NVIDIA technology in the cloud, GeForce NOW is poised to bring another ultimate year of gaming to members.

Following the Prompts: Generative AI Powers Smarter Robots With NVIDIA Isaac Platform

Generative AI is reshaping trillion-dollar industries, and NVIDIA, a front-runner in smart robotics, is seizing the moment.

Speaking today as part of a special address ahead of CES, NVIDIA Vice President of Robotics and Edge Computing Deepu Talla detailed how NVIDIA and its partners are bringing generative AI and robotics together.

It’s a natural fit, with a growing roster of partners — including Boston Dynamics, Collaborative Robotics, Covariant, Sanctuary AI, Unitree Robotics and others — embracing GPU-accelerated large language models to bring unprecedented levels of intelligence and adaptability to machines of all kinds.

The timing couldn’t be better.

“Autonomous robots powered by artificial intelligence are being increasingly utilized for improving efficiency, decreasing costs and tackling labor shortages,” Talla said.

Present at the Creation

NVIDIA has been central to the generative AI revolution from the beginning.

A decade ago, NVIDIA founder and CEO Jensen Huang hand-delivered the first NVIDIA DGX AI supercomputer to OpenAI. Now, thanks to OpenAI’s ChatGPT, generative AI has become one of the fastest-growing technologies of our time.

And it’s just getting started.

The impact of generative AI will go beyond text and image generation — and into homes and offices, farms and factories, hospitals and laboratories, Talla predicted.

The key: LLMs, akin to the brain’s language center, will let robots understand and respond to human instructions more naturally.

Such machines will be able to learn continuously from humans, from each other and from the world around them.

“Given these attributes, generative AI is well-suited for robotics,” Talla said.

How Robots Are Using Generative AI

Agility Robotics, NTT, and others are incorporating generative AI into their robots to help them understand text or voice commands. Robot vacuum cleaners from Dreame Technology are being trained in simulated living spaces created by generative AI models. And Electric Sheep is developing a world model for autonomous lawn mowing.

NVIDIA technologies such as the NVIDIA Isaac and Jetson platforms, which facilitate the development and deployment of AI-powered robots, are already relied on by more than 1.2 million developers and 10,000 customers and partners.

Many of them are at CES this week, including Analog Devices, Aurora Labs, Canonical, Dreame Innovation Technology, DriveU, e-con Systems, Ecotron, Enchanted Tools, GlüxKind, Hesai Technology, Leopard Imaging, Segway-Ninebot (Willand (Beijing) Technology Co., Ltd.), Nodar, Orbbec, QT Group, Robosense, Spartan Radar, TDK Corporation, Telit, Unitree Robotics, Voyant Photonics and ZVISION Technologies Co., Ltd.

Two Brains Are Better Than One

In his talk at CES, Talla showed the dual-computer model (below) essential for deploying AI in robotics, demonstrating NVIDIA’s comprehensive approach to AI development and application.

The first computer, referred to as an “AI factory,” is central to the creation and continuous improvement of AI models.

AI factories use NVIDIA’s data center compute infrastructure along with its AI and NVIDIA Omniverse platforms for the simulation and training of AI models.

The second computer represents the runtime environment of the robot.

This varies depending on the application: It could be in the cloud or a data center; in an on-premises server for tasks like defect inspection in semiconductor manufacturing; or within an autonomous machine equipped with multiple sensors and cameras.

Generating Quality Assets and Scenes

Talla also highlighted the role of LLMs in breaking down technical barriers, turning typical users into technical artists capable of creating complex robotics workcells or entire warehouse simulations.

With generative AI tools like NVIDIA Picasso, users can generate realistic 3D assets from simple text prompts and add them to digital scenes for dynamic and comprehensive robot training environments.

The same capability extends to creating diverse and physically accurate scenarios in Omniverse, enhancing the testing and training of robots to ensure real-world applicability.

This dovetails with the transformative potential of generative AI in reconfiguring the deployment of robots.

Traditionally, robots are purpose-built for specific tasks, and modifying them for different ones is a time-consuming process.

But advancements in LLMs and vision language models are eliminating this bottleneck, enabling more intuitive interactions with robots through natural language, Talla explained.

Such machines — adaptable and aware of the environment around them — will soon spill out across the world.

To learn more, attend a virtual CES session and watch Talla’s full talk below.

OpenAI and journalism

We support journalism, partner with news organizations, and believe The New York Times lawsuit is without merit.OpenAI Blog

Modernizing data science lifecycle management with AWS and Wipro

This post was written in collaboration with Bhajandeep Singh and Ajay Vishwakarma from Wipro’s AWS AI/ML Practice.

Many organizations have been using a combination of on-premises and open source data science solutions to create and manage machine learning (ML) models.

Data science and DevOps teams may face challenges managing these isolated tool stacks and systems. Integrating multiple tool stacks to build a compact solution might involve building custom connectors or workflows. Managing different dependencies based on the current version of each stack and maintaining those dependencies with the release of new updates of each stack complicates the solution. This increases the cost of infrastructure maintenance and hampers productivity.

Artificial intelligence (AI) and machine learning (ML) offerings from Amazon Web Services (AWS), along with integrated monitoring and notification services, help organizations achieve the required level of automation, scalability, and model quality at optimal cost. AWS also helps data science and DevOps teams to collaborate and streamlines the overall model lifecycle process.

The AWS portfolio of ML services includes a robust set of services that you can use to accelerate the development, training, and deployment of machine learning applications. The suite of services can be used to support the complete model lifecycle including monitoring and retraining ML models.

In this post, we discuss model development and MLOps framework implementation for one of Wipro’s customers that uses Amazon SageMaker and other AWS services.

Wipro is an AWS Premier Tier Services Partner and Managed Service Provider (MSP). Its AI/ML solutions drive enhanced operational efficiency, productivity, and customer experience for many of their enterprise clients.

Current challenges

Let’s first understand a few of the challenges the customer’s data science and DevOps teams faced with their current setup. We can then examine how the integrated SageMaker AI/ML offerings helped solve those challenges.

Collaboration – Data scientists each worked on their own local Jupyter notebooks to create and train ML models. They lacked an effective method for sharing and collaborating with other data scientists.
Scalability – Training and re-training ML models was taking more and more time as models became more complex while the allocated infrastructure capacity remained static.
MLOps – Model monitoring and ongoing governance wasn’t tightly integrated and automated with the ML models. There are dependencies and complexities with integrating third-party tools into the MLOps pipeline.
Reusability – Without reusable MLOps frameworks, each model must be developed and governed separately, which adds to the overall effort and delays model operationalization.

This diagram summarizes the challenges and how Wipro’s implementation on SageMaker addressed them with built-in SageMaker services and offerings.

Figure 1 – SageMaker offerings for ML workload migration

Wipro defined an architecture that addresses the challenges in a cost-optimized and fully automated way.

The following is the use case and model used to build the solution:

Use case: Price prediction based on the used car dataset
Problem type: Regression
Models used: XGBoost and Linear Learner (SageMaker built-in algorithms)

Solution architecture

Wipro consultants conducted a deep-dive discovery workshop with the customer’s data science, DevOps, and data engineering teams to understand the current environment as well as their requirements and expectations for a modern solution on AWS. By the end of the consulting engagement, the team had implemented the following architecture that effectively addressed the core requirements of the customer team, including:

Code Sharing – SageMaker notebooks enable data scientists to experiment and share code with other team members. Wipro further accelerated their ML model journey by implementing Wipro’s code accelerators and snippets to expedite feature engineering, model training, model deployment, and pipeline creation.

Continuous integration and continuous delivery (CI/CD) pipeline – Using the customer’s GitHub repository enabled code versioning and automated scripts to launch pipeline deployment whenever new versions of the code are committed.

MLOps – The architecture implements a SageMaker model monitoring pipeline for continuous model quality governance by validating data and model drift as required by the defined schedule. Whenever drift is detected, an event is launched to notify the respective teams to take action or initiate model retraining.

Event-driven architecture – The pipelines for model training, model deployment, and model monitoring are well integrated by use Amazon EventBridge, a serverless event bus. When defined events occur, EventBridge can invoke a pipeline to run in response. This provides a loosely-coupled set of pipelines that can run as needed in response to the environment.

Figure 2 – Event Driven MLOps architecture with SageMaker

Solution components

This section describes the various solution components of the architecture.

Experiment notebooks

Purpose: The customer’s data science team wanted to experiment with various datasets and multiple models to come up with the optimal features, using those as further inputs to the automated pipeline.
Solution: Wipro created SageMaker experiment notebooks with code snippets for each reusable step, such as reading and writing data, model feature engineering, model training, and hyperparameter tuning. Feature engineering tasks can also be prepared in Data Wrangler, but the client specifically asked for SageMaker processing jobs and AWS Step Functions because they were more comfortable using those technologies. We used the AWS step function data science SDK to create a step function—for flow testing—directly from the notebook instance to enable well-defined inputs for the pipelines. This has helped the data scientist team to create and test pipelines at a much faster pace.

Automated training pipeline

Purpose: To enable an automated training and re-training pipeline with configurable parameters such as instance type, hyperparameters, and an Amazon Simple Storage Service (Amazon S3) bucket location. The pipeline should also be launched by the data push event to S3.
Solution: Wipro implemented a reusable training pipeline using the Step Functions SDK, SageMaker processing, training jobs, a SageMaker model monitor container for baseline generation, AWS Lambda, and EventBridge services.Using AWS event-driven architecture, the pipeline is configured to launch automatically based on a new data event being pushed to the mapped S3 bucket. Notifications are configured to be sent to the defined email addresses. At a high level, the training flow looks like the following diagram:

Figure 3 – Training pipeline step machine.

Flow description for the automated training pipeline

The above diagram is an automated training pipeline built using Step Functions, Lambda, and SageMaker. It’s a reusable pipeline for setting up automated model training, generating predictions, creating a baseline for model monitoring and data monitoring, and creating and updating an endpoint based on previous model threshold value.

Pre-processing: This step takes data from an Amazon S3 location as input and uses the SageMaker SKLearn container to perform necessary feature engineering and data pre-processing tasks, such as the train, test, and validate split.
Model training: Using the SageMaker SDK, this step runs training code with the respective model image and trains datasets from pre-processing scripts while generating the trained model artifacts.
Save model: This step creates a model from the trained model artifacts. The model name is stored for reference in another pipeline using the AWS Systems Manager Parameter Store.
Query training results: This step calls the Lambda function to fetch the metrics of the completed training job from the earlier model training step.
RMSE threshold: This step verifies the trained model metric (RMSE) against a defined threshold to decide whether to proceed towards endpoint deployment or reject this model.
Model accuracy too low: At this step the model accuracy is checked against the previous best model. If the model fails at metric validation, the notification is sent by a Lambda function to the target topic registered in Amazon Simple Notification Service (Amazon SNS). If this check fails, the flow exits because the new trained model didn’t meet the defined threshold.
Baseline job data drift: If the trained model passes the validation steps, baseline stats are generated for this trained model version to enable monitoring and the parallel branch steps are run to generate the baseline for the model quality check.
Create model endpoint configuration: This step creates endpoint configuration for the evaluated model in the previous step with an enable data capture configuration.
Check endpoint: This step checks if the endpoint exists or needs to be created. Based on the output, the next step is to create or update the endpoint.
Export configuration: This step exports the parameter’s model name, endpoint name, and endpoint configuration to the AWS Systems Manager Parameter Store.

Alerts and notifications are configured to be sent to the configured SNS topic email on the failure or success of state machine status change. The same pipeline configuration is reused for the XGBoost model.

Automated batch scoring pipeline

Purpose: Launch batch scoring as soon as scoring input batch data is available in the respective Amazon S3 location. The batch scoring should use the latest registered model to do the scoring.
Solution: Wipro implemented a reusable scoring pipeline using the Step Functions SDK, SageMaker batch transformation jobs, Lambda, and EventBridge. The pipeline is auto triggered based on the new scoring batch data availability to the respective S3 location.

Figure 4 – Scoring pipeline step machine for linear learner and XGBoost model

Flow description for the automated batch scoring pipeline:

Pre-processing: The input for this step is a data file from the respective S3 location, and does the required pre-processing before calling SageMaker batch transformation job.
Scoring: This step runs the batch transformation job to generate inferences, calling the latest version of the registered model and storing the scoring output in an S3 bucket. Wipro has used the input filter and join functionality of SageMaker batch transformation API. It helped enrich the scoring data for better decision making.

Figure 5 – Input filter and join flow for batch transformation

In this step, the state machine pipeline is launched by a new data file in the S3 bucket.

The notification is configured to be sent to the configured SNS topic email on the failure/success of the state machine status change.

Real-time inference pipeline

Purpose: To enable real-time inferences from both the models’ (Linear Learner and XGBoost) endpoints and get the maximum predicted value (or by using any other custom logic that can be written as a Lambda function) to be returned to the application.
Solution: The Wipro team has implemented reusable architecture using Amazon API Gateway, Lambda, and SageMaker endpoint as shown in Figure 6:

Figure 6 – Real-time inference pipeline

Flow description for the real-time inference pipeline shown in Figure 6:

The payload is sent from the application to Amazon API Gateway, which routes it to the respective Lambda function.
A Lambda function (with an integrated SageMaker custom layer) does the required pre-processing, JSON or CSV payload formatting, and invokes the respective endpoints.
The response is returned to Lambda and sent back to the application through API Gateway.

The customer used this pipeline for small and medium scale models, which included using various types of open-source algorithms. One of the key benefits of SageMaker is that various types of algorithms can be brought into SageMaker and deployed using a bring your own container (BYOC) technique. BYOC involves containerizing the algorithm and registering the image in Amazon Elastic Container Registry (Amazon ECR), and then using the same image to create a container to do training and inference.

Scaling is one of the biggest issues in the machine learning cycle. SageMaker comes with the necessary tools for scaling a model during inference. In the preceding architecture, users need to enable auto-scaling of SageMaker, which eventually handles the workload. To enable auto-scaling, users must provide an auto-scaling policy that asks for the throughput per instance and maximum and minimum instances. Within the policy in place, SageMaker automatically handles the workload for real-time endpoints and switches between instances when needed.

Custom model monitor pipeline

Purpose: The customer team wanted to have automated model monitoring to capture both data drift and model drift. The Wipro team used SageMaker model monitoring to enable both data drift and model drift with a reusable pipeline for real-time inferences and batch transformation.Note that during the development of this solution, the SageMaker model monitoring didn’t provide provision for detecting data or model drift for batch transformation. We have implemented customizations to use the model monitor container for the batch transformations payload.
Solution: The Wipro team implemented a reusable model-monitoring pipeline for real-time and batch inference payloads using AWS Glue to capture the incremental payload and invoke the model monitoring job according to the defined schedule.

Figure 7 – Model monitor step machine

Flow description for the custom model monitor pipeline:
The pipeline runs according to the defined schedule configured through EventBridge.

CSV consolidation – It uses the AWS Glue bookmark feature to detect the presence of incremental payload in the defined S3 bucket of real-time data capture and response and batch data response. It then aggregates that data for further processing.
Evaluate payload – If there is incremental data or payload present for the current run, it invokes the monitoring branch. Otherwise, it bypasses without processing and exits the job.
Post processing – The monitoring branch is designed to have two parallel sub branches—one for data drift and another for model drift.
Monitoring (data drift) – The data drift branch runs whenever there is a payload present. It uses the latest trained model baseline constraints and statistics files generated through the training pipeline for the data features and runs the model monitoring job.
Monitoring (model drift) – The model drift branch runs only when ground truth data is supplied, along with the inference payload. It uses trained model baseline constraints and statistics files generated through the training pipeline for the model quality features and runs the model monitoring job.
Evaluate drift – The outcome of both data and model drift is a constraint violation file that’s evaluated by the evaluate drift Lambda function which sends notification to the respective Amazon SNS topics with details of the drift. Drift data is enriched further with the addition of attributes for reporting purposes. The drift notification emails will look similar to the examples in Figure 8.

Figure 8 – Data and model drift notification message

Figure 9 – Data and model drift notification message

Insights with Amazon QuickSight visualization:

Purpose: The customer wanted to have insights about the data and model drift, relate the drift data to the respective model monitoring jobs, and find out the inference data trends to understand the nature of the interference data trends.
Solution: The Wipro team enriched the drift data by connecting input data with the drift result, which enables triage from drift to monitoring and respective scoring data. Visualizations and dashboards were created using Amazon QuickSight with Amazon Athena as the data source (using the Amazon S3 CSV scoring and drift data).

Figure 10 – Model monitoring visualization architecture

Design considerations:

Use the QuickSight spice dataset for better in-memory performance.
Use QuickSight refresh dataset APIs to automate the spice data refresh.
Implement group-based security for dashboard and analysis access control.
Across accounts, automate deployment using export and import dataset, data source, and analysis API calls provided by QuickSight.

Model monitoring dashboard:

To enable an effective outcome and meaningful insights of the model monitoring jobs, custom dashboards were created for the model monitoring data. The input data points are combined in parallel with inference request data, jobs data, and monitoring output to create a visualization of trends revealed by the model monitoring.

This has really helped the customer team to visualize the aspects of various data features along with the predicted outcome of each batch of inference requests.

Figure 11 – Model monitor dashboard with selection prompts

Figure 12 – Model monitor drift analysis

Conclusion

The implementation explained in this post enabled Wipro to effectively migrate their on-premises models to AWS and build a scalable, automated model development framework.

The use of reusable framework components empowers the data science team to effectively package their work as deployable AWS Step Functions JSON components. Simultaneously, the DevOps teams used and enhanced the automated CI/CD pipeline to facilitate the seamless promotion and retraining of models in higher environments.

Model monitoring component has enabled continuous monitoring of the model performance, and users receive alerts and notifications whenever data or model drift is detected.

The customer’s team is using this MLOps framework to migrate or develop more models and increase their SageMaker adoption.

By harnessing the comprehensive suite of SageMaker services in conjunction with our meticulously designed architecture, customers can seamlessly onboard multiple models, significantly reducing deployment time and mitigating complexities associated with code sharing. Moreover, our architecture simplifies code versioning maintenance, ensuring a streamlined development process.

This architecture handles the entire machine learning cycle, encompassing automated model training, real-time and batch inference, proactive model monitoring, and drift analysis. This end-to-end solution empowers customers to achieve optimal model performance while maintaining rigorous monitoring and analysis capabilities to ensure ongoing accuracy and reliability.

To create this architecture, begin by creating essential resources like Amazon Virtual Private Cloud (Amazon VPC), SageMaker notebooks, and Lambda functions. Make sure to set up appropriate AWS Identity and Access Management (IAM) policies for these resources.

Next, focus on building the components of the architecture—such as training and preprocessing scripts—within SageMaker Studio or Jupyter Notebook. This step involves developing the necessary code and configurations to enable the desired functionalities.

After the architecture’s components are defined, you can proceed with building the Lambda functions for generating inferences or performing post-processing steps on the data.

At the end, use Step Functions to connect the components and establish a smooth workflow that coordinates the running of each step.

About the Authors

Stephen Randolph is a Senior Partner Solutions Architect at Amazon Web Services (AWS). He enables and supports Global Systems Integrator (GSI) partners on the latest AWS technology as they develop industry solutions to solve business challenges. Stephen is especially passionate about Security and Generative AI, and helping customers and partners architect secure, efficient, and innovative solutions on AWS.

Bhajandeep Singh has served as the AWS AI/ML Center of Excellence Head at Wipro Technologies, leading customer engagements to deliver data analytics and AI solutions. He holds the AWS AI/ML Specialty certification and authors technical blogs on AI/ML services and solutions. With experience of leading AWS AI/ML solutions across industries, Bhajandeep has enabled clients to maximize the value of AWS AI/ML services through his expertise and leadership.

Ajay Vishwakarma is an ML engineer for the AWS wing of Wipro’s AI solution practice. He has good experience in building BYOM solution for custom algorithm in SageMaker, end to end ETL pipeline deployment, building chatbots using Lex, Cross account QuickSight resource sharing and building CloudFormation templates for deployments. He likes exploring AWS taking every customers problem as a challenge to explore more and provide solutions to them.

Deploying Attention-Based Vision Transformers to Apple Neural Engine

Apple Machine Learning Research

Generating value from enterprise data: Best practices for Text2SQL and generative AI

Generative AI has opened up a lot of potential in the field of AI. We are seeing numerous uses, including text generation, code generation, summarization, translation, chatbots, and more. One such area that is evolving is using natural language processing (NLP) to unlock new opportunities for accessing data through intuitive SQL queries. Instead of dealing with complex technical code, business users and data analysts can ask questions related to data and insights in plain language. The primary goal is to automatically generate SQL queries from natural language text. To do this, the text input is transformed into a structured representation, and from this representation, a SQL query that can be used to access a database is created.

In this post, we provide an introduction to text to SQL (Text2SQL) and explore use cases, challenges, design patterns, and best practices. Specifically, we discuss the following:

Why do we need Text2SQL
Key components for Text to SQL
Prompt engineering considerations for natural language or Text to SQL
Optimizations and best practices
Architecture patterns

Why do we need Text2SQL?

Today, a large amount of data is available in traditional data analytics, data warehousing, and databases, which may be not easy to query or understand for the majority of organization members. The primary goal of Text2SQL is to make querying databases more accessible to non-technical users, who can provide their queries in natural language.

NLP SQL enables business users to analyze data and get answers by typing or speaking questions in natural language, such as the following:

“Show total sales for each product last month”
“Which products generated more revenue?”
“What percentage of customers are from each region?”

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) via a single API, enabling to easily build and scale Gen AI applications. It can be leveraged to generate SQL queries based on questions similar to the ones listed above and query organizational structured data and generate natural language responses from the query response data.

Key components for text to SQL

Text-to-SQL systems involve several stages to convert natural language queries into runnable SQL:

Natural language processing:
- Analyze the user’s input query
- Extract key elements and intent
- Convert to a structured format
SQL generation:
- Map extracted details into SQL syntax
- Generate a valid SQL query
Database query:
- Run the AI-generated SQL query on the database
- Retrieve results
- Return results to the user

One remarkable capability of Large Language Models (LLMs) is generation of code, including Structured Query Language (SQL) for databases. These LLMs can be leveraged to understand the natural language question and generate a corresponding SQL query as an output. The LLMs will benefit by adopting in-context learning and fine-tuning settings as more data is provided.

The following diagram illustrates a basic Text2SQL flow.

Prompt engineering considerations for natural language to SQL

The prompt is crucial when using LLMs to translate natural language into SQL queries, and there are several important considerations for prompt engineering.

Effective prompt engineering is key to developing natural language to SQL systems. Clear, straightforward prompts provide better instructions for the language model. Providing context that the user is requesting a SQL query along with relevant database schema details enables the model to translate the intent accurately. Including a few annotated examples of natural language prompts and corresponding SQL queries helps guide the model to produce syntax-compliant output. Additionally, incorporating Retrieval Augmented Generation (RAG), where the model retrieves similar examples during processing, further improves the mapping accuracy. Well-designed prompts that give the model sufficient instruction, context, examples, and retrieval augmentation are crucial for reliably translating natural language into SQL queries.

The following is an example of a baseline prompt with code representation of the database from the whitepaper Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies.

/* Given the following database schema : */
CREATE TABLE IF NOT EXISTS " gymnast " (
" Gymnast_ID " int ,
" Floor_Exercise_Points " real ,
" Pommel_Horse_Points " real ,
" Rings_Points " real ,
" Vault_Points " real ,
" Parallel_Bars_Points " real ,
" Horizontal_Bar_Points " real ,
 " Total_Points " real ,
 PRIMARY KEY ( " Gymnast_ID " ) ,
 FOREIGN KEY ( " Gymnast_ID " ) REFERENCES " people " ( " People_ID " )
 ) ;
 CREATE TABLE IF NOT EXISTS " people " (
 " People_ID " int ,
 " Name " text ,
 " Age " real ,
 " Height " real ,
 " Hometown " text ,
 PRIMARY KEY ( " People_ID " )
 ) ;

/* Answer the following : Return the total points of the gymnast with the lowest age .
*/

select t1 . total_points from gymnast as t1 join people as t2 on t1 . gymnast_id = t2 .
people_id order by t2 . age asc limit 1

As illustrated in this example, prompt-based few-shot learning provides the model with a handful of annotated examples in the prompt itself. This demonstrates the target mapping between natural language and SQL for the model. Typically, the prompt would contain around 2–3 pairs showing a natural language query and the equivalent SQL statement. These few examples guide the model to generate syntax-compliant SQL queries from natural language without requiring extensive training data.

Fine-tuning vs. prompt engineering

When building natural language to SQL systems, we often get into the discussion of if fine-tuning the model is the right technique or if effective prompt engineering is the way to go. Both approaches could be considered and selected based on the right set of requirements:

- Fine-tuning – The baseline model is pre-trained on a large general text corpus and then can use instruction-based fine-tuning, which uses labeled examples to improve the performance of a pre-trained foundation model on text-SQL. This adapts the model to the target task. Fine-tuning directly trains the model on the end task but requires many text-SQL examples. You can use supervised fine-tuning based on your LLM to improve the effectiveness of text-to-SQL. For this, you can use several datasets like Spider, WikiSQL, CHASE, BIRD-SQL, or CoSQL.
- Prompt engineering – The model is trained to complete prompts designed to prompt the target SQL syntax. When generating SQL from natural language using LLMs, providing clear instructions in the prompt is important for controlling the model’s output. In the prompt to annotate different components like pointing to columns, schema and then instruct which type of SQL to create. These act like instructions that tell the model how to format the SQL output. The following prompt shows an example where you point table columns and instruct to create a MySQL query:

Table offices, columns = [OfficeId, OfficeName]
Table employees, columns = [OfficeId, EmployeeId,EmployeeName]
Create a MySQL query for all employees in the Machine Learning Department

An effective approach for text-to-SQL models is to first start with a baseline LLM without any task-specific fine-tuning. Well-crafted prompts can then be used to adapt and drive the base model to handle the text-to-SQL mapping. This prompt engineering allows you to develop the capability without needing to do fine-tuning. If prompt engineering on the base model doesn’t achieve sufficient accuracy, fine-tuning on a small set of text-SQL examples can then be explored along with further prompt engineering.

The combination of fine-tuning and prompt engineering may be required if prompt engineering on the raw pre-trained model alone doesn’t meet requirements. However, it’s best to initially attempt prompt engineering without fine-tuning, because this allows rapid iteration without data collection. If this fails to provide adequate performance, fine-tuning alongside prompt engineering is a viable next step. This overall approach maximizes efficiency while still allowing customization if purely prompt-based methods are insufficient.

Optimization and best practices

Optimization and best practices are essential for enhancing effectiveness and ensuring resources are used optimally and the right results are achieved in the best way possible. The techniques help in improving performance, controlling costs, and achieving a better-quality outcome.

When developing text-to-SQL systems using LLMs, optimization techniques can improve performance and efficiency. The following are some key areas to consider:

Caching – To improve latency, cost control, and standardization, you can cache the parsed SQL and recognized query prompts from the text-to-SQL LLM. This avoids reprocessing repeated queries.
Monitoring – Logs and metrics around query parsing, prompt recognition, SQL generation, and SQL results should be collected to monitor the text-to-SQL LLM system. This provides visibility for the optimization example updating the prompt or revisiting the fine-tuning with an updated dataset.
Materialized views vs. tables – Materialized views can simplify SQL generation and improve performance for common text-to-SQL queries. Querying tables directly may result in complex SQL and also result in performance issues, including constant creation of performance techniques like indexes. Additionally, you can avoid performance issues when the same table is used for other areas of application at the same time.
Refreshing data – Materialized views need to be refreshed on a schedule to keep data current for text-to-SQL queries. You can use batch or incremental refresh approaches to balance overhead.
Central data catalog – Creating a centralized data catalog provides a single pane of glass view to an organization’s data sources and will help LLMs select appropriate tables and schemas in order to provide more accurate responses. Vector embeddings created from a central data catalog can be supplied to an LLM along with information requested to generate relevant and precise SQL responses.

By applying optimization best practices like caching, monitoring, materialized views, scheduled refreshing, and a central catalog, you can significantly improve the performance and efficiency of text-to-SQL systems using LLMs.

Architecture patterns

Let’s look at some architecture patterns that can be implemented for a text to SQL workflow.

Prompt engineering

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering.

In this pattern, the user creates prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI-generated SQL, which is validated and then run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock or foundation models in Amazon SageMaker JumpStart.

In this pattern, the user creates a prompt-based few-shot learning that provides the model with annotated examples in the prompt itself, which includes the table and schema details and some sample queries with its results. The LLM uses the provided prompt to return back the AI generated SQL which is validated and run against the database to get the results. This is the most straightforward pattern to get started using prompt engineering. For this, you can use Amazon Bedrock which is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI or JumpStart Foundation Models which offers state-of-the-art foundation models for use cases such as content writing, code generation, question answering, copywriting, summarization, classification, information retrieval, and more

Prompt engineering and fine-tuning

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and fine-tuning.

This flow is similar to the previous pattern, which mostly relies on prompt engineering, but with an additional flow of fine-tuning on the domain-specific dataset. The fine-tuned LLM is used to generate the SQL queries with minimal in-context value for the prompt. For this, you can use SageMaker JumpStart to fine-tune an LLM on a domain-specific dataset in the same way you would train and deploy any model on Amazon SageMaker.

Prompt engineering and RAG

The following diagram illustrates the architecture for generating queries with an LLM using prompt engineering and RAG.

In this pattern, we use Retrieval Augmented Generation using vector embeddings stores, like Amazon Titan Embeddings or Cohere Embed, on Amazon Bedrock from a central data catalog, like AWS Glue Data Catalog, of databases within an organization. The vector embeddings are stored in vector databases like Vector Engine for Amazon OpenSearch Serverless, Amazon Relational Database Service (Amazon RDS) for PostgreSQL with the pgvector extension, or Amazon Kendra. LLMs use the vector embeddings to select the right database, tables, and columns from tables faster when creating SQL queries. Using RAG is helpful when data and relevant information that need to be retrieved by LLMs are stored in multiple separate database systems and the LLM needs to be able to search or query data from all these different systems. This is where providing vector embeddings of a centralized or unified data catalog to the LLMs results in more accurate and comprehensive information returned by the LLMs.

Conclusion

In this post, we discussed how we can generate value from enterprise data using natural language to SQL generation. We looked into key components, optimization, and best practices. We also learned architecture patterns from basic prompt engineering to fine-tuning and RAG. To learn more, refer to Amazon Bedrock to easily build and scale generative AI applications with foundation models

About the Authors

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Arghya Banerjee is a Sr. Solutions Architect at AWS in the San Francisco Bay Area focused on helping customers adopt and use AWS Cloud. Arghya is focused on Big Data, Data Lakes, Streaming, Batch Analytics and AI/ML services and technologies.

Splitwise improves GPU usage by splitting LLM inference phases

The recent surge in large language model (LLM) use is causing significant challenges for cloud providers, requiring them to deploy more GPUs at an unprecedented rate. However, the capacity to provision the power needed to run these GPUs is limited, and with demand for computation surpassing supply, it is not uncommon for user queries to be denied. Therefore, any approach to making the existing infrastructure more efficient—enabling it to serve more queries faster under the same power budget—can have very tangible benefits to both cloud providers and users.

One aspect of LLM inference that currently limits efficient use of resources is that it has two distinct phases with different characteristics: the prompt phase and the token-generation phase. During the prompt phase, LLMs process all user input, or prompts, in parallel, efficiently utilizing GPU compute. However, during the token-generation phase, LLMs generate each output token sequentially and are limited by GPU memory bandwidth. Even when employing state-of-the-art batching mechanisms, the discrepancy between these two phases results in low overall hardware utilization, leading to much higher costs when offering LLMs to users. Figure 1 illustrates the differences between these two phases.

An example of the generative LLM inference process and the two phases associated with it. The initial prompt is “Which is better, pizza or burger?” and it generates the word “Pizza”. The token generation phase generates the words/tokens: “is”, “better”, and “.”. The prompt phase has the following properties: (1) all input tokens are processed in parallel to generate the first output token, (2) compute intensive, and (3) is a smaller part of the end-to-end latency. The token phase is: (1) serialized, (2) memory intensive, and (3) tends to be the majority of the end-to-end latency. — Figure 1. An example of the generative LLM inference process and the two phases associated with it. The prompt phase is computationally intensive, while the token phase is memory intensive.

Splitting the phases with Splitwise

At Azure Research – Systems, we tackled this by creating Splitwise, a technique designed to optimally utilize available hardware by separating the prompt computation and token-generation phases onto separate machines. This approach is underpinned by the insight that prompt processing and token-generation are distinct in their computational, memory, and power requirements. By separating these two phases, we can enhance hardware utilization during both phases. Our paper, “Splitwise: Efficient Generative LLM Inference Using Phase Splitting,” details our methods for developing and testing this technique, including an exploration of how different types of GPUs perform during each phase.

To create a sustainable approach for GPU provisioning, we used Splitwise to design GPU clusters with three primary objectives: maximizing throughput, minimizing costs, and reducing power. In addition to separating the two LLM inference phases into two distinct machine pools, we include a third machine pool for mixed batching across the prompt and token phases, sized dynamically based on real-time computational demands. Lastly, we transferred the state context (i.e., KV-cache in the LLM transformer attention layers) from the prompt to the token machines over InfiniBand without any perceivable latency impact to the user. This high-level system architecture is illustrated in Figure 2.

A high-level diagram of Splitwise architecture. Machines maintained in different pools are dedicated to the corresponding phases. The mixed pool grows and reduces according to runtime demand. KV-cache encompassing the state of the query after the prompt phase is transferred from the prompt machines to the token machines over InfiniBand with very low latency. — Figure 2. A high-level diagram of the Splitwise architecture. Machines maintained in different pools are dedicated to the two distinct LLM inference phases. The mixed pool grows and reduces according to runtime demand. KV-cache encompassing the state of the query after the prompt phase is transferred from the prompt machines to the token machines over InfiniBand with very low latency.

Tests show Splitwise maximizes throughput while lowering costs

To evaluate its performance, we used Splitwise to design clusters with different types of GPUs, including NVIDIA DGX-A100 and DGX-H100, while optimizing cost, power, and throughput under specific latency service level agreements (SLAs) for each query. Table 1 shows the machine types we used for each cluster design. Our application of Splitwise encompassed two use cases: code and conversation using the Llama-2-70B (opens in new tab) and BLOOM-176B (opens in new tab) LLMs.

Table 1. Details for the prompt and token machines we used for each cluster design, evaluated with Splitwise. All values are normalized to a baseline of DGX-A100. DGX-H100 capped is a system with all GPUs power-capped to half the maximum power.

Our findings demonstrate that Splitwise successfully achieves our three goals of maximizing throughput, minimizing costs, and reducing power. Through our evaluation, we observed that the Splitwise cluster design can maximize throughput at the same cost compared with an A100 baseline cluster. Moreover, Splitwise delivers much higher throughput while operating within the same provisioned power constraints as the baseline cluster. Figure 3 shows that compared with Baseline-H100, we can achieve 1.4x higher throughput at 20 percent lower cost. Alternatively, we can achieve 2.35x more throughput with the same cost and power budgets.

Results from baseline and Splitwise clusters optimized for throughput, all with the same power constraints. Splitwise-HH requires the least number of machines. Splitwise-HHcap provides the best throughput. Splitwise-AA is the cheapest option. — Figure 3. Results from baseline and Splitwise clusters optimized for throughput, all with the same power constraints.

Looking forward

Splitwise marks a leap toward efficient, high-performance LLM deployments. By separating the prompt and token phases, we can unlock new potential in GPU use. Looking forward, we at Microsoft Azure envision tailored machine pools driving maximum throughput, reduced costs, and power efficiency, and we will continue to focus on making LLM inference efficient and sustainable.

Our approach is now part of vLLM (opens in new tab) and can also be implemented with other frameworks.

Acknowledgements

This work was done in collaboration with our intern, Pratyush Patel from the University of Washington. We also appreciate the help and guidance of Suriya Kalivardhan, Gopi Kumar, and Chetan Bansal.

The post Splitwise improves GPU usage by splitting LLM inference phases appeared first on Microsoft Research.

Amazon Robotics names 2023 Day One Fellowship Program recipients

Program empowers students from diverse backgrounds to become industry leaders through scholarship, research, and career opportunities.Read More

A New Year of Gaming: GeForce NOW Adds More Than 20 New Titles in January

Celebrate the new year with more cloud gaming. Experience the power and performance of the cloud with more than 20 new games to be added to GeForce NOW in January.

Start with five games available this week, including The Finals from Embark Studios.

And tune in to the NVIDIA Special Address at CES on Monday, Jan. 8, at 8 a.m. PT for the latest on gaming, AI-related news and more.

It’s the Final Countdown

Fight for fame on the world’s biggest stage with Embark Studios’ The Finals. The free-to-play, multiplayer, first-person shooter is newly supported in the cloud this week, with RTX ON for the most cinematic lighting and visuals for GeForce NOW Ultimate and Priority members.

In The Finals, take part in a deadly TV game show that pits contestants against each other as they battle for a huge reward. Fight alongside teammates in virtual arenas that can be altered, exploited and even destroyed. Manipulate the environment as a weapon itself and use it to take down other players. Drive viewers wild with thrilling combat and flair, using tricks like crashing a wrecking ball into opponents.

Harness the power of the cloud and reach the finals anywhere with the ability to stream across devices. Ultimate members can fight for glory with the advantage of longer gaming sessions, the highest frame rates, ray tracing and ultra-low latency.

In With the New

Spotlight games on GeForce NOW — *Flame on! ‘Enshrouded’ launches in the cloud Jan. 24.*

In Enshrouded, become Flameborn, the last ember of hope of a dying race. Awaken, survive the terror of a corrupting fog and reclaim the lost beauty of the kingdom. Venture into a vast world, vanquish punishing bosses, build grand halls and forge a path in this co-op survival action role-playing game for up to 16 players, launching in the cloud Jan. 24.

Don’t miss the five newly supported games joining the GeForce NOW library this week:

Dishonored, for Hungary, Czech Republic and Poland (Steam)
The Finals (Steam)
Redmatch 2 (Steam)
Scorn (Xbox, available for PC Game Pass)
Sniper Elite 5 (Xbox, available for PC Game Pass)

And here’s what’s coming throughout the rest of January:

War Hospital (New release on Steam, Jan. 11)
Prince of Persia: The Lost Crown (New release on Ubisoft, Jan. 18)
Turnip Boy Robs a Bank (New release on Steam and Xbox, available for PC Game Pass, Jan.18)
Stargate: Timekeepers (New release on Steam, Jan. 23)
Enshrouded (New release on Steam, Jan. 24)
Bang-On Balls: Chronicles (Steam)
Firefighting Simulator – The Squad (Steam)
Jected – Rivals (Steam)
The Legend of Nayuta: Boundless Trails (Steam)
RAILGRADE (Steam)
Redmatch 2 (Steam)
Shadow Tactics: Blades of the Shogun (Steam)
Shadow Tactics: Blades of the Shogun – Aiko’s Choice (Steam)
Solasta: Crown of the Magister (Steam)
Survivalist: Invisible Strain (Steam)
Witch It (Steam)
Wobbly Life (Steam)

Doubled in December

In addition to the 70 games announced last month, 34 extra joined GeForce NOW:

Avatar: Frontiers of Pandora (New release on Ubisoft, Dec. 7)
Goat Simulator 3 (New release on Xbox, available on PC Game Pass, Dec. 7)
LEGO Fortnite (New release on Epic Games Store, Dec. 7)
Against the Storm (New release on Xbox, available on PC Game Pass, Dec. 8)
Rocket Racing (New release on Epic Games Store, Dec. 8)
Fortnite Festival (New release on Epic Games Store, Dec. 9)
Stellaris Nexus (New release on Steam, Dec. 12)
Tin Hearts (New release on Xbox, available PC Game Pass, Dec. 12)
Amazing Cultivation Simulator (Xbox, available on the Microsoft Store)
Blasphemous 2 (Epic Games Store)
Century: Age of Ashes (Xbox, available on the Microsoft Store)
Chorus (Xbox, available on the Microsoft Store)
Dungeons 4 (Xbox, available on PC Game Pass)
Edge of Eternity (Xbox, available on the Microsoft Store)
Farming Simulator 17 (Xbox, available on the Microsoft Store)
Farming Simulator 22 (Xbox, available on PC Game Pass)
Flashback 2 (Steam)
Forza Horizon 4 (Steam)
Forza Horizon 5 (Steam, Xbox and available on PC Game Pass)
Hollow Knight (Xbox, available on PC Game Pass)
The Front (Steam)
Martha Iis Dead (Xbox, available on the Microsoft Store)
Minecraft Dungeons (Steam, Xbox and available on PC Game Pass)
Monster Hunter: World (Steam)
Neon Abyss (Xbox, available on PC Game Pass)
Ori and the Will of the Wisps (Steam, Xbox and available on PC Game Pass)
Ori and the Blind Forest: Definitive Edition (Steam)
Raji: An Ancient Epic (Xbox, available on the Microsoft Store)
Remnant: From the Ashes (Xbox, available on PC Game Pass)
Remnant II (Xbox, available on PC Game Pass)
Richman 10 (Xbox, available on the Microsoft Store)
Spirittea (Xbox, available on PC Game Pass)
Surgeon Simulator 2 (Xbox, available on the Microsoft Store)
Sword and Fairy 7 (Xbox, available on PC Game Pass)

Terminator: Dark Fate – Defiance didn’t make it in December due to a change in its publish date. Stay tuned to GFN Thursday for updates.

What are you planning to play this weekend? Let us know on X or in the comments below.

what was the final game you played in 2023?

— NVIDIA GeForce NOW (@NVIDIAGFN) January 3, 2024

Shaping the future of advanced robotics

Introducing AutoRT, SARA-RT, and RT-TrajectoryRead More

Here Come the Games

Don’t Pass This Up

Let That Sync In

Mobile Phones Are Now PC Gaming Rigs

Clouds in Japan

Present at the Creation

How Robots Are Using Generative AI

Two Brains Are Better Than One

Generating Quality Assets and Scenes

Current challenges

Solution architecture

Solution components

Experiment notebooks

Automated training pipeline

Automated batch scoring pipeline

Real-time inference pipeline

Custom model monitor pipeline

Insights with Amazon QuickSight visualization:

Model monitoring dashboard:

Conclusion

About the Authors

Why do we need Text2SQL?

Key components for text to SQL

Prompt engineering considerations for natural language to SQL

Fine-tuning vs. prompt engineering

Optimization and best practices

Architecture patterns

Prompt engineering

Prompt engineering and fine-tuning

Prompt engineering and RAG

Conclusion

About the Authors

Splitting the phases with Splitwise

Abstracts: October 23, 2023

Tests show Splitwise maximizes throughput while lowering costs

Looking forward

Acknowledgements

It’s the Final Countdown

In With the New

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.