Livestreaming Bliss: Wander Warwick’s World This Week ‘In the NVIDIA Studio’

Livestreaming Bliss: Wander Warwick’s World This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

The GeForce RTX 4060 Ti 8GB GPU — part of the GeForce RTX 4060 family announced last week — is now available, starting at $399, from top add-in card providers including ASUS, Colorful, Galax, GIGABYTE, INNO3D, MSI, Palit, PNY and ZOTAC, as well as from system integrators and builders worldwide.

GeForce RTX 4060 Ti 8GB is available now from a range of providers.

GeForce RTX 40 Series GPUs come backed by NVIDIA Studio technologies, including hardware acceleration for 3D, video and AI workflows; optimizations for RTX hardware in over 110 of the most popular creative apps; and exclusive Studio apps like Omniverse, Broadcast and Canvas.

Plus, enhancements for NVIDIA Studio-powered creator apps keep coming in. MAGIX VEGAS Pro software for video editing is receiving a major AI overhaul that will boost performance for all GeForce RTX users.

And prepare to be inspired by U.K.-based livestreamer Warwick, equal parts insightful and inspirational, as they share their AI-based workflow powered by a GeForce RTX GPU and the NVIDIA Broadcast app, this week In the NVIDIA Studio.

At the Microsoft Build conference today NVIDIA unveiled new tools for developers that will make it easier and faster to train and deploy advanced AI on Windows 11 PCs with RTX GPUs.

In addition, the Studio team wants to see how creators #SetTheScene, whether for an uncharted virtual world or a small interior diorama of a room.

Enter the #SetTheScene Studio community challenge. Post original environment art on Facebook, Twitter or Instagram, and use the hashtag #SetTheScene for a chance to be featured on the @NVIDIAStudio or @NVIDIAOmniverse social channels.

VEGAS Pro Gets an AI Assist Powered by RTX

NVIDIA Studio collaborated with MAGIX VEGAS Pro to accelerate AI model performance on Windows PCs with extraordinary results.

VEGAS Pro 20 update 3, released this month, increases the speed of AI effects — such as style transfer, AI upscaling and colorization — with NVIDIA RTX GPUs.

Shorter times are better. Tested on GeForce RTX 4090 GPU, Intel Core i9-12900K with UHD 770.

Style transfer, for example, uses AI to instantly bring to pieces the style of famous artists such as Picasso or van Gogh with a staggering 219% performance increase over the previous version.

Warwick’s World

As this week’s featured In the NVIDIA Studio artist would say, “Welcome to the channnnnnnel!” Warwick is a U.K.-based content streamer who enjoys coffee, Daft Punk, tabletop role-playing games and cats. Alongside their immense talent and wildly entertaining persona lies an extraordinary superpower: empathy.

 

Warwick, like the rest of the world, had to find new ways to connect with people during the pandemic. They decided to pursue streaming as a way to build a community. Their vision was to create a channel that provides laughter and joy, escapism during stressful times and a safe haven for love and expression.

“It’s okay not to be okay,” stressed Warwick. “I’ve lived a lot of my life being told I couldn’t feel a certain way, show emotion or let things get me down. I was told that those were weaknesses that I needed to fight, when in reality they’re our truest strengths: being true to ourselves, feeling and being honest with our emotions.”

Warwick finds inspiration in making a positive contribution to other people’s lives. The thousands of subs speak for themselves.

 

But there are always ways to improve the quality of streams — plus, working and streaming full time can be challenging, as “it can be tough to get all your ideas to completion,” Warwick said.

For maximum efficiency, Warwick deploys their GeForce RTX 3080 GPU, taking advantage of the seventh-generation NVIDIA encoder (NVENC) to independently encode video, which frees up the graphics card to focus on livestreaming.

“NVIDIA is highly regarded in content-creation circles. Using OBS, Adobe Photoshop and Premiere Pro is made better by GeForce GPUs!” — Warwick

“I honestly can’t get enough of it!” said the streamer. “Being able to stream with OBS Studio software using NVENC lets me play the games I want at the quality I want, with other programs running to offer quality content to my community.”

Warwick has also experimented with the NVIDIA Broadcast app, which magically transforms dorms, home offices and more into home studios. They said the Eye Contact effect had “near-magical” results.

“Whenever I need to do ad reads, I find it incredible how well Eye Contact works, considering it’s in beta!” said Warwick. “I love the other Broadcast features that are offered for content creators and beyond.”

Warwick will be a panelist on an event hosted by Top Tier Queer (TTQ), an initiative that celebrates queer advocates in the creator space.

Sponsored by NVIDIA Studio and organized by In the NVIDIA Studio artist WATCHOLLIE, the TTQ event in June will serve as an avenue for queer visibility and advocacy, as well as an opportunity to award one participant with prizes, including a GeForce RTX 3090 GPU, to help amplify their voice even further. Apply for the TTQ initiative now.

Streaming is deeply personal for Warwick. “In my streams and everything I create, I aim to inspire others to know their feelings are valid,” they said. “And because of that, I feel the community that I have really appreciates me and the space that I give them.”

Livestreamer Warwick.

Subscribe to Warwick’s Twitch channel for more content.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Read More

Index your Confluence content using the new Confluence connector V2 for Amazon Kendra

Index your Confluence content using the new Confluence connector V2 for Amazon Kendra

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. An enterprise search solution should be able to pull together data across several structured and unstructured repositories to index and search on.

One such unstructured data repository is Confluence. Confluence is a team workspace that gives knowledge worker teams a place to create, capture, and collaborate on any project or idea. Team spaces help teams structure, organize, and share work, so every team member has visibility into institutional knowledge and access to the information they need.

There are two Confluence offerings:

  • Cloud – This is offered as a software as a service (SaaS) product. It’s always on, continuously updated, and highly secure.
  • Data Center (self-managed) – Here, you host Confluence on your infrastructure, which could be on premises or the cloud. This allows you to keep data within your network and manage it yourself.

We’re excited to announce that you can now use the new Amazon Kendra connector V2 for Confluence to search information stored in your Confluence account both on the cloud and your data center. In this post, we show how to index information stored in Confluence and use the Amazon Kendra intelligent search function. In addition, the ML-powered intelligent search can accurately find information from unstructured documents having natural language narrative content, for which keyword search is not very effective.

What’s new for this version

This version supports OAuth 2.0 authentication in addition to basic authentication for the Cloud edition. For the Data Center (on-premises) edition, we have added OAuth2 in addition to basic authentication and personal access tokens for showing search results based on user access rights. You can benefit from the following features:

  • You can now crawl comments in addition to spaces, pages, blogs, and attachments
  • You now have fine-grained choices for your sync scope—you can specify pages, blogs, comments, and attachments
  • You can choose to import identities (or not)
  • This version offers regex support for choosing entity titles as well as file types
  • You have the choice of multiple Sync modes

Solution overview

With Amazon Kendra, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index a Confluence repository using the Amazon Kendra connector for Confluence. The solution consists of the following steps:

  1. Choose an authentication mechanism.
  2. Configure an app on Confluence and get the connection details.
  3. Store the details in AWS Secrets Manager.
  4. Create a Confluence data source V2 via the Amazon Kendra console.
  5. Index the data in the Confluence repository.
  6. Run a sample query to test the solution.

Prerequisites

To try out the Amazon Kendra connector for Confluence, you need the following:

Choose an authentication mechanism

Choose your preferred authentication method:

  • Basic – This works on both the Cloud and Data Center editions. You need a user ID and a password to configure this method.
  • Personal access token – This option only works for the Data Center edition.
  • OAuth2 – This is more involved and works for both Cloud and Data Center editions.

Gather authentication details

In this section, we show the steps to gather your authentication details depending on your authentication method.

Basic authentication

For basic authentication with the Data Center edition, all you need is your login and password. Make sure your login has privileges to gather all content.

For Cloud edition, your user ID serves as your user login. For your password, you need to get a token. Complete the following steps:

  1. Log in to https://id.atlassian.com/manage-profile/security/api-tokens and choose Create API token.

  1. For Label, enter a name for the token.
  2. Choose Create.

  1. Copy the value and save it to use as your password.

Personal access token

This authentication method works for on premises (Data Center) only. Complete the following steps to acquire authentication details:

  1. Log in to your Confluence URL using the user ID and password that you want Amazon Kendra to use while retrieving content.
  2. Choose the profile icon and choose Settings.

  1. Choose Personal Access Tokens in the navigation pane, then choose Create token.

create token

  1. For Token name, enter a name.
  2. For Expiry date, deselect Automatic expiry.
  3. Choose Create.

  1. Copy the token and save it in a safe place.

To configure Secrets Manager, we use the login URL and this value.

OAuth2 authentication for Confluence Cloud edition

This authentication method follows the full OAuth2.0 (3LO) documentation from Confluence. We first create and configure an app on Confluence and enable it for OAuth2. The process is slightly different for the Cloud and Data Center editions. We then get an authorization token and exchange this for an access token. Finally, we get the client ID, client secret, and client code. Complete the following steps:

  1. Log in to the Confluence app.
  2. Navigate to https://developer.atlassian.com/.
  3. Next to My apps, choose Create and choose OAuth2 Integration.

  1. For Name, enter a name.
  2. Choose Create.

  1. Choose Authorization in the navigation pane.
  2. Choose Add next to your authorization type.

  1. For Callback URL, enter the URL you use to log in to Confluence.
  2. Choose Save changes.

save changess

  1. Under Authorization URL generator, choose Add APIs.

add apis

  1. Next to User identity API, choose Add, then choose Configure.

add permissions

  1. Choose Edit Scopes to configure read scopes for the app.
  2. Select View active user profile and View user profiles.

edit scopes

  1. Choose Permissions in the navigation pane.
  2. Next to Confluence API, choose Add, then choose Configure.
  3. On the Classic scopes tab, choose Edit Scopes.
  4. Select all read, search, and download scopes.
  5. Choose Save.

grannular scopes

  1. On the Granular scopes tab, choose Edit Scopes.
  2. Search for read and select all the scopes found.
  3. Choose Save.

scope choice confirmation

  1. Choose Authorization in the navigation pane.
  2. Next to your authorization type, choose Configure.

configure authorization type

You should see three URLs listed.

generated urls

  1. Copy the code for Granular Confluence API authorization URL.

The following is example code:

https://auth.atlassian.com/authorize?
audience=api.atlassian.com
&client_id=YOUR_CLIENT_ID
&scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO

&redirect_uri=https://YOUR_APP_CALLBACK_URL
&state=YOUR_USER_BOUND_VALUE
&response_type=code
&prompt=consent
  1. If you want to generate a refresh token so that you don’t have to repeat this process, add offline_access (or %20offline_access) to the end of all the scopes in the URL (for example, &scope=REQUESTED_SCOPE%20REQUESTED_SCOPE_TWO%20offline_access).
  2. If you’re okay generating a new token each time, just enter the URL in your browser.
  3. Choose Accept.

choose accept

You’re redirected to your Confluence home page.

  1. Inspect the browser URL and locate code=xxxxx.
  2. Copy this code and save it.

This is the authorization code that we use to exchange with the access token.

copy authorization code

  1. Return to the Atlassian developer console and choose Settings in the navigation pane.
  2. Copy the values of the client ID and secret ID and save them.

We need these values to make a call to exchange the authorization token with the access token.

postman utility

Next, we use the Postman utility to post the authorization code to get the access token. You can use alternate tools like curl to do this as well.

  1. The URL to post the authorization code is https://auth.atlassian.com/oauth/token.
  2. The JSON body to post is as follows:
    {"grant_type": "authorization_code",
    "client_id": "YOUR_CLIENT_ID",
    "client_secret": "YOUR_CLIENT_SECRET",
    "code": "YOUR_AUTHORIZATION_CODE",
    "redirect_uri": "https://YOUR_APP_CALLBACK_URL"}

The grant_type parameter is hard-coded. We collected the values for client_id and client_secret in a previous step. The value for code is the authorization code we collected earlier.

A successful response will return the access token. If you added offline access to the URL earlier, you also get a refresh token.

return response with access token

  1. Save the access token to use when setting up Secrets Manager.

If you’re generating a new token from the refresh token, the current token is valid only for 1 hour. If you need to get a new token, you can start all over again. However, if you have the refresh token, as before, use Postman to post to the following URL: https://auth.atlassian.com/oauth/token. Use the following JSON format for the body of the token:

{"grant_type": "refresh_token",
"client_id": "YOUR_CLIENT_ID",
"client_secret": "YOUR_CLIENT_SECRET",
"refresh_token": "YOUR_REFRESH_TOKEN"}

The call will return a new access token

new access token

OAuth2 authentication for Confluence Data Center edition

If using the Data Center edition with OAuth2 authentication, complete the following steps:

  1. Log in to Confluence Data Center edition.
  2. Choose the gear icon, then choose General configuration.
  3. In the navigation pane, choose Application links, then choose Create link.
  4. In the Create link pop-up window, select External application and Incoming, then choose Continue.
  5. For Name, enter a name.
  6. For Redirect URL, enter https://httpbin.org/.
  7. Choose Save.
  8. Copy and save the values for the client ID and client secret.
  9. On a separate browser tab, open the URL https://example-app.com/pkce.
  10. Choose Generate Random String and Calculate Hash.
  11. Copy the value under Code Challenge.

  12. Return to your original tab.
  13. Use the following URL to get the authorization code:
    https://<confluence url>/rest/oauth2/latest/authorize
    ?client_id=CLIENT_ID
    &redirect_uri=REDIRECT_URI
    &response_type=code
    &scope=SCOPE
    &code_challenge=CODE_CHALLENGE
    &code_challenge_method=S256

Use the client ID you copied earlier, and https://httpbin.org for the redirect URI. For CODE_CHALLENGE, enter the code you copied earlier.

  1. Choose Allow.

You’re redirected to httpbin.org.

  1. Save the code to use in the next step.

  1. To get the access token and refresh token, use a tool such as curl or Postman to post the following values to https://<your confluence URL>/rest/oauth2/latest/token:
    grant_type: authorization_code
    client_id: YOUR_CLIENT_ID
    client_secret: YOUR_CLIENT_SECRET
    code: YOUR_AUTHORIZATION_CODE
    code_verifier: CODE_VERIFIER
    redirect_uri: YOUR_REDIRECT_URL

Use the client ID, client secret, and authorization code you saved earlier. For CODE_VERIFIER, enter the value from when you generated the code challenge.

  1. Copy the access token and refresh token to use later

copy access and refresh tokens

The access token and refresh token are valid only for 1 hour. To refresh the token, post the following code to the same URL to get new values:

grant_type: refresh_token
client_id: YOUR_CLIENT_ID
client_secret: YOUR_CLIENT_SECRET
refresh_token: REFRESH_TOKEN
redirect_uri: YOUR_REDIRECT_URL

The new tokens are valid for 1 hour.

new tokens

Store Confluence credentials in Secrets Manager

To store your Confluence credentials in Secrets Manager, compete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. Select Other type of secret.

other type

  1. Depending on the type of secret, enter the key-values as follows:
    • For Confluence Cloud basic authentication, enter the following key-value pairs (note that the password is not the login password, but the token you created earlier):
      "username" : "<your login username>",
      
      "password" : "<your token value>"

    • For Confluence Cloud OAuth authentication, enter the following key-value pairs:
      "confluenceAppKey" : “<your clientid>”
      
      "confluenceAppSecret" : “<your client Secret>”
      
      "confluenceAccessToken" : “<your access token>”
      
      "confluenceRefreshToken" : “<your refresh token>”

    • For Confluence Data Center basic authentication, enter the following key-value pairs:
      "username" : "<login username>"
      
      "password" : "<login password>"

    • For Confluence Data Center personal access token authentication, enter the following key-value pairs:
      "patToken" :"<your personal access token>"

    • For Confluence Data Center OAuth authentication, enter the following key-value pairs:
      "confluenceAppKey" : "<your client id>"
      
      "confluenceAppSecret" : “<your Client Secret>”
      
      "confluenceAccessToken" : “<your Access Token>"
      
      "confluenceRefreshToken" : “<your refresh token>”

  1. Choose Next.

choose next

  1. For Secret name, enter a name (for example, AmazonKendra-my-confluence-secret).
  2. Enter an optional description.
  3. Choose Next.

configure secret

  1. In the Configure rotation section, keep all settings at their defaults and choose Next.

configure rotation

  1. On the Review page, choose Store.

Configure the Amazon Kendra connector for Confluence

To configure the Amazon Kendra connector, complete the following steps:

  1. On the Amazon Kendra console, choose Create an Index.

create an index

  1. For Index name, enter a name for the index (for example, my-confluence-index).
  2. Enter an optional description.
  3. For Role name, enter an IAM role name.
  4. Configure optional encryption settings and tags.
  5. Choose Next.

specify index details

  1. In the Configure user access control section, leave the settings at their defaults and choose Next.

configure user access control

  1. In the Specify provisioning section, select Developer edition and choose Next.

specify provisioning

  1. On the review page, choose Create.

This creates and propagates the IAM role and then creates the Amazon Kendra index, which can take up to 30 minutes.

index created

Create a Confluence data source

Complete the following steps to create your data source:

  1. On the Amazon Kendra console, choose Data sources in the navigation pane.
  2. Under Confluence connector V2.0, choose Add connector.

.

  1. For Data source name, enter a name (for example, my-Confluence-data-source).
  2. Enter an optional description.
  3. Choose Next.

specify data source details

  1. Choose either Confluence Cloud or Confluence Server depending on your data source.
  2. For Authentication, choose your authentication option.
  3. Select Identity crawler is on.
  4. For IAM role¸ choose Create a new role.
  5. For Role name, enter a name (for example, AmazonKendra-my-confluence-datasource-role).
  6. Choose Next.

define access and security

For Confluence Data Center and Cloud editions, we can add additional optional information (not shown) like the VPC. For Data Center edition only, we can add additional information for the web proxy. There is also an additional authentication option if using a personal access token that is valid only for Data Center and not Cloud edition.

  1. For Sync scope, select all the content to sync.
  2. For Sync mode, select Full sync.
  3. For Frequency, choose Run on demand.
  4. Choose Next.

configure sync settings

  1. Optionally, you can set mapping fields.

Mapping fields is a useful exercise where you can substitute field names to values that are user-friendly and fit in your organization’s vocabulary.

  1. For this post, keep all defaults and choose Next.

set field mappings

  1. Review the settings and choose Add data source.
  2. To sync the data source, choose Sync now.

sync data source

A banner message appears when the sync is complete.

Test the solution

Now that you have ingested the content from your Confluence account into your Amazon Kendra index, you can test some queries. For the purposes of our test, we have created a Confluence website with two teams: team1 with the member Analyst1 and team2 with the member Analyst2.

  1. On the Amazon Kendra console, navigate to your index and choose Search indexed content.
  2. Enter a sample search query and review your search results (your results will vary based on the contents of your account).

simple search

The Confluence connector also crawls local identity information from Confluence. You can use this feature to narrow down your query by user. Confluence offers comprehensive visibility options. Users can choose their content to be seen by other users, at a space level, or by groups. When you filter your searches by users, the query returns only those documents that the user has access to at the time of ingestion.

  1. To use this feature, expand Test query with user name or groups and choose Apply user name or groups.
  2. Enter the user name of your user and choose Apply.

Note that for Confluence Data Center edition, the user name is the email ID.

apply user name or groups

Rerun your search query.

This brings you a filtered set of results. Notice we bring back just 62 results.

filtered resultw

We now go back and restrict Bob Straham to just be able to access his workspace and run the search again.

bob's results

Notice that we get just a subset of the results because the search is restricted to just Bob’s content.

When fronting Amazon Kendra with an application such as an application built using Experience Builder, you can pass the user identity (in the form of the email ID for Cloud edition or user name for Data Center edition) to Amazon Kendra to ensure that each user only sees content specific to their user ID. Alternately, you can use AWS IAM Identity Center (successor to AWS Single Sign-On) to control user context being passed to Amazon Kendra to limit queries by user.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Confluence account.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Confluence V2, delete that data source.

Conclusion

With the new Confluence connector V2 for Amazon Kendra, organizations can tap into the repository of information stored in their account securely using intelligent search powered by Amazon Kendra.

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide. For more information on how you can create, modify, or delete metadata and content when ingesting your data from Confluence, refer to Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


About the author

Ashish Lagwankar is a Senior Enterprise Solutions Architect at AWS. His core interests include AI/ML, serverless, and container technologies. Ashish is based in the Boston, MA, area and enjoys reading, outdoors, and spending time with his family.

Read More

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

Accelerate machine learning time to value with Amazon SageMaker JumpStart and PwC’s MLOps accelerator

This is a guest blog post co-written with Vik Pant and Kyle Bassett from PwC.

With organizations increasingly investing in machine learning (ML), ML adoption has become an integral part of business transformation strategies. A recent PwC CEO survey unveiled that 84% of Canadian CEOs agree that artificial intelligence (AI) will significantly change their business within the next 5 years, making this technology more critical than ever. However, implementing ML into production comes with various considerations, notably being able to navigate the world of AI safely, strategically, and responsibly. One of the first steps and notably a great challenge to becoming AI powered is effectively developing ML pipelines that can scale sustainably in the cloud. Thinking of ML in terms of pipelines that generate and maintain models rather than models by themselves helps build versatile and resilient prediction systems that are better able to withstand meaningful changes in relevant data over time.

Many organizations start their journey into the world of ML with a model-centric viewpoint. In the early stages of building an ML practice, the focus is on training supervised ML models, which are mathematical representations of relationships between inputs (independent variables) and outputs (dependent variables) that are learned from data (typically historical). Models are mathematical artifacts that take input data, perform calculations and computations on them, and generate predictions or inferences.

Although this approach is a reasonable and relatively simple starting point, it isn’t inherently scalable or intrinsically sustainable due to the manual and ad hoc nature of model training, tuning, testing, and trialing activities. Organizations with greater maturity in the ML domain adopt an ML operations (MLOps) paradigm that incorporates continuous integration, continuous delivery, continuous deployment, and continuous training. Central to this paradigm is a pipeline-centric viewpoint for developing and operating industrial-strength ML systems.

In this post, we start with an overview of MLOps and its benefits, describe a solution to simplify its implementations, and provide details on the architecture. We finish with a case study highlighting the benefits realize by a large AWS and PwC customer who implemented this solution.

Background

An MLOps pipeline is a set of interrelated sequences of steps that are used to build, deploy, operate, and manage one or more ML models in production. Such a pipeline encompasses the stages involved in building, testing, tuning, and deploying ML models, including but not limited to data preparation, feature engineering, model training, evaluation, deployment, and monitoring. As such, an ML model is the product of an MLOps pipeline, and a pipeline is a workflow for creating one or more ML models. Such pipelines support structured and systematic processes for building, calibrating, assessing, and implementing ML models, and the models themselves generate predictions and inferences. By automating the development and operationalization of stages of pipelines, organizations can reduce the time to delivery of models, increase the stability of the models in production, and improve collaboration between teams of data scientists, software engineers, and IT administrators.

Solution overview

AWS offers a comprehensive portfolio of cloud-native services for developing and running MLOps pipelines in a scalable and sustainable manner. Amazon SageMaker comprises a comprehensive portfolio of capabilities as a fully managed MLOps service to enable developers to create, train, deploy, operate, and manage ML models in the cloud. SageMaker covers the entire MLOps workflow, from collecting to preparing and training the data with built-in high-performance algorithms and sophisticated automated ML (AutoML) experiments so that companies can choose specific models that fit their business priorities and preferences. SageMaker enables organizations to collaboratively automate the majority of their MLOps lifecycle so that they can focus on business results without risking project delays or escalating costs. In this way, SageMaker allows businesses to focus on results without worrying about infrastructure, development, and maintenance associated with powering industrial-strength prediction services.

SageMaker includes Amazon SageMaker JumpStart, which offers out-of-the-box solution patterns for organizations seeking to accelerate their MLOps journey. Organizations can start with pre-trained and open-source models that can be fine-tuned to meet their specific needs through retraining and transfer learning. Additionally, JumpStart provides solution templates designed to tackle common use cases, as well as example Jupyter notebooks with prewritten starter code. These resources can be accessed by simply visiting the JumpStart landing page within Amazon SageMaker Studio.

PwC has built a pre-packaged MLOps accelerator that further speeds up time to value and increases return on investment for organizations that use SageMaker. This MLOps accelerator enhances the native capabilities of JumpStart by integrating complementary AWS services. With a comprehensive suite of technical artifacts, including infrastructure as code (IaC) scripts, data processing workflows, service integration code, and pipeline configuration templates, PwC’s MLOps accelerator simplifies the process of developing and operating production-class prediction systems.

Architecture overview

The inclusion of cloud-native serverless services from AWS is prioritized into the architecture of the PwC MLOps accelerator. The entry point into this accelerator is any collaboration tool, such as Slack, that a data scientist or data engineer can use to request an AWS environment for MLOps. Such a request is parsed and then fully or semi-automatically approved using workflow features in that collaboration tool. After a request is approved, its details are used for parameterizing IaC templates. The source code for these IaC templates is managed in AWS CodeCommit. These parameterized IaC templates are submitted to AWS CloudFormation for modeling, provisioning, and managing stacks of AWS and third-party resources.

The following diagram illustrates the workflow.

After AWS CloudFormation provisions an environment for MLOps on AWS, the environment is ready for use by data scientists, data engineers, and their collaborators. The PWC accelerator includes predefined roles on AWS Identity and Access Management (IAM) that are related to MLOps activities and tasks. These roles specify the services and resources in the MLOps environment that can be accessed by various users based on their job profiles. After accessing the MLOps environment, users can access any of the modalities on SageMaker to perform their duties. These include SageMaker notebook instances, Amazon SageMaker Autopilot experiments, and Studio. You can benefit from all SageMaker features and functions, including model training, tuning, evaluation, deployment, and monitoring.

The accelerator also includes connections with Amazon DataZone for sharing, searching, and discovering data at scale across organizational boundaries to generate and enrich models. Similarly, data for training, testing, validating, and detecting model drift can source a variety of services, including Amazon Redshift, Amazon Relational Database Service (Amazon RDS), Amazon Elastic File System (Amazon EFS), and Amazon Simple Storage Service (Amazon S3). Prediction systems can be deployed in many ways, including as SageMaker endpoints directly, SageMaker endpoints wrapped in AWS Lambda functions, and SageMaker endpoints invoked through custom code on Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Compute Cloud (Amazon EC2). Amazon CloudWatch is used to monitor the environment for MLOps on AWS in a comprehensive manner to observe alarms, logs, and events data from across the complete stack (applications, infrastructure, network, and services).

The following diagram illustrates this architecture.

Case study

In this section, we share an illustrative case study from a large insurance company in Canada. It focuses on the transformative impact of the implementation of PwC Canada’s MLOps accelerator and JumpStart templates.

This client partnered with PwC Canada and AWS to address challenges with inefficient model development and ineffective deployment processes, lack of consistency and collaboration, and difficulty in scaling ML models. The implementation of this MLOps Accelerator in concert with JumpStart templates achieved the following:

  • End-to-end automation – Automation nearly halved the amount of time for data preprocessing, model training, hyperparameter tuning, and model deployment and monitoring
  • Collaboration and standardization – Standardized tools and frameworks to promote consistency across the organization nearly doubled the rate of model innovation
  • Model governance and compliance – They implemented a model governance framework to ensure that all ML models met regulatory requirements and adhered to the company’s ethical guidelines, which reduced risk management costs by 40%
  • Scalable cloud infrastructure – They invested in scalable infrastructure to effectively manage massive data volumes and deploy multiple ML models simultaneously, reducing infrastructure and platform costs by 50%
  • Rapid deployment – The prepackaged solution reduced time to production by 70%

By delivering MLOps best practices through rapid deployment packages, our client was able to de-risk their MLOps implementation and unlock the full potential of ML for a range of business functions, such as risk prediction and asset pricing. Overall, the synergy between the PwC MLOps accelerator and JumpStart enabled our client to streamline, scale, secure, and sustain their data science and data engineering activities.

It should be noted that the PwC and AWS solution is not industry specific and is relevant across industries and sectors.

Conclusion

SageMaker and its accelerators allow organizations to enhance the productivity of their ML program. There are many benefits, including but not limited to the following:

  • Collaboratively create IaC, MLOps, and AutoML use cases to realize business benefits from standardization
  • Enable efficient experimental prototyping, with and without code, to turbocharge AI from development to deployment with IaC, MLOps, and AutoML
  • Automate tedious, time-consuming tasks such as feature engineering and hyperparameter tuning with AutoML
  • Employ a continuous model monitoring paradigm to align the risk of ML model usage with enterprise risk appetite

Please contact the authors of this post, AWS Advisory Canada, or PwC Canada to learn more about Jumpstart and PwC’s MLOps accelerator.


About the Authors

Vik PantVik is a Partner in the Cloud & Data practice at PwC Canada He earned a PhD in Information Science from the University of Toronto. He is convinced that there is a telepathic connection between his biological neural network and the artificial neural networks that he trains on SageMaker. Connect with him on LinkedIn.

Kyle is a Partner in the Cloud & Data practice at PwC Canada, along with his crack team of tech alchemists, they weave enchanting MLOPs solutions that mesmerize clients with accelerated business value. Armed with the power of artificial intelligence and a sprinkle of wizardry, Kyle turns complex challenges into digital fairy tales, making the impossible possible. Connect with him on LinkedIn.

Francois ChevallierFrancois is a Principal Advisory Consultant with AWS Professional Services Canada and the Canadian practice lead for Data and Innovation Advisory. He guides customers to establish and implement their overall cloud journey and their data programs, focusing on vision, strategy, business drivers, governance, target operating models, and roadmaps. Connect with him on LinkedIn.

Read More

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

Deploy generative AI models from Amazon SageMaker JumpStart using the AWS CDK

The seeds of a machine learning (ML) paradigm shift have existed for decades, but with the ready availability of virtually infinite compute capacity, a massive proliferation of data, and the rapid advancement of ML technologies, customers across industries are rapidly adopting and using ML technologies to transform their businesses.

Just recently, generative AI applications have captured everyone’s attention and imagination. We are truly at an exciting inflection point in the widespread adoption of ML, and we believe every customer experience and application will be reinvented with generative AI.

Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. Like all AI, generative AI is powered by ML models—very large models that are pre-trained on vast corpora of data and commonly referred to as foundation models (FMs).

The size and general-purpose nature of FMs make them different from traditional ML models, which typically perform specific tasks, like analyzing text for sentiment, classifying images, and forecasting trends.

With tradition ML models, in order to achieve each specific task, you need to gather labeled data, train a model, and deploy that model. With foundation models, instead of gathering labeled data for each model and training multiple models, you can use the same pre-trained FM to adapt various tasks. You can also customize FMs to perform domain-specific functions that are differentiating to your businesses, using only a small fraction of the data and compute required to train a model from scratch.

Generative AI has the potential to disrupt many industries by revolutionizing the way content is created and consumed. Original content production, code generation, customer service enhancement, and document summarization are typical use cases of generative AI.

Amazon SageMaker JumpStart provides pre-trained, open-source models for a wide range of problem types to help you get started with ML. You can incrementally train and tune these models before deployment. JumpStart also provides solution templates that set up infrastructure for common use cases, and executable example notebooks for ML with Amazon SageMaker.

With over 600 pre-trained models available and growing every day, JumpStart enables developers to quickly and easily incorporate cutting-edge ML techniques into their production workflows. You can access the pre-trained models, solution templates, and examples through the JumpStart landing page in Amazon SageMaker Studio. You can also access JumpStart models using the SageMaker Python SDK. For information about how to use JumpStart models programmatically, see Use SageMaker JumpStart Algorithms with Pretrained Models.

In April 2023, AWS unveiled Amazon Bedrock, which provides a way to build generative AI-powered apps via pre-trained models from startups including AI21 Labs, Anthropic, and Stability AI. Amazon Bedrock also offers access to Titan foundation models, a family of models trained in-house by AWS. With the serverless experience of Amazon Bedrock, you can easily find the right model for your needs, get started quickly, privately customize FMs with your own data, and easily integrate and deploy them into your applications using the AWS tools and capabilities you’re familiar with (including integrations with SageMaker ML features like Amazon SageMaker Experiments to test different models and Amazon SageMaker Pipelines to manage your FMs at scale) without having to manage any infrastructure.

In this post, we show how to deploy image and text generative AI models from JumpStart using the AWS Cloud Development Kit (AWS CDK). The AWS CDK is an open-source software development framework to define your cloud application resources using familiar programming languages like Python.

We use the Stable Diffusion model for image generation and the FLAN-T5-XL model for natural language understanding (NLU) and text generation from Hugging Face in JumpStart.

Solution overview

The web application is built on Streamlit, an open-source Python library that makes it easy to create and share beautiful, custom web apps for ML and data science. We host the web application using Amazon Elastic Container Service (Amazon ECS) with AWS Fargate and it is accessed via an Application Load Balancer. Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters or virtual machines. The generative AI model endpoints are launched from JumpStart images in Amazon Elastic Container Registry (Amazon ECR). Model data is stored on Amazon Simple Storage Service (Amazon S3) in the JumpStart account. The web application interacts with the models via Amazon API Gateway and AWS Lambda functions as shown in the following diagram.

API Gateway provides the web application and other clients a standard RESTful interface, while shielding the Lambda functions that interface with the model. This simplifies the client application code that consumes the models. The API Gateway endpoints are publicly accessible in this example, allowing for the possibility to extend this architecture to implement different API access controls and integrate with other applications.

In this post, we walk you through the following steps:

  1. Install the AWS Command Line Interface (AWS CLI) and AWS CDK v2 on your local machine.
  2. Clone and set up the AWS CDK application.
  3. Deploy the AWS CDK application.
  4. Use the image generation AI model.
  5. Use the text generation AI model.
  6. View the deployed resources on the AWS Management Console.

We provide an overview of the code in this project in the appendix at the end of this post.

Prerequisites

You must have the following prerequisites:

You can deploy the infrastructure in this tutorial from your local computer or you can use AWS Cloud9 as your deployment workstation. AWS Cloud9 comes pre-loaded with AWS CLI, AWS CDK and Docker. If you opt for AWS Cloud9, create the environment from the AWS console.

The estimated cost to complete this post is $50, assuming you leave the resources running for 8 hours. Make sure you delete the resources you create in this post to avoid ongoing charges.

Install the AWS CLI and AWS CDK on your local machine

If you don’t already have the AWS CLI on your local machine, refer to Installing or updating the latest version of the AWS CLI and Configuring the AWS CLI.

Install the AWS CDK Toolkit globally using the following node package manager command:

$ npm install -g aws-cdk-lib@latest

Run the following command to verify the correct installation and print the version number of the AWS CDK:

$ cdk --version

Make sure you have Docker installed on your local machine. Issue the following command to verify the version:

$ docker --version

Clone and set up the AWS CDK application

On your local machine, clone the AWS CDK application with the following command:

$ git clone https://github.com/aws-samples/generative-ai-sagemaker-cdk-demo.git

Navigate to the project folder:

$ cd generative-ai-sagemaker-cdk-demo

Before we deploy the application, let’s review the directory structure:

.
├── LICENSE
├── README.md
├── app.py
├── cdk.json
├── code
│   ├── lambda_txt2img
│   │   └── txt2img.py
│   └── lambda_txt2nlu
│       └── txt2nlu.py
├── construct
│   └── sagemaker_endpoint_construct.py
├── images
│   ├── architecture.png
│   ├── ...
├── requirements-dev.txt
├── requirements.txt
├── source.bat
├── stack
│   ├── __init__.py
│   ├── generative_ai_demo_web_stack.py
│   ├── generative_ai_txt2img_sagemaker_stack.py
│   ├── generative_ai_txt2nlu_sagemaker_stack.py
│   └── generative_ai_vpc_network_stack.py
├── tests
│   ├── __init__.py
│   └── ...
└── web-app
    ├── Dockerfile
    ├── Home.py
    ├── configs.py
    ├── img
    │   └── sagemaker.png
    ├── pages
    │   ├── 2_Image_Generation.py
    │   └── 3_Text_Generation.py
    └── requirements.txt

The stack folder contains the code for each stack in the AWS CDK application. The code folder contains the code for the Lambda functions. The repository also contains the web application located under the folder web-app.

The cdk.json file tells the AWS CDK Toolkit how to run your application.

This application was tested in the us-east-1 Region, but it should work in any Region that has the required services and inference instance type ml.g4dn.4xlarge specified in app.py.

Set up a virtual environment

This project is set up like a standard Python project. Create a Python virtual environment using the following code:

$ python3 -m venv .venv

Use the following command to activate the virtual environment:

$ source .venv/bin/activate

If you’re on a Windows platform, activate the virtual environment as follows:

% .venvScriptsactivate.bat

After the virtual environment is activated, upgrade pip to the latest version:

$ python3 -m pip install --upgrade pip

Install the required dependencies:

$ pip install -r requirements.txt

Before you deploy any AWS CDK application, you need to bootstrap a space in your account and the Region you’re deploying into. To bootstrap in your default Region, issue the following command:

$ cdk bootstrap

If you want to deploy into a specific account and Region, issue the following command:

$ cdk bootstrap aws://ACCOUNT-NUMBER/REGION

For more information about this setup, visit Getting started with the AWS CDK.

AWS CDK application stack structure

The AWS CDK application contains multiple stacks, as shown in the following diagram.

You can list the stacks in your AWS CDK application with the following command:

$ cdk list

GenerativeAiTxt2imgSagemakerStack
GenerativeAiTxt2nluSagemakerStack
GenerativeAiVpcNetworkStack
GenerativeAiDemoWebStack

The following are other useful AWS CDK commands:

  • cdk ls – Lists all stacks in the app
  • cdk synth – Emits the synthesized AWS CloudFormation template
  • cdk deploy – Deploys this stack to your default AWS account and Region
  • cdk diff – Compares the deployed stack with current state
  • cdk docs – Opens the AWS CDK documentation

The next section shows you how to deploy the AWS CDK application.

Deploy the AWS CDK application

The AWS CDK application will be deployed to the default Region based on your workstation configuration. If you want to force the deployment in a specific Region, set your AWS_DEFAULT_REGION environment variable accordingly.

At this point, you can deploy the AWS CDK application. First you launch the VPC network stack:

$ cdk deploy GenerativeAiVpcNetworkStack

If you are prompted, enter y to proceed with the deployment. You should see a list of AWS resources that are being provisioned in the stack. This step takes around 3 minutes to complete.

Then you launch the web application stack:

$ cdk deploy GenerativeAiDemoWebStack

After analyzing the stack, the AWS CDK will display the resource list in the stack. Enter y to proceed with the deployment. This step takes around 5 minutes.

Note down the WebApplicationServiceURL from the output to use later. You can also retrieve it on the AWS CloudFormation console, under the GenerativeAiDemoWebStack stack outputs.

Now, launch the image generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2imgSagemakerStack

This step takes around 8 minutes. The image generation model endpoint is deployed, we can now use it.

Use the image generation AI model

The first example demonstrates how to utilize Stable Diffusion, a powerful generative modeling technique that enables the creation of high-quality images from text prompts.

  1. Access the web application using the WebApplicationServiceURL from the output of GenerativeAiDemoWebStack in your browser.
  2. In the navigation pane, choose Image Generation.
  3. The SageMaker Endpoint Name and API GW Url fields will be pre-populated, but you can change the prompt for the image description if you’d like.
  4. Choose Generate image.
  5. The application will make a call to the SageMaker endpoint. It takes a few seconds. A picture with the characteristics in your image description will be displayed.

Use the text generation AI model

The second example centers around using the FLAN-T5-XL model, which is a foundation or large language model (LLM), to achieve in-context learning for text generation while also addressing a broad range of natural language understanding (NLU) and natural language generation (NLG) tasks.

Some environments might limit the number of endpoints you can launch at a time. If this is the case, you can launch one SageMaker endpoint at a time. To stop a SageMaker endpoint in the AWS CDK app, you have to destroy the deployed endpoint stack and before launching the other endpoint stack. To turn down the image generation AI model endpoint, issue the following command:

$ cdk destroy GenerativeAiTxt2imgSagemakerStack

Then launch the text generation AI model endpoint stack:

$ cdk deploy GenerativeAiTxt2nluSagemakerStack

Enter y at the prompts.

After the text generation model endpoint stack is launched, complete the following steps:

  1. Go back to the web application and choose Text Generation in the navigation pane.
  2. The Input Context field is pre-populated with a conversation between a customer and an agent regarding an issue with the customers phone, but you can enter your own context if you’d like.
  3. Below the context, you will find some pre-populated queries on the drop-down menu. Choose a query and choose Generate Response.
  4. You can also enter your own query in the Input Query field and then choose Generate Response.

View the deployed resources on the console

On the AWS CloudFormation console, choose Stacks in the navigation pane to view the stacks deployed.

On the Amazon ECS console, you can see the clusters on the Clusters page.

On the AWS Lambda console, you can see the functions on the Functions page.

On the API Gateway console, you can see the API Gateway endpoints on the APIs page.

On the SageMaker console, you can see the deployed model endpoints on the Endpoints page.

When the stacks are launched, some parameters are generated. These are stored in the AWS Systems Manager Parameter Store. To view them, choose Parameter Store in the navigation pane on the AWS Systems Manager console.

Clean up

To avoid unnecessary cost, clean up all the infrastructure created with the following command on your workstation:

$ cdk destroy --all

Enter y at the prompt. This step takes around 10 minutes. Check if all resources are deleted on the console. Also delete the assets S3 buckets created by the AWS CDK on the Amazon S3 console as well as the assets repositories on Amazon ECR.

Conclusion

As demonstrated in this post, you can use the AWS CDK to deploy generative AI models in JumpStart. We showed an image generation example and a text generation example using a user interface powered by Streamlit, Lambda, and API Gateway.

You can now build your generative AI projects using pre-trained AI models in JumpStart. You can also extend this project to fine-tune the foundation models for your use case and control access to API Gateway endpoints.

We invite you to test the solution and contribute to the project on GitHub. Share your thoughts on this tutorial in the comments!

License summary

This sample code is made available under a modified MIT license. See the LICENSE file for more information. Also, review the respective licenses for the stable diffusion and flan-t5-xl models on Hugging Face.


About the authors

Hantzley Tauckoor is an APJ Partner Solutions Architecture Leader based in Singapore. He has 20 years’ experience in the ICT industry spanning multiple functional areas, including solutions architecture, business development, sales strategy, consulting, and leadership. He leads a team of Senior Solutions Architects that enable partners to develop joint solutions, build technical capabilities, and steer them through the implementation phase as customers migrate and modernize their applications to AWS.

Kwonyul Choi is a CTO at BABITALK, a Korean beauty care platform startup, based in Seoul. Prior to this role, Kownyul worked as Software Development Engineer at AWS with a focus on AWS CDK and Amazon SageMaker.

Arunprasath Shankar is a Senior AI/ML Specialist Solutions Architect with AWS, helping global customers scale their AI solutions effectively and efficiently in the cloud. In his spare time, Arun enjoys watching sci-fi movies and listening to classical music.

Satish Upreti is a Migration Lead PSA and Security SME in the partner organization in APJ. Satish has 20 years of experience spanning on-premises private cloud and public cloud technologies. Since joining AWS in August 2020 as a migration specialist, he provides extensive technical advice and support to AWS partners to plan and implement complex migrations.


Appendix: Code walkthrough

In this section, we provide an overview of the code in this project.

AWS CDK application

The main AWS CDK application is contained in the app.py file in the root directory. The project consists of multiple stacks, so we have to import the stacks:

#!/usr/bin/env python3
import aws_cdk as cdk

from stack.generative_ai_vpc_network_stack import GenerativeAiVpcNetworkStack
from stack.generative_ai_demo_web_stack import GenerativeAiDemoWebStack
from stack.generative_ai_txt2nlu_sagemaker_stack import GenerativeAiTxt2nluSagemakerStack
from stack.generative_ai_txt2img_sagemaker_stack import GenerativeAiTxt2imgSagemakerStack

We define our generative AI models and get the related URIs from SageMaker:

from script.sagemaker_uri import *
import boto3

region_name = boto3.Session().region_name
env={"region": region_name}

#Text to Image model parameters
TXT2IMG_MODEL_ID = "model-txt2img-stabilityai-stable-diffusion-v2-1-base"
TXT2IMG_INFERENCE_INSTANCE_TYPE = "ml.g4dn.4xlarge" 
TXT2IMG_MODEL_TASK_TYPE = "txt2img"
TXT2IMG_MODEL_INFO = get_sagemaker_uris(model_id=TXT2IMG_MODEL_ID,
                                        model_task_type=TXT2IMG_MODEL_TASK_TYPE, 
                                        instance_type=TXT2IMG_INFERENCE_INSTANCE_TYPE,
                                        region_name=region_name)

#Text to NLU image model parameters
TXT2NLU_MODEL_ID = "huggingface-text2text-flan-t5-xl"
TXT2NLU_INFERENCE_INSTANCE_TYPE = "ml.g4dn.4xlarge" 
TXT2NLU_MODEL_TASK_TYPE = "text2text"
TXT2NLU_MODEL_INFO = get_sagemaker_uris(model_id=TXT2NLU_MODEL_ID,
                                        model_task_type=TXT2NLU_MODEL_TASK_TYPE,
                                        instance_type=TXT2NLU_INFERENCE_INSTANCE_TYPE,
                                        region_name=region_name)

The function get_sagemaker_uris retrieves all the model information from JumpStart. See script/sagemaker_uri.py.

Then, we instantiate the stacks:

app = cdk.App()

network_stack = GenerativeAiVpcNetworkStack(app, "GenerativeAiVpcNetworkStack", env=env)
GenerativeAiDemoWebStack(app, "GenerativeAiDemoWebStack", vpc=network_stack.vpc, env=env)

GenerativeAiTxt2nluSagemakerStack(app, "GenerativeAiTxt2nluSagemakerStack", env=env, model_info=TXT2NLU_MODEL_INFO)
GenerativeAiTxt2imgSagemakerStack(app, "GenerativeAiTxt2imgSagemakerStack", env=env, model_info=TXT2IMG_MODEL_INFO)

app.synth()

The first stack to launch is the VPC stack, GenerativeAiVpcNetworkStack. The web application stack, GenerativeAiDemoWebStack, is dependent on the VPC stack. The dependency is done through parameter passing vpc=network_stack.vpc.

See app.py for the full code.

VPC network stack

In the GenerativeAiVpcNetworkStack stack, we create a VPC with a public subnet and a private subnet spanning across two Availability Zones:

        self.output_vpc = ec2.Vpc(self, "VPC",
            nat_gateways=1,
            ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/16"),
            max_azs=2,
            subnet_configuration=[
                ec2.SubnetConfiguration(name="public",subnet_type=ec2.SubnetType.PUBLIC,cidr_mask=24),
                ec2.SubnetConfiguration(name="private",subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,cidr_mask=24)
            ]
        )

See /stack/generative_ai_vpc_network_stack.py for the full code.

Demo web application stack

In the GenerativeAiDemoWebStack stack, we launch Lambda functions and respective API Gateway endpoints through which the web application interacts with the SageMaker model endpoints. See the following code snippet:

        # Defines an AWS Lambda function for Image Generation service
        lambda_txt2img = _lambda.Function(
            self, "lambda_txt2img",
            runtime=_lambda.Runtime.PYTHON_3_9,
            code=_lambda.Code.from_asset("code/lambda_txt2img"),
            handler="txt2img.lambda_handler",
            role=role,
            timeout=Duration.seconds(180),
            memory_size=512,
            vpc_subnets=ec2.SubnetSelection(
                subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS
            ),
            vpc=vpc
        )
        
        # Defines an Amazon API Gateway endpoint for Image Generation service
        txt2img_apigw_endpoint = apigw.LambdaRestApi(
            self, "txt2img_apigw_endpoint",
            handler=lambda_txt2img
        )

The web application is containerized and hosted on Amazon ECS with Fargate. See the following code snippet:

        # Create Fargate service
        fargate_service = ecs_patterns.ApplicationLoadBalancedFargateService(
            self, "WebApplication",
            cluster=cluster,            # Required
            cpu=2048,                   # Default is 256 (512 is 0.5 vCPU, 2048 is 2 vCPU)
            desired_count=1,            # Default is 1
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=image, 
                container_port=8501,
                ),
            #load_balancer_name="gen-ai-demo",
            memory_limit_mib=4096,      # Default is 512
            public_load_balancer=True)  # Default is True

See /stack/generative_ai_demo_web_stack.py for the full code.

Image generation SageMaker model endpoint stack

The GenerativeAiTxt2imgSagemakerStack stack creates the image generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will be used by the web application. See the following code:

        endpoint = SageMakerEndpointConstruct(self, "TXT2IMG",
                                    project_prefix = "GenerativeAiDemo",
                                    
                                    role_arn= role.role_arn,

                                    model_name = "StableDiffusionText2Img",
                                    model_bucket_name = model_info["model_bucket_name"],
                                    model_bucket_key = model_info["model_bucket_key"],
                                    model_docker_image = model_info["model_docker_image"],

                                    variant_name = "AllTraffic",
                                    variant_weight = 1,
                                    instance_count = 1,
                                    instance_type = model_info["instance_type"],

                                    environment = {
                                        "MMS_MAX_RESPONSE_SIZE": "20000000",
                                        "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
                                        "SAGEMAKER_PROGRAM": "inference.py",
                                        "SAGEMAKER_REGION": model_info["region_name"],
                                        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
                                    },

                                    deploy_enable = True
        )
        
        ssm.StringParameter(self, "txt2img_sm_endpoint", parameter_name="txt2img_sm_endpoint", string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2img_sagemaker_stack.py for the full code.

NLU and text generation SageMaker model endpoint stack

The GenerativeAiTxt2nluSagemakerStack stack creates the NLU and text generation model endpoint from JumpStart and stores the endpoint name in Systems Manager Parameter Store. This parameter will also be used by the web application. See the following code:

        endpoint = SageMakerEndpointConstruct(self, "TXT2NLU",
                                    project_prefix = "GenerativeAiDemo",
                                    
                                    role_arn= role.role_arn,

                                    model_name = "HuggingfaceText2TextFlan",
                                    model_bucket_name = model_info["model_bucket_name"],
                                    model_bucket_key = model_info["model_bucket_key"],
                                    model_docker_image = model_info["model_docker_image"],

                                    variant_name = "AllTraffic",
                                    variant_weight = 1,
                                    instance_count = 1,
                                    instance_type = model_info["instance_type"],

                                    environment = {
                                        "MODEL_CACHE_ROOT": "/opt/ml/model",
                                        "SAGEMAKER_ENV": "1",
                                        "SAGEMAKER_MODEL_SERVER_TIMEOUT": "3600",
                                        "SAGEMAKER_MODEL_SERVER_WORKERS": "1",
                                        "SAGEMAKER_PROGRAM": "inference.py",
                                        "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code/",
                                        "TS_DEFAULT_WORKERS_PER_MODEL": "1"
                                    },

                                    deploy_enable = True
        )
        
        ssm.StringParameter(self, "txt2nlu_sm_endpoint", parameter_name="txt2nlu_sm_endpoint", string_value=endpoint.endpoint_name)

See /stack/generative_ai_txt2nlu_sagemaker_stack.py for the full code.

Web application

The web application is located in the /web-app directory. It is a Streamlit application that is containerized as per the Dockerfile:

FROM python:3.9
EXPOSE 8501
WORKDIR /app
COPY requirements.txt ./requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD streamlit run Home.py 
    --server.headless true 
    --browser.serverAddress="0.0.0.0" 
    --server.enableCORS false 
    --browser.gatherUsageStats false

To learn more about Streamlit, see Streamlit documentation.

Read More

Resolving code review comments with ML

Resolving code review comments with ML

Code-change reviews are a critical part of the software development process at scale, taking a significant amount of the code authors’ and the code reviewers’ time. As part of this process, the reviewer inspects the proposed code and asks the author for code changes through comments written in natural language. At Google, we see millions of reviewer comments per year, and authors require an average of ~60 minutes active shepherding time between sending changes for review and finally submitting the change. In our measurements, the required active work time that the code author must do to address reviewer comments grows almost linearly with the number of comments. However, with machine learning (ML), we have an opportunity to automate and streamline the code review process, e.g., by proposing code changes based on a comment’s text.

Today, we describe applying recent advances of large sequence models in a real-world setting to automatically resolve code review comments in the day-to-day development workflow at Google (publication forthcoming). As of today, code-change authors at Google address a substantial amount of reviewer comments by applying an ML-suggested edit. We expect that to reduce time spent on code reviews by hundreds of thousands of hours annually at Google scale. Unsolicited, very positive feedback highlights that the impact of ML-suggested code edits increases Googlers’ productivity and allows them to focus on more creative and complex tasks.

Predicting the code edit

We started by training a model that predicts code edits needed to address reviewer comments. The model is pre-trained on various coding tasks and related developer activities (e.g., renaming a variable, repairing a broken build, editing a file). It’s then fine-tuned for this specific task with reviewed code changes, the reviewer comments, and the edits the author performed to address those comments.

An example of an ML-suggested edit of refactorings that are spread within the code.

Google uses a monorepo, a single repository for all of its software artifacts, which allows our training dataset to include all unrestricted code used to build Google’s most recent software, as well as previous versions.

To improve the model quality, we iterated on the training dataset. For example, we compared the model performance for datasets with a single reviewer comment per file to datasets with multiple comments per file, and experimented with classifiers to clean up the training data based on a small, curated dataset to choose the model with the best offline precision and recall metrics.

Serving infrastructure and user experience

We designed and implemented the feature on top of the trained model, focusing on the overall user experience and developer efficiency. As part of this, we explored different user experience (UX) alternatives through a series of user studies. We then refined the feature based on insights from an internal beta (i.e., a test of the feature in development) including user feedback (e.g., a “Was this helpful?” button next to the suggested edit).

The final model was calibrated for a target precision of 50%. That is, we tuned the model and the suggestions filtering, so that 50% of suggested edits on our evaluation dataset are correct. In general, increasing the target precision reduces the number of shown suggested edits, and decreasing the target precision leads to more incorrect suggested edits. Incorrect suggested edits take the developers time and reduce the developers’ trust in the feature. We found that a target precision of 50% provides a good balance.

At a high level, for every new reviewer comment, we generate the model input in the same format that is used for training, query the model, and generate the suggested code edit. If the model is confident in the prediction and a few additional heuristics are satisfied, we send the suggested edit to downstream systems. The downstream systems, i.e., the code review frontend and the integrated development environment (IDE), expose the suggested edits to the user and log user interactions, such as preview and apply events. A dedicated pipeline collects these logs and generates aggregate insights, e.g., the overall acceptance rates as reported in this blog post.

Architecture of the ML-suggested edits infrastructure. We process code and infrastructure from multiple services, get the model predictions and surface the predictions in the code review tool and IDE.

The developer interacts with the ML-suggested edits in the code review tool and the IDE. Based on insights from the user studies, the integration into the code review tool is most suitable for a streamlined review experience. The IDE integration provides additional functionality and supports 3-way merging of the ML-suggested edits (left in the figure below) in case of conflicting local changes on top of the reviewed code state (right) into the merge result (center).

3-way-merge UX in IDE.

Results

Offline evaluations indicate that the model addresses 52% of comments with a target precision of 50%. The online metrics of the beta and the full internal launch confirm these offline metrics, i.e., we see model suggestions above our target model confidence for around 50% of all relevant reviewer comments. 40% to 50% of all previewed suggested edits are applied by code authors.

We used the “not helpful” feedback during the beta to identify recurring failure patterns of the model. We implemented serving-time heuristics to filter these and, thus, reduce the number of shown incorrect predictions. With these changes, we traded quantity for quality and observed an increased real-world acceptance rate.

Code review tool UX. The suggestion is shown as part of the comment and can be previewed, applied and rated as helpful or not helpful.

Our beta launch showed a discoverability challenge: code authors only previewed ~20% of all generated suggested edits. We modified the UX and introduced a prominent “Show ML-edit” button (see the figure above) next to the reviewer comment, leading to an overall preview rate of ~40% at launch. We additionally found that suggested edits in the code review tool are often not applicable due to conflicting changes that the author did during the review process. We addressed this with a button in the code review tool that opens the IDE in a merge view for the suggested edit. We now observe that more than 70% of these are applied in the code review tool and fewer than 30% are applied in the IDE. All these changes allowed us to increase the overall fraction of reviewer comments that are addressed with an ML-suggested edit by a factor of 2 from beta to the full internal launch. At Google scale, these results help automate the resolution of hundreds of thousands of comments each year.

Suggestions filtering funnel.

We see ML-suggested edits addressing a wide range of reviewer comments in production. This includes simple localized refactorings and refactorings that are spread within the code, as shown in the examples throughout the blog post above. The feature addresses longer and less formally-worded comments that require code generation, refactorings and imports.

Example of a suggestion for a longer and less formally worded comment that requires code generation, refactorings and imports.

The model can also respond to complex comments and produce extensive code edits (shown below). The generated test case follows the existing unit test pattern, while changing the details as described in the comment. Additionally, the edit suggests a comprehensive name for the test reflecting the test semantics.

Example of the model’s ability to respond to complex comments and produce extensive code edits.

Conclusion and future work

In this post, we introduced an ML-assistance feature to reduce the time spent on code review related changes. At the moment, a substantial amount of all actionable code review comments on supported languages are addressed with applied ML-suggested edits at Google. A 12-week A/B experiment across all Google developers will further measure the impact of the feature on the overall developer productivity.

We are working on improvements throughout the whole stack. This includes increasing the quality and recall of the model and building a more streamlined experience for the developer with improved discoverability throughout the review process. As part of this, we are investigating the option of showing suggested edits to the reviewer while they draft comments and expanding the feature into the IDE to enable code-change authors to get suggested code edits for natural-language commands.

Acknowledgements

This is the work of many people in Google Core Systems & Experiences team, Google Research, and DeepMind. We’d like to specifically thank Peter Choy for bringing the collaboration together, and all of our team members for their key contributions and useful advice, including Marcus Revaj, Gabriela Surita, Maxim Tabachnyk, Jacob Austin, Nimesh Ghelani, Dan Zheng, Peter Josling, Mariana Stariolo, Chris Gorgolewski, Sascha Varkevisser, Katja Grünwedel, Alberto Elizondo, Tobias Welp, Paige Bailey, Pierre-Antoine Manzagol, Pascal Lamblin, Chenjie Gu, Petros Maniatis, Henryk Michalewski, Sara Wiltberger, Ambar Murillo, Satish Chandra, Madhura Dudhgaonkar, Niranjan Tulpule, Zoubin Ghahramani, Juanjo Carin, Danny Tarlow, Kevin Villela, Stoyan Nikolov, David Tattersall, Boris Bokowski, Kathy Nix, Mehdi Ghissassi, Luis C. Cobo, Yujia Li, David Choi, Kristóf Molnár, Vahid Meimand, Amit Patel, Brett Wiltshire, Laurent Le Brun, Mingpan Guo, Hermann Loose, Jonas Mattes, Savinee Dancs.

Read More

NVIDIA and Microsoft Drive Innovation for Windows PCs in New Era of Generative AI

NVIDIA and Microsoft Drive Innovation for Windows PCs in New Era of Generative AI

Generative AI — in the form of large language model (LLM) applications like ChatGPT, image generators such as Stable Diffusion and Adobe Firefly, and game rendering techniques like NVIDIA DLSS 3 Frame Generation — is rapidly ushering in a new era of computing for productivity, content creation, gaming and more.

At the Microsoft Build developer conference, NVIDIA and Microsoft today showcased a suite of advancements in Windows 11 PCs and workstations with NVIDIA RTX GPUs to meet the demands of generative AI.

More than 400 Windows apps and games already employ AI technology, accelerated by dedicated processors on RTX GPUs called Tensor Cores. Today’s announcements, which include tools to develop AI on Windows PCs, frameworks to optimize and deploy AI, and driver performance and efficiency improvements, will empower developers to build the next generation of Windows apps with generative AI at their core.

“AI will be the single largest driver of innovation for Windows customers in the coming years,” said Pavan Davuluri, corporate vice president of Windows silicon and system integration at Microsoft. “By working in concert with NVIDIA on hardware and software optimizations, we’re equipping developers with a transformative, high-performance, easy-to-deploy experience.”

Develop Models With Windows Subsystem for Linux

AI development has traditionally taken place on Linux, requiring developers to either dual-boot their systems or use multiple PCs to work in their AI development OS while still accessing the breadth and depth of the Windows ecosystem.

Over the past few years, Microsoft has been building a powerful capability to run Linux directly within the Windows OS, called Windows Subsystem for Linux (WSL). NVIDIA has been working closely with Microsoft to deliver GPU acceleration and support for the entire NVIDIA AI software stack inside WSL. Now developers can use Windows PC for all their local AI development needs with support for GPU-accelerated deep learning frameworks on WSL.

With NVIDIA RTX GPUs delivering up to 48GB of RAM in desktop workstations, developers can now work with models on Windows that were previously only available on servers. The large memory also improves the performance and quality for local fine-tuning of AI models, enabling designers to customize them to their own style or content. And because the same NVIDIA AI software stack runs on NVIDIA data center GPUs, it’s easy for developers to push their models to Microsoft Azure Cloud for large training runs.

Rapidly Optimize and Deploy Models

With trained models in hand, developers need to optimize and deploy AI for target devices.

Microsoft released the Microsoft Olive toolchain for optimization and conversion of PyTorch models to ONNX, enabling developers to automatically tap into GPU hardware acceleration such as RTX Tensor Cores. Developers can optimize models via Olive and ONNX, and deploy Tensor Core-accelerated models to PC or cloud. Microsoft continues to invest in making PyTorch and related tools and frameworks work seamlessly with WSL to provide the best AI model development experience.

Improved AI Performance, Power Efficiency

Once deployed, generative AI models demand incredible inference performance. RTX Tensor Cores deliver up to 1,400 Tensor TFLOPS for AI inferencing. Over the last year, NVIDIA has worked to improve DirectML performance to take full advantage of RTX hardware.

On May 24, we’ll release our latest optimizations in Release 532.03 drivers that combine with Olive-optimized models to deliver big boosts in AI performance. Using an Olive-optimized version of the Stable Diffusion text-to-image generator with the popular Automatic1111 distribution, performance is improved over 2x with the new driver.

Chart showing performance improvements in Stable Diffusion with updated NVIDIA drivers.
Stable Diffusion performance tested on GeForce RTX 4090 using Automatic1111 and Text-to-Image function.

With AI coming to nearly every Windows application, efficiently delivering inference performance is critical — especially for laptops. Coming soon, NVIDIA will introduce new Max-Q low-power inferencing for AI-only workloads on RTX GPUs. It optimizes Tensor Core performance while keeping power consumption of the GPU as low as possible, extending battery life and maintaining a cool, quiet system.  The GPU can then dynamically scale up for maximum AI performance when the workload demands it.

Join the PC AI Revolution Now

Top software developers — like Adobe, DxO, ON1 and Topaz — have already incorporated NVIDIA AI technology with more than 400 Windows applications and games optimized for RTX Tensor Cores.

“AI, machine learning and deep learning power all Adobe applications and drive the future of creativity. Working with NVIDIA we continuously optimize AI model performance to deliver the best possible experience for our Windows users on RTX GPUs.” — Ely Greenfield, CTO of digital media at Adobe

“NVIDIA is helping to optimize our WinML model performance on RTX GPUs, which is accelerating the AI in DxO DeepPRIME, as well as providing better denoising and demosaicing, faster.” — Renaud Capolunghi, senior vice president of engineering at DxO

“Working with NVIDIA and Microsoft to accelerate our AI models running in Windows on RTX GPUs is providing a huge benefit to our audience. We’re already seeing 1.5x performance gains in our suite of AI-powered photography editing software.” — Dan Harlacher, vice president of products at ON1

“Our extensive work with NVIDIA has led to improvements across our suite of photo- and video-editing applications. With RTX GPUs, AI performance has improved drastically, enhancing the experience for users on Windows PCs.” — Suraj Raghuraman, head of AI engine development at Topaz Labs

NVIDIA and Microsoft are making several resources available for developers to test drive top generative AI models on Windows PCs. An Olive-optimized version of the Dolly 2.0 large language model is available on Hugging Face. And a PC-optimized version of NVIDIA NeMo large language model for conversational AI is coming soon to Hugging Face.

Developers can also learn how to optimize their applications end-to-end to take full advantage of GPU-acceleration via the NVIDIA AI for accelerating applications developer site.

The complementary technologies behind Microsoft’s Windows platform and NVIDIA’s dynamic AI hardware and software stack will help developers quickly and easily develop and deploy generative AI on Windows 11.

Microsoft Build runs through Thursday, May 25. Tune into to learn more on shaping the future of work with AI.

Read More

No Programmers? No Problem: READY Robotics Simplifies Robot Coding, Rollouts

No Programmers? No Problem: READY Robotics Simplifies Robot Coding, Rollouts

Robotics hardware traditionally requires programmers to deploy it. READY Robotics wants to change that with its “no code” software aimed at people working in manufacturing who haven’t got programming skills.

The Columbus, Ohio, startup is a spinout of robotics research from Johns Hopkins University. Kel Guerin was a PhD candidate there leading this research when he partnered with Benjamin Gibbs, who was at Johns Hopkins Technology Ventures, to land funding and pursue the company, now led by Gibbs as CEO.

“There was this a-ha moment where we figured out that we could take these types of visual languages that are very easy to understand and use them for robotics,” said Guerin, who’s now chief innovation officer at the startup.

READY’s  “no code” ForgeOS operating system is designed to enable anyone to program any type of robot hardware or automation device. ForgeOS works seamlessly with plug-ins for most major robot hardware, and similar to other operating systems, like Android, it allows running third-party apps and plugins, providing a robust ecosystem of partners and developers working to make robots more capable, says Guerin.

Implementing apps in robotics allows for new capabilities to be added to a robotic system in a few clicks, improving user experience and usability. Users can install their own apps, such as Task Canvas, which provides an intuitive building block programming interface similar to Scratch, a simple block-based visual language for kids developed at MIT Media Lab, which was influential in its design.

Task Canvas allows users to show the actions of the robot, as well as all the other devices in an automation cell (such as grippers, programmable logic controllers, and machine tools) as blocks in a flow chart. The user can easily create powerful logic by tying these blocks together — without writing a single line of code. The interface offers nonprogrammers a more “drag-and-drop” experience for programming and deploying robots, whether working directly on the factory floor with real robots on a tablet device or with access to simulation from Isaac Sim, powered by NVIDIA Omniverse.

 

Robot System Design in Simulation for Real-World Deployments 

READY is making robotics system design easier for nonprogrammers, helping to validate robots and systems for accelerated deployments.

The company is developing Omniverse Extensions — Omniverse kit applications based on Isaac Sim — and can deploy them on the cloud. It uses Omniverse Nucleus — the platform’s database and collaboration engine — in the cloud as well.

Isaac Sim is an application framework that enables simulation training for testing out robots in virtual manufacturing lines before deployment into the real world.

“Bigger companies are moving to a sim-first approach to automation because these systems cost a lot of money to install. They want to simulate them first to make sure it’s worth the investment,” said Guerin.

The startup charges users of its platform licensing per software seat and also offers support services to help roll out and develop systems.

It’s a huge opportunity. Roughly 90 percent of the world’s factories haven’t yet embraced automation, which is a trillion-dollar market.

READY is a member of NVIDIA Inception, a free program that provides startups with technical training, go-to-market support and AI platform guidance.

From Industrial Automation Giants to Stanley Black & Decker

The startup operates in an ecosystem of world-leading industrial automation providers, and these global partners are actively developing integrations with platforms like NVIDIA Omniverse and are investing in READY, said Guerin.

“Right now we are starting to work with large enterprise customers who want to automate but they can’t find the expertise to do it,” he said.

Stanley Black & Decker, a global supplier of tools, is relying on READY to automate machines, including CNC lathes and mills.

Robotic automation had been hard to deploy in their factory until Stanley Black & Decker started using READY’s ForgeOS with its Station setup, which makes it possible to deploy robots in a day.

Creating Drag-and-Drop Robotic Systems in Simulation 

READY is putting simulation capabilities into the hands of nonprogrammers, who can learn its Task Canvas interface for drag-and-drop programming of industrial robots in about an hour, according to the company.

The company also runs READY Academy, which offers a catalog of free training for manufacturing professionals to learn the skills to design, deploy, manage and troubleshoot robotic automation systems.

“For potential customers interested in our technology, being able to try it out with a robot simulated in Omniverse before they get their hands on the real thing — that’s something we’re really excited about,” said Guerin.

Learn more about NVIDIA Isaac Sim, Jetson Orin, Omniverse Enterprise.

 

Read More

GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

GPT-4 + Stable-Diffusion = ?: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models

TL;DR: Text Prompt -> LLM -> Intermediate Representation (such as an image layout) -> Stable Diffusion -> Image.

Recent advancements in text-to-image generation with diffusion models have yielded remarkable results synthesizing highly realistic and diverse images. However, despite their impressive capabilities, diffusion models, such as Stable Diffusion, often struggle to accurately follow the prompts when spatial or common sense reasoning is required.

The following figure lists four scenarios in which Stable Diffusion falls short in generating images that accurately correspond to the given prompts, namely negation, numeracy, and attribute assignment, spatial relationships. In contrast, our method, LLM-grounded Diffusion (LMD), delivers much better prompt understanding in text-to-image generation in those scenarios.

VisualizationsFigure 1: LLM-grounded Diffusion enhances the prompt understanding ability of text-to-image diffusion models.