April 2024 – Vedere AI

Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center

As of April 30, 2024 Amazon Q Business is generally available. Amazon Q Business is a conversational assistant powered by generative artificial intelligence (AI) that enhances workforce productivity by answering questions and completing tasks based on information in your enterprise systems. Your employees can access enterprise content securely and privately using web applications built with Amazon Q Business. The success of these applications depends on two key factors: first, that an end-user of the application is only able to see responses generated from documents they have been granted access to, and second, that each user’s conversation history is private, secure, and accessible only to the user.

Amazon Q Business operationalizes this by validating the identity of the user every time they access the application so that the application can use the end-user’s identity to restrict tasks and answers to documents that the user has access to. This outcome is achieved with a combination of AWS IAM Identity Center and Amazon Q Business. IAM Identity Center stores the user identity, is the authoritative source of identity information for Amazon Q Business applications, and validates the user’s identity when they access an Amazon Q Business application. You can configure IAM Identity Center to use your enterprise identity provider (IdP)—such as Okta or Microsoft Entra ID—as the identity source. Amazon Q Business makes sure that access control lists (ACLs) for enterprise documents being indexed are matched to the user identities provided by IAM Identity Center, and that these ACLs are honored every time the application calls Amazon Q Business APIs to respond to user queries.

In this post, we show how IAM Identity Center acts as a gateway to steer user identities created by your enterprise IdP as the identity source, for Amazon Q Business, and how Amazon Q Business uses these identities to respond securely and confidentially to user queries. We use an example of a generative AI employee assistant built with Amazon Q Business, demonstrate how to set it up to only respond using enterprise content that each employee has permissions to access, and show how employees are able to converse securely and privately with this assistant.

Solution overview

The following diagram shows a high-level architecture of how the enterprise IdP, IAM Identity Center instance, and Amazon Q Business application interact with each other to enable an authenticated user to securely and privately interact with an Amazon Q Business application using an Amazon Q Business web experience from their web browser.

When using an external IdP such as Okta, users and groups are first provisioned in the IdP and then automatically synchronized with the IAM Identity Center instance using the SCIM protocol. When a user starts the Amazon Q Business web experience, they are authenticated with their IdP using single sign-on, and the tokens obtained from the IdP are used by Amazon Q Business to validate the user with IAM Identity Center. After validation, a chat session is started with the user.

The sample use case in this post uses an IAM Identity Center account instance with its identity source configured as Okta, which is used as the IdP. Then we ingest content from Atlassian Confluence. The Amazon Q Business built-in connector for Confluence ingests the local users and groups configured in Confluence, as well as ACLs for the spaces and documents, to the Amazon Q Business application index. These users from the data source are matched with the users configured in the IAM Identity Center instance, and aliases are created in Amazon Q Business User Store for correct ACL enforcement.

Prerequisites

To implement this solution for the sample use case of this post, you need an IAM Identity Center instance and Okta identity provider as identity source. We provide more information about these resources in this section.

IAM Identity Center instance

An Amazon Q Business application requires an IAM Identity Center instance to be associated with it. There are two types of IAM Identity Center instances: an organization instance and an account instance. Amazon Q Business applications can work with either type of instance. These instances store the user identities that are created by an IdP, as well as the groups to which the users belong.

For production use cases, an IAM Identity Center organization instance is recommended. The advantage of an organization instance is that it can be used by an Amazon Q Business application in any AWS account in AWS Organizations, and you only pay once for a user in your company, if you have multiple Amazon Q Business applications spread across several AWS accounts and you use organization instance. Many AWS enterprise customers use Organizations, and have IAM Identity Center organization instances associated with them.

For proof of concept and departmental use cases, or in situations when an AWS account is not part of an AWS Organization and you don’t want to create a new AWS organization, you can use an IAM Identity Center account instance to enable an Amazon Q Business application. In this case, only the Amazon Q Business application configured in the AWS account in which the account instance is created will be able to use that instance.

Amazon Q Business implements a per-user subscription fee. A user is billed only one time if they are uniquely identifiable across different accounts and different Amazon Q Business applications. For example, if multiple Amazon Q Business applications are within a single AWS account, a user that is uniquely identified by an IAM Identity Center instance tied to this account will only be billed one time for using these applications. If your organization has two accounts, and you have an organization-level IAM Identity Center instance, a user who is uniquely identified in the organization-level instance will be billed only one time even though they access applications in both accounts. However, if you have two account-level IAM Identity Center instances, a user in one account can’t be identified as the same user in another account because there is no central identity. This means that the same user will be billed twice. We therefore recommend using organization-level IAM Identity Center instances for production use cases to optimize costs.

In both these cases, the Amazon Q Business application needs to be in the same AWS Region as the IAM Identity Center instance.

Identity source

If you already use an IdP such as Okta or Entra ID, you can continue to use your preferred IdP with Amazon Q Business applications. In this case, the IAM Identity Center instance is configured to use the IdP as its identity source. The users and user groups from the IdP can be automatically synced to the IAM Identity Center instance using SCIM. Many AWS enterprise customers already have this configured for their IAM Identity Center organization instance. For more information about all the supported IdPs, see Getting started tutorials. The process is similar for IAM Identity Center organization instances and account instances.

AWS IAM Identity Center instance configured with Okta as the identity source

The following screenshot shows the IAM Identity Center application configured in Okta, and the users and groups from the Okta configuration assigned to this application.

The following screenshot shows the IAM Identity Center instance user store after configuring Okta as the identity source. Here the user and group information is automatically provisioned (synchronized) from Okta into IAM Identity Center using the System for Cross-domain Identity Management (SCIM) v2.0 protocol.

Configure an Amazon Q Business application with IAM Identity Center enabled

Complete the following steps to create an Amazon Q Business application and enable IAM Identity Center:

On the Amazon Q Business console, choose Create application.
For Application name, enter a name.
Unless you need to change the AWS Identity and Access Management (IAM) role for the application or customize encryption settings, keep the default settings.
Choose Create.
On the Select retriever page, unless you want to configure a preexisting Amazon Kendra index as a retriever, or you need to configure storage units for more than 20,000 documents, you can continue with the default settings.
Choose Next.

For more information about Amazon Q Business retrievers, refer to Creating and selecting a retriever for an Amazon Q Business application.

On the Connect data sources page, for Data sources, choose Confluence.

The following instructions demonstrate how to configure the Confluence data source. These may differ for other data sources.

For Data source name, enter a name.
For Source¸ select Confluence Cloud.
For Confluence URL, enter the Confluence URL.
For Authentication, select Basic authentication.
For AWS Secrets Manager secret, choose an AWS Secrets Manager secret.
For Virtual Private Cloud, choose No VPC.
For IAM role, choose Create a new service role.
For Role name¸ either go with the provided name or edit it for your new role.
For Sync scope, select the contents to sync.
For Sync mode, select Full sync.
For Frequency, choose Run on demand.
For Field mappings, leave the defaults.
Choose Add data source.
Choose Next.
On the Add groups and users page, choose Add groups and users.
In the pop-up window, choose Get started.
Search for users based on their display name or groups, then choose the user or group you want to add to the application.
Add more users as needed.
Choose Assign.
You will see the following screen:
Choose subscription for each user by clicking on the Choose subscription pull down and then selecting the check mark.
After choosing subscription for all the users, your screen will look as below. Unless you want to change the service role, choose Create application.

After the application is created, you will see the application settings page, as shown in the following screenshot.

Employee AI assistant use case

To illustrate how you can build a secure and private generative AI assistant for your employees using Amazon Q Business applications, let’s take a sample use case of an employee AI assistant in an enterprise corporation. Two new employees, Mateo Jackson and Mary Major, have joined the company on two different projects, and have finished their employee orientation. They have been given corporate laptops, and their accounts are provisioned in the corporate IdP. They have been told to get help from the employee AI assistant for any questions related to their new team member activities and their benefits.

The company uses Confluence to manage their enterprise content. The sample Amazon Q application used to run the scenarios for this post is configured with a data source using the built-in connector for Confluence to index the enterprise Confluence spaces used by employees. The example uses three Confluence spaces: AnyOrgApp Project, ACME Project Space, and AJ-DEMO-HR-SPACE. The access permissions for these spaces are as follows:

AJ-DEMO-HR-SPACE – All employees, including Mateo and Mary
AnyOrgApp Project – Employees assigned to the project including Mateo
ACME Project Space – Employees assigned to the project including Mary

Let’s look at how Mateo and Mary experience their employee AI assistant.

Both are provided with the URL of the employee AI assistant web experience. They use the URL and sign in to the IdP from the browsers of their laptops. Mateo and Mary both want to know about their new team member activities and their fellow team members. They ask the same questions to the employee AI assistant but get different responses, because each has access to separate projects. In the following screenshots, the browser window on the left is for Mateo Jackson and the one on the right is for Mary Major. Mateo gets information about the AnyOrgApp project and Mary gets information about the ACME project.

Mateo chooses Sources under the question about team members to take a closer look at the team member information, and Mary choosing Sources under the question for new team member onboarding activities. The following screenshots show their updated views.

Mateo and Mary want to find out more about the benefits their new job offers and how the benefits are applicable to their personal and family situations.

The following screenshot shows that Mary asks the employee AI assistant questions about her benefits and eligibility.

Mary can also refer to the source documents.

The following screenshot shows that Mateo asks the employee AI assistant different questions about his eligibility.

Mateo looks at the following source documents.

Both Mary and Mateo first want to know their eligibility for benefits. But after that, they have different questions to ask. Even though the benefits-related documents are accessible by both Mary and Mateo, their conversations with employee AI assistant are private and personal. The assurance that their conversation history is private and can’t be seen by any other user is critical for the success of a generative AI employee productivity assistant.

Clean up

If you created a new Amazon Q Business application to try out the integration with IAM Identity Center, and don’t plan to use it further, unsubscribe and remove assigned users from the application and delete it so that your AWS account does not accumulate costs.

To unsubscribe and remove users go to the application details page and select Manage access and subscriptions.

Select all the users, and then use the Edit button to choose Unsubscribe and remove as shown below.

Delete the application after removing the users, going back to the application details page and selecting Delete.

Conclusion

For enterprise generative AI assistants such as the one shown in this post to be successful, they must respect access control as well as assure the privacy and confidentiality of every employee. Amazon Q Business and IAM Identity Center provide a solution that authenticates each user and validates the user identity at each step to enforce access control along with privacy and confidentiality.

To achieve this, IAM Identity Center acts as a gateway to sync user and group identities from an IdP (such as Okta), and Amazon Q Business uses IAM Identity Center-provided identities to uniquely identify a user of an Amazon Q Business application (in this case, an employee AI assistant). Document ACLs and local users set up in the data source (such as Confluence) are matched up with the user and group identities provided by IAM Identity Center. At query time, Amazon Q Business answers questions from users utilizing only those documents that they are provided access to by the document ACLs.

If you want to know more, take a look at the Amazon Q Business launch blog post on AWS News Blog, and refer to Amazon Q Business User Guide. For more information on IAM Identity Center, refer to the AWS IAM Identity Center User Guide.

About the Authors

Abhinav Jawadekar is a Principal Solutions Architect in the Amazon Q Business service team at AWS. Abhinav works with AWS customers and partners to help them build generative AI solutions on AWS.

Venky Nagapudi is a Senior Manager of Product Management for Q Business, Amazon Comprehend and Amazon Translate. His focus areas on Q Business include user identity management, and using offline intelligence from documents to improve Q Business accuracy and helpfulness.

Enhance customer service efficiency with AI-powered summarization using Amazon Transcribe Call Analytics

In the fast-paced world of customer service, efficiency and accuracy are paramount. After each call, contact center agents often spend up to a third of the total call time summarizing the customer conversation. Additionally, manual summarization can lead to inconsistencies in the style and level of detail due to varying interpretations of note-taking guidelines. This post-contact work can not only add to customer wait times, but also can put pressure on some agents to avoid taking notes altogether. Supervisors also spend a considerable amount of time listening to call recordings or reading transcripts to understand the gist of a customer conversation when investigating customer issues or evaluating an agent’s performance. This can make it challenging to scale quality management within the contact center.

To address these issues, we launched a generative artificial intelligence (AI) call summarization feature in Amazon Transcribe Call Analytics. Transcribe Call Analytics is a generative AI-powered API for generating highly accurate call transcripts and extracting conversation insights to improve customer experience, agent productivity, and supervisor productivity. Powered by Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) through a single API, generative call summarization in Transcribe Call Analytics produces call summaries that reduce the time agents spend capturing and summarizing notes after each conversation. This reduces customer wait times and improves agent productivity. Generative call summarization also provides supervisors with quick insight into a conversation without the need to listen to the entire call recording or read the entire transcript.

As Praphul Kumar, Chief Product Officer at SuccessKPI, noted,

“Generative call summarization in the Amazon Transcribe Call Analytics API has enabled us to add generative AI capabilities to our platform faster. With this feature, we are able to improve productivity in our customer’s contact center by automatically summarizing calls and removing the need for agents to write after call notes. We are looking forward to bringing this valuable capability into the hands of many more large enterprises.”

We previously published Use generative AI to increase agent productivity through automated call summarization. This new generative call summarization feature automatically integrates with multiple services and handles necessary configurations, making it simple and seamless to start using and realizing the benefits. You don’t need to manually integrate with services or perform additional configurations. Simply turn the feature on from the Amazon Transcribe console or using the start_call_analytics_job API. You can also use generative call summarization through Amazon Transcribe Post Call Analytics Solution for post-call summaries.

In this post, we show you how to use the new generative call summarization feature.

Solution overview

The following diagram illustrates the solution architecture.

You can upload a call recording in Amazon S3 and start a Transcribe Call Analytics job. The summary is generated and uploaded back to S3 along with the transcript and analytics as a single JSON.

We show you how to use the generative call summarization feature with a call sample inquiring about a used car through the following high-level steps:

Create a new Post Call Analytics job and turn on the generative call summarization feature.
Review the generative call summarization results.

Prerequisites

To get started, upload your recorded file or the sample file provided to an Amazon Simple Storage Service (Amazon S3) bucket.

Create a new Post call analytics job

Complete the following steps to create a new Post call analytics job:

On the Amazon Transcribe console, choose Post-call Analytics in the navigation pane under Amazon Transcribe Call Analytics.
Choose Create job.
For Name, enter summarysample.
In the Language settings and Model type sections, leave the default settings.
For Input file location on S3, browse to the S3 bucket containing the uploaded audio file and choose Choose.
In the Output data section, leave as default.
Create a new AWS Identity and Access Management (IAM) role named summarysamplerole that provides Amazon Transcribe service permissions to read the audio files from the S3 bucket.
In the Role permissions details section, leave as default and choose Next.
Toggle Generative call summarization on and choose Create job.

Review the transcription and summary

When the status of the job is Complete, you can review the transcription and summary by choosing the job name summarysample. The Text tab shows the Agent and Customer sentences clearly separated.

The Generative call summarization tab provides a concise summary of the call.

Choose Download transcript for the JSON output containing the transcript and summary.

Conclusion

The world of customer service is constantly evolving, and organizations must adapt to meet the growing demands of their clients. Amazon Transcribe Call Analytics introduces an innovative solution to streamline the post-call process and enhance productivity. With generative call summarization, contact center agents can devote more time to engage with customers, and supervisors can gain insights quickly without extensive call reviews. This feature improves efficiency and empowers enterprises to scale their quality management efforts, enabling them to deliver exceptional customer experiences.

Generative call summarization in Amazon Transcribe Call Analytics is generally available today in English in US East (N. Virginia) and US West (Oregon). We invite you to share your thoughts and questions in the comments section.

Learn more:

About the Authors

Ami Dani is a Senior Technical Program Manager at AWS focusing on AI/ML services. During her career, she has focused on delivering transformative software development projects for the federal government and large companies in industries as diverse as advertising, entertainment, and finance. Ami has experience driving business growth, implementing innovative training programs and successfully managing complex, high-impact projects. She is a strategic problem-solver and collaborative partner, consistently delivering results that exceed expectations.

Gopikrishnan Anilkumar is a Senior Technical Product Manager on the Amazon Transcribe team. He has 10 years of product management experience across a variety of domains and is passionate about AI/ML. Outside of work, Gopikrishnan loves to travel and enjoys playing cricket.

Evaluating the helpfulness of AI-enhanced catalogue data

Using causal random forests and Bayesian structural time series to extrapolate from sparse data ensures that customers get the most useful information as soon as possible.Read More

Accelerate software development and leverage your business data with generative AI assistance from Amazon Q

We believe generative artificial intelligence (AI) has the potential to transform virtually every customer experience. To make this possible, we’re rapidly innovating to provide the most comprehensive set of capabilities across the three layers of the generative AI stack. This includes the bottom layer with infrastructure to train Large Language Models (LLMs) and other Foundation Models (FMs) and produce inferences or predictions, the middle layer with tools to easily and rapidly build generative AI applications, and the top-layer where we’re investing in game-changing applications. While all of these layers are important for the advancement of generative AI, I’m excited today to share more on our investments in the top application layer.

With the assistance of generative AI, everyone from developers and business analysts to employees in specialized areas like customer service or supply chain operations, can be more productive, creative, and data-driven than ever before. But for generative AI apps and assistants to be truly useful at work, they must know an organization’s data, their customers, their operations, and their business. Many of today’s assistants can’t be easily personalized and they weren’t designed to meet the data privacy and security requirements companies need.

That’s why we invented Amazon Q, the most capable generative AI-powered assistant for accelerating software development and leveraging business data. I’m excited to share that Amazon Q Developer, Amazon Q Business, and Amazon Q in QuickSight are available today along with several new features. You can see a quick breakdown of today’s announcement and demos in this video from Dr. Matt Wood.

Amazon Q is the most capable work assistant available today, and we built it with security and privacy in mind from the start- if an employee can’t access a data source normally, they can’t access it through Q either.

We are bringing you a ton of great capabilities and experiences today, and I thought I’d call out a just a few.

Amazon Q Developer – your assistant for the entire software development lifecycle

Amazon Q Developer helps developers and IT professionals (IT pros) with all of their tasks—from coding, testing, and upgrading, to troubleshooting, performing security scanning and fixes, optimizing AWS resources, and creating data engineering pipelines. Customers are already seeing the value in Amazon Q Developer: Eviden, a digital transformation services company, is experiencing 20-40% in productivity gains; Switchboard MD, a healthcare company, has reduced their time to deploy new features for products by 25%; and Datapel Systems, a warehouse management and inventory stock solutions company, is achieving a remarkable efficiency improvement of at least 70%.

Amazon Q Developer helps developers build faster and more securely by generating code suggestions and recommendations in near real time. In fact, Amazon Q Developer has the highest reported code acceptance rates in the industry for assistants that perform multi-line code suggestions, with BT Group recently reporting they accepted 37% of Q’s code suggestions and National Australia Bank reporting a 50% acceptance rate. Previously, some of these coding assistant capabilities were provided by Amazon CodeWhisperer, and are now part of Amazon Q Developer.

Amazon Q Developer agent capabilities can autonomously perform a range of tasks–everything from implementing features, documenting, and refactoring code, to performing software upgrades. You can ask Amazon Q Developer to add a new checkout feature to your e-commerce app, and it will analyze your existing codebase, map out the implementation plan spanning multiple files, and upon your approval, execute all the required code changes and tests in minutes. Check out this feature in action, Implement an API with Amazon Q Developer Agent for Software Development. Carrying out these tasks, the agent for software development achieved the highest scores of 13.4% on the SWE-Bench Leaderboard and 20.5% on the SWE-bench Leaderboard (Lite), a dataset that benchmarks coding capabilities. These updates will be available to customers in the coming days.

Q can also automate app upgrades, reducing days of work to minutes. Recently, a five-person Amazon team used the Amazon Q Code Transformation agent to upgrade over 1,000 production applications from Java 8 to Java 17 in just two days (the average time per application was less than 10 minutes), saving months of time and improving application performance. Today, Amazon Q performs Java language upgrades, and cross-platform .NET upgrades are coming soon to accelerate migrations to Linux saving customers millions in licensing fees. Check out the code transformation agent in action, Upgrade a Java App with Amazon Q Developer Agent for Code Transformation.

Starting today, you can also ask Q Developer questions about your AWS account like “What instances are currently running in US East 1?” or “What’s my S3 bucket encryption?” or “What were my EC2 costs by region last month?” and Amazon Q Developer will list the resources and details, in a summarized answer with links to learn more.

Learn more about Amazon Q Developer at the AWS News Blog.

Amazon Q Business empowers employees to be more creative, data-driven, efficient, prepared, and productive

Our vision for Amazon Q Business is to make the power of generative AI accessible to every business to get insights from all their data (unstructured and structured), take actions, and build applications.

Most companies have troves of valuable data that is hard to access and parse through. With Amazon Q Business, employees can get answers to questions across business data such as company policies, product information, business results, code base, people, and many other topics by connecting to enterprise data repositories to summarize the data logically, analyze trends, and engage in dialog about the data. To make this possible, Amazon Q Business has more built-in, managed and secure data connectors to connect your enterprise data than any other generative AI assistant. This includes commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3). You can even build custom plugins enabling Amazon Q Business to directly take actions like submitting time-off requests. Need the latest sales figures summarized? Looking for competitive intel on a prospective client? Amazon Q Business’s advanced language capabilities will quickly synthesize relevant info from scattered documents, databases and chat logs, into a coherent response.

One of the most exciting announcements today is a feature of Amazon Q Business called Amazon Q Apps. Amazon Q Apps helps every employee go from conversation to building generative AI-powered app in seconds, making it much easier to streamline and automate daily tasks. Creating an application with Amazon Q Apps is straightforward—employees can describe the type of app they want in natural language, or just tell Amazon Q Apps to do it from a conversation where Amazon Q helped solve a problem. For instance, a marketer could ask Q Apps to create an app that generates compelling customer stories by just inputting the customer’s name, products used, business challenge, and business impact. In seconds, Q creates the app that is then shareable to other marketers throughout the organization. heck out more examples at Introducing Amazon Q Apps (Preview).

Recently, I sat down with Praerit Garg, President of Product & Innovation at Smartsheet, to discuss how Amazon Q Business is helping their employees get answers faster and ultimately improve productivity, as well as onboard new hires more quickly.

Learn more about Amazon Q Business at the Amazon Machine Learning Blog.

Generative BI allows analysts to build detailed dashboards in minutes and business users to get insights fast

Historically, businesses have stored large amounts of their valuable structured data in their databases and data warehouses, which are usually accessible only through business intelligence (BI) tools. When business executives needed to extract information from the data, they had to rely on over-taxed business analysts to build dashboards, which could often take days or weeks. Even when dashboards were created it was difficult to extract and share important insights from these dashboards. Now, Amazon Q brings its advanced generative AI technology to Amazon QuickSight, AWS’s unified BI service built for the cloud. With Amazon Q in QuickSight, customers get a generative BI assistant that allows business analysts to use natural language to reduce the time to build a BI dashboard from hours to minutes. And it helps everyone in an organization become more data-driven by making data more accessible. It is the only BI product where business users can get AI-driven executive summaries of dashboards, ask questions of data beyond what is presented in the dashboards–for instant answers, and create detailed and customizable data stories highlighting key insights, trends, and drivers. All you have to do is say what you want in natural language.

Showpad, a leading provider of sales enablement solutions, is able to give customers the ability to query data without the need for a complex user interface or the need-to-know SQL. Integration took just a little over a week, and Showpad was able to customize the experience so it blends seamlessly into their own experience.

Clinigence Health is leveraging Amazon Q in QuickSight to identify insights and trends within data in minutes, a process that previously took hours.

See more about Amazon Q in QuickSight at the Business Intelligence Blog.

New pricing including Amazon Q Developer Free Tier

Along with these new features, we’re also announcing new pricing tiers that make it even easier to use Amazon Q. We are offering an Amazon Q Developer Free Tier, which provides free coding to individuals in the IDE and command line, as well as free limited usage of most advanced capabilities, like Amazon Q Developer Agents. For customers who want organizational license management, the ability to customize Amazon Q Developer to their codebase for more relevant coding suggestions, as well as higher limits on advanced capabilities, we are offering the Amazon Q Developer Pro tier for $19 per user per month. The Amazon Q Business Pro subscription at $20 per user/per month provides users access to the full suite of Amazon Q Business capabilities, including access to Amazon Q Apps and Amazon Q in QuickSight (Reader Pro). Finally, you can access a free trial of Amazon Q Developer Pro and Amazon Q Business Pro until June 30, 2024.

We’re excited about bringing you these new capabilities across Amazon Q Developer, Amazon Q Business, and Amazon Q in QuickSight. We are just getting started in helping you be more productive, better leverage your organizational data, and create new ways of working.

Check out the following resources to learn more about this announcement:

Visit community.aws to find deep-dive technical content and to discover how our builder communities are using Amazon Q in their solutions
Learn more about Generative AI on AWS
Amazon Q Introduction (15-minute course)
Amazon Q Business Getting Started (one-hour course that introduces developers and technical audiences to Amazon Q Business’s features and use cases, and teaches how to build a chatbot using Amazon Q)

About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Amazon Q Business and Amazon Q in QuickSight empowers employees to be more data-driven and make better, faster decisions using company knowledge

Today, we announced the General Availability of Amazon Q, the most capable generative AI powered assistant for accelerating software development and leveraging companies’ internal data. “During the preview, early indications signaled Amazon Q could help our customers’ employees become more than 80% more productive at their jobs; and with the new features we’re planning on introducing in the future, we think this will only continue to grow,” shared Dr. Swami Sivasubramanian, vice president of Artificial Intelligence and Data at AWS. Employees across every organization collectively spend hours every week searching internal sources for information, piecing together analyses, writing reports, building presentations, creating and searching for insights in dashboards, or adapting content for different customers or audiences. We built Amazon Q Business and Amazon Q in QuickSight to make this much simpler.

Amazon Q Business is a generative AI–powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Amazon Q Business unites more data sources than any other generative AI assistant available today

Amazon Q Business easily and securely connects to 40+ commonly used business tools, such as wikis, intranets, Atlassian, Gmail, Microsoft Exchange, Salesforce, ServiceNow, Slack, and Amazon Simple Storage Service (Amazon S3)–more than any other generative AI assistant available today. Simply point Q at your enterprise data repositories, and it will search all of your data, summarize logically, analyze trends, and engage in dialog with end users about the data. This helps business users to access all of their data, no matter where it resides in their organization. Watch use cases of Amazon Q Business through its simple web base interface.

Built from the ground up with security and privacy in mind

Amazon Q Business is built to be secure and private by design. It seamlessly integrates with a customer’s existing identities, roles, and access permissions to personalize the interactions for each individual user, while maintaining the highest levels of security. It generates accurate responses based on enterprise information, and customers can restrict sensitive topics, block keywords, and filter out inappropriate content and does not use customer content to train the underlying model for anybody else. If you want to learn more about how to set up and administer Q Business, check out the News Blog: Amazon Q Business.

Generative BI allows analysts and business users to build detailed dashboards in minutes

Amazon QuickSight is AWS’s unified Business Intelligence (BI) service built for the cloud. With Amazon Q in QuickSight, customers get a Generative BI assistant that allows business analysts to use natural language to build BI dashboards in minutes and easily create visualizations and complex calculations. It is also the only BI product where business users can get AI-driven executive summaries of dashboards, ask questions of data beyond what is presented in the dashboards–for instant answers, and create detailed and customizable data stories highlighting key insights, trends, and drivers. Business users can ask to “build a story about how the business has changed over the last month for a business review with leadership” and in seconds Amazon Q creates a story in multiple parts explaining different aspects of their data with specific insights and supporting visuals, including specific ideas of how to improve the business. Users can choose to layout content in an easy to share document or presentation where they can customize text, images, and themes, and use Amazon Q to rewrite and improve the text. You can read more about all the updates to Amazon QuickSight at the AWS Business Intelligence Blog, and watch the Unlock the power of Generative BI with Amazon Q in QuickSight on creating and sharing content based on your own data.

First-of-its-kind capability that helps every employee go from conversation to generative AI-powered app in seconds

Today, we announced a new capability of Amazon Q Business, called Amazon Q Apps (in preview) that allows employees to easily and quickly create generative AI-powered apps based on their company data, without requiring any prior coding experience. With Amazon Q Apps, employees simply describe the app they want, in natural language, or they can take an existing conversation where Amazon Q Business helped them solve a problem and, with one click, Amazon Q will instantly generate an app that accomplishes their desired task that can be easily shared across their organization.

For example, generating employee onboarding plans for new recruits can be a long and laborious process. They require many hours of searching through different data stores and documents to find the appropriate content for the new employee and oftentimes the content is out of date or not specific enough to their role. With Amazon Q, an HR professional can simply describe an app that could pull together a personalized onboarding plan for a new employee, simply by inputting their name and employee ID. In a matter of seconds, Amazon Q Apps will build an app that can automatically generate a personalized onboarding plan tailored to the employee, their role, and the department using the latest data. The HR professional can then share the app with hiring managers across the company to instantly build personalized onboarding plans for their own teams. Now, with Amazon Q Apps, business users can easily, quickly, and securely build an app based on enterprise information to improve their work productivity. Watch the Introducing Amazon Q Apps to see how easy it is to implement.

Bani Bedi, senior vice president, Corporate Development and Strategy at Smartsheet, said:

“Amazon Q Business is streamlining knowledge management and accelerating employee productivity at Smartsheet. Previously, it was too difficult for our 3,300 employees to find the information they needed across public help documents, training courses, and hundreds of all-employee Slack help channels. We have consolidated our organizational knowledge into a single AI engine to give our workforce immediate answers, significantly boosting employee productivity.”

You can hear more in the interview AWS Fireside Chat with Smartsheet.

We’re really excited to share Amazon Q Business and Amazon Q in QuickSight with you. If you want more information on generative AI at AWS, you can find it AWS Generative AI.

About the Authors

Mukesh Karki is GM of Amazon Q Business.

Tracy Daugherty is GM of Amazon Quicksight.

ExecuTorch Alpha: Taking LLMs and AI to the Edge with Our Community and Partners

We are excited to announce the release of ExecuTorch alpha, focused on deploying large language models (LLMs) and large ML models to the edge, stabilizing the API surface, and improving our installation processes. It has been an exciting few months from our 0.1 (preview) release in collaboration with our partners at Arm, Apple, and Qualcomm Technologies, Inc.

In this post we’ll discuss our full support for Meta’s Llama 2, early support for Meta’s Llama 3, broad model support in ExecuTorch, and highlight the important work our partners have done to move us forward.

Large Language Models on Mobile

Mobile devices are highly constrained for compute, memory, and power. To bring LLMs to these devices, we heavily leverage quantization and other techniques to pack these models appropriately.

ExecuTorch alpha supports 4-bit post-training quantization using GPTQ. We’ve provided broad device support on CPU by landing dynamic shape support and new dtypes in XNNPack. We’ve also made significant improvements in export and lowering, reduced memory overhead and improved runtime performance. This enables running Llama 2 7B efficiently on iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22, S23, and S24 phones and other edge devices. Early support for Llama 3 8B is also included. We are always improving the token/sec on various edge devices and you can visit GitHub for the latest performance numbers.

We’re working closely with our partners at Apple, Arm, and Qualcomm Technologies to delegate to GPU and NPU for performance through Core ML, MPS, TOSA, and Qualcomm AI Stack backends respectively.

Supported Models

We remain committed to supporting an ever-expanding list of models with ExecuTorch. Since preview, we have significantly expanded our tested models across NLP, vision and speech, with full details in our release notes. Although support for on-device LLMs is early, we anticipate most traditional models to function seamlessly out of the box, with delegation to XNNPACK, Core ML, MPS, TOSA, and HTP for performance. If you encounter any problems please open a GitHub issue with us.

Productivity

Deploying performant models tuned for specific platforms often require deep visualization into the on-device runtime data to determine the right changes to make in the original PyTorch model. With ExecuTorch alpha, we provide a powerful SDK with observability throughout the process from model authoring to deployment, including delegate and hardware-level information.

The ExecuTorch SDK was enhanced to include better debugging and profiling tools. Because ExecuTorch is built on PyTorch, the debugging capabilities include the ability to map from operator nodes back to original Python source code for more efficient anomaly resolution and performance tuning for both delegated and non-delegated model instances. You can learn more about the ExecuTorch SDK here.

Partnerships

ExecuTorch has only been possible because of strong collaborations across Arm, Apple, and Qualcomm Technologies. The collaboration for the initial launch of ExecuTorch continues as we support LLMs and large AI models on the edge for PyTorch. As we’ve seen with this early work for ExecuTorch alpha, there are unique challenges with these larger models and we’re excited to develop in the open.

We also want to highlight the great partnership with Google on XNNPACK for CPU performance. The teams continue to work together upstreaming our changes and across the TensorFlow and PyTorch teams to make sure we can all support generative AI models on the edge with SOTA performance.

Lastly, our hardware partner MediaTek has been doing work enabling the Llama collection of models with ExecuTorch on their SoCs. We’ll have more to share in the future.

Alpha and Production Usage

With our alpha release, we have production-tested ExecuTorch. Meta is using ExecuTorch for hand tracking on Meta Quest 3 and a variety of models on Ray-Ban Meta Smart Glasses. In addition, we have begun the rollout of ExecuTorch with Instagram and are integrating with other Meta products. We are excited to see how ExecuTorch can be used for other edge experiences.

Community

We are excited to see various efforts in the community to adopt or contribute to ExecuTorch. For instance, Unity recently shared their work at the Game Developers Conference (GDC) on leveraging ExecuTorch and Edge IR to run PyTorch models with their neural network inference library Sentis. Leveraging ExecuTorch’s hackability and extensibility, Unity introduced their own custom backend that serializes ExecuTorch’s Edge Dialect IR into Sentis’ native serialized format enabling developers to begin using PyTorch models easily in their games and apps.

We’ve been building and innovating with ExecuTorch in the open. Our north star is to empower the community to deploy any ML model on edge devices painlessly and efficiently. Whether you are a hobbyist or this is your day job, we’d love for you to jump in to bring your ML models to the edge. We are looking for your help to:

Use ExecuTorch to run your LLM models locally on various deployment targets and share your feedback
Expand our supported models, including bug reports
Expand our quantization schemes
Help us build out delegates to GPU and NPU

To all individual contributors and early adopters of ExecuTorch, a big thank you as well. We can’t wait to have more of you join us!

Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications

Modern computer systems and applications, with unprecedented scale, complexity, and security needs, require careful co-design and co-evolution of hardware and software. The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (opens in new tab), is the main forum where researchers bridge the gap between architecture, programming languages, and operating systems to advance the state of the art.

ASPLOS 2024 is taking place in San Diego between April 27 and May 1, and Microsoft researchers and collaborators have a strong presence, with members of our team taking on key roles in organizing the event. This includes participation in the program and external review committees and leadership as the program co-chair.

We are pleased to share that eight papers from Microsoft researchers and their collaborators have been accepted to the conference, spanning a broad spectrum of topics. In the field of AI and deep learning, subjects include power and frequency management for GPUs and LLMs, the use of Process-in-Memory for deep learning, and instrumentation frameworks. Regarding infrastructure, topics include memory safety with CHERI, I/O prefetching in modern storage, and smart oversubscription of burstable virtual machines. This post highlights some of this work.

Paper highlights

Characterizing Power Management Opportunities for LLMs in the Cloud

The rising popularity of LLMs and generative AI has led to an unprecedented demand for GPUs. However, the availability of power is a key limiting factor in expanding a GPU fleet. This paper characterizes the power usage in LLM clusters, examines the power consumption patterns across multiple LLMs, and identifies the differences between inference and training power consumption patterns. This investigation reveals that the average and peak power consumption in inference clusters is not very high, and that there is substantial headroom for power oversubscription. Consequently, the authors propose POLCA: a framework for power oversubscription that is robust, reliable, and readily deployable for GPU clusters. It can deploy 30% more servers in the same GPU clusters for inference tasks, with minimal performance degradation.

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization

PIM-DL is the first deep learning framework specifically designed for off-the-shelf processing-in-memory (PIM) systems, capable of offloading most computations in neural networks. Its goal is to surmount the computational limitations of PIM hardware by replacing traditional compute-heavy matrix multiplication operations with Lookup Tables (LUTs). PIM-DL first enables neural networks to operate efficiently on PIM architectures, significantly reducing the need for complex arithmetic operations. PIM-DL demonstrates significant speed improvements, achieving up to ~37x faster performance than traditional GEMM-based systems and showing competitive speedups against CPUs and GPUs.

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety

Memory safety bugs have persistently plagued software for over 50 years and underpin some 70% of common vulnerabilities and exposures (CVEs) every year. The CHERI capability architecture (opens in new tab) is an emerging technology (opens in new tab) (especially through Arm’s Morello (opens in new tab) and Microsoft’s CHERIoT (opens in new tab) platforms) for spatial memory safety and software compartmentalization. In this paper, the authors demonstrate the viability of object-granularity heap temporal safety built atop CHERI with considerably lower overheads than prior work.

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines

Burstable virtual machines (BVMs) are a type of virtual machine in the cloud that allows temporary increases in resource allocation. This paper shows how to oversubscribe BVMs. It first studies the characteristics of BVMs on Microsoft Azure and explains why traditional approaches based on using a fixed oversubscription ratio or based on the Central Limit Theorem do not work well for BVMs: they lead to either low utilization or high server capacity violation rates. Based on the lessons learned from the workload study, the authors developed a new approach, called AUDIBLE, using a nonparametric statistical model. This makes the approach lightweight and workload independent. This study shows that AUDIBLE achieves high system utilization while enforcing stringent requirements on server capacity violations.

Complete list of accepted publications by Microsoft researchers

Amanda: Unified Instrumentation Framework for Deep Neural Networks
Yue Guan, Yuxian Qiu, and Jingwen Leng; Fan Yang, Microsoft Research; Shuo Yu, Shanghai Jiao Tong University; Yunxin Liu, Tsinghua University; Yu Feng and Yuhao Zhu, University of Rochester; Lidong Zhou, Microsoft Research; Yun Liang, Peking University; Chen Zhang, Chao Li, and Minyi Guo, Shanghai Jiao Tong University

AUDIBLE: A Convolution-Based Resource Allocator for Oversubscribing Burstable Virtual Machines
Seyedali Jokar Jandaghi and Kaveh Mahdaviani, University of Toronto; Amirhossein Mirhosseini, University of Michigan; Sameh Elnikety, Microsoft Research; Cristiana Amza and Bianca Schroeder, University of Toronto, Cristiana Amza and Bianca Schroeder, University of Toronto

Characterizing Power Management Opportunities for LLMs in the Cloud
(opens in new tab)Pratyush Patel, Microsoft Azure and University of Washington; Esha Choukse (opens in new tab), Chaojie Zhang (opens in new tab), and Íñigo Goiri (opens in new tab), Azure Research; Brijesh Warrier (opens in new tab), Nithish Mahalingam, Ricardo Bianchini (opens in new tab), Microsoft AzureResearch

Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety
Nathaniel Wesley Filardo, University of Cambridge and Microsoft Research; Brett F. Gutstein, Jonathan Woodruff, Jessica Clarke, and Peter Rugg, University of Cambridge; Brooks Davis, SRI International; Mark Johnston, University of Cambridge; Robert Norton, Microsoft Research; David Chisnall, SCI Semiconductor; Simon W. Moore, University of Cambridge; Peter G. Neumann, SRI International; Robert N. M. Watson, University of Cambridge

CrossPrefetch: Accelerating I/O Prefetching for Modern Storage
Shaleen Garg and Jian Zhang, Rutgers University; Rekha Pitchumani, Samsung; Manish Parashar, University of Utah; Bing Xie, Microsoft; Sudarsun Kannan, Rutgers University

Kimbap: A Node-Property Map System for Distributed Graph Analytics
Hochan Lee, University of Texas at Austin; Roshan Dathathri, Microsoft Research; Keshav Pingali, University of Texas at Austin

PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
Cong Li and Zhe Zhou, Peking University; Yang Wang, Microsoft Research; Fan Yang, Nankai University; Ting Cao and Mao Yang, Microsoft Research; Yun Liang and Guangyu Sun, Peking University

Predict; Don’t React for Enabling Efficient Fine-Grain DVFS in GPUs
Srikant Bharadwaj, Microsoft Research; Shomit Das, Qualcomm; Kaushik Mazumdar and Bradford M. Beckmann, AMD; Stephen Kosonocky, Uhnder

Conference organizers from Microsoft

Career opportunities

Microsoft welcomes talented individuals across various roles at Microsoft Research, Azure Research, and other departments. We are always pushing the boundaries of computer systems to improve the scale, efficiency, and security of all our offerings. You can review our open research-related positions here.

The post Microsoft at ASPLOS 2024: Advancing hardware and software for high-scale, secure, and efficient modern applications appeared first on Microsoft Research.

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

This is a guest post co-authored with Ville Tuulos (Co-founder and CEO) and Eddie Mattia (Data Scientist) of Outerbounds.

To build a production-grade AI system today (for example, to do multilingual sentiment analysis of customer support conversations), what are the primary technical challenges? Historically, natural language processing (NLP) would be a primary research and development expense. In 2024, however, organizations are using large language models (LLMs), which require relatively little focus on NLP, shifting research and development from modeling to the infrastructure needed to support LLM workflows.

For AWS and Outerbounds customers, the goal is to build a differentiated machine learning and artificial intelligence (ML/AI) system and reliably improve it over time. This often means the method of using a third-party LLM API won’t do for security, control, and scale reasons. Owning the infrastructural control and knowhow to run workflows that power AI systems is a requirement.

Returning to the original question, three MLOps challenges may arise:

You need high-quality data to train and fine-tune models
You need a diverse cloud infrastructure for experimentation, training, tracking, and orchestrating the production system
You need a significant amount of compute to power the system

In this post, we highlight a collaboration between Outerbounds and AWS that takes a step towards addressing the last two challenges. First, the AWS Trainium accelerator provides a high-performance, cost-effective, and readily available solution for training and fine-tuning large models. Second, open source Metaflow provides the necessary software infrastructure to build production-grade ML/AI systems in a developer-friendly manner. It provides an approachable, robust Python API for the full infrastructure stack of ML/AI, from data and compute to workflows and observability.

In the following sections, we first introduce Metaflow and the Trainium integration. We then show how to set up the infrastructure stack you need to take your own data assets and pre-train or fine-tune a state-of-the-art Llama2 model on Trainium hardware.

Metaflow overview

Metaflow was originally developed at Netflix to enable data scientists and ML engineers to build ML/AI systems quickly and deploy them on production-grade infrastructure. Netflix open sourced the framework in 2019 with integrations to AWS services like AWS Batch, AWS Step Functions (see Unbundling Data Science Workflows with Metaflow and AWS Step Functions), Kubernetes, and throughput-optimized Amazon Simple Storage Service (Amazon S3), so you can build your own Netflix-scale ML/AI environment in your AWS account.

The key motivation of Metaflow is to address the typical needs of all ML/AI projects with a straightforward, human-centric API, from prototype to production (and back). The following figure illustrates this workflow.

Metaflow’s coherent APIs simplify the process of building real-world ML/AI systems in teams. Metaflow helps scientists and engineers access, move, and manipulate data efficiently; track and version experiments and models; orchestrate and integrate workflows to surrounding systems; and scale compute to the cloud easily. Moreover, it has first-class support for teams, such as namespacing and deploying workflows in versioned production branches.

Now, with today’s announcement, you have another straightforward compute option for workflows that need to train or fine-tune demanding deep learning models: running them on Trainium.

How Metaflow integrates with Trainium

From a Metaflow developer perspective, using Trainium is similar to other accelerators. After a Metaflow deployment is configured to access Trainium chips through the compute platform customers use with Metaflow (which we discuss later in this post), ML engineers and data scientists can operate autonomously in the land of deep learning code. Scientists can write PyTorch, Hugging Face, and use the AWS Neuron SDK along with the NeuronX Distributed SDK to optimize these frameworks to target Trainium devices, and Metaflow integrates with the underlying AWS services to separate concerns about how to actually run the code at scale.

As illustrated by the following figure, you can declare the following in a few lines of Python code:

How many nodes to launch
How many Trainium devices to use per node
How the nodes are interconnected (Elastic Fabric Adapter)
How often to check the resource utilization
What training script the torchrun process should run on each node

You can initialize the training process in the start step, which directs the next train step to run on two parallel instances (num_parallel=2). The decorators of the train step configure your desired training setup:

@torchrun – Sets up PyTorch Distributed across two instances
@batch – Configures the Trainium nodes, managed by AWS Batch
@neuron_monitor – Activates the monitoring UI that allows you to monitor the utilization of the Trainium cores

Metaflow allows you to configure all this functionality in a few lines of code. However, the main benefit is that you can embed Trainium-based training code inside a larger production system, using the scaffolding provided by Metaflow.

Benefits of using Trainium with Metaflow

Trainium and Metaflow work together to solve problems like what we discussed earlier in this post. The Trainium devices and Neuron software stack make it straightforward for teams to access and effectively use the high-performance hardware needed for cutting-edge AI.

Trainium provides a few key benefits for building real-world AI systems:

Trainium instances can help reduce generative AI model training and fine-tuning costs by up to 50% over comparable instances on AWS
It is readily available in many AWS Regions, is often more available than GPU-based instance types, and scaling is available in the most popular Regions worldwide
The hardware and software are mature and actively developed by AWS

If you have been struggling with GPU availability and cost, you’ll surely appreciate these benefits. Using Trainium effectively can require a bit of infrastructure effort and knowledge, which is a key motivation for this integration. Through Metaflow and the deployment scripts provided in this post, you should be able to get started with Trainium with ease.

Besides easy access, using Trainium with Metaflow brings a few additional benefits:

Infrastructure accessibility

Metaflow is known for its developer-friendly APIs that allow ML/AI developers to focus on developing models and applications, and not worry about infrastructure. Metaflow helps engineers manage the infrastructure, making sure it integrates with existing systems and policies effortlessly.

Data, model, and configuration management

Metaflow provides built-in, seamless artifact persistence, tracking, and versioning, which covers the full state of the workflows, making sure you’ll follow MLOps best practices. Thanks to Metaflow’s high-throughput S3 client, you can load and save datasets and model checkpoints very quickly, without having to worry about extra infrastructure such as shared file systems. You can use artifacts to manage configuration, so everything from hyperparameters to cluster sizing can be managed in a single file, tracked alongside the results.

Observability

Metaflow comes with a convenient UI, which you can customize to observe metrics and data that matter to your use cases in real time. In the case of Trainium, we provide a custom visualization that allows you to monitor utilization of the NeuronCores inside Trainium instances, making sure that resources are used efficiently. The following screenshot shows an example of the visualization for core (top) and memory (bottom) utilization.

Multi-node compute

Finally, a huge benefit of Metaflow is that you can use it to manage advanced multi-instance training clusters, which would take a lot of involved engineering otherwise. For instance, you can train a large PyTorch model, sharded across Trainium instances, using Metaflow’s @torchrun and @batch decorators.

Behind the scenes, the decorators set up a training cluster using AWS Batch multi-node with a specified number of Trainium instances, configured to train a PyTorch model across the instances. By using the launch template we provide in this post, the setup can benefit from low-latency, high-throughput networking via Elastic Fabric Adapter (EFA) networking interfaces.

Solution overview

As a practical example, let’s set up the complete stack required to pre-train Llama2 for a few epochs on Trainium using Metaflow. The same recipe applies to the fine-tuning examples in the repository.

Deploy and configure Metaflow

If you already use a Metaflow deployment, you can skip to the next step to deploy the Trainium compute environment.

Deployment

To deploy a Metaflow stack using AWS CloudFormation, complete the following steps:

Download the CloudFormation template.
On the CloudFormation console, choose Stacks in the navigation pane.
Choose Create new stack.
For Prepare template¸ select Template is ready.
For Template source, select Upload a template file.
Upload the template.
Choose Next.

If you are brand new to Metaflow, or are trying this recipe as a proof of concept, we suggest you change the APIBasicAuth parameter to false and leave all other default parameter settings.
Complete the stack creation process.

After you create the CloudFormation stack and configure Metaflow to use the stack resources, there is no additional setup required. For more information about the Metaflow components that AWS CloudFormation deploys, see AWS Managed with CloudFormation.

Configuration

To use the stack you just deployed from your laptop or cloud workstation, complete the following steps:

Prepare a Python environment and install Metaflow in it:

pip install metaflow

Run metaflow configure aws in a terminal.

metaflow configure aws

After the CloudFormation stack deployment is complete, the Outputs on the stack details page will contain a list of resource names and their values, which you can use in the Metaflow AWS configuration prompts.

Deploy a Trainium compute environment

The default Metaflow deployment from the previous step has an AWS Batch compute environment, but it will not be able to schedule jobs to run on Amazon Elastic Compute Cloud (Amazon EC2) instances with Trainium devices. To deploy an AWS Batch compute environment for use with Trainium accelerators, you can use the following CloudFormation template. Complete the following steps:

Download the CloudFormation template.
On the CloudFormation console, choose Stacks in the navigation pane.
Choose Create new stack.
For Prepare template¸ select Template is ready.
For Template source, select Upload a template file.
Upload the template.
Choose Next.
Complete the stack creation process.

Take note of the name of the AWS Batch job queue that you created to use in a later step.

Prepare a base Docker image to run Metaflow tasks

Metaflow tasks run inside Docker containers when AWS Batch is used as a compute backend. To run Trainium jobs, developers need to build a custom image and specify it in the @batch decorator Metaflow developers use to declare task resources:

@batch(trainium=16, efa=8, image=”YOUR_IMAGE_IN_ECR” )
@step
def train_llama2(self):
    # neuron distributed training code

To make the image, complete the following steps:

Create an Amazon Elastic Container Registry (Amazon ECR) registry to store your image in.
Create and log in to an EC2 instance with sufficient memory. For this post, we used Ubuntu x86 OS on a C5.4xlarge instance.
Install Docker.
Copy the following Dockerfile to your instance.
Authenticate with the upstream base image provider:

aws ecr get-login-password 
--region $REGION | docker login 
--username AWS 
--password-stdin 763104351884.dkr.ecr.$REGION.amazonaws.com

Build the image:

docker build . -t $YOUR_IMAGE_NAME:$YOUR_IMAGE_TAG

On the Amazon ECR console, navigate to the ECR registry you created, and you will find the commands needed to authenticate from the EC2 instance and push your image.

Clone the repository on your workstation

Now you’re ready to verify the infrastructure is working properly, after which you can run complex distributed training code like Llama2 training. To get started, clone the examples repository to the workstation where you configured Metaflow with AWS:

git clone https://github.com/outerbounds/metaflow-trainium

Verify the infrastructure with an allreduce example

To validate your infrastructure configuration, complete the following steps:

Navigate to the allreduce example:

cd allreduce-trn

Open the flow.py file and make sure to set the job queue and image to the name of the queue you deployed with AWS CloudFormation and the image you pushed to Amazon ECR, respectively.
To run the allreduce code, run the following Metaflow command:

python flow.py --package-suffixes=.sh run

You can find the logs (truncated in the following code snippet for readability) in the Metaflow UI:

Task is starting (status SUBMITTED)...
Task is starting (status RUNNABLE)... (parallel node status: [SUBMITTED:3])
Task is starting (status STARTING)... (parallel node status: [SUBMITTED:3])
Task is starting (status RUNNING)... (parallel node status: [SUBMITTED:3])
Setting up task environment.
Downloading code package...
Code package downloaded.
Task is starting.
...
Compiler status PASS
result OK step 0: tensor([[64., 64., 64.],
[64., 64., 64.]], device='xla:1')
...
result OK step 900: tensor([[64., 64., 64.],
[64., 64., 64.]], device='xla:1')
Before final rendezvous
Waiting for batch secondary tasks to finish

Configure and run any Neuron distributed code

If the allreduce test runs successfully, you are ready to move on to meaningful workloads. To complete this onboarding, complete the following steps:

Navigate to the llama2-7b-pretrain-trn directory.
Similar to the all reduce example, before using this code, you need to modify the config.py file so that it matches the AWS Batch job queue and ECR image that you created. Open the file, find these lines, and modify them to your values:

class BatchJobConfig:
    # <snip>
    image: str = "YOUR_IMAGE"
    job_queue: str = "YOUR_QUEUE"

After modifying these values, and any others you want to experiment with, run the following command:

python config.py

Then run the workflow to pre-train your own Llama2 model from scratch:

python flow.py run --config-file config.yaml

This will train the model on however many nodes you specify in the config.py file, and will push the trained model result to Amazon S3 storage, versioned by Metaflow’s data store using the flow name and run ID.

Logs will appear like the following (truncated from a sample run of five steps for readability):

Task is starting (status SUBMITTED)...
Task is starting (status RUNNABLE)... (parallel node status: [SUBMITTED:3])
Task is starting (status STARTING)... (parallel node status: [SUBMITTED:3])
Task is starting (status RUNNING)... (parallel node status: [SUBMITTED:3])
Setting up task environment.
Downloading code package...
Code package downloaded.
Task is starting.
...
initializing tensor model parallel with size 8
initializing pipeline model parallel with size 1
initializing data parallel with size 16
...
Epoch 0 begin Fri Mar 15 21:19:10 2024
...
Compiler status PASS
...
(0, 3) step_loss : 15.4375 learning_rate : 3.00e-04 throughput : 4.38
(0, 4) step_loss : 12.1250 learning_rate : 1.50e-04 throughput : 5.47
(0, 5) step_loss : 11.8750 learning_rate : 0.00e+00 throughput : 6.44
...
Writing data to the provided results file: /metaflow/metaflow/metrics.json
...
Waiting for batch secondary tasks to finish

Clean up

To clean up resources, delete the CloudFormation stacks for your Metaflow deployment and Trainium compute environment:

aws cloudformation delete-stack --stack-name metaflow
aws cloudformation delete-stack --stack-name trn1-batch

Conclusion

You can get started experimenting with the solution presented in this post in your environment today. Follow the instructions in the GitHub repository to pre-train a Llama2 model on Trainium devices. Additionally, we have prepared examples for fine-tuning Llama2 and BERT models, demonstrating how you can use the Optimum Neuron package to use the integration from this post with any Hugging Face model.

We are happy to help you get started. Join the Metaflow community Slack for support, to provide feedback, and share experiences!

About the authors

Ville Tuulos is a co-founder and CEO of Outerbounds, a developer-friendly ML/AI platform. He has been developing infrastructure for ML and AI for over two decades in academia and as a leader at a number of companies. At Netflix, he led the ML infrastructure team that created Metaflow, a popular open-source, human-centric foundation for ML/AI systems. He is also the author of a book, Effective Data Science Infrastructure, published by Manning.

Eddie Mattia is in scientific computing and more recently building machine learning developer tools. He has worked as a researcher in academia, in customer-facing and engineering roles at MLOps startups, and as a product manager at Intel. Currently, Eddie is working to improve the open-source Metaflow project and is building tools for AI researchers and MLOps developers at Outerbounds.

Vidyasagar specializes in high performance computing, numerical simulations, optimization techniques and software development across industrial and academic environments. At AWS, Vidyasagar is a Senior Solutions Architect developing predictive models, generative AI and simulation technologies. Vidyasagar has a PhD from the California Institute of Technology.

Diwakar Bansal is an AWS Senior Specialist focused on business development and go-to-market for GenAI and Machine Learning accelerated computing services. Diwakar has led product definition, global business development, and marketing of technology products in the fields of IOT, Edge Computing, and Autonomous Driving focusing on bringing AI and Machine leaning to these domains. Diwakar is passionate about public speaking and thought leadership in the Cloud and GenAI space.

Sadaf Rasool is a Machine Learning Engineer with the Annapurna ML Accelerator team at AWS. As an enthusiastic and optimistic AI/ML professional, he holds firm to the belief that the ethical and responsible application of AI has the potential to enhance society in the years to come, fostering both economic growth and social well-being.

Scott Perry is a Solutions Architect on the Annapurna ML accelerator team at AWS. Based in Canada, he helps customers deploy and optimize deep learning training and inference workloads using AWS Inferentia and AWS Trainium. His interests include large language models, deep reinforcement learning, IoT, and genomics.

Cohere Command R and R+ are now available in Amazon SageMaker JumpStart

This blog post is co-written with Pradeep Prabhakaran from Cohere.

Today, we are excited to announce that Cohere Command R and R+ foundation models are available through Amazon SageMaker JumpStart to deploy and run inference. Command R/R+ are the state-of-the-art retrieval augmented generation (RAG)-optimized models designed to tackle enterprise-grade workloads.

In this post, we walk through how to discover and deploy Cohere Command R/R+ via SageMaker JumpStart.

What are Cohere Command R and Command R+?

Cohere Command R is a family of highly scalable language models that balance high performance with strong accuracy. Command R family – include Command R and Command R+ models – are optimized for RAG based workflows such as conversational interaction and long context tasks, enabling companies to move beyond proof of concept and into production. These powerful models are designed to handle complex tasks with high performance and strong accuracy, making them suitable for real-world applications.

Command R boasts high precision on RAG and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, and Chinese.

Command R+ is the newest model, optimized for extremely performant conversational interaction and long-context tasks. It is recommended for workflows that lean on complex RAG functionality and multi-step tool use (agents), while Cohere R is well-suited for simpler RAG and single-step tool use tasks, as well as applications where price is a major consideration.

What is SageMaker JumpStart

With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network-isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Cohere Command R/R+ models with a few choices in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK. Doing so enables you to derive model performance and machine learning operations (MLOps) controls with SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs.

The model is deployed in an AWS secure environment and under your virtual private cloud (VPC) controls, helping provide data security. Cohere Command R/R+ models are available today for deployment and inferencing in Amazon SageMaker Studio in us-east-1 (N. Virginia), us-east-2 (Ohio), us-west-1 (N. California), us-west-2 (Oregon), Canada (Central), eu-central-1 (Frankfurt), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris), eu-north-1 (Stockholm), ap-southeast-1 (Singapore), ap-southeast-2 (Sydney), ap-northeast-1 (Tokyo) , ap-northeast-2 (Seoul), ap-south-1 (Mumbai), and sa-east-1 (Sao Paulo).

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

From the SageMaker JumpStart landing page, you can easily discover various models by browsing through different hubs, which are named after model providers. The Cohere Command R and R+ models are available in the Cohere hub. If you don’t see these models, ensure you have the latest SageMaker Studio version by shutting down and restarting Studio Classic Apps.

To find the Command R and R+ models, search for “Command R” in the search box located at the top left of the SageMaker JumpStart landing page. Each model can be deployed on Amazon Elastic Compute Cloud (EC2) P5 instances powered by NVIDIA H100 Tensor Core GPUs (p5.48xlarge) and Amazon EC2 P4de instances powered by NVIDIA A100 Tensor Core GPUs (ml.p4de.24xlarge).

Deploy a model

To illustrate model deployment, we’ll deploy Cohere Command R+ on NVIDIA H100. Choose the model card to open the corresponding model detail page.

When you choose Deploy, a window appears prompting you to subscribe to the model on AWS Marketplace. Choose Subscribe, which redirects you to the AWS Marketplace listing for Cohere Command R+ (H100). Follow the on-screen instructions to complete the subscription process.

Once subscribed, return to the model detail page and choose Deploy in the window. The deployment process initiates.

Alternatively, you can choose Notebooks on the model card and open the example notebook in JupyterLab. This notebook provides end-to-end guidance on deploying the model for inference and cleaning up resources. You can also find this example notebook in the Cohere SageMaker GitHub repository. To ensure the security of the endpoint, you can configure AWS Key Management Service (KMS) key for a SageMaker endpoint configuration.

If an endpoint has already been created, you can simply connect to it:

co = Client(region_name=region)

co.connect_to_endpoint(endpoint_name="cohere-command-r-plus")

Real-time inference

Once your endpoint has been connected, you can perform real-time inference using the co.chat endpoint.

message = "Write a LinkedIn post about starting a career in tech:"
response = co.chat(message=message, stream=False)

Multilingual capabilities

Command R/R+ is optimized to perform well in 10 key languages, as listed in the introduction. Additionally, pre-training data have been included for the following 13 languages: Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian.

The model has been trained to respond in the language of the user. Here’s an example in Spanish:

co.chat(
  message="Écris une description de produit pour une voiture électrique en 50 à 75 mots"
)

Here’s what the response might look like:

Découvrez la voiture électrique qui va révolutionner votre façon de conduire.
Avec son design élégant, cette voiture offre une expérience de conduit unique avec une accélération puissante et une autonomie impressionnante. Sa technologie avancée vous garantit une charge rapide et une fiabilité inégalée. Avec sa conception innovante et durable, cette voiture est parfaite pour les trajets urbains et les longues distances. Profitez d'une conduite silencieuse et vivez l'expérience de la voiture électrique!

Command R/R+ can also perform cross-lingual tasks, such as translation or answering questions about content in other languages.

Chat with documents (RAG)

Command R/R+ can ground its generations. This means that it can generate responses based on a list of supplied document snippets, and it includes citations in its response indicating the source of the information.

For example, the code snippet that follows produces an answer to “How deep is the Mariana Trench” along with inline citations based on the provided on-line documents.

Request:

message="How deep is the Mariana Trench"
documents = [
    {
       "id": "national_geographic_everest",
       "title": "Height of Mount Everest",
       "snippet": "The height of Mount Everest is 29,035 feet",
       "url": "https://education.nationalgeographic.org/resource/mount-everest/",
    },
    {
        "id": "national_geographic_mariana",
        "title": "Depth of the Mariana Trench",
        "snippet": "The depth of the Mariana Trench is 36,070 feet",
        "url": "https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth",
    }
]

response = co.chat(message=message, documents=documents, stream=False)

Response:

{
   text: “The depth of the Mariana Trench is 36,070 feet.”,
   citations: [
      {'start': 35, 'end': 47, 'text': '36,070 feet.', 'document_ids': ['national_geographic_mariana']}
   ],
   documents: [
      {'id': 'national_geographic_mariana', 
       'snippet': 'The depth of the Mariana Trench is 36,070 feet', 
       'title': 'Depth of the Mariana Trench'
	'url':'https://www.nationalgeographic.org/activity/mariana-trench-deepest-place-earth'}
   ]
}

Single-Step & Multi-Step Tool Use

Command R/R+, comes with a Tool Use API that enables the language model to interact with user-defined tools to automate highly sophisticated tasks. Command R/R+ in Tool Use mode creates API payloads (JSONs with specific parameters) based on user interactions and conversational history. These can be used to instruct any other application or tool.

For example, an application can be instructed to automatically categorize and route support tickets to the appropriate individual, change a status in customer relationship management software (CRM), or retrieve relevant snippets from a vector database. It comes in two variants; single-step and multi-step:

Single-step tool use enables a richer set of behaviors by leveraging data stored in tools, taking actions through APIs, interacting with a vector database, querying a search engine, etc.
Multi-step tool use is an extension of this basic idea and allows the model to call more than one tool in a sequence of steps, using the results from one tool call in a subsequent step. This process allows the language model to reason, perform dynamic actions, and quickly adapt based on information coming from external sources.

To explore these capabilities further, you can refer to the provided Jupyter notebook and Cohere’s AWS GitHub repository, which offer additional examples showcasing various use cases and applications.

Clean Up

After you’ve finished running the notebook and exploring the Cohere Command R and R+ models, it’s essential to clean up the resources you’ve created to avoid incurring unnecessary charges. Follow these steps to delete the resources and stop the billing:

co.delete_endpoint()
co.close()

Conclusion

In this post, we explored how to leverage the powerful capabilities of Cohere’s Command R and R+ models on Amazon SageMaker JumpStart. These state-of-the-art large language models are specifically designed to excel at real-world enterprise use cases, offering unparalleled performance and scalability. With their availability on SageMaker JumpStart and AWS Marketplace, you now have seamless access to these cutting-edge models, enabling you to unlock new levels of productivity and innovation in your natural language processing projects.

About the authors

Pradeep Prabhakaran is a Customer Solutions Architect at Cohere. In his current role at Cohere, Pradeep acts as a trusted technical advisor to customers and partners, providing guidance and strategies to help them realize the full potential of Cohere’s cutting-edge Generative AI platform. Prior to joining Cohere, Pradeep was a Principal Customer Solutions Manager at Amazon Web Services, where he led Enterprise Cloud transformation programs for large enterprises. Prior to AWS, Pradeep has held various leadership positions at consulting companies such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor’s degree in Engineering and is based in Dallas, TX.

James Yi is a Senior AI/ML Partner Solutions Architect at Amazon Web Services. He spearheads AWS’s strategic partnerships in Emerging Technologies, guiding engineering teams to design and develop cutting-edge joint solutions in GenAI. He enables field and technical teams to seamlessly deploy, operate, secure, and integrate partner solutions on AWS. James collaborates closely with business leaders to define and execute joint Go-To-Market strategies, driving cloud-based business growth. Outside of work, he enjoys playing soccer, traveling, and spending time with his family.

SIGMA: An open-source mixed-reality system for research on physical task assistance

Blue, purple, pink gradient background with three images: a five item checklist on the left, a sound wave in the middle, and goggles on the right.

Imagine if every time you needed to complete a complex physical task, like building a bicycle, fixing a broken water heater, or cooking risotto for the first time, you had a world-class expert standing over your shoulder and guiding you through the process. In addition to telling you the steps to follow, this expert would also tune the instructions to your skill set, deliver them with the right timing, and adapt to any mistakes, confusions, or distractions that might arise along the way.

What would it take to build an interactive AI system that could assist you with any task in the physical world, just as a real-time expert would? To begin exploring the core competencies that such a system would require, we developed and released the Situated Interactive Guidance, Monitoring, and Assistance (SIGMA) system, an open-source research platform and testbed prototype (opens in new tab) for studying mixed-reality task assistance. SIGMA provides a basis for researchers to explore, understand, and develop the capabilities required to enable in-stream task assistance in the physical world.

Left: Stock photo of a man with glasses fixing a bicycle. Middle: Stock photo of a man cooking a meal in a kitchen. Right: Stock photo of a woman fixing the plumbing of a kitchen sink with a wrench while lying on the floor with other tools scattered around.

Recent advances in generative AI and large language, vision, and multimodal models can provide a foundation of open-domain knowledge, inference, and generation capabilities to help enable such open-ended task assistance scenarios. However, building AI systems that collaborate with people in the physical world—including not just mixed-reality task assistants but also interactive robots, smart factory floors, autonomous vehicles, and so on—requires going beyond the ability to generate relevant instructions and content. To be effective, these systems also require physical and social intelligence.

Physical and social intelligence

For AI systems to fluidly collaborate with people in the physical world, they must continuously perceive and reason multimodally, in stream, about their surrounding environment. This requirement goes beyond just detecting and tracking objects. Effective collaboration in the physical world necessitates an understanding of which objects are relevant for the task at hand, what their possible uses may be, how they relate to each other, what spatial constraints are in play, and how all these aspects evolve over time.

Just as important as reasoning about the physical environment, these systems also need to reason about people. This reasoning should include not only lower-level inferences about body pose, speech and actions, but also higher-level inferences about cognitive states and the social norms of real-time collaborative behavior. For example, the AI assistant envisioned above would need to consider questions such as: Is the user confused or frustrated? Are they about to make a mistake? What’s their level of expertise? Are they still pursuing the current task, or have they started doing something else in parallel? Is it a good time to interrupt them or provide the next instruction? And so forth.

Situated Interactive Guidance, Monitoring, and Assistance

We developed SIGMA as a platform to investigate these challenges and evaluate progress in developing new solutions.

Left: A person using SIGMA running on a HoloLens 2 to perform a procedural task. Middle: First-person view showing SIGMA’s task-guidance panel and task-specific holograms. Right: 3D visualization of the system's scene understanding showing the egocentric camera view, depth map, detected objects, gaze, hand and head pose. — **Left**: A person using SIGMA running on a HoloLens 2 to perform a procedural task. **Middle**: First-person view showing SIGMA’s task-guidance panel and task-specific holograms. **Right**: 3D visualization of the system’s scene understanding showing the egocentric camera view, depth map, detected objects, gaze, hand and head pose.

SIGMA is an interactive application that currently runs on a HoloLens 2 device and combines a variety of mixed-reality and AI technologies, including large language and vision models, to guide a user through procedural tasks. Tasks are structured as a sequence of steps, which can either be predefined manually in a task library or generated on the fly using a large language model like GPT-4. Throughout the interaction, SIGMA can leverage large language models to answer open-ended questions that a user might have along the way. Additionally, SIGMA can use vision models like Detic and SEEM to detect and track task-relevant objects in the environment and point them out to the user as appropriate. This video (opens in new tab) provides a first-person view of someone using SIGMA to perform a couple of example procedural tasks.

Enabling research at the intersection of AI and mixed reality

SIGMA was designed to serve as a research platform. Our goal in open-sourcing the system is to help other researchers leapfrog the basic engineering challenges of putting together a full-stack interactive application and allow them to directly focus on the interesting research challenges ahead.

Several design choices support these research goals. For example, the system is implemented as a client-server architecture: a lightweight client application runs on the HoloLens 2 device (configured in Research Mode (opens in new tab)), which captures and sends a variety of multimodal data streams—including RGB (red-green-blue), depth, audio, head, hand, and gaze tracking information—live to a more powerful desktop server. The desktop server implements the core functionality of the application and streams information and commands to the client app for what to render on the device. This architecture enables researchers to bypass current compute limitations on the headset and creates opportunities for porting the application to other mixed-reality devices.

SIGMA is built on top of Platform for Situated Intelligence (opens in new tab) (also known as psi), an open-source framework that provides the fabric, tools, and components for developing and researching multimodal integrative-AI systems. The underlying psi framework enables fast prototyping and provides a performant streaming and logging infrastructure. The framework provides infrastructure for data replay, enabling data-driven development and tuning at the application level. Finally, Platform for Situated Intelligence Studio provides extensive support for visualization, debugging, tuning and maintenance.

An animated gif depicting the Platform for Situated Intelligence Studio visualization tool. Various 2D, 3D, and timeline streams are shown over a 10-second clip of a user interacting with SIGMA, such as the egocentric camera view, depth map, head pose, audio, speech recognition results, etc. — Platform for Situated Intelligence Studio is a tool that enables researchers to visualize various data streams collected and debug the application.

SIGMA’s current functionality is relatively simple, but the system provides an important starting point for discovering and exploring research challenges at the intersection of mixed reality and AI. From computer vision to speech recognition, many research problems, especially when it comes to perception, can and have been investigated based on collected datasets. The recently increased interest in egocentric data and associated challenges provides important fuel for advancing the state of the art. Yet, numerous problems that have to do with interaction and with real-time collaboration are only surfaced by real-time end-to-end systems and are best studied and understood in an interactive context with actual users.

As a testament to Microsoft’s continued commitment to the space, SIGMA provides a research platform and reflects just one part of the company’s work to explore new AI and mixed-reality technologies. Microsoft also offers an enterprise-ready, mixed-reality solution for frontline workers: Dynamics 365 Guides. With Copilot in Dynamics 365 Guides, which is currently being used by customers in private preview, AI and mixed reality together empower frontline workers with step-by-step procedural guidance and relevant information in the flow of work. Dynamics 365 Guides is a richly featured product for enterprise customers, geared toward frontline workers who perform complex tasks. In comparison, SIGMA is an open-source testbed for exploratory research purposes only.

We hope that SIGMA can provide a solid foundation for researchers to build on. Although the system targets the specific scenario of mixed-reality task assistance, it can help illuminate the challenges of developing social and physical intelligence that arise for any computing systems that are designed to operate in the physical world and interact with people, from virtual agents to physical robots and devices.

If you are interested in learning more and using SIGMA in your own research, check it out at https://aka.ms/psi-sigma (opens in new tab). We are excited to collaborate with and work alongside the open-source research community to make faster progress in this exciting and challenging space.

Acknowledgements / Contributors

Ishani Chakraborty, Neel Joshi, Ann Paradiso, Mahdi Rad, Nick Saw, Vibhav Vineet, Xin Wang.

Responsible AI considerations

SIGMA was designed as an experimental prototype for research purposes only and is not intended for use in developing commercial applications. The primary use case is as a research tool to enable academic and industry researchers to push the state of the art in the space of procedural task assistance at the intersection of mixed reality and AI. As such, the system has been open-sourced under a research-only license (opens in new tab). Researchers that wish to make use of SIGMA in their own work should first familiarize themselves with the system and its limitations and risks involved with using the system in a user-study context and should undergo a full IRB or ethical board review as appropriate for their institution. Limitations, risks and additional considerations for using the system are described in a Transparency Note (opens in new tab) available in SIGMA’s open-source repository (opens in new tab).

The post SIGMA: An open-source mixed-reality system for research on physical task assistance appeared first on Microsoft Research.

Solution overview

Prerequisites

IAM Identity Center instance

Identity source

AWS IAM Identity Center instance configured with Okta as the identity source

Configure an Amazon Q Business application with IAM Identity Center enabled

Employee AI assistant use case

Clean up

Conclusion

About the Authors

Solution overview

Prerequisites

Create a new Post call analytics job

Review the transcription and summary

Conclusion

Learn more:

About the Authors

Amazon Q Developer – your assistant for the entire software development lifecycle

Amazon Q Business empowers employees to be more creative, data-driven, efficient, prepared, and productive

Generative BI allows analysts to build detailed dashboards in minutes and business users to get insights fast

New pricing including Amazon Q Developer Free Tier

Check out the following resources to learn more about this announcement:

About the author

Amazon Q Business unites more data sources than any other generative AI assistant available today

Built from the ground up with security and privacy in mind

Generative BI allows analysts and business users to build detailed dashboards in minutes

First-of-its-kind capability that helps every employee go from conversation to generative AI-powered app in seconds

About the Authors

Large Language Models on Mobile

Supported Models

Productivity

Partnerships

Alpha and Production Usage

Community

Microsoft research copilot experience

Paper highlights

Complete list of accepted publications by Microsoft researchers

Conference organizers from Microsoft

Program Co-Chair

Submission Chairs

Program Committee

External Review Committee

Career opportunities

Metaflow overview

How Metaflow integrates with Trainium

Benefits of using Trainium with Metaflow

Infrastructure accessibility

Data, model, and configuration management

Observability

Multi-node compute

Solution overview

Deploy and configure Metaflow

Deployment

Configuration

Deploy a Trainium compute environment

Prepare a base Docker image to run Metaflow tasks

Clone the repository on your workstation

Verify the infrastructure with an allreduce example

Configure and run any Neuron distributed code

Clean up

Conclusion

About the authors

What are Cohere Command R and Command R+?

What is SageMaker JumpStart

Discover models

Deploy a model

Real-time inference

Multilingual capabilities

Chat with documents (RAG)

Request:

Response:

Single-Step & Multi-Step Tool Use

Clean Up

Conclusion

About the authors

Physical and social intelligence

Situated Interactive Guidance, Monitoring, and Assistance

Enabling research at the intersection of AI and mixed reality

Acknowledgements / Contributors

Responsible AI considerations