Deliver your first ML use case in 8–12 weeks

Deliver your first ML use case in 8–12 weeks

Do you need help to move your organization’s Machine Learning (ML) journey from pilot to production? You’re not alone. Most executives think ML can apply to any business decision, but on average only half of the ML projects make it to production.

This post describes how to implement your first ML use case using Amazon SageMaker in just 8–12 weeks by leveraging a methodology called Experience-based Acceleration (EBA).

Challenges

Customers may face several challenges when implementing machine learning (ML) solutions.

  • You may struggle to connect your ML technology efforts to your business value proposition, making it difficult for IT and business leadership to justify the investment it requires to operationalize models.
  • You may often select low-value use cases as proof of concept rather than solving a meaningful business or customer problem.
  • You may have gaps in skills and technologies, including operationalizing ML solutions, implementing ML services, and managing ML projects for rapid iterations.
  • Ensuring data quality, governance, and security may slow down or stall ML projects.

Solution overview: Machine Learning Experience-based Acceleration (ML EBA)

Machine learning EBA is a 3-day, sprint-based, interactive workshop (called a party) that uses SageMaker to accelerate business outcomes by guiding you through an accelerated and a prescriptive ML lifecycle. It starts with identifying business goals and ML problem framing, and takes you through data processing, model development, production deployment, and monitoring.

The following visual illustrates a sample ML lifecycle.

Sample Machine Learning Lifecycle

Two primary customer scenarios apply. The first is by using low-code or no-code ML services such as Amazon SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon SageMaker Autopilot, and Amazon SageMaker JumpStart to help data analysts prepare data, build models, and generate predictions. The second is by using SageMaker to help data scientists and ML engineers build, train, and deploy custom ML models.

We recognize that customers have different starting points. If you’re starting from scratch, it’s often simpler to begin with low-code or no-code solutions and gradually transition to developing custom models. In contrast, if you have an existing on-premises ML infrastructure, you can begin directly by using SageMaker to alleviate challenges with your current solution.

Through ML EBA, experienced AWS ML subject matter experts work side by side with your cross-functional team to provide prescriptive guidance, remove blockers, and build organizational capability for a continued ML adoption. This party steers you to solve a compelling business problem as opposed to thinking in terms of data and ML technology environments. Additionally, the party gets you started on driving material business value from untapped data.

ML EBA helps you to think big, start small, and scale fast. Although it creates a minimum viable ML model in 3 days, there are 4–6 weeks of preparation leading up to the EBA. Furthermore, you spend 4–6 weeks post-EBA to fine-tune the model with additional feature engineering and hyperparameter optimization before production deployment.

Let’s dive into what the whole process looks like and how you can use the ML EBA methodology to address the common blockers.

EBA prep (4–6 weeks)

In this section, we detail the 4–6 weeks of preparation leading up to the EBA.

6 weeks before the party: Problem framing and qualification

The first step is to frame and qualify the ML problem, which includes the following:

  • Identify the right business outcome – You must have a clear understanding of the problem you are trying to solve and the desired outcome you hope to achieve through the use of ML. You must be able to measure the business value gained against specific objectives and success criteria. Furthermore, you must be able to identify what should be observed, and what should be predicted. AWS works with you to help answer the following important questions before embarking on the ML EBA:
    • Does the ML use case solve a meaningful business problem?
    • Is it important enough to get the attention of business leadership?
    • Do you already have data to solve the ML use case?
    • Can the use case eventually be operationalized into production?
    • Does it really require ML?
    • Are there organizational processes in place for the business to use the model’s output?

The AI Use Case Explorer is a good starting point to explore the right use cases by industry, business function, or desired business outcome and discover relevant customer success stories.

  • Executive sponsorship – To help you move faster than you would have organically, AWS meets with the executive sponsor to confirm buy-in, remove internal obstacles, and commit resources. Additionally, AWS can offer financial incentives to help offset the costs for your first ML use case.
  • Meeting you where you are at in your ML journey – AWS assesses your current state—people, process, and technology. We help you detail requirements and dependencies; specifically, what teams and data are required to begin the journey successfully. Additionally, we provide recommendations on the technical path: starting with low-code or no-code services, or building a custom model using SageMaker.

5 weeks before the party: Workstream configuration and transition into action

The next step is to identify the teams needed to support the EBA effort. Commonly, the work is split up between the following workstreams:

  • Cloud engineering (infrastructure and security) – Focuses on verifying that the AWS accounts and infrastructure are set up and secure ahead of EBA. This includes AWS Identity and Access Management (IAM) or single sign-on (SSO) access, security guardrails, Amazon SageMaker Studio provisioning, automated stop/start to save costs, and Amazon Simple Storage Service (Amazon S3) set up.
  • Data engineering – Identifies the data sources, sets up data ingestion and pipelines, and prepares data using Data Wrangler.
  • Data science – The heart of ML EBA and focuses on feature engineering, model training, hyperparameter tuning, and model validation.
  • MLOps engineering – Focuses on automating the DevOps pipelines for operationalizing the ML use case. This may often be the same team as cloud engineering.
  • Leadership team – Responsible for orchestrating the effort, removing blockers, aligning with the executive sponsors, and is ultimately accountable for delivering the expected outcomes.

After these efforts have been completed, we must transition into action. A standard baseline 4-week timeline should be strictly adhered to make sure the EBA stays on track. Experienced AWS subject matter experts will guide and coach you through this preparation leading up to the EBA party.

4 weeks before the party: Inspire builders and curate a technical plan

Every customer is different; AWS helps you curate a technical plan of activities to be completed in the next 4 weeks leading up to the party.

AWS conducts Immersion Days to inspire your builders and build momentum for the party. An Immersion Day is a half or full day workshop with the right mix of presentation, hands-on labs, and Q&A to introduce AWS services or solutions. AWS will help you select the right Immersion Days from the AI/ML Workshops catalog.

We recognize that every builder in your organization is at a different level. We recommend that your builders use the ML ramp-up guide resources or digital or classroom training to start where they are at and build the necessary skills for the party.

3 weeks before the party: Tech prep focused on cloud and data engineering

Your cloud and data engineering teams should work on the following with guidance from AWS:

  • Create AWS accounts with network and security set up
  • Set up Amazon SageMaker Studio
  • Create Amazon S3 buckets to store data
  • Identify data sources (or producers)
  • Integrate external sources to dump data into S3 buckets

2 weeks before the party: Tech prep focused on data science

Your data science team should work on the following with guidance from AWS:

1 week before the party: Assess readiness (go/no-go)

AWS works with you to assess go/no-go readiness for technical activities, skills, and momentum for the party. Then we solidify the scope for the 3-day party, prioritizing progress over perfection.

EBA (3-day party)

Although the EBA party itself is customized for your organization, the recommended agenda for the 3 days is shown in the following table. You will learn by doing during the EBA with guidance from AWS subject matter experts.

. Day 1 Day 2 Day 3
Data Science

AM: Try AutoPilot or JumpStart models.

PM: Pick 1–2 models based on AutoPilot outcomes to experiment further.

Improve model accuracy:

  • In-depth feature engineering (example, PCA)
  • Hyperparameter optimization (HPO)

Quality assurance and validation with test data.

Deploy to production (inference endpoint).

Monitoring setup (model, data drift).

Data Engineering Explore using feature store for future ML use cases. Create a backlog of items for data governance and associated guardrails.
Cloud/MLOps Engineering Evaluate the MLOps framework solution library. Assess if this can be used for a repeatable MLOps framework. Identify gaps and create a backlog of things to enhance the solution library or create your own MLOps framework. Implement backlog items to create a repeatable MLOps framework. Continue implementing backlog items to create a repeatable MLOps framework.

Post-EBA

ML involves extensive experimentation, and it’s common to not reach your desired model accuracy during the 3-day EBA. Therefore, creating a well-defined backlog or path to production is essential, including improving model accuracy through experimentation, feature engineering, hyperparameter optimization, and production deployment. AWS will continue to assist you through production deployment.

Conclusion

By complementing ML EBA methodology with SageMaker, you can achieve the following results:

  • Move from pilot to production value in 8-12 weeks – Bring together business and technology teams to deploy the first ML use case to production in 8-12 weeks.
  • Build the organizational capability to speed up and scale ML across lines of business – The ML EBA inspires and up-skills builders with real work experience. It establishes a successful working model (a collaboration and iteration model) to sustain and scale ML initiatives across lines of business. It also creates reusable assets to speed up and scale ML in a repeatable way.
  • Reduce technical debt, pain points, and cost from existing on-premises ML models – The on-premises solutions may have challenges related to higher costs, inability to scale infrastructure, undifferentiated infrastructure management, and lack of advanced feature sets such as hyperparameter optimization, explainability for predictions, and more. Adoption of AWS ML services such as SageMaker reduces these issues.

Contact your AWS account team (Account Manager or Customer Solutions Manager) to learn more and get started.


About the Authors

Ritesh Shah is Senior Customer Solutions Manager at Amazon Web Services. He helps large US-Central enterprises accelerate their cloud-enabled transformation and build modern cloud-native solutions. He is passionate about accelerating customers’ ML journeys. In his free time, Ritesh enjoys spending time with his daughter, cooking, and learning something new, while also evangelizing cloud and ML. Connect with him on LinkedIn.

Nicholaus Lawson is a Solution Architect at AWS and part of the AIML specialty group. He has a background in software engineering and AI research. Outside of work, Nicholaus is often coding, learning something new, or woodworking. Connect with him on LinkedIn.

Read More

Collaborative Machine Learning Model Building with Families Using Co-ML

Existing novice-friendly machine learning (ML) modeling tools center around a solo user experience, where a single user collects only their own data to build a model. However, solo modeling experiences limit valuable opportunities for encountering alternative ideas and approaches that can arise when learners work together; consequently, it often precludes encountering critical issues in ML around data representation and diversity that can surface when different perspectives are manifested in a group-constructed data set. To address this issue, we created Co-ML – a tablet-based app for learners…Apple Machine Learning Research

Research Focus: Week of April 24, 2023

Research Focus: Week of April 24, 2023

Microsoft Research Focus 14 edition, week of April 24, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

AWARD

Microsoft researcher Kalai awarded 2022 ACM Prize in Computing

Yael Tauman Kalai, a senior principal researcher at Microsoft Research, has been awarded the 2022 ACM Prize in Computing. Kalai was recognized for breakthroughs in verifiable delegation of computation and fundamental contributions to cryptography. According to the award announcement, “Kalai’s contributions have helped shape modern cryptographic practices and provided a strong foundation for further advancements.”

The ACM Prize in Computing recognizes early-to-mid-career computer scientists whose research contributions have fundamental impact and broad implications.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

Among the multiple accomplishments cited for the award, Kalai has developed methods for producing succinct proofs that certify the correctness of any computation. This method enables a weak device to offload any computation to a stronger device in a way that enables the results to be efficiently checked for correctness. Such succinct proofs have been used by blockchain companies to certify transaction validity, thereby overcoming key obstacles in blockchain scalability and enabling faster and more reliable transactions.

Kalai was also cited for her breakthrough work on the security of the “Fiat-Shamir paradigm,” a general technique for eliminating interaction from interactive protocols. This paradigm is extensively utilized in real-world applications, including the most prevalent digital signature scheme (ECDSA), which is used by all iOS and Android mobile devices.


NEW RESEARCH

Empowering Azure Storage with RDMA

High performance and highly reliable storage are fundamental requirements of public clouds. Given the wide adoption of disaggregated storage in the cloud, networking is essential for enabling high performance and high reliability. Microsoft’s Azure cloud service uses remote direct memory access (RDMA) as its transport and aims to enable it for both storage frontend traffic (between compute virtual machines and storage clusters) and backend traffic (within a storage cluster) to fully realize its benefits. As compute and storage clusters may be located in different datacenters within an Azure region, RDMA needs to be supported at regional scale.

In a new paper: Empowering Azure Storage with RDMA, Microsoft Azure and Microsoft Research report on their intra-region RDMA deployment to support storage workloads in Azure. The high complexity and heterogeneity of Azure infrastructure creates challenges, such as the problem of interoperability between different types of RDMA network interface cards. Several changes were made to the network infrastructure to address these challenges. Today, around 70% of traffic in Azure is RDMA and intra-region RDMA is supported in all Azure public regions. This helps achieve significant disk I/O performance improvements and CPU core savings.


NEW RESEARCH

LIDA: Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models

Systems that support users in the automatic creation of visualizations must address several subtasks—understand the semantics of data; enumerate relevant visualization goals; and generate visualization specifications. In a new paper: LIDA: Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models, researchers from Microsoft pose visualization generation as a multi-stage generation problem and argue that well-orchestrated pipelines based on large language models (LLMs) and image generation models (IGMs) are suitable to addressing these tasks.

LIDA is a novel tool for generating grammar-agnostic visualizations and infographics. It is comprised of four modules—a summarizer that converts data into a rich but compact natural language summary; a goal explorer that enumerates visualization goals given the data; a visgenerator that generates, evaluates, refines, executes, and filters visualization code; and an infographer module that yields data-faithful stylized graphics using IGMs. LIDA provides a python API and a hybrid user interface (direct manipulation and multilingual natural language) for interactive chart, infographics and data story generation.


NEW RELEASE

Announcing DeepSpeed-Chat: Easy, fast, affordable RLHF Training of ChatGPT-like models at all scales

Microsoft’s AI at Scale initiative has released DeepSpeed-Chat, an easy, fast, and low-cost open-source solution for reinforcement learning from human feedback (RLHF) training that can create high-quality ChatGPT-like models ranging in size from a few to hundreds of billions of parameters. DeepSpeed-Chat provides complete RLHF training experience with a single click. It combines the prowess of DeepSpeed-Inference and DeepSpeed-Training to offer 15x faster throughput than the previous state of the art, while also supporting model sizes that are up to 8x larger on the same hardware. With DeepSpeed-Chat, practitioners can train an OPT-13B ChatGPT-like model in under 1.5 hours or a massive 175B model in a day on a modest GPU cluster. For those who don’t have a GPU cluster handy, DeepSpeed-Chat enables practitioners to train up to a 13B model on a single GPU, or at $300 to train on Azure Cloud. 


NEWS

Gov4git: Decentralized community governance to fuel open-source projects

Communal open-source projects have helped build countless applications for sourcing and sharing information like bug details and scientific data, as well as decentralized planning, design and policymaking. 

But the lack of a standardized and secure governance solution prevents many open-source projects from getting started—and holds them back when they get too big to be managed through ad-hoc methods. These small communities often resort to external mechanisms to manage their projects and protect them from malicious actors.

Microsoft Research and Protocol Labs, an open-source R&D company, are collaborating to develop Gov4git, a decentralized, git-native protocol with configurable governance rules to help launch more open-source projects and communities and support their growth.

Gov4git comes with many of the transparency, decentralization, and security benefits of blockchains while also harnessing the power of formal governance to avoid costly approaches to validation and dispute resolution. 

Git is the worldwide standard for version control and management of collaborative software development projects. Gov4git is designed as a secure and cost-effective framework solution which can be tailored to the specific needs of any one community and deployed by non-technical users anywhere where access to git is present. Gov4git can strengthen the security of such communities against the risks of malicious actors posing as collaborators with the intent to negatively impact community maintenance.

The post Research Focus: Week of April 24, 2023 appeared first on Microsoft Research.

Read More

Viral NVIDIA Broadcast Demo Drops Hammer on Imperfect Audio This Week ‘In the NVIDIA Studio’

Viral NVIDIA Broadcast Demo Drops Hammer on Imperfect Audio This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows.

Content creators in all fields can benefit from free, AI-powered technology available from NVIDIA Studio.

The Studio platform delivers RTX acceleration in over 110 popular creative apps plus an exclusive suite of AI-powered Studio software. NVIDIA Omniverse interconnects 3D workflows, Canvas turns simple brushstrokes into realistic landscape images and RTX Remix helps modders create stunning RTX remasters of classic PC games.

Spotlighted by this week’s In the NVIDIA Studio featured artist Unmesh Dinda, NVIDIA Broadcast transforms the homes, apartments and dorm rooms of content creators, livestreamers and people working from home through the power of AI — all without the need for specialized equipment.

Host of the widely watched YouTube channel PiXimperfect, Dinda takes the noise-canceling and echo-removal AI features in Broadcast to extremes. He turned the perfect demo into a viral hit faster, powered by RTX acceleration in his go-to video-editing software, Adobe Premiere Pro.

It’s Hammer Time

NVIDIA Broadcast has several popular features, including visual background, autoframing, video noise removal, eye contact and vignette effects.

Two of the most frequently used features, noise and echo removal, caught the attention of Dinda, who saw Broadcast’s potential and wanted to show creators how to instantly improve their content.

The foundation of Dinda’s tutorial style came from his childhood. “My father would sit with me every day to help me with schoolwork,” he said. “He always used to explain with examples which were crystal clear to me, so now I do the same with my channel.”

Dinda contemplated how to demonstrate this incredible technology in a quick, relatable way.

“Think of a crazy idea that grabs attention instantly,” said Dinda. “Concepts like holding a drill in both hands or having a friend play drums right next to me.”

Dinda took the advice of famed British novelist William Golding, who once said, “The greatest ideas are the simplest.” Dinda’s final concept ended up as a scene of a hammer hitting a helmet on his head.

It turns out that seeing — and hearing — is believing.

Even with an electric fan whirring directly into his microphone and intense hammering on his helmet, Dinda can be heard crystal clear with Broadcast’s noise-removal feature turned on. To help emphasize the sorcery, Dinda briefly turns the feature off in the demo to reveal the painful sound his viewers would hear without it.

The demo launched on Instagram a few months ago and went viral overnight. Across social media platforms, the video now has over 12 million views and counting.

Dinda wasn’t harmed in the making of this video.

Views are fantastic, but the real gratification of Dinda’s work comes from a genuine desire to improve his followers’ skillsets, he said.

“The biggest inspiration comes from viewers,” said Dinda. “When they comment, message or meet me at an event to say how much the content has helped their career, it inspires me to create more and reach more creatives.”

 

Learn more and download Broadcast, free for all GeForce RTX GPU owners.

Hammer Out the Details

Dinda uses Adobe Premiere Pro to edit his videos, and his GeForce RTX 3080 Ti plays a major part in accelerating his creative workflow.

“I work with and render high-resolution videos on a daily basis, especially with Adobe Premiere Pro. Having a GPU like the GeForce RTX 3080 Ti helps me render and publish in time.” — Unmesh Dinda

He uses the GPU-accelerated decoder, called NVDEC, to unlock smooth playback and scrubbing of the high-resolution video footage he often works in.

As his hammer-filled Broadcast demo launched on several social media platforms, Dinda had the option to deploy the AI-powered, RTX-accelerated auto reframe feature. It automatically and intelligently tracks objects, and crops landscape video to social-media-friendly aspect ratios, saving even more time.

Dinda also used Adobe Photoshop to add graphical overlays to the video. With more than 30 GPU-accelerated features at his disposal — such as super resolution, blur gallery, object selection, smart sharpen and perspective warp — he can improve and adjust footage, quickly and easily.

 

Dinda used the GPU-accelerated NVIDIA encoder, aka NVENC, to speed up video exports up to 5x faster with his RTX GPU, leading to more time saved on the project.

Though he’s a full-time, successful video creator, Dinda stressed, “I have a normal life outside Adobe Photoshop, I promise!”

Streamer Unmesh Dinda.

Check out Dinda’s PiXimperfect channel, a free resource for learning Adobe Photoshop — another RTX-accelerated Studio app.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Read More

The Future of Intelligent Vehicle Interiors: Building Trust with HMI & AI

The Future of Intelligent Vehicle Interiors: Building Trust with HMI & AI

Imagine a future where your vehicle’s interior offers personalized experiences and builds trust through human-machine interfaces (HMI) and AI. In this episode of the NVIDIA AI Podcast, Andreas Binner, chief technology officer at Rightware, delves into this fascinating topic with host Katie Burke Washabaugh.

Rightware is a Helsinki-based company at the forefront of developing in-vehicle HMI. Its platform, Kanzi, works in tandem with NVIDIA DRIVE IX to provide a complete toolchain for designing personalized vehicle interiors for the next generation of transportation, including detailed visualizations of the car’s AI.

Binner touches on his journey into automotive technology and HMI, the evolution of infotainment in the automotive industry over the past decade, and surprising trends in HMI. They explore the influence of AI on HMI, novel AI-enabled features and the importance of trust in new technologies.

Other topics include the role of HMI in fostering trust between vehicle occupants and the vehicle, the implications of autonomous vehicle visualization, balancing larger in-vehicle screens with driver distraction risks, additional features for trust-building between autonomous vehicles and passengers, and predictions for intelligent cockpits in the next decade.

Tune in to learn about the innovations that Rightware’s Kanzi platform and NVIDIA DRIVE IX bring to the automotive industry and how they contribute to developing intelligent vehicle interiors.

Read more on the NVIDIA Blog:  NVIDIA DRIVE Ecosystem Creates Pioneering In-Cabin Features With NVIDIA DRIVE IX

You Might Also Like

Driver’s Ed: How Waabi Uses AI, Simulation to Teach Autonomous Vehicles to Drive

Teaching the AI brains of autonomous vehicles to understand the world as humans do requires billions of miles of driving experience. The road to achieving this astronomical level of driving leads to the virtual world. Learn how Waabi uses powerful high-fidelity simulations to train and develop production-level autonomous vehicles.

Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Driving enjoyment and autonomous driving capabilities can complement one another in intelligent, sustainable vehicles. Learn about the automaker’s plans to unveil its third vehicle, the Polestar 3, the tech inside it, and what the company’s racing heritage brings to the intersection of smarts and sustainability.

GANTheftAuto: Harrison Kinsley on AI-Generated Gaming Environments

Humans playing games against machines is nothing new, but now computers can develop their own games for people to play. Programming enthusiast and social media influencer Harrison Kinsley created GANTheftAuto, an AI-based neural network that generates a playable chunk of the classic video game Grand Theft Auto V.

Subscribe to the AI Podcast: Now Available on Amazon Music

The AI Podcast is now available through Amazon Music.

In addition, get the AI Podcast through iTunes, Google Podcasts, Google Play, Castbox, DoggCatcher, Overcast, PlayerFM, Pocket Casts, Podbay, PodBean, PodCruncher, PodKicker, Soundcloud, Spotify, Stitcher and TuneIn.

Make the AI Podcast better: Have a few minutes to spare? Fill out this listener survey.

Read More

Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes

Run your local machine learning code as Amazon SageMaker Training jobs with minimal code changes

We recently introduced a new capability in the Amazon SageMaker Python SDK that lets data scientists run their machine learning (ML) code authored in their preferred integrated developer environment (IDE) and notebooks along with the associated runtime dependencies as Amazon SageMaker training jobs with minimal code changes to the experimentation done locally. Data scientists typically carry out several iterations of experimentation in data processing and training models while working on any ML problem. They want to run this ML code and carry out the experimentation with ease of use and minimal code change. Amazon SageMaker Model Training helps data scientists run fully managed large-scale training jobs on AWS’s compute infrastructure. SageMaker Training also helps data scientists with advanced tools such as Amazon SageMaker Debugger and Profiler to debug and analyze their large-scale training jobs.

For customers with small budgets, small teams, and tight timelines, every single new concept and line of code rewritten to run on SageMaker makes them less productive towards their core tasks, namely data processing and training ML models. They want to write code once in the framework of their choice and be able to move seamlessly from running code in their notebooks or laptops to running code at scale using SageMaker capabilities.

With this new capability of the SageMaker Python SDK, data scientists can onboard their ML code to the SageMaker Training platform in a few minutes. You just need to add a single line of code to your ML code, and SageMaker intelligently comprehends your code along with the datasets and workspace environment setup and runs it as a SageMaker Training job. You can then take advantage of the key capabilities of the SageMaker Training platform, like the ability to scale jobs easily, and other associated tools like Debugger and Profiler. In this release, you can run your local machine learning (ML) Python code as a single-node Amazon SageMaker training job or multiple parallel jobs. Distributed training jobs(across multiple nodes) are not supported by remote functions.

In this post, we show you how to use this new capability to run local ML code as a SageMaker Training job.

Solution overview

You can now run your ML code written in your IDE or notebook as a SageMaker Training job by annotating the function, which acts as an entry point to the user’s code base, with a simple decorator. Upon invocation, this capability automatically takes a snapshot of all the associated variables, functions, packages, environment variables, and other runtime requirements from your ML code, serializes them, and submits them as a SageMaker Training job. It integrates with the recently announced SageMaker Python SDK feature for setting default values for parameters. This capability simplifies the SageMaker constructs that you need to learn to be able to run code using SageMaker Training. Data scientists can write, debug, and iterate their code in any preferred IDE (such as Amazon SageMaker Studio, notebooks, VS Code, or PyCharm). When ready, you can annotate your Python function with the @remote decorator and run it as a SageMaker job at scale.

This capability takes familiar open-source Python objects as arguments and outputs. Furthermore, you don’t need to understand container lifecycle management and can simply run your workloads across different compute contexts (such as a local IDE, Studio, or training jobs) with minimal configuration overheads. To run any local code as a SageMaker Training job, this capability infers the configurations required to run jobs, such as the AWS Identity and Access Management (IAM) role, encryption key, and network configuration, from the Studio or IDE settings (which can be the default settings) and passes them to the platform by default. You have the flexibility to customize your runtime in the SageMaker managed infrastructure using the inferred configuration or override them at the SDK-level by passing them as arguments to the decorator.

This new capability of the SageMaker Python SDK transforms your ML code in an existing workspace environment and any associated data processing code and datasets into a SageMaker Training job. This capability looks for ML code wrapped inside a @remote decorator and automatically translates it into a job that runs in either Studio or a local IDE such as PyCharm.

In the following sections, we walk through the features of this new capability and how to launch python functions as SageMaker Training jobs.

Prerequisites

To use this new SageMaker Python SDK capability and run the code associated with this post, you need the following prerequisites:

  • An AWS account that will contain all your AWS resources
  • An IAM role to access SageMaker
  • Access to Studio or a SageMaker notebook instance or an IDE such as PyCharm

Use the SDK from Studio and SageMaker notebooks

You can use this capability from Studio by launching a notebook and wrapping your code with a @remote decorator inside the notebook. You first need to import the remote function using the following code:

from sagemaker.remote_function import remote

When you use the decorator function, this capability will automatically interpret the function of your code and run it as a SageMaker Training job.

You can also use this capability from a SageMaker notebook instance. You first need to start a notebook instance, open Jupyter or Jupyter Lab on it, and launch a notebook. Then import the remote function as shown in the preceding code and wrap your code with the @remote decorator. We include an example of how to use the decorator function and the associated settings later in this post.

Use the SDK from your local environment

You can also use this capability from your local IDE. As a prerequisite, you must have the AWS Command Line Interface (AWS CLI), SageMaker Python SDK, and AWS SDK for Python (Boto3) installed in your local environment. You need to import these libraries in your code, set the SageMaker session, specify settings, and decorate your function with the @remote decorator. In the following example code, we run a simple divide function as a SageMaker Training job:

import boto3
import sagemaker
from sagemaker.remote_function import remote

sm_session = sagemaker.Session(boto_session=boto3.session.Session(region_name="us-west-2"))
settings = dict(
    sagemaker_session=sm_session,
    role=<IAM_ROLE_NAME>
    instance_type="ml.m5.xlarge",
)
@remote(**settings)
def divide(x, y):
    return x / y
if __name__ == "__main__":
    print(divide(2, 3.0))

We can use a similar methodology to run advanced functions as training jobs, as shown in the next section.

Launch Python functions as SageMaker jobs

The new SageMaker Python SDK feature allows you to run Python functions as SageMaker Training jobs. Any Python code, ML training code developed by data scientists using their preferred local IDEs (PyCharm, VS Code), SageMaker notebooks, or Studio notebooks can be launched as a managed SageMaker job.

In ML workloads using this capability, associated datasets, dependencies, and workspace environment setups are serialized using the ML code and run as a SageMaker job synchronously and asynchronously.

You can add a @remote decorator annotation to any Python code including a local ML processing or training function to launch it as a managed SageMaker Training job, thereby taking advantage of the scale, performance, and cost benefits of SageMaker. This can be achieved with minimal code changes by adding a decorator to the Python function code. Invocation to the decorated function is run synchronously, and the function run waits until the SageMaker job is complete.

In the following example, we use the @remote decorator to launch SageMaker jobs in decorator mode using an ml.m5.large instance. SageMaker uses training jobs to launch this function as a managed job.

from sagemaker.remote_function import remote
from numpy as np

@remote(instance_type="ml.m5.large")
def matrix_multiply(a, b):
    return np.matmul(a, b)

a = np.array([[1, 0], [0, 1]])
b = np.array([1, 2])

assert matrix_multiply(a, b) == np.array([1,2])

You can also use decorator mode to launch SageMaker jobs, Python packages, and dependencies. You can include environment variables such as VPC, subnets, and security groups to launch SageMaker training jobs in the environment.yml file. This allows ML engineers and admins to configure these environment variables so data scientists can focus on ML model building and iterate faster. See the following code:

from sagemaker.remote_function import remote

@remote(instance_type="ml.g4dn.xlarge",dependencies = "./environment.yml")
def train_hf_model(
    train_input_path,test_input_path,s3_output_path = None,
    *,epochs = 1, train_batch_size = 32, eval_batch_size = 64,
    warmup_steps = 500,learning_rate = 5e-5
    ):  
    model_name = "distilbert-base-uncased"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    ... <TRUCNATED>
    return os.path.join(s3_output_path, model_dir), eval_result

You can use RemoteExecutor to launch Python functions as SageMaker jobs asynchronously. The executor asynchronously polls SageMaker Training jobs to update the status of the job. The RemoteExecutor class is an implementation of the concurrent.futures.Executor, which is used to submit SageMaker Training jobs asynchronously. See the following code:

from sagemaker.remote_function import RemoteExecutor

def train_hf_model(
    train_input_path,test_input_path,s3_output_path = None,
    *,epochs = 1, train_batch_size = 32, eval_batch_size = 64,
    warmup_steps = 500,learning_rate = 5e-5
    ):  
    model_name = "distilbert-base-uncased"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    ...<TRUNCATED>
    return os.path.join(s3_output_path, model_dir), eval_result


with RemoteExecutor(instance_type="ml.g4dn.xlarge", dependencies = './requirements.txt') as e:
    future = e.submit(divide, train_input_path,test_input_path,s3_output_path,
                      epochs, train_batch_size, eval_batch_size,warmup_steps,learning_rate)

Customize the runtime environment

Decorator mode and RemoteExecutor allow you to define and customize the runtime environments for the SageMaker job. The runtime dependencies, including Python packages and environment variables for SageMaker jobs, can be specified to customize the runtime. In order to run local Python code as SageMaker managed jobs, the Python package and dependencies need to be made available to SageMaker. ML engineers or data science administrators can configure networking and security configurations such as VPC, subnets, and security groups for SageMaker jobs, so data scientists can use these centrally managed configurations while launching SageMaker jobs. You can use either a requirements.txt file or a Conda environment.yaml file.

When dependencies are defined with requirements.txt, the packages will be installed using pip in the job runtime. If the image used for running the job comes with Conda environments, packages will be installed in the Conda environment declared to use for jobs. The following code shows an example requirements.txt file:

datasets
transformers
torch
scikit-learn
s3fs==0.4.2
sagemaker>=2.148.0

You can pass your Conda environment.yaml file to create the Conda environment you would like your code to run in during the training job. If the image used for running the job declares a Conda environment to run the code under, we will update the declared Conda environment with the given specification. The following code is an example of a Conda environment.yaml file:

name: sagemaker_example
channels:
  - conda-forge
dependencies:
  - python=3.10
  - pandas
  - pip:
      - sagemaker

Alternatively, you can set dependencies=”auto_capture” to let the SageMaker Python SDK capture the installed dependencies in the active Conda environment. You must have an active Conda environment for auto_capture to work. Note that there are prerequisites for auto_capture to work; we recommend that you pass in your dependencies as a requirement.txt or Conda environment.yml file as described in the previous section.

For more details, refer to Run your local code as a SageMaker Training job.

Configurations for SageMaker jobs

Infrastructure-related settings can be offloaded to a configuration file that admin users could help set up. You only need to set it up one time. Infrastructure settings cover the network configuration, IAM roles, Amazon Simple Storage Service (Amazon S3) folder for input, output data, and tags. Refer to Configuring and using defaults with the SageMaker Python SDK for more details.

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        Dependencies: path/to/requirements.txt
        EnvironmentVariables: {"EnvVarKey": "EnvVarValue"}
        ImageUri: 366666666666.dkr.ecr.us-west-2.amazonaws.com/my-image:latest
        InstanceType: ml.m5.large
        RoleArn: arn:aws:iam::366666666666:role/MyRole
        S3KmsKeyId: somekmskeyid
        S3RootUri: s3://my-bucket/my-project
        SecurityGroupIds:
          - sg123
        Subnets:
          - subnet-1234
        Tags:
          - {"Key": "someTagKey", "Value": "someTagValue"}
        VolumeKmsKeyId: somekmskeyid

Implementation

Deep learning models like PyTorch or TensorFlow can also be run within Studio by running the code as a training job within the notebook. To showcase this capability in Studio, you can clone this repo into your Studio and run the notebook located in the GitHub repository.

This example demonstrates an end-to-end binary text classification use case. We are using the Hugging Face transformers and datasets library to fine-tune a pre-trained transformer on binary text classification. In particular, the pre-trained model will be fine-tuned using the IMDb dataset.

When you clone the repository, you should locate the following files:

  • config.yaml – Most of the decorator arguments can be offloaded to the configuration file in order to separate out the infrastructure-related settings from the code base
  • huggingface.ipynb – This contains the code to train a pre-trained HuggingFace model, which will be fine-tuned using the IMDB dataset
  • requirements.txt – This file contains all the dependencies to run the function that will be used in this notebook for running the code and running the training remotely in a GPU instance as a training job

When you open the notebook, you will be prompted to set up the notebook environment. You can select the Data Science 3.0 image with the Python 3 kernel and ml.m5.large as the fast launch instance type for running the notebook code. This instance type is significantly faster in spinning up an environment.

The training job will be run in an ml.g4dn.xlarge instance as defined in the config.yaml file:

SchemaVersion: '1.0'
SageMaker:
  PythonSDK:
    Modules:
      RemoteFunction:
        # role arn is not required if in SageMaker Notebook instance or SageMaker Studio
        # Uncomment the following line and replace with the right execution role if in a local IDE
        # RoleArn: <IAM_ROLE_ARN>
        InstanceType: ml.g4dn.xlarge
        Dependencies: ./requirements.txt

The requirements.txt file dependencies to run the function for training the Hugging Face model include the following:

datasets
transformers
torch
scikit-learn
# lock s3fs to this specific version as more recent ones introduce dependency on aiobotocore, which is not compatible with botocore
s3fs==0.4.2
sagemaker>=2.148.0,<3

The Hugging Face notebook showcases how to run the training remotely via the @remote function, which is run synchronously. Therefore, the function run for training the model will wait until the SageMaker Training job is complete. The training will be run remotely with a GPU instance wherein the instance type is defined in the preceding configuration file.

from sagemaker.remote_function import remote

@remote(s3_root_uri=s3_root_folder, keep_alive_period_in_seconds=600)
def train_hf_model(
    train_input_path,
    test_input_path,
    s3_output_path = None,
    *,
    epochs = 1,
    train_batch_size = 32,
    eval_batch_size = 64,
    warmup_steps = 500,
    learning_rate = 5e-5
):  
    model_dir = 'model'

    train_dataset = load_from_disk(train_input_path, keep_in_memory=True)
    test_dataset = load_from_disk(test_input_path, keep_in_memory=True)
    
    model_name = 'distilbert-base-uncased'
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    
    training_args = TrainingArguments(
        output_dir=model_dir,
        num_train_epochs=epochs,
        per_device_train_batch_size=train_batch_size,
        per_device_eval_batch_size=eval_batch_size,
        warmup_steps=warmup_steps,
        evaluation_strategy="epoch",
        logging_dir="logs/",
        learning_rate=float(learning_rate),
    )

    # create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        eval_dataset=test_dataset,
        tokenizer=tokenizer,
    )
    
    print("Starting model training..")
    trainer.train()
        
    trainer.save_model(model_dir)

After you run the training job, you can run the rest of the cells in the notebook to inspect the evaluation metrics and classify the text on our trained model.

You can also view the training job status that got remotely triggered in the GPU instance on the SageMaker dashboard by navigating back to the SageMaker console.

As soon as the training job is complete, it continues to run the instructions in the notebook for evaluation and classification. Similar jobs can be trained and run via the remote executor function embedded within Studio notebooks to carry out the runs asynchronously.

Integration with SageMaker experiments inside a @remote function

You can pass your experiment name, run name, and other parameters into your remote function to create a SageMaker experiments run. The following code example imports the experiment name, the name of the run, and the parameters to log for each run:

from sagemaker.remote_function import remote
from sagemaker.experiments.run import Run
# Define your remote function
@remote
def train(value_1, value_2, exp_name, run_name):
...
...
#Creates the experiment
with Run(
  experiment_name=exp_name,
  run_name=run_name,
  sagemaker_session=sagemaker_session
) as run:
...
...
#Define values for the parameters to log
run.log_parameter("param_1", value_1)
run.log_parameter("param_2", value_2)
...
...
#Define metrics to log
run.log_metric("metric_a", 0.5)
run.log_metric("metric_b", 0.1)

# Invoke your remote function
train(1.0, 2.0, "my-exp-name", "my-run-name")  

In the preceding example, the parameters p1 and p2 are logged over time inside a training loop. Common parameters may include batch size or epochs. In the example, the metrics A and B are logged for a run over time inside a training loop. Common metrics may include accuracy or loss. For more information, see Create an Amazon SageMaker Experiment.

Conclusion

In this post, we introduced a new SageMaker Python SDK capability that enables data scientists to run their ML code in their preferred IDE as SageMaker Training jobs. We discussed the prerequisites needed to use this capability along with its features. We also showed how to use this capability in Studio, SageMaker notebook instances, and your local IDE. In addition, we provided sample code examples to demonstrate how to use this capability. As a next step, we recommend trying this capability in your IDE or SageMaker by following the code examples referenced in this post.


About the Authors

Dipankar Patro is a Software Development Engineer at AWS SageMaker, innovating and building MLOps solutions to help customers adopt AI/ML solutions at scale. He has an MS in Computer Science and his areas of interest are Computer Security, Distributed Systems and AI/ML.

Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

Manoj Ravi is a Senior Product Manager for Amazon SageMaker. He is passionate about building next-gen AI products and works on software and tools to make large-scale machine learning easier for customers. He holds an MBA from Haas School of Business and a Masters in Information Systems Management from Carnegie Mellon University. In his spare time, Manoj enjoys playing tennis and pursuing landscape photography.

Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Vikram Elango is a Sr. AI/ML Specialist Solutions Architect at AWS, based in Virginia, US. He is currently focused on generative AI, LLMs, prompt engineering, large model inference optimization, and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design and thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking, and camping.

Read More

LayerNAS: Neural Architecture Search in Polynomial Complexity

LayerNAS: Neural Architecture Search in Polynomial Complexity

Every byte and every operation matters when trying to build a faster model, especially if the model is to run on-device. Neural architecture search (NAS) algorithms design sophisticated model architectures by searching through a larger model-space than what is possible manually. Different NAS algorithms, such as MNasNet and TuNAS, have been proposed and have discovered several efficient model architectures, including MobileNetV3, EfficientNet.

Here we present LayerNAS, an approach that reformulates the multi-objective NAS problem within the framework of combinatorial optimization to greatly reduce the complexity, which results in an order of magnitude reduction in the number of model candidates that must be searched, less computation required for multi-trial searches, and the discovery of model architectures that perform better overall. Using a search space built on backbones taken from MobileNetV2 and MobileNetV3, we find models with top-1 accuracy on ImageNet up to 4.9% better than current state-of-the-art alternatives.

Problem formulation

NAS tackles a variety of different problems on different search spaces. To understand what LayerNAS is solving, let’s start with a simple example: You are the owner of GBurger and are designing the flagship burger, which is made up with three layers, each of which has four options with different costs. Burgers taste differently with different mixtures of options. You want to make the most delicious burger you can that comes in under a certain budget.

Make up your burger with different options available for each layer, each of which has different costs and provides different benefits.

Just like the architecture for a neural network, the search space for the perfect burger follows a layerwise pattern, where each layer has several options with different changes to costs and performance. This simplified model illustrates a common approach for setting up search spaces. For example, for models based on convolutional neural networks (CNNs), like MobileNet, the NAS algorithm can select between a different number of options — filters, strides, or kernel sizes, etc. — for the convolution layer.

Method

We base our approach on search spaces that satisfy two conditions:

  • An optimal model can be constructed using one of the model candidates generated from searching the previous layer and applying those search options to the current layer.
  • If we set a FLOP constraint on the current layer, we can set constraints on the previous layer by reducing the FLOPs of the current layer.

Under these conditions it is possible to search linearly, from layer 1 to layer n knowing that when searching for the best option for layer i, a change in any previous layer will not improve the performance of the model. We can then bucket candidates by their cost, so that only a limited number of candidates are stored per layer. If two models have the same FLOPs, but one has better accuracy, we only keep the better one, and assume this won’t affect the architecture of following layers. Whereas the search space of a full treatment would expand exponentially with layers since the full range of options are available at each layer, our layerwise cost-based approach allows us to significantly reduce the search space, while being able to rigorously reason over the polynomial complexity of the algorithm. Our experimental evaluation shows that within these constraints we are able to discover top-performance models.

NAS as a combinatorial optimization problem

By applying a layerwise-cost approach, we reduce NAS to a combinatorial optimization problem. I.e., for layer i, we can compute the cost and reward after training with a given component Si . This implies the following combinatorial problem: How can we get the best reward if we select one choice per layer within a cost budget? This problem can be solved with many different methods, one of the most straightforward of which is to use dynamic programming, as described in the following pseudo code:

while True:
	# select a candidate to search in Layer i
	candidate = select_candidate(layeri)
	if searchable(candidate):
		# Use the layerwise structural information to generate the children.
		children = generate_children(candidate)
		reward = train(children)
		bucket = bucketize(children)
		if memorial_table[i][bucket] < reward:
			memorial_table[i][bucket] = children
		move to next layer
Pseudocode of LayerNAS.
Illustration of the LayerNAS approach for the example of trying to create the best burger within a budget of $7–$9. We have four options for the first layer, which results in four burger candidates. By applying four options on the second layer, we have 16 candidates in total. We then bucket them into ranges from $1–$2, $3–$4, $5–$6, and $7–$8, and only keep the most delicious burger within each of the buckets, i.e., four candidates. Then, for those four candidates, we build 16 candidates using the pre-selected options for the first two layers and four options for each candidate for the third layer. We bucket them again, select the burgers within the budget range, and keep the best one.

Experimental results

When comparing NAS algorithms, we evaluate the following metrics:

  • Quality: What is the most accurate model that the algorithm can find?
  • Stability: How stable is the selection of a good model? Can high-accuracy models be consistently discovered in consecutive trials of the algorithm?
  • Efficiency: How long does it take for the algorithm to find a high-accuracy model?

We evaluate our algorithm on the standard benchmark NATS-Bench using 100 NAS runs, and we compare against other NAS algorithms, previously described in the NATS-Bench paper: random search, regularized evolution, and proximal policy optimization. Below, we visualize the differences between these search algorithms for the metrics described above. For each comparison, we record the average accuracy and variation in accuracy (variation is noted by a shaded region corresponding to the 25% to 75% interquartile range).

NATS-Bench size search defines a 5-layer CNN model, where each layer can choose from eight different options, each with different channels on the convolution layers. Our goal is to find the best model with 50% of the FLOPs required by the largest model. LayerNAS performance stands apart because it formulates the problem in a different way, separating the cost and reward to avoid searching a significant number of irrelevant model architectures. We found that model candidates with fewer channels in earlier layers tend to yield better performance, which explains how LayerNAS discovers better models much faster than other algorithms, as it avoids spending time on models outside the desired cost range. Note that the accuracy curve drops slightly after searching longer due to the lack of correlation between validation accuracy and test accuracy, i.e., some model architectures with higher validation accuracy have a lower test accuracy in NATS-Bench size search.

Top: NATS-Bench size search test accuracy on Cifar10; Middle: On Cifar100; Bottom: On ImageNet16-120. Average on 100 runs compared with random search (random), Regularized Evolution (evolution), and Proximal Policy Optimization (PPO).

We construct search spaces based on MobileNetV2, MobileNetV2 1.4x, MobileNetV3 Small, and MobileNetV3 Large and search for an optimal model architecture under different #MADDs (number of multiply-additions per image) constraints. Among all settings, LayerNAS finds a model with better accuracy on ImageNet. See the paper for details.

Comparison on models under different #MAdds.

Conclusion

In this post, we demonstrated how to reformulate NAS into a combinatorial optimization problem, and proposed LayerNAS as a solution that requires only polynomial search complexity. We compared LayerNAS with existing popular NAS algorithms and showed that it can find improved models on NATS-Bench. We also use the method to find better architectures based on MobileNetV2, and MobileNetV3.

Acknowledgements

We would like to thank Jingyue Shen, Keshav Kumar, Daiyi Peng, Mingxing Tan, Esteban Real, Peter Young, Weijun Wang, Qifei Wang, Xuanyi Dong, Xin Wang, Yingjie Miao, Yun Long, Zhuo Wang, Da-Cheng Juan, Deqiang Chen, Fotis Iliopoulos, Han-Byul Kim, Rino Lee, Andrew Howard, Erik Vee, Rina Panigrahy, Ravi Kumar and Andrew Tomkins for their contribution, collaboration and advice.

Read More

Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra

Perform intelligent search across emails in your Google workspace using the Gmail connector for Amazon Kendra

Many organizations use Gmail for their business email needs. Gmail for Business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Google Docs, Google Sheets, and more. For any organization, emails contain a wealth of information, which could be within the subject of an email, the message content, or even email attachments. Performing an intelligent search on email interactions with coworkers can help find answers to questions, thereby improving employee productivity and enhancing the overall customer experience for the organization.

Amazon Kendra is a highly accurate and intelligent search service that allows your users to search unstructured and structured data using natural language processing (NLP) and advanced search algorithms. You can now use the Gmail connector for Amazon Kendra to index emails and email attachments in Gmail, and search for answers to your questions on this content using intelligent search in Amazon Kendra, powered by machine learning (ML).

This post walks you through the process of configuring the Gmail connector for Amazon Kendra for your organization’s Google Workspace, allowing you to index emails based on a defined scope and take advantage of the intelligent search capabilities of Amazon Kendra.

Solution overview

A data source is a data repository or location that Amazon Kendra connects to and indexes your documents or content. After you create an Amazon Kendra index, you can create one or many data sources and configure them to start ingesting documents from the data source. In our solution, we ingest emails and attachments from Gmail by configuring the new Gmail data source connector to filter for emails that meet a certain filter criterion. After the configuration is complete, we can synchronize the data source to index the documents, allowing you to perform intelligent search on the Amazon Kendra index.

Prerequisites

To enable the Gmail connector for Amazon Kendra, you need the following:

  • An AWS account
  • A Google Workspace account and an organization for your business with one or many users that have access to Gmail
  • Administrator account credentials to Google Workspace and the Google Cloud console

Configure Google Workspace

To enable Amazon Kendra to access and index emails from Gmail accounts within the organization and perform intelligent search on them, it’s essential to configure your organization’s Google Workspace. In the steps that follow, we create a service account that the Gmail connector uses to index emails. The service account is provided with authorization scopes to allow access to certain Gmail APIs. The authorization scopes express the permissions you request users to authorize for your app and are applicable for all emails within your organization’s Google Workspace.

  1. Log in to your organization’s Google Cloud account.
  2. Create a new project with an appropriate name and assign it to your organization. In our example, we name the project KendraGmailConnector.
  3. Choose Create.

  1. Monitor the progress of creation of the new project on the Notifications menu on the top right of the Google Cloud console.

  1. After the project is created, choose the options menu, choose API & Services¸ and choose Library to view the API Library.

  1. On the API Library, search for Admin SDK API and choose Enable. The Admin SDK API enables managing the Google Workspace account resources and audit usage.

  1. Similarly, search for Gmail API on the API Library page and choose Enable. The Gmail API can help in viewing and managing Gmail mailbox data like threads, messages, and labels.

We now create a service account, which the Gmail connector for Amazon Kendra uses to access your organization’s emails based on the allowed API scope.

  1. On the options menu, choose IAM & Admin, then choose Service Accounts.

  1. Choose Create service account.

  1. Enter a name for your service account. For this post, we name our service account AmazonKendraGmailConnector.
  2. Enter your service account ID and account description.
  3. Skip the optional steps Grant this service account access to project and Grant users access to this service account and choose Done.

  1. Choose the service account you created to open the service account details page.
  2. Note the unique ID for the service account (also known as a client ID), to use in a later step.

Next, we create keys for the service account, which allows it to be used by the Gmail connector for Amazon Kendra.

  1. On the Keys tab, choose Add key.

  1. For Key type, select JSON.
  2. Choose Create.

This step downloads the private key to your computer, which must be kept safe to allow configuration on the Amazon Kendra console.

  1. Choose Close.

The following screenshot shows an example of the credentials JSON file.

  1. On the Details tab, expand the Advanced settings section.
  2. Under Domain-wide delegation, choose View Google Workspace admin console.

Granting access to the service account via a domain-wide delegation to your organization’s data must be done with caution, and can be reversed by disabling or deleting the service account or removing access through the Google Workspace admin console.

  1. Log in to the admin console using your Google Workspace admin credentials.
  2. In the navigation pane, under Security, choose Access and data control, then choose API controls.
  3. In the Domain-wide delegation section, choose Manage domain-wide delegation.

  1. Choose Add new.

This brings up the Add a new client ID dialog.

  1. Enter the unique ID for the service account you created earlier, and enter the following scopes to allow the service account to access the emails from Gmail:
    1. https://www.googleapis.com/auth/gmail.readonly
    2. https://www.googleapis.com/auth/admin.directory.user.readonly
  2. Choose Authorize.

This concludes the configuration within the Google Cloud console and Google Workspace admin console.

Configure the Gmail connector for Amazon Kendra

In this section, we walk through the configuration steps for the Gmail connector for Amazon Kendra:

  1. On the Amazon Kendra console, create a new index or open an existing index. For this post, we use the existing index EnterpriseKendraIndex.

  1. Under Data management in the navigation pane, choose Data sources.
  2. Choose Add data source.

  1. On the list of data sources, find the Gmail connector and choose Add connector.

  1. On the Specify data source details page, complete the following steps:
    1. For Data source name, enter a name.
    2. For Description, enter an optional description.
    3. Leave the language as the default setting, English (en).

    Amazon Kendra supports a select set of languages with full semantic search. These languages include Spanish, Japanese, French, and others. For more information, see Adding documents in languages other than English.

    1. Add any tags to the index, then choose Next.

Next, we create an AWS Secrets Manager secret to store the Gmail authentication details, and use the values in the credentials JSON file that we downloaded earlier.

  1. On the Define access and security page, complete the following steps:
      1. In the Authentication section, choose Create and add new secret, which opens the Create an AWS Secrets Manager secret dialog.
      2. For Secret name, enter a name.
      3. For Client email, enter the client email ID from the credentials JSON file.
      4. For Admin account email, enter the admin email for the Google Cloud console.
      5. For Private key, enter the private key from the credentials JSON file.
      6. Choose Save to return to the Define access and security page.

      1. In the Configure VPC and security group section, you can choose a VPC and the subnets that will contain the data source and security group that will grant access to the host. For our configuration, we choose No VPC.
      2. In the IAM role section, choose Create a new role and enter a role name.
      3. Choose Next.

  1. On the Configure sync settings page, set the following parameters to sync all emails and email attachments sent from the admin email address:
    1. In the Sync scope section, select Message attachments.
    2. Under Additional configuration, configure filters for the emails to ingest into the Amazon Kendra index:
      1. For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
      2. For Email domains, enter the email from domains, email to domains, subject, CC, and BCC emails you wish to include or exclude in your index. For this post, we set the email from domain as the admin email address.
      3. For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects.
      4. For Labels, add regular expression patterns to include or exclude certain labels or attachment types (up to 100 patterns).
      5. For Attachments, add regular expression patterns to include or exclude certain attachments (up to 100 patterns).

    1. In the Sync mode section, you can either specify a full sync to sync and index all contents in all entities regardless of the previous sync status, or only sync new, modified, or deleted content. For this post, we select Full sync.

    1. Lastly, we set an appropriate frequency for the sync. For this post, we choose Run on demand.
    2. Choose Next.
  1. On the Set field mappings page, you associate or create a mapping of the required data source fields with fields in your index. You can also create mappings for custom index fields. You can specify mapping for both messages and message attachments. For this post, we add field mappings in the Message section:
    1. Select the Gmail field mappings subject, from, and to.
    2. Choose Next.

  1. On the Review and create page, review all the steps and choose Add data source to create your Gmail connector data source.
  2. After the data source is created, on the Data sources page, select the data source (kendra-gmail-connector) and choose Sync now.

The amount of time the sync takes depends on the number of the emails that match the sync scope and the size of attachments that need be indexed. You can check the status of the sync operation for the Gmail data source if you choose the data source and scroll down to the Sync run history section. Choose the status of the individual sync to view more details.

This section shows the start and end times of the sync and also the number of documents that were added, deleted, failed, or modified during the sync. A status of Completed denotes a sync where there are no failures. In cases where a document being ingested is blank, the sync status is set to Completed with Errors with the number of failed documents listed as Failed, as shown in the following screenshot. In case of a sync failure, you can investigate the reason by either choosing the number of failed documents or by choosing the entry in the Details column, which brings up the Amazon CloudWatch logs. In the following example, two documents failed ingestion because they were blank.

After the sync is successful, you can perform a search on the Amazon Kendra index.

Search indexed content

To search on the indexed content, choose Search indexed content in the navigation pane on the Amazon Kendra console.

On the search console, enter any natural language question. In our example, we ask “What is SageMaker.” Amazon Kendra performs an intelligent search on the emails ingested into the index based on the scope of the sync and finds an answer, as shown in the following screenshot.

In this example, the Document fields section shows the field mappings that we specified while configuring our data source connector.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Gmail connector, delete the added data source.

Conclusion

In this post, we showed how organizations can now use the Gmail connector for Amazon Kendra to allow users to perform intelligent search on emails and email attachments, thereby improving employee productivity and customer satisfaction.

Additionally, we walked through how to define field mappings to the Amazon Kendra data source, allowing users to refine their search results.

To learn more about the Gmail connector for Amazon Kendra, refer to Gmail data source connector for Amazon Kendra.


About the Author

Roshan Thomas is a Senior Solutions Architect at Amazon Web Services. He is based in Melbourne, Australia, and works closely with power and utilities customers to accelerate their journey in the cloud. He is passionate about technology and helping customers architect and build solutions on AWS.

Read More

Right on Track: NVIDIA Open-Source Software Helps Developers Add Guardrails to AI Chatbots

Right on Track: NVIDIA Open-Source Software Helps Developers Add Guardrails to AI Chatbots

Newly released open-source software can help developers guide generative AI applications to create impressive text responses that stay on track.

NeMo Guardrails will help ensure smart applications powered by large language models (LLMs) are accurate, appropriate, on topic and secure. The software includes all the code, examples and documentation businesses need to add safety to AI apps that generate text.

Today’s release comes as many industries are adopting LLMs, the powerful engines behind these AI apps. They’re answering customers’ questions, summarizing lengthy documents, even writing software and accelerating drug design.

NeMo Guardrails is designed to help users keep this new class of AI-powered applications safe.

Powerful Models, Strong Rails

Safety in generative AI is an industry-wide concern. NVIDIA designed NeMo Guardrails to work with all LLMs, such as OpenAI’s ChatGPT.

The software lets developers align LLM-powered apps so they’re safe and stay within the domains of a company’s expertise.

NeMo Guardrails enables developers to set up three kinds of boundaries:

  • Topical guardrails prevent apps from veering off into undesired areas. For example, they keep customer service assistants from answering questions about the weather.
  • Safety guardrails ensure apps respond with accurate, appropriate information. They can filter out unwanted language and enforce that references are made only to credible sources.
  • Security guardrails restrict apps to making connections only to external third-party applications known to be safe.

Virtually every software developer can use NeMo Guardrails — no need to be a machine learning expert or data scientist. They can create new rules quickly with a few lines of code.

Riding Familiar Tools

Since NeMo Guardrails is open source, it can work with all the tools that enterprise app developers use.

For example, it can run on top of LangChain, an open-source toolkit that developers are rapidly adopting to plug third-party applications into the power of LLMs.

“Users can easily add NeMo Guardrails to LangChain workflows to quickly put safe boundaries around their AI-powered apps,” said Harrison Chase, who created the LangChain toolkit and a startup that bears its name.

In addition, NeMo Guardrails is designed to be able to work with a broad range of LLM-enabled applications, such as Zapier. Zapier is an automation platform used by over 2 million businesses, and it’s seen first-hand how users are integrating AI into their work.

“Safety, security, and trust are the cornerstones of responsible AI development, and we’re excited about NVIDIA’s proactive approach to embed these guardrails into AI systems,” said Reid Robinson, lead product manager of AI at Zapier.

“We look forward to the good that will come from making AI a dependable and trusted part of the future.”

Available as Open Source and From NVIDIA

NVIDIA is incorporating NeMo Guardrails into the NVIDIA NeMo framework, which includes everything users need to train and tune language models using a company’s proprietary data.

Much of the NeMo framework is already available as open source code on GitHub.  Enterprises also can get it as a complete and supported package, part of the NVIDIA AI Enterprise software platform.

NeMo is also available as a service. It’s part of NVIDIA AI Foundations, a family of cloud services for businesses that want to create and run custom generative AI models based on their own datasets and domain knowledge.

Using NeMo, South Korea’s leading mobile operator built an intelligent assistant that’s had 8 million conversations with its customers. A research team in Sweden employed NeMo to create LLMs that can automate text functions for the country’s hospitals, government and business offices.

An Ongoing Community Effort

Building good guardrails for generative AI is a hard problem that will require lots of ongoing research as AI evolves.

NVIDIA made NeMo Guardrails — the product of several years’ research — open source to contribute to the developer community’s tremendous energy and work on AI safety.

Together, our efforts on guardrails will help companies keep their smart services aligned with safety, privacy and security requirements so these engines of innovation stay on track.

For more details on NeMo Guardrails and to get started, see our technical blog.

Read More