Why GPUs Are Great for AI

Why GPUs Are Great for AI

GPUs have been called the rare Earth metals — even the gold — of artificial intelligence, because they’re foundational for today’s generative AI era.

Three technical reasons, and many stories, explain why that’s so. Each reason has multiple facets well worth exploring, but at a high level:

  • GPUs employ parallel processing.
  • GPU systems scale up to supercomputing heights.
  • The GPU software stack for AI is broad and deep.

The net result is GPUs perform technical calculations faster and with greater energy efficiency than CPUs. That means they deliver leading performance for AI training and inference as well as gains across a wide array of applications that use accelerated computing.

In its recent report on AI, Stanford’s Human-Centered AI group provided some context. GPU performance “has increased roughly 7,000 times” since 2003 and price per performance is “5,600 times greater,” it reported.

Stanford report on GPU performance increases
A 2023 report captured the steep rise in GPU performance and price/performance.

The report also cited analysis from Epoch, an independent research group that measures and forecasts AI advances.

“GPUs are the dominant computing platform for accelerating machine learning workloads, and most (if not all) of the biggest models over the last five years have been trained on GPUs … [they have] thereby centrally contributed to the recent progress in AI,” Epoch said on its site.

A 2020 study assessing AI technology for the U.S. government drew similar conclusions.

“We expect [leading-edge] AI chips are one to three orders of magnitude more cost-effective than leading-node CPUs when counting production and operating costs,” it said.

NVIDIA GPUs have increased performance on AI inference 1,000x in the last ten years, said Bill Dally, the company’s chief scientist in a keynote at Hot Chips, an annual gathering of semiconductor and systems engineers.

ChatGPT Spread the News

ChatGPT provided a powerful example of how GPUs are great for AI. The large language model (LLM), trained and run on thousands of NVIDIA GPUs, runs generative AI services used by more than 100 million people.

Since its 2018 launch, MLPerf, the industry-standard benchmark for AI, has provided numbers that detail the leading performance of NVIDIA GPUs on both AI training and inference.

For example, NVIDIA Grace Hopper Superchips swept the latest round of inference tests. NVIDIA TensorRT-LLM, inference software released since that test, delivers up to an 8x boost in performance and more than a 5x reduction in energy use and total cost of ownership. Indeed, NVIDIA GPUs have won every round of MLPerf training and inference tests since the benchmark was released in 2019.

In February, NVIDIA GPUs delivered leading results for inference, serving up thousands of inferences per second on the most demanding models in the STAC-ML Markets benchmark, a key technology performance gauge for the financial services industry.

A RedHat software engineering team put it succinctly in a blog: “GPUs have become the foundation of artificial intelligence.”

AI Under the Hood

A brief look under the hood shows why GPUs and AI make a powerful pairing.

An AI model, also called a neural network, is essentially a mathematical lasagna, made from layer upon layer of linear algebra equations. Each equation represents the likelihood that one piece of data is related to another.

For their part, GPUs pack thousands of cores, tiny calculators working in parallel to slice through the math that makes up an AI model. This, at a high level, is how AI computing works.

Highly Tuned Tensor Cores

Over time, NVIDIA’s engineers have tuned GPU cores to the evolving needs of AI models. The latest GPUs include Tensor Cores that are 60x more powerful than the first-generation designs for processing the matrix math neural networks use.

In addition, NVIDIA Hopper Tensor Core GPUs include a Transformer Engine that can automatically adjust to the optimal precision needed to process transformer models, the class of neural networks that spawned generative AI.

Along the way, each GPU generation has packed more memory and optimized techniques to store an entire AI model in a single GPU or set of GPUs.

Models Grow, Systems Expand

The complexity of AI models is expanding a whopping 10x a year.

The current state-of-the-art LLM, GPT4, packs more than a trillion parameters, a metric of its mathematical density. That’s up from less than 100 million parameters for a popular LLM in 2018.

Chart shows 1,000x performance improvement on AI inference over a decade for single GPUs
In a recent talk at Hot Chips, NVIDIA Chief Scientist Bill Dally described how single-GPU performance on AI inference expanded 1,000x in the last decade.

GPU systems have kept pace by ganging up on the challenge. They scale up to supercomputers, thanks to their fast NVLink interconnects and NVIDIA Quantum InfiniBand networks.

For example, the DGX GH200, a large-memory AI supercomputer, combines up to 256 NVIDIA GH200 Grace Hopper Superchips into a single data-center-sized GPU with 144 terabytes of shared memory.

Each GH200 superchip is a single server with 72 Arm Neoverse CPU cores and four petaflops of AI performance. A new four-way Grace Hopper systems configuration puts in a single compute node a whopping 288 Arm cores and 16 petaflops of AI performance with up to 2.3 terabytes of high-speed memory.

And NVIDIA H200 Tensor Core GPUs announced in November pack up to 288 gigabytes of the latest HBM3e memory technology.

Software Covers the Waterfront

An expanding ocean of GPU software has evolved since 2007 to enable every facet of AI, from deep-tech features to high-level applications.

The NVIDIA AI platform includes hundreds of software libraries and apps. The CUDA programming language and the cuDNN-X library for deep learning provide a base on top of which developers have created software like NVIDIA NeMo, a framework to let users build, customize and run inference on their own generative AI models.

Many of these elements are available as open-source software, the grab-and-go staple of software developers. More than a hundred of them are packaged into the NVIDIA AI Enterprise platform for companies that require full security and support. Increasingly, they’re also available from major cloud service providers as APIs and services on NVIDIA DGX Cloud.

SteerLM, one of the latest AI software updates for NVIDIA GPUs, lets users fine tune models during inference.

A 70x Speedup in 2008

Success stories date back to a 2008 paper from AI pioneer Andrew Ng, then a Stanford researcher. Using two NVIDIA GeForce GTX 280 GPUs, his three-person team achieved a 70x speedup over CPUs processing an AI model with 100 million parameters, finishing work that used to require several weeks in a single day.

“Modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods,” they reported.

Picture of Andrew Ng showing slide in a talk on GPU performance for AI
Andrew Ng described his experiences using GPUs for AI in a GTC 2015 talk.

In a 2015 talk at NVIDIA GTC, Ng described how he continued using more GPUs to scale up his work, running larger models at Google Brain and Baidu. Later, he helped found Coursera, an online education platform where he taught hundreds of thousands of AI students.

Ng counts Geoff Hinton, one of the godfathers of modern AI, among the people he influenced. “I remember going to Geoff Hinton saying check out CUDA, I think it can help build bigger neural networks,” he said in the GTC talk.

The University of Toronto professor spread the word. “In 2009, I remember giving a talk at NIPS [now NeurIPS], where I told about 1,000 researchers they should all buy GPUs because GPUs are going to be the future of machine learning,” Hinton said in a press report.

Fast Forward With GPUs

AI’s gains are expected to ripple across the global economy.

A McKinsey report in June estimated that generative AI could add the equivalent of $2.6 trillion to $4.4 trillion annually across the 63 use cases it analyzed in industries like banking, healthcare and retail. So, it’s no surprise Stanford’s 2023 AI report said that a majority of business leaders expect to increase their investments in AI.

Today, more than 40,000 companies use NVIDIA GPUs for AI and accelerated computing, attracting a global community of 4 million developers. Together they’re advancing science, healthcare, finance and virtually every industry.

Among the latest achievements, NVIDIA described a whopping 700,000x speedup using AI to ease climate change by keeping carbon dioxide out of the atmosphere (see video below). It’s one of many ways NVIDIA is applying the performance of GPUs to AI and beyond.

Learn how GPUs put AI into production.

Read More

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

How Getir reduced model training durations by 90% with Amazon SageMaker and AWS Batch

This is a guest post co-authored by Nafi Ahmet Turgut, Hasan Burak Yel, and Damla Şentürk from Getir.

Established in 2015, Getir has positioned itself as the trailblazer in the sphere of ultrafast grocery delivery. This innovative tech company has revolutionized the last-mile delivery segment with its compelling offering of “groceries in minutes.” With a presence across Turkey, the UK, the Netherlands, Germany, and the United States, Getir has become a multinational force to be reckoned with. Today, the Getir brand represents a diversified conglomerate encompassing nine different verticals, all working synergistically under a singular umbrella.

In this post, we explain how we built an end-to-end product category prediction pipeline to help commercial teams by using Amazon SageMaker and AWS Batch, reducing model training duration by 90%.

Understanding our existing product assortment in a detailed manner is a crucial challenge that we, along with many businesses, face in today’s fast-paced and competitive market. An effective solution to this problem is the prediction of product categories. A model that generates a comprehensive category tree allows our commercial teams to benchmark our existing product portfolio against that of our competitors, offering a strategic advantage. Therefore, our central challenge is the creation and implementation of an accurate product category prediction model.

We capitalized on the powerful tools provided by AWS to tackle this challenge and effectively navigate the complex field of machine learning (ML) and predictive analytics. Our efforts led to the successful creation of an end-to-end product category prediction pipeline, which combines the strengths of SageMaker and AWS Batch.

This capability of predictive analytics, particularly the accurate forecast of product categories, has proven invaluable. It provided our teams with critical data-driven insights that optimized inventory management, enhanced customer interactions, and strengthened our market presence.

The methodology we explain in this post ranges from the initial phase of feature set gathering to the final implementation of the prediction pipeline. An important aspect of our strategy has been the use of SageMaker and AWS Batch to refine pre-trained BERT models for seven different languages. Additionally, our seamless integration with AWS’s object storage service Amazon Simple Storage Service (Amazon S3) has been key to efficiently storing and accessing these refined models.

SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly and effortlessly build and train ML models, and then directly deploy them into a production-ready hosted environment.

As a fully managed service, AWS Batch helps you run batch computing workloads of any scale. AWS Batch automatically provisions compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. With AWS Batch, there’s no need to install or manage batch computing software, so you can focus your time on analyzing results and solving problems. We used GPU jobs that help us run jobs that use an instance’s GPUs.

Overview of solution

Five people from Getir’s data science team and infrastructure team worked together on this project. The project was completed in a month and deployed to production after a week of testing.

The following diagram shows the solution’s architecture.

The model pipeline is run separately for each country. The architecture includes two AWS Batch GPU cron jobs for each country, running on defined schedules.

We overcame some challenges by strategically deploying SageMaker and AWS Batch GPU resources. The process used to address each difficulty is detailed in the following sections.

Fine-tuning multilingual BERT models with AWS Batch GPU jobs

We sought a solution to support multiple languages for our diverse user base. BERT models were an obvious choice due to their established ability to handle complex natural language tasks effectively. In order to tailor these models to our needs, we harnessed the power of AWS by using single-node GPU instance jobs. This allowed us to fine-tune pre-trained BERT models for each of the seven languages we required support for. Through this method, we ensured high precision in predicting product categories, overcoming any potential language barriers.

Efficient model storage using Amazon S3

Our next step was to address model storage and management. For this, we selected Amazon S3, known for its scalability and security. Storing our fine-tuned BERT models on Amazon S3 enabled us to provide easy access to different teams within our organization, thereby significantly streamlining our deployment process. This was a crucial aspect in achieving agility in our operations and a seamless integration of our ML efforts.

Creating an end-to-end prediction pipeline

An efficient pipeline was required to make the best use of our pre-trained models. We first deployed these models on SageMaker, an action that allowed for real-time predictions with low latency, thereby enhancing our user experience. For larger-scale batch predictions, which were equally vital to our operations, we utilized AWS Batch GPU jobs. This ensured the optimal use of our resources, providing us with a perfect balance of performance and efficiency.

Exploring future possibilities with SageMaker MMEs

As we continue to evolve and seek efficiencies in our ML pipeline, one avenue we are keen to explore is using SageMaker multi-model endpoints (MMEs) for deploying our fine-tuned models. With MMEs, we can potentially streamline the deployment of various fine-tuned models, ensuring efficient model management while also benefiting from the native capabilities of SageMaker like shadow variants, auto scaling, and Amazon CloudWatch integration. This exploration aligns with our continuous pursuit of enhancing our predictive analytics capabilities and providing superior experiences to our customers.

Conclusion

Our successful integration of SageMaker and AWS Batch has not only addressed our specific challenges but also significantly boosted our operational efficiency. Through the implementation of a sophisticated product category prediction pipeline, we are able to empower our commercial teams with data-driven insights, thereby facilitating more effective decision-making.

Our results speak volumes about our approach’s effectiveness. We have achieved an 80% prediction accuracy across all four levels of category granularity, which plays an important role in shaping the product assortments for each country we serve. This level of precision extends our reach beyond language barriers and ensures we cater to our diverse user base with the utmost accuracy.

Moreover, by strategically using scheduled AWS Batch GPU jobs, we’ve been able to reduce our model training durations by 90%. This efficiency has further streamlined our processes and bolstered our operational agility. Efficient model storage using Amazon S3 has played a critical role in this achievement, balancing both real-time and batch predictions.

For more information about how to get started building your own ML pipelines with SageMaker, see Amazon SageMaker resources. AWS Batch is an excellent option if you are looking for a low-cost, scalable solution for running batch jobs with low operational overhead. To get started, see Getting Started with AWS Batch.


About the Authors

Nafi Ahmet Turgut finished his master’s degree in Electrical & Electronics Engineering and worked as a graduate research scientist. His focus was building machine learning algorithms to simulate nervous network anomalies. He joined Getir in 2019 and currently works as a Senior Data Science & Analytics Manager. His team is responsible for designing, implementing, and maintaining end-to-end machine learning algorithms and data-driven solutions for Getir.

Hasan Burak Yel received his bachelor’s degree in Electrical & Electronics Engineering at Boğaziçi University. He worked at Turkcell, mainly focused on time series forecasting, data visualization, and network automation. He joined Getir in 2021 and currently works as a Data Science & Analytics Manager with the responsibility of Search, Recommendation, and Growth domains.

Damla Şentürk received her bachelor’s degree of Computer Engineering at Galatasaray University. She continues her master’s degree of Computer Engineering in Boğaziçi University. She joined Getir in 2022, and has been working as a Data Scientist. She has worked on commercial, supply chain, and discovery-related projects.

Esra Kayabalı is a Senior Solutions Architect at AWS, specialized in the analytics domain, including data warehousing, data lakes, big data analytics, batch and real-time data streaming, and data integration. She has 12 years of software development and architecture experience. She is passionate about learning and teaching cloud technologies.

Read More

Fast Optimal Locally Private Mean Estimation via Random Projections

We study the problem of locally private mean estimation of high-dimensional vectors in the Euclidean ball. Existing algorithms for this problem either incur sub-optimal error or have high communication and/or run-time complexity. We propose a new algorithmic framework, ProjUnit, for private mean estimation that yields algorithms that are computationally efficient, have low communication complexity, and incur optimal error up to a 1+o(1)-factor. Our framework is deceptively simple: each randomizer projects its input to a random low-dimensional subspace, normalizes the result, and then runs an…Apple Machine Learning Research

Manifold Diffusion Fields

This paper was accepted at the Diffusion Models workshop at NeurIPS 2023.
Score-based models have quickly become the de facto choice for generative modeling of images, text and more recently molecules. However, to adapt a score-based generative modeling to these domains the score network needs to be carefully designed, hampering its applicability to arbitrary data domains. In this paper we tackle this problem by taking a textit{functional} view of data. This functional view allows to cast seemingly different domains to a common shared representation. We then re-formulate the score function to…Apple Machine Learning Research

Generating Molecular Conformers with Manifold Diffusion Fields

This paper was accepted at Generative AI and Biology workshop at NeurIPS 2023.
In this paper we tackle the problem of generating a molecule conformation in 3D space given its 2D structure. We approach this problem through the lens of a diffusion model for functions in Riemannian Manifolds. Our approach is simple and scalable, and obtains results that are on par with state-of-the-art while making no assumptions about the explicit structure of molecules.Apple Machine Learning Research

Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation

This paper was accepted at the workshop I Can’t Believe It’s Not Better! (ICBINB) at NeurIPS 2023.
Recent advances in image tokenizers, such as VQ-VAE, have enabled text-to-image generation using auto-regressive methods, similar to language modeling. However, these methods have yet to leverage pre-trained language models, despite their adaptability to various downstream tasks. In this work, we explore this gap, and find that pre-trained language models offer limited help in auto-regressive text-to-image generation. We provide a two-fold explanation by analyzing tokens from each modality…Apple Machine Learning Research

Controllable Music Production with Diffusion Models and Guidance Gradients

This paper was accepted at the NeurIPS 2023 workshop on Diffusion Models.
We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer of desired stylistic characteristics to existing audio clips. We achieve this by applying guidance at sampling time in a simple framework that…Apple Machine Learning Research

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

Boosting developer productivity: How Deloitte uses Amazon SageMaker Canvas for no-code/low-code machine learning

The ability to quickly build and deploy machine learning (ML) models is becoming increasingly important in today’s data-driven world. However, building ML models requires significant time, effort, and specialized expertise. From data collection and cleaning to feature engineering, model building, tuning, and deployment, ML projects often take months for developers to complete. And experienced data scientists can be hard to come by.

This is where the AWS suite of low-code and no-code ML services becomes an essential tool. With just a few clicks using Amazon SageMaker Canvas, you can take advantage of the power of ML without needing to write any code.

As a strategic systems integrator with deep ML experience, Deloitte utilizes the no-code and low-code ML tools from AWS to efficiently build and deploy ML models for Deloitte’s clients and for internal assets. These tools allow Deloitte to develop ML solutions without needing to hand-code models and pipelines. This can help speed up project delivery timelines and enable Deloitte to take on more client work.

The following are some specific reasons why Deloitte uses these tools:

  • Accessibility for non-programmers – No-code tools open up ML model building to non-programmers. Team members with just domain expertise and very little coding skills can develop ML models.
  • Rapid adoption of new technology – Availability and constant improvement on ready-to-use models and AutoML helps ensure that users are constantly using leading-class technology.
  • Cost-effective development – No-code tools help reduce the cost and time required for ML model development, making it more accessible to clients, which can help them achieve a higher return on investment.

Additionally, these tools provide a comprehensive solution for faster workflows, enabling the following:

  • Faster data preparation – SageMaker Canvas has over 300 built-in transformations and the ability to use natural language that can accelerate data preparation and making data ready for model building.
  • Faster model building – SageMaker Canvas offers ready-to-use models or Amazon AutoML technology that enables you to build custom models on enterprise data with just a few clicks. This helps speed up the process compared to coding models from the ground up.
  • Easier deployment – SageMaker Canvas offers the ability to deploy production-ready models to an Amazon Sagmaker endpoint in a few clicks while also registering it in Amazon SageMaker Model Registry.

Vishveshwara Vasa, Cloud CTO for Deloitte, says:

“Through AWS’s no-code ML services such as SageMaker Canvas and SageMaker Data Wrangler, we at Deloitte Consulting have unlocked new efficiencies, enhancing the speed of development and deployment productivity by 30–40% across our client-facing and internal projects.”

In this post, we demonstrate the power of building an end-to-end ML model with no code using SageMaker Canvas by showing you how to build a classification model for predicting if a customer will default on a loan. By predicting loan defaults more accurately, the model can help a financial services company manage risk, price loans appropriately, improve operations, provide additional services, and gain a competitive advantage. We demonstrate how SageMaker Canvas can help you rapidly go from raw data to a deployed binary classification model for loan default prediction.

SageMaker Canvas offers comprehensive data preparation capabilities powered by Amazon SageMaker Data Wrangler in the SageMaker Canvas workspace. This enables you to go through all the phases of a standard ML workflow, from data preparation to model building and deployment, on a single platform.

Data preparation is typically the most time-intensive phase of the ML workflow. To reduce time spent on data preparation, SageMaker Canvas allows you to prepare your data using over 300 built-in transformations. Alternatively, you can write natural language prompts, such as “drop the rows for column c that are outliers,” and be presented with the code snippet necessary for this data preparation step. You can then add this to your data preparation workflow in a few clicks. We show you how to use that in this post as well.

Solution overview

The following diagram describes the architecture for a loan default classification model using SageMaker low-code and no-code tools.

Starting with a dataset that has details about loan default data in Amazon Simple Storage Service (Amazon S3), we use SageMaker Canvas to gain insights about the data. We then perform feature engineering to apply transformations such as encoding categorical features, dropping features that are not needed, and more. Next, we store the cleansed data back in Amazon S3. We use the cleaned dataset to create a classification model for predicting loan defaults. Then we have a production-ready model for inference.

Prerequisites

Make sure that the following prerequisites are complete and that you have enabled the Canvas Ready-to-use models option when setting up the SageMaker domain. If you have already set up your domain, edit your domain settings and go to Canvas settings to enable the Enable Canvas Ready-to-use models option. Additionally, set up and create the SageMaker Canvas application, then request and enable Anthropic Claude model access on Amazon Bedrock.

Dataset

We use a public dataset from kaggle that contains information about financial loans. Each row in the dataset represents a single loan, and the columns provide details about each transaction. Download this dataset and store this in an S3 bucket of your choice. The following table lists the fields in the dataset.

Column Name Data Type Description
Person_age Integer Age of the person who took a loan
Person_income Integer Income of the borrower
Person_home_ownership String Home ownership status (own or rent)
Person_emp_length Decimal Number of years they are employed
Loan_intent String Reason for loan (personal, medical, educational, and so on)
Loan_grade String Loan grade (A–E)
Loan_int_rate Decimal Interest rate
Loan_amnt Integer Total amount of the loan
Loan_status Integer Target (whether they defaulted or not)
Loan_percent_income Decimal Loan amount compared to the percentage of the income
Cb_person_default_on_file Integer Previous defaults (if any)
Cb_person_credit_history_length String Length of their credit history

Simplify data preparation with SageMaker Canvas

Data preparation can take up to 80% of the effort in ML projects. Proper data preparation leads to better model performance and more accurate predictions. SageMaker Canvas allows interactive data exploration, transformation, and preparation without writing any SQL or Python code.

Complete the following steps to prepare your data:

  1. On the SageMaker Canvas console, choose Data preparation in the navigation pane.
  2. On the Create menu, choose Document.
  3. For Dataset name, enter a name for your dataset.
  4. Choose Create.
  5. Choose Amazon S3 as the data source and connect it to the dataset.
  6. After the dataset is loaded, create a data flow using that dataset.
  7. Switch to the analyses tab and create a Data Quality and Insights Report.

This is a recommended step to analyze the quality of the input dataset. The output of this report produces instant ML-powered insights such as data skew, duplicates in the data, missing values, and much more. The following screenshot shows a sample of the generated report for the loan dataset.

By generating these insights on your behalf, SageMaker Canvas provides you with a set of issues in the data that need remediation in the data preperation phase. To pick the top two issues identified by SageMaker Canvas, you need to encode the categorical features and remove the duplicate rows so your model quality is high. You can do both of these and more in a visual workflow with SageMaker Canvas.

  1. First, one-hot encode the loan_intent, loan_grade, and person_home_ownership
  2. You can drop the cb_person_cred_history_length column because that column has the least predicting power, as shown in the Data Quality and Insights Report.

    SageMaker Canvas recently added a Chat with data option. This feature uses the power of foundation models to interpret natural language queries and generate Python-based code to apply feature engineering transformations. This feature is powered by Amazon Bedrock, and can be configured to run entirely in a your VPC so that data never leaves the your environment.
  3. To use this feature to remove duplicate rows, choose the plus sign next to the Drop column transform, then choose Chat with data.
  4. Enter your query in natural language (for example, “Remove duplicate rows from the dataset”).
  5. Review the generated transformation and choose Add to steps to add the transformation to the flow.
  6. Finally, export the output of these transformations to Amazon S3 or optionally Amazon SageMaker Feature Store to use these features across multiple projects.

You can also add another step to create an Amazon S3 destination for the dataset to scale the workflow for a large dataset. The following diagram shows the SageMaker Canvas data flow after adding visual transformations.

You have completed the entire data processing and feature engineering step using visual workflows in SageMaker Canvas. This helps reduce the time a data engineer spends on cleaning and making the data ready for model development from weeks to days. The next step is to build the ML model.

Build a model with SageMaker Canvas

Amazon SageMaker Canvas provides a no-code end-to-end workflow for building, analyzing, testing, and deploying this binary classification model. Complete the following steps:

  1. Create a dataset in SageMaker Canvas.
  2. Specify either the S3 location that was used to export the data or the S3 location that’s on the destination of the SageMaker Canvas job.

    Now you’re ready to build the model.
  3. Choose Models in the navigation pane and choose New model.
  4. Name the model and select Predictive analysis as the model type.
  5. Choose the dataset created in the previous step.

    The next step is configuring the model type.
  6. Choose the target column and the model type will be automatically set as 2 category prediction.
  7. Choose your build type, Standard build or Quick build.

    SageMaker Canvas displays the expected build time as soon as you start building the model. Standard build usually takes between 2–4 hours; you can use the Quick build option for smaller datasets, which only takes 2–15 minutes. For this particular dataset, it should take around 45 minutes to complete the model build. SageMaker Canvas keeps you informed of the progress of the build process.
  8. After the model is built, you can look at the model performance.

    SageMaker Canvas provides various metrics like accuracy, precision, and F1 score depending on the type of the model. The following screenshot shows the accuracy and a few other advanced metrics for this binary classification model.
  9. The next step is to make test predictions.
    SageMaker Canvas allows you to make batch predictions on multiple inputs or a single prediction to quickly verify the model quality. The following screenshot shows a sample inference.
  10. The last step is to deploy the trained model.
    SageMaker Canvas deploys the model on SageMaker endpoints, and now you have a production model ready for inference. The following screenshot shows the deployed endpoint.

After the model is deployed, you can call it through the AWS SDK or AWS Command Line Interface (AWS CLI) or make API calls to any application of your choice to confidently predict the risk of a potential borrower. For more information about testing your model, refer to Invoke real-time endpoints.

Clean up

To avoid incurring additional charges, log out of SageMaker Canvas or delete the SageMaker domain that was created. Additionally, delete the SageMaker model endpoint and delete the dataset that was uploaded to Amazon S3.

Conclusion

No-code ML accelerates development, simplifies deployment, doesn’t require programming skills, increases standardization, and reduces cost. These benefits made no-code ML attractive to Deloitte to improve its ML service offerings, and they have shortened their ML model build timelines by 30–40%.

Deloitte is a strategic global systems integrator with over 17,000 certified AWS practitioners across the globe. It continues to raise the bar through participation in the AWS Competency Program with 25 competencies, including Machine Learning. Connect with Deloitte to start using AWS no-code and low-code solutions to your enterprise.


About the authors

Chida Sadayappan leads Deloitte’s Cloud AI/Machine Learning practice. He brings strong thought leadership experience to engagements and thrives in supporting executive stakeholders achieve performance improvement and modernization goals across industries using AI/ML. Chida is a serial tech entrepreneur and an avid community builder in the startup and developer ecosystems.

Kuldeep Singh, a Principal Global AI/ML leader at AWS with over 20 years in tech, skillfully combines his sales and entrepreneurship expertise with a deep understanding of AI, ML, and cybersecurity. He excels in forging strategic global partnerships, driving transformative solutions and strategies across various industries with a focus on generative AI and GSIs.

Kasi Muthu is a senior partner solutions architect focusing on data and AI/ML at AWS based out of Houston, TX. He is passionate about helping partners and customers accelerate their cloud data journey. He is a trusted advisor in this field and has plenty of experience architecting and building scalable, resilient, and performant workloads in the cloud. Outside of work, he enjoys spending time with his family.

Read More

Unsupervised speech-to-speech translation from monolingual data

Unsupervised speech-to-speech translation from monolingual data

Speech-to-speech translation (S2ST) is a type of machine translation that converts spoken language from one language to another. This technology has the potential to break down language barriers and facilitate communication between people from different cultures and backgrounds.

Previously, we introduced Translatotron 1 and Translatotron 2, the first ever models that were able to directly translate speech between two languages. However they were trained in supervised settings with parallel speech data. The scarcity of parallel speech data is a major challenge in this field, so much that most public datasets are semi- or fully-synthesized from text. This adds additional hurdles to learning translation and reconstruction of speech attributes that are not represented in the text and are thus not reflected in the synthesized training data.

Here we present Translatotron 3, a novel unsupervised speech-to-speech translation architecture. In Translatotron 3, we show that it is possible to learn a speech-to-speech translation task from monolingual data alone. This method opens the door not only to translation between more language pairs but also towards translation of the non-textual speech attributes such as pauses, speaking rates, and speaker identity. Our method does not include any direct supervision to target languages and therefore we believe it is the right direction for paralinguistic characteristics (e.g., such as tone, emotion) of the source speech to be preserved across translation. To enable speech-to-speech translation, we use back-translation, which is a technique from unsupervised machine translation (UMT) where a synthetic translation of the source language is used to translate texts without bilingual text datasets. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system.

Translatotron 3

Translatotron 3 addresses the problem of unsupervised S2ST, which can eliminate the requirement for bilingual speech datasets. To do this, Translatotron 3’s design incorporates three key aspects:

  1. Pre-training the entire model as a masked autoencoder with SpecAugment, a simple data augmentation method for speech recognition that operates on the logarithmic mel spectogram of the input audio (instead of the raw audio itself) and is shown to effectively improve the generalization capabilities of the encoder.
  2. Unsupervised embedding mapping based on multilingual unsupervised embeddings (MUSE), which is trained on unpaired languages but allows the model to learn an embedding space that is shared between the source and target languages.
  3. A reconstruction loss based on back-translation, to train an encoder-decoder direct S2ST model in a fully unsupervised manner.

The model is trained using a combination of the unsupervised MUSE embedding loss, reconstruction loss, and S2S back-translation loss. During inference, the shared encoder is utilized to encode the input into a multilingual embedding space, which is subsequently decoded by the target language decoder.

Architecture

Translatotron 3 employs a shared encoder to encode both the source and target languages. The decoder is composed of a linguistic decoder, an acoustic synthesizer (responsible for acoustic generation of the translation speech), and a singular attention module, like Translatotron 2. However, for Translatotron 3 there are two decoders, one for the source language and another for the target language. During training, we use monolingual speech-text datasets (i.e., these data are made up of speech-text pairs; they are not translations).

Encoder

The encoder has the same architecture as the speech encoder in the Translatotron 2. The output of the encoder is split into two parts: the first part incorporates semantic information whereas the second part incorporates acoustic information. By using the MUSE loss, the first half of the output is trained to be the MUSE embeddings of the text of the input speech spectrogram. The latter half is updated without the MUSE loss. It is important to note that the same encoder is shared between source and target languages. Furthermore, the MUSE embedding is multilingual in nature. As a result, the encoder is able to learn a multilingual embedding space across source and target languages. This allows a more efficient and effective encoding of the input, as the encoder is able to encode speech from both languages into a common embedding space, rather than maintaining a separate embedding space for each language.

Decoder

Like Translatotron 2, the decoder is composed of three distinct components, namely the linguistic decoder, the acoustic synthesizer, and the attention module. To effectively handle the different properties of the source and target languages, however, Translatotron 3 has two separate decoders, for the source and target languages.

Two part training

The training methodology consists of two parts: (1) auto-encoding with reconstruction and (2) a back-translation term. In the first part, the network is trained to auto-encode the input to a multilingual embedding space using the MUSE loss and the reconstruction loss. This phase aims to ensure that the network generates meaningful multilingual representations. In the second part, the network is further trained to translate the input spectrogram by utilizing the back-translation loss. To mitigate the issue of catastrophic forgetting and enforcing the latent space to be multilingual, the MUSE loss and the reconstruction loss are also applied in this second part of training. To ensure that the encoder learns meaningful properties of the input, rather than simply reconstructing the input, we apply SpecAugment to encoder input at both phases. It has been shown to effectively improve the generalization capabilities of the encoder by augmenting the input data.

Training objective

During the back-translation training phase (illustrated in the section below), the network is trained to translate the input spectrogram to the target language and then back to the source language. The goal of back-translation is to enforce the latent space to be multilingual. To achieve this, the following losses are applied:

  • MUSE loss: The MUSE loss measures the similarity between the multilingual embedding of the input spectrogram and the multilingual embedding of the back-translated spectrogram.
  • Reconstruction loss: The reconstruction loss measures the similarity between the input spectrogram and the back-translated spectrogram.

In addition to these losses, SpecAugment is applied to the encoder input at both phases. Before the back-translation training phase, the network is trained to auto-encode the input to a multilingual embedding space using the MUSE loss and reconstruction loss.

MUSE loss

To ensure that the encoder generates multilingual representations that are meaningful for both decoders, we employ a MUSE loss during training. The MUSE loss forces the encoder to generate such a representation by using pre-trained MUSE embeddings. During the training process, given an input text transcript, we extract the corresponding MUSE embeddings from the embeddings of the input language. The error between MUSE embeddings and the output vectors of the encoder is then minimized. Note that the encoder is indifferent to the language of the input during inference due to the multilingual nature of the embeddings.

The training and inference in Translatotron 3. Training includes the reconstruction loss via the auto-encoding path and employs the reconstruction loss via back-translation.

Audio samples

Following are examples of direct speech-to-speech translation from Translatotron 3:

Spanish-to-English (on Conversational dataset)

Input (Spanish)
TTS-synthesized reference (English)   
Translatotron 3 (English)

Spanish-to-English (on CommonVoice11 Synthesized dataset)

Input (Spanish)
TTS-synthesized reference (English)   
Translatotron 3 (English)

Spanish-to-English (on CommonVoice11 dataset)

Input (Spanish)
TTS reference (English)
Translatotron 3 (English)   

Performance

To empirically evaluate the performance of the proposed approach, we conducted experiments on English and Spanish using various datasets, including the Common Voice 11 dataset, as well as two synthesized datasets derived from the Conversational and Common Voice 11 datasets.

The translation quality was measured by BLEU (higher is better) on ASR (automatic speech recognition) transcriptions from the translated speech, compared to the corresponding reference translation text. Whereas, the speech quality is measured by the MOS score (higher is better). Furthermore, the speaker similarity is measured by the average cosine similarity (higher is better).

Because Translatotron 3 is an unsupervised method, as a baseline we used a cascaded S2ST system that is combined from ASR, unsupervised machine translation (UMT), and TTS (text-to-speech). Specifically, we employ UMT that uses the nearest neighbor in the embedding space in order to create the translation.

Translatotron 3 outperforms the baseline by large margins in every aspect we measured: translation quality, speaker similarity, and speech quality. It particularly excelled on the conversational corpus. Moreover, Translatotron 3 achieves speech naturalness similar to that of the ground truth audio samples (measured by MOS, higher is better).

Translation quality (measured by BLEU, where higher is better) evaluated on three Spanish-English corpora.
Speech similarity (measured by average cosine similarity between input speaker and output speaker, where higher is better) evaluated on three Spanish-English corpora.
Mean-opinion-score (measured by average MOS metric, where higher is better) evaluated on three Spanish-English corpora.

Future work

As future work, we would like to extend the work to more languages and investigate whether zero-shot S2ST can be applied with the back-translation technique. We would also like to examine the use of back-translation with different types of speech data, such as noisy speech and low-resource languages.

Acknowledgments

The direct contributors to this work include Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayutsh Asawaroengchai, Heiga Zhen, and Michelle Tadmor Ramanovich. We also thank Yu Zhang, Yuma Koizumi, Soroosh Mariooryad, RJ Skerry-Ryan, Neil Zeghidour, Christian Frank, Marco Tagliasacchi, Nadav Bar, Benny Schlesinger and Yonghui Wu.

Read More

PwR: Using representations for AI-powered software development

PwR: Using representations for AI-powered software development

This research is being presented at the Agami Summit 2023 (opens in new tab), an annual forum in Maharashtra, India, for innovation in the field of law and justice.

Flowchart showing natural language is transformed into a program in domain specific language using an LLM. This step is called Intent formalization. The user is able to modify, repair and query. The Program in DSL is then converted into natural language representation that can be in text or visual formats. The Program in DSL is also separatedly converted into Code via the Code Generation pipeline. This step is called Robust Code Generation.

In one scenario of the future, such as the one imagined by Matt Welsh for the Association of Computing Machinery (ACM) (opens in new tab), AI will take the lead in coding while humans oversee the process. This shift will require people to take a supervisory role, focusing on high-level tasks while leaving the code details to AI. As we envision this transformation, we face a critical question: How can we reimagine software development to not just improve developer productivity but also to ensure software safety, reliability, and maintainability while keeping it personalized to developer preferences?

Realizing this outcome relies on AI and developers establishing a common understanding. While natural language can facilitate AI-developer interaction, it also introduces the potential for misinterpreting tasks. Existing solutions address this gap, prompting AI to communicate its understanding in a structured natural-language document. This document can then be inspected, edited, and approved by the developer. While effective, the developer still needs to vet the resulting AI-generated code for safety and reliability, requiring both domain and coding expertise. Our goal is to decouple this requirement, paving the way for numerous organizations and individuals, including those without coding expertise, to develop software. 

PwR approach

Programming with Representations (PwR, pronounced “power”), which we are presenting at the Agami Summit 2023 (opens in new tab), is a software development approach that relies on a domain-specific language (DSL), or representation, defined by a developer specializing in a specific domain. This representation includes built-in guardrails that are automatically implemented throughout the software development process. Once a representation is defined for a domain, PwR enables any developer interested in that domain to translate their intentions using natural language into a program in that representation. This process is illustrated in Figure 1. 

Flowchart showing natural language is transformed into a program in domain specific language using an LLM. This step is called Intent formalization. The user is able to modify, repair and query. The Program in DSL is then converted into natural language representation that can be in text or visual formats. The Program in DSL is also separatedly converted into Code via the Code Generation pipeline. This step is called Robust Code Generation.
Figure 1. The PwR approach converts an ambiguous conversation in natural language into a program in a custom DSL. The DSL program is then transformed into executable code. Not only can the developer provide instructions and requirements, they can also inquire into the current state of the program, receive feedback, and update their instructions accordingly.

PwR uses large language models (LLMs) to interpret user conversations and transform them into DSL programs. This process involves traversing a code-generation pipeline to ultimately derive executable code. However, despite advancements in LLMs’ code generation, these models still grapple with limitations like hallucinations and limited context windows. Using DSL reduces the amount of code that LLMs need to generate, as most code can be generated from the DSL, increasing accuracy throughout the process.

The DSL incorporates guardrails, ensuring that the essential components are there, such as the starting state of a workflow, clearly defined transitions, and error handling protocols. These guardrails can be automatically examined and communicated back to the developer in natural language, allowing for necessary corrections. While certain guardrails may enhance safety preferences, developers still must confirm that the intended functionality was implemented. PwR simply acts as a facilitator within the cycle involving the developer, the LLM, and the DSL checker. 

PwR does not require developers to learn a custom DSL. Instead, it generates a natural-language representation (NLR) of the DSL. Developers can inspect this NLR, essentially programming in a natural language representation while the underlying DSL remains concealed. This approach grants developers the flexibility and ease of interacting with a natural-language representation while preserving the precision of their intent within the DSL. Additionally, developers can access a live test environment where their code can be hosted and tested for functionality. These capabilities are integrated into the PwR Studio tool, making it easy to get started.

PwR lowers the programming barrier, empowering nontechnical domain experts like teachers to create software tailored to their specific needs. Additionally, it can improve productivity for complex, multidisciplinary software engineering teams, enabling them to efficiently handle large volumes of changes.

Creating a welfare scheme application with PwR

Let’s take an example of how PwR can be applied. In a scenario where a nongovernment organization (NGO) aims to develop an application facilitating citizen access to government welfare schemes—enabling search, identification, and application processes involving authentication and deposits—the orchestration of multiple components is crucial. It is vital to accurately set up these components before deploying them at scale, given the program’s involvement with user data and monetary transactions. 

Reliable orchestration of these types of components can significantly enhance all types applications. We initiate this process by building a custom DSL, encoding interconnected workflows comprising various tasks. Each task represents a singular action that might involve calling an external API or another workflow. This DSL seamlessly interacts with external APIs through plugins available through the PwR Studio store. 

The following video demonstrates how PwR Studio, configured with the DSL workflow, constructs the NGO application. A developer augments the initial version of the application by incorporating the payment feature. This is accomplished by conversing with PwR Studio, understanding the specific requirements, and implementing necessary modifications. Additionally, the developer gains access to a test environment where they can launch and interact with the application in a controlled setting. 

Video: Step-by-step workflow of a developer building a bot in PwR Studio.

Looking forward

We intend to provide PwR Studio as an open-source integrated development environment (IDE) for creating software through conversations. Our initial aim is to facilitate workflow-based applications for NGOs and social enterprises that have little access to technical expertise. However, our ambitions stretch far beyond this scope.

With the recent success of GitHub Copilot for conversational code generation and recent announcements surrounding OpenAI’s GPTs framework for programming ChatGPT-like bots, it’s evident that AI is poised to democratize software development, granting everyone the ability to create software. With that, it’s imperative to prioritize safety and reliability. PwR is an approach that incorporates these priorities, where the insights of a few technical experts guide a large community of developers through the power of representation. We encourage the software development community to experiment with PwR, and similar ideas, to build safe and reliable AI-powered software.

Learn more on the PwR project page.

Acknowledgements

PwR is the result of a joined collaboration with several of our colleagues, including Sriram Rajamani, B. Ashok, Mohit Jain, Vageesh D C, Dinesh KA, and Sanoop Menon. We would also like to thank Vyshak Jain, Drishti Goel, Hamna, and Sanoop Menon for their help in creating the video.

The post PwR: Using representations for AI-powered software development appeared first on Microsoft Research.

Read More