September 2023 – Page 2

Beyond forecasting: The delicate balance of serving customers and growing your business

Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of goods are needed, businesses often need to know where they will be needed, to create a geographically optimal inventory.

The delicate balance of oversupply and undersupply

If manufacturers produce too few parts or finished goods, the resulting undersupply can cause them to make tough choices of rationing available resources among their trading partners or business units. As a result, purchase orders may have lower acceptance rates with fewer profits realized. Further down the supply chain, if a retailer has too few products to sell, relative to demand, they can disappoint shoppers due to out-of-stocks. When the retail shopper has an immediate need, these shortfalls can result in the purchase from an alternate retailer or substitutable brand. This substitution can be a churn risk if the alternate becomes the new default.

On the other end of the supply pendulum, an oversupply of goods can also incur penalties. Surplus items must now be carried in inventory until sold. Some degree of safety stock is expected to help navigate through expected demand uncertainty; however, excess inventory leads to inefficiencies that can dilute an organization’s bottom line. Especially when products are perishable, an oversupply can lead to the loss of all or part of the initial investment made to acquire the sellable finished good.

Even when products are not perishable, during storage they effectively become an idle resource that could be available on the balance sheet as free cash or used to pursue other investments. Balance sheets aside, storage and carrying costs are not free. Organizations typically have a finite amount of arranged warehouse and logistics capabilities. They must operate within these constraints, using available resources efficiently.

Faced with choosing between oversupply and undersupply, on average, most organizations prefer to oversupply by explicit choice. The measurable cost of undersupply is often higher, sometimes by several multiples, when compared to the cost of oversupply, which we discuss in sections that follow.

The main reason for the bias towards oversupply is to avoid the intangible cost of losing goodwill with customers whenever products are unavailable. Manufacturers and retailers think about long-term customer value and want to foster brand loyalty—this mission helps inform their supply chain strategy.

In this section, we examined inequities resulting from allocating too many or too few resources following a demand planning process. Next, we investigate time series forecasting and how demand predictions can be optimally matched with item-level supply strategies.

Classical approaches to sales and operations planning cycles

Historically, forecasting has been achieved with statistical methods that result in point forecasts, which provide a most-likely value for the future. This approach is often based on forms of moving averages or linear regression, which seeks to fit a model using an ordinary least squares approach. A point forecast consists of a single mean prediction value. Because the point forecast value is centered on a mean, it is expected that the true value will be above the mean, approximately 50% of the time. This leaves a remaining 50% of the time when the true number will fall below the point forecast.

Point forecasts may be interesting, but they can result in retailers running out of must-have items 50% of the time if followed without expert review. To prevent underserving customers, supply and demand planners apply manual judgement overrides or adjust point forecasts by a safety stock formula. Companies may use their own interpretation of a safety stock formula, but the idea is to help ensure product supply is available through an uncertain short-term horizon. Ultimately, planners will need to decide whether to inflate or deflate the mean point forecast predictions, according to their rules, interpretations, and subjective view of the future.

Modern, state-of-the-art time series forecasting enables choice

To meet real-world forecasting needs, AWS provides a broad and deep set of capabilities that deliver a modern approach to time series forecasting. We offer machine learning (ML) services that include but are not limited to Amazon SageMaker Canvas (for details, refer to Train a time series forecasting model faster with Amazon SageMaker Canvas Quick build), Amazon Forecast (Start your successful journey with time series forecasting with Amazon Forecast), and Amazon SageMaker built-in algorithms (Deep demand forecasting with Amazon SageMaker). In addition, AWS developed an open-source software package, AutoGluon, which supports diverse ML tasks, including those in the time series domain. For more information, refer to Easy and accurate forecasting with AutoGluon-TimeSeries.

Consider the point forecast discussed in the prior section. Real-world data is more complicated than can be expressed with an average or a straight regression line estimate. In addition, because of the imbalance of over and undersupply, you need more than a single point estimate. AWS services address this need by the use of ML models coupled with quantile regression. Quantile regression enables you to select from a wide range of planning scenarios, which are expressed as quantiles, rather than rely on single point forecasts. It is these quantiles that offer choice, which we describe in more detail in the next section.

Forecasts designed to serve customers and generate business growth

The following figure provides a visual of a time series forecast with multiple outcomes, made possible through quantile regression. The red line, denoted with p05, offers a probability that the real number, whatever it may be, is expected to fall below the p05 line, about 5% of the time. Conversely, this means 95% of the time, the true number will likely fall above the p05 line.

Next, observe the green line, denoted with p70. The true value will fall below the p70 line about 70% of the time, leaving a 30% chance it will exceed the p70. The p50 line provides a mid-point perspective about the future, with a 50/50 chance values will fall above or below the p50, on average. These are examples, but any quantile can be interpreted in the same manner.

In the following section, we examine how to measure if the quantile predictions produce an over or undersupply by item.

Measuring oversupply and undersupply from historic data

The previous section demonstrated a graphical way to observe predictions; another way to view them is in a tabular way, as shown in the following table. When creating time series models, part of the data is held back from the training operation, which allows accuracy metrics to be generated. Although the future is uncertain, the main idea here is that accuracy during a holdback period is the best approximation of how tomorrow’s predictions will perform, all other things being equal.

The table doesn’t show accuracy metrics; rather, it shows true values known from the past, alongside several quantile predictions from p50 through p90 in steps of 10. During the recent historic five time periods, the true demand was 218 units. Quantile predictions offer a range of values, from a low of 189 units, to a high of 314 units. With the following table, it’s easy to see p50 and p60 result in an undersupply, and the last three quantiles result in an oversupply.

We previously pointed out that there is an asymmetry in over and undersupply. Most businesses who make a conscious choice to oversupply do so to avoid disappointing customers. The critical question becomes: “For the future ahead, which quantile prediction number should the business plan against?” Given the asymmetry that exists, a weighted decision needs to be made. This need is addressed in the next section where forecasted quantities, as units, are converted to their respective financial meanings.

Automatically selecting correct quantile points based on maximizing profit or customer service goals

To convert quantile values to business values, we must find the penalty associated with each unit of overstock and with each unit of understock, because these are rarely equal. A solution for this need is well-documented and studied in the field of operations research, referred to as a newsvendor problem. Whitin (1955) was the first to formulate a demand model with pricing effects included. The newsvendor problem is named from a time when news sellers had to decide how many newspapers to purchase for the day. If they chose a number too low, they would sell out early and not reach their income potential the day. If they chose a number too high, they were stuck with “yesterday’s news” and would risk losing part of their early morning speculative investment.

To compute per-unit the over and under penalties, there are a few pieces of data necessary for each item you wish to forecast. You may also increase the complexity by specifying the data as an item+location pair, item+customer pair, or other combinations according to business need.

Expected sales value for the item.
All-in cost of goods to purchase or manufacture the item.
Estimated holding costs associated with carrying the item in inventory, if unsold.
Salvage value of the item, if unsold. If highly perishable, the salvage value could approach zero, resulting in a full loss of the original cost of goods investment. When shelf stable, the salvage value can fall anywhere under the expected sales value for the item, depending on the nature of a stored and potentially aged item.

The following table demonstrates how the quantile points were self-selected from among the available forecast points in known historical periods. Consider the example of item 3, which had a true demand of 1,578 units in prior periods. A p50 estimate of 1,288 units would have undersupplied, whereas a p90 value of 2,578 units would have produced a surplus. Among the observed quantiles, the p70 value produces a maximum profit of $7,301. Knowing this, you can see how a p50 selection would result in a near $1,300 penalty, compared to the p70 value. This is only one example, but each item in the table has a unique story to tell.

Solution overview

The following diagram illustrates a proposed workflow. First, Amazon SageMaker Data Wrangler consumes backtest predictions produced by a time series forecaster. Next, backtest predictions and known actuals are joined with financial metadata on an item basis. At this point, using backtest predictions, a SageMaker Data Wrangler transform computes the unit cost for under and over forecasting per item.

SageMaker Data Wrangler translates the unit forecast into a financial context and automatically selects the item-specific quantile that provides the highest amount of profit among quantiles examined. The output is a tabular set of data, stored on Amazon S3, and is conceptually similar to the table in the previous section.

Finally, a time series forecaster is used to produce future-dated forecasts for future periods. Here, you may also choose to drive inference operations, or act on inference data, according to which quantile was chosen. This may allow you to reduce computational costs while also removing the burden of manual review of every single item. Experts in your company can have more time to focus on high-value items while thousands of items in your catalog can have automatic adjustments applied. As a point of consideration, the future has some degree of uncertainty. However, all other things being equal, a mixed selection of quantiles should optimize outcomes in an overall set of time series. Here at AWS, we advise you to use two holdback prediction cycles to quantify the degree of improvements found with mixed quantile selection.

Solution guidance to accelerate your implementation

If you wish to recreate the quantile selection solution discussed in this post and adapt it to your own dataset, we provide a synthetic sample set of data and a sample SageMaker Data Wrangler flow file to get you started on GitHub. The entire hands-on experience should take you less than an hour to complete.

We provide this post and sample solution guidance to help accelerate your time to market. The primary enabler for recommending specific quantiles is SageMaker Data Wrangler, a purpose-built AWS service meant to reduce the time it takes to prepare data for ML use cases. SageMaker Data Wrangler provides a visual interface to design data transformations, analyze data, and perform feature engineering.

If you are new to SageMaker Data Wrangler, refer to Get Started with Data Wrangler to understand how to launch the service through Amazon SageMaker Studio. Independently, we have more than 150 blog posts that help discover diverse sample data transformations addressed by the service.

Conclusion

In this post, we discussed how quantile regression enables multiple business decision points in time series forecasting. We also discussed the imbalanced cost penalties associated with over and under forecasting—often the penalty of undersupply is several multiples of the oversupply penalty, not to mention undersupply can cause the loss of goodwill with customers.

The post discussed how organizations can evaluate multiple quantile prediction points with a consideration for the over and undersupply costs of each item to automatically select the quantile likely to provide the most profit in future periods. When necessary, you can override the selection when business rules desire a fixed quantile over a dynamic one.

The process is designed to help meet business and financial goals while removing the friction of having to manually apply judgment calls to each item forecasted. SageMaker Data Wrangler helps the process run on an ongoing basis because quantile selection must be dynamic with changing real-world data.

It should be noted that quantile selection is not a one-time event. The process should be evaluated during each forecasting cycle as well, to account for changes including increased cost of goods, inflation, seasonal adjustments, new product introduction, shifting consumer demands, and more. The proposed optimization process is positioned after the time series model generation, referred to as the model training step. Quantile selections are made and used with the future forecast generation step, sometimes called the inference step.

If you have any questions about this post or would like a deeper dive into your unique organizational needs, please reach out to your AWS account team, your AWS Solutions Architect, or open a new case in our support center.

References

DeYong, G. D. (2020). The price-setting newsvendor: review and extensions. International Journal of Production Research, 58(6), 1776–1804.
Liu, C., Letchford, A. N., & Svetunkov, I. (2022). Newsvendor problems: An integrated method for estimation and optimisation. European Journal of Operational Research, 300(2), 590–601.
Punia, S., Singh, S. P., & Madaan, J. K. (2020). From predictive to prescriptive analytics: A data-driven multi-item newsvendor model. Decision Support Systems, 136.
Trapero, J. R., Cardós, M., & Kourentzes, N. (2019). Quantile forecast optimal combination to enhance safety stock estimation. International Journal of Forecasting, 35(1), 239–250.
Whitin, T. M. (1955). Inventory control and price theory. Management Sci. 2 61–68.

About the Author

Charles Laughlin is a Principal AI/ML Specialist Solution Architect and works in the Amazon SageMaker service team at AWS. He helps shape the service roadmap and collaborates daily with diverse AWS customers to help transform their businesses using cutting-edge AWS technologies and thought leadership. Charles holds a M.S. in Supply Chain Management and a Ph.D. in Data Science.

Kicking Games Up a Notch: Startup Sports Vision AI to Broadcast Athletics Across the Globe

Pixellot is scoring with vision AI — making it easier for organizations to deliver real-time sports broadcasting and analytics to viewers across the globe.

A member of the NVIDIA Metropolis vision AI partner ecosystem, the company based near Tel Aviv offers an AI-powered platform that automates the capturing, streaming and analysis of sporting events.

It’s changing the game for fans, coaches and players of nearly 20 different sports — not just basketball and soccer but also rugby and handball — as it broadcasts events and provides analytics from more than 30,000 venues across 70+ countries. In the U.S., Pixellot powers the broadcasting of over a million games every year through its partnership with the NFHS Network, a leader in streaming live and on-demand high school sports.

Through its broadcasting partners like the NFHS Network, MLB and others, Pixellot provides professional analytics, post-match breakdowns and highlights based on jersey numbers with shot charts and heat maps — which can be especially useful for coaches and players of school and pro sports alike as they study their moves to up their game. It also enables interactive experiences for users, who can manipulate viewframes and cut their own highlights for a game.

Recently, SuperSport Schools, a company based in Cape Town, South Africa, deployed the Pixellot platform to power an app that broadcasts student athletics across the nation, where more than 1,500 high schools are active in sports.

“Our goal is to democratize the coverage of sports with the help of AI and automation,” said Yossi Tarablus, who leads marketing at Pixellot, a member of the NVIDIA Inception program for cutting-edge startups. “Using the NVIDIA Jetson platform for edge AI, Pixellot brings powerful technology for sports broadcasting and analytics to some of the world’s most remote areas.”

How Pixellot Works

During peak sports seasons, about 200,000 games a month are broadcasted across the globe using the Pixellot platform, according to Tarablus.

Lightweight Pixellot cameras powered by NVIDIA Jetson capture high-quality video of games, matches and even practices — and livestream them in high definition to users through an app in real time with an overlaid scoreboard, live stats, commentary and more.

The platform creates an automatic viewframe that simulates a camera operator, optimizes videos and corrects scene lighting using NVIDIA RTX ray-tracing technology.

In addition, the platform helps organizations and companies monetize sports while making them more accessible to viewers, as it enables over-the-top, or OTT, streaming — direct streaming over the internet without the need for a traditional cable or satellite TV provider.

In all of its camera setups, the Metropolis member runs the NVIDIA DeepStream software development kit for AI-powered video streaming analytics. And the company relies on the NVIDIA TensorRT SDK for high-performance deep learning inference.

“NVIDIA Jetson made it possible for Pixellot to create the most accurate and affordable AI-powered camera solution for broadcasting live sporting events,” said Gal Oz, chief technology officer and cofounder of Pixellot. “The versatility of Jetson modules in terms of camera pipeline, encoders and AI capabilities enabled Pixellot to develop multiple products based on the same hardware and software platform.”

Broadcasting South African School Sports

High-quality, real-time broadcasts of athletics are difficult to produce without access to a slew of graphics and data.

As the NVIDIA Jetson Orin NX module enables AI-powered video processing and GPU-accelerated computing right at the edge — on the field or at courtside — Pixellot lets organizations broadcast sports from anywhere.

“It’s amazing how many people have told us stories about a moment they were empowered to share with their children thanks to SuperSport Schools and Pixellot, because they couldn’t be there physically but were present through live or on-demand video,” said Kelvin Watt, managing director of Capitalize Media and SuperSport Schools, on the Pixellot deployment in South Africa.

The SuperSport Schools app, which is free and recently reached 600,000 subscribers, was the first to broadcast a junior nationals track race in the country.

At the event last year, a student named Viwe Jingqi broke 50-plus-year national records for both the 100- and 200-meter races for South African girls under 18 years old. People all over the world could easily witness these historic victories through the SuperSport Schools app, powered by Pixellot.

Building a Smart Sports City in China

In China, tech giant Baidu and the Chengdu Sports Authority are using Pixellot technology in an initiative to develop a smart sports city, with an initial focus on broadcasting community soccer.

Chengdu, the capital of southwestern China’s Sichuan province, is a sports-oriented city and was the host of this year’s World University Games, an event sanctioned by the International University Sports Federation.

“Pixellot’s AI-driven sports production solutions are a perfect fit for our strategic vision of delivering innovative technology solutions to communities,” said Liu Chuan, solution director of the intelligent cloud sports industry at Baidu.

“Broadcasting community soccer with vision AI is part of the Chengdu initiative’s efforts to emphasize the health benefits of engaging in sports recreationally,” said Tarablus. “It moves the spotlight from pro or Olympic sports to the importance of athletics for all.”

Learn more about the NVIDIA Metropolis application framework, developer tools and partner ecosystem.

Announcing New Tools to Help Every Business Embrace Generative AI

From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk to customers, they tell us they need security and privacy, scale and price-performance, and most importantly tech that is relevant to their business. We are excited to announce new capabilities and services today to allow organizations big and small to use generative AI in creative ways, building new applications and improving how they work. At AWS, we are hyper-focused on helping our customers in a few ways:

Making it easy to build generative AI applications with security and privacy built in
Focusing on the most performant, low cost infrastructure for generative AI so you can train your own models and run inference at scale
Providing generative AI-powered applications for the enterprise to transform how work gets done
Enabling data as your differentiator to customize foundation models (FMs) and make them an expert on your business, your data, and your company

To help a broad range of organizations build differentiated generative AI experiences, AWS has been working hand-in-hand with our customers, including BBVA, Thomson Reuters, Philips, and LexisNexis Legal & Professional. And with the new capabilities launched today, we look forward to enhanced productivity, improved customer engagement, and more personalized experiences that will transform how companies get work done.

Announcing the general availability of Amazon Bedrock, the easiest way to build generative AI applications with security and privacy built in

Customers are excited and optimistic about the value that generative AI can bring to the enterprise. They are diving deep into the technology to learn the steps they need to take to build a generative AI system in production. While recent advancements in generative AI have captured widespread attention, many businesses have not been able to take part in this transformation. Customers tell us they need a choice of models, security and privacy assurances, a data-first approach, cost-effective ways to run models, and capabilities like prompt engineering, retrieval augmented generation (RAG), agents, and more to create customized applications. That is why on April 13, 2023, we announced Amazon Bedrock, the easiest way to build and scale generative AI applications with foundation models. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models from leading providers like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, along with a broad set of capabilities that customers need to build generative AI applications, simplifying development while maintaining privacy and security. Additionally, as part of a recently announced strategic collaboration, all future FMs from Anthropic will be available within Amazon Bedrock with early access to unique features for model customization and fine-tuning capabilities.

Since April, we have seen firsthand how startups like Coda, Hurone AI, and Nexxiot; large enterprises like adidas, GoDaddy, and Broadridge; and partners like Accenture, BCG, Leidos, and Mission Cloud are already using Amazon Bedrock to securely build generative AI applications across industries. Independent software vendors (ISVs) like Salesforce are now securely integrating with Amazon Bedrock to enable their customers to power generative AI applications. Customers are applying generative AI to new use cases; for example, Lonely Planet, a premier travel media company, worked with our Generative AI Innovation Center to introduce a scalable AI platform that organizes book content in minutes to deliver cohesive, highly accurate travel recommendations, reducing itinerary generation costs by nearly 80%. And since then, we have continued to add new capabilities, like agents for Amazon Bedrock, as well as support for new models, like Cohere and the latest models from Anthropic, to offer our customers more choice and make it easier to create generative AI-based applications. Agents for Bedrock are a game changer, allowing LLMs to complete complex tasks based on your own data and APIs, privately, securely, with setup in minutes (no training or fine tuning required).

Today, we are excited to share new announcements that make it easier to bring generative AI to your organization:

General availability of Amazon Bedrock to help even more customers build and scale generative AI applications
Expanded model choice with Llama 2 (coming in the next few weeks) and Amazon Titan Embeddings gives customers greater choice and flexibility to find the right model for each use case and power RAG for better results
Amazon Bedrock is a HIPAA eligible service and can be used in compliance with GDPR, allowing even more customers to benefit from generative AI.
Provisioned throughput to ensure a consistent user experience even during peak traffic times

With the general availability of Amazon Bedrock, more customers will have access to Bedrock’s comprehensive capabilities. Customers can easily experiment with a variety of top FMs, customize them privately with their data using techniques such as fine tuning and RAG, and create managed agents that execute complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code. Since Amazon Bedrock is serverless, customers don’t have to manage any infrastructure, and they can securely integrate and deploy generative AI capabilities into their applications using the AWS services they are already familiar with.

Second, model choice has been a cornerstone of what makes Amazon Bedrock a unique, differentiated service for our customers. This early in the adoption of generative AI, there is no single model that unlocks all the value of generative AI, and customers need the ability to work with a range of high-performing models. We are excited to announce the general availability of Amazon Titan Embeddings and coming in the next few weeks availability of Llama 2, Meta’s next generation large language model (LLM) – joining existing model providers AI21 Labs, Anthropic, Cohere, Stability AI, and Amazon in further expanding choice and flexibility for customers. Amazon Bedrock is the first fully managed generative AI service to offer Llama 2, Meta’s next-generation LLM, through a managed API. Llama 2 models come with significant improvements over the original Llama models, including being trained on 40% more data and having a longer context length of 4,000 tokens to work with larger documents. Optimized to provide a fast response on AWS infrastructure, the Llama 2 models available via Amazon Bedrock are ideal for dialogue use cases. Customers can now build generative AI applications powered by Llama 2 13B and 70B parameter models, without the need to set up and manage any infrastructure.

Amazon Titan FMs are a family of models created and pretrained by AWS on large datasets, making them powerful, general purpose capabilities built to support a variety of use cases. The first of these models generally available to customers, Amazon Titan Embeddings, is an LLM that converts text into numerical representations (known as embeddings) to power RAG use cases. FMs are well suited for a wide variety of tasks, but they can only respond to questions based on learnings from the training data and contextual information in a prompt, limiting their effectiveness when responses require timely knowledge or proprietary data. Data is the difference between a general generative AI application and one that truly knows your business and your customer. To augment FM responses with additional data, many organizations turn to RAG, a popular model-customization technique where an FM connects to a knowledge source that it can reference to augment its responses. To get started with RAG, customers first need access to an embedding model to convert their data into vectors that allow the FM to more easily understand the semantic meaning and relationships between data. Building an embeddings model requires massive amounts of data, resources, and ML expertise, putting RAG out of reach for many organizations. Amazon Titan Embeddings makes it easier for customers to get started with RAG to extend the power of any FM using their proprietary data. Amazon Titan Embeddings supports more than 25 languages and a context length of up to 8,192 tokens, making it well suited to work with single words, phrases, or entire documents based on the customer’s use case. The model returns output vectors of 1,536 dimensions, giving it a high degree of accuracy, while also optimizing for low-latency, cost-effective results. With new models and capabilities, it’s easy to use your organization’s data as a strategic asset to customize foundation models and build more differentiated experiences.

Third, because the data customers want to use for customization is such valuable IP, they need it to remain secure and private. With security and privacy built in since day one, Amazon Bedrock customers can trust that their data remains protected. None of the customer’s data is used to train the original base FMs. All data is encrypted at rest and in transit. And you can expect the same AWS access controls that you have with any other AWS service. Today, we are excited to build on this foundation and introduce new security and governance capabilities – Amazon Bedrock is now a HIPAA eligible service and can be used in compliance with GDPR, allowing even more customers to benefit from generative AI. New governance capabilities include integration with Amazon CloudWatch to track usage metrics and build customized dashboards and integration with AWS CloudTrail to monitor API activity and troubleshoot issues. These new governance and security capabilities help organizations unlock the potential of generative AI, even in highly regulated industries, and ensure that data remains protected.

Finally, certain periods of the year, like the holidays, are critical for customers to make sure their users can get uninterrupted service from applications powered by generative AI. During these periods, customers want to ensure their service is available to all of its customers regardless of the demand. Amazon Bedrock now allows customers to reserve throughput (in terms of tokens processed per minute) to maintain a consistent user experience even during peak traffic times.

Together, the new capabilities and models we announced today for Amazon Bedrock will accelerate how quickly enterprises can build more personalized applications and enhance employee productivity. In concert with our ongoing investments in ML infrastructure, Amazon Bedrock is the best place for customers to build and scale generative AI applications.

To help customers get started quickly with these new features, we are adding a new generative AI training for Amazon Bedrock to our collection of digital, on-demand training courses. Amazon Bedrock – Getting Started is a free, self-paced digital course that introduces learners to the service. This 60-minute course will introduce developers and technical audiences to Amazon Bedrock’s benefits, features, use cases, and technical concepts.

Announcing Amazon CodeWhisperer customization capability to generate more relevant code recommendations informed by your organization’s code base

At AWS, we are building powerful new applications that transform how our customers get work done with generative AI. In April 2023, we announced the general availability of Amazon CodeWhisperer, an AI coding companion that helps developers build software applications faster by providing code suggestions across 15 languages, based on natural language comments and code in a developer’s integrated developer environment (IDE). CodeWhisperer has been trained on billions of lines of publicly available code to help developers be more productive across a wide range of tasks. We have specially trained CodeWhisperer on high-quality Amazon code, including AWS APIs and best practices, to help developers be even faster and more accurate generating code that interacts with AWS services like Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), and AWS Lambda. Customers from Accenture to Persistent to Bundesliga have been using CodeWhisperer to help make their developers more productive.

Many customers also want CodeWhisperer to include their own internal APIs, libraries, best practices, and architectural patterns in its suggestions, so they can speed up development even more. Today, AI coding companions are not able to include these APIs in their code suggestions because they are typically trained on publicly available code, and so aren’t aware of a company’s internal code. For example, to build a feature for an ecommerce website that lists items in a shopping cart, developers have to find and understand existing internal code, such as the API that provides the description of items, so they can display the description in the shopping cart. Without a coding companion capable of suggesting the correct, internal code for them, developers have to spend hours digging through their internal code base and documentation to complete their work. Even after developers are able to find the right resources, they have to spend more time reviewing the code to make sure it follows their company’s best practices.

Today, we are excited to announce a new Amazon CodeWhisperer customization capability, which enables CodeWhisperer to generate even better suggestions than before, because it can now include your internal APIs, libraries, best practices, and architectural patterns. This capability uses the latest model and context customization techniques and will be available in preview soon as part of a new CodeWhisperer Enterprise Tier. With this capability, you can securely connect your private repositories to CodeWhisperer, and with a few clicks, customize CodeWhisperer to generate real-time recommendations that include your internal code base. For example, with a CodeWhisperer customization, a developer working in a food delivery company can ask CodeWhisperer to provide recommendations that include specific code related to the company’s internal services, such as “Process a list of unassigned food deliveries around the driver’s current location.” Previously, CodeWhisperer would not know the correct internal APIs for “unassigned food deliveries” or “driver’s current location” because this isn’t publicly available information. Now, once customized on the company’s internal code base, CodeWhisperer understands the intent, determines which internal and public APIs are best suited to the task, and generates code recommendations for the developer. The CodeWhisperer customization capability can save developers hours spent searching and modifying sparsely documented code, and helps onboard developers who are new to the company faster.

In the following example, after creating a private customization, AnyCompany (a food delivery company) developers get CodeWhisperer code recommendations that include their internal APIs and libraries.

We conducted a recent study with Persistent, a global services and solutions company delivering digital engineering and enterprise modernization services to customers, to measure the productivity benefits of the CodeWhisperer customization capability. Persistent found that developers using the customization capability were able to complete their coding tasks up to 28% faster, on average, than developers using standard CodeWhisperer.

We designed this customization capability with privacy and security at the forefront. Administrators can easily manage access to a private customization from the AWS Management Console, so that only specific developers have access. Administrators can also ensure that only repositories that meet their standards are eligible for use in a CodeWhisperer customization. Using high-quality repositories helps CodeWhisperer make suggestions that promote security and code quality best practices. Each customization is completely isolated from other customers and none of the customizations built with this new capability will be used to train the FM underlying CodeWhisperer, protecting customers’ valuable intellectual property.

Announcing the preview of Generative BI authoring capabilities in Amazon QuickSight to help business analysts easily create and customize visuals using natural-language commands

AWS has been on a mission to democratize access to insights for all users in the organization. Amazon QuickSight, our unified business intelligence (BI) service built for the cloud, allows insights to be shared across all users in the organization. With QuickSight, we’ve been using generative models to power Amazon QuickSight Q, which enable any user to ask questions of their data using natural language, without having to write SQL queries or learn a BI tool, since 2020. In July 2023, we announced that we are furthering the early innovation in QuickSight Q with the new LLM capabilities to provide Generative BI capabilities in QuickSight. Current QuickSight customers like BMW Group and Traeger Grills are looking forward to further increasing productivity of their analysts using the Generative BI authoring experience.

Today, we are excited to make these LLM capabilities available in preview with Generative BI dashboard authoring capabilities for business analysts. The new Generative BI authoring capabilities extend the natural-language querying of QuickSight Q beyond answering well-structured questions (such as “what are the top 10 products sold in California?”) to help analysts quickly create customizable visuals from question fragments (such as “top 10 products”), clarify the intent of a query by asking follow-up questions, refine visualizations, and complete complex calculations. Business analysts simply describe the desired outcome, and QuickSight generates compelling visuals that can be easily added to a dashboard or report with a single click. QuickSight Q also offers related questions to help analysts clarify ambiguous cases when multiple data fields match their query. When the analyst has the initial visualization, they can add complex calculations, change chart types, and refine visuals using natural language prompts. The new Generative BI authoring capabilities in QuickSight Q make it fast and easy for business analysts to create compelling visuals and reduce the time to deliver the insights needed to inform data-driven decisions at scale.

Creating visuals using Generative BI capabilities in Amazon QuickSight

Generative AI tools and capabilities for every business

Today’s announcements open generative AI up to any customer. With enterprise-grade security and privacy, choice of leading FMs, a data-first approach, and a highly performant, cost-effective infrastructure, organizations trust AWS to power their innovations with generative AI solutions at every layer of the stack. We have seen exciting innovation from Bridgewater Associates to Omnicom to Rocket Mortgage, and with these new announcements, we look forward to new use cases and applications of the technology to boost productivity. This is just the beginning—across the technology stack, we are innovating with new services and capabilities built for your organization to help tackle some of your largest challenges and change how we work.

Resources

To learn more, check out the following resources:

Explore generative AI on AWS
Learn about Amazon Bedrock, the easiest way to build and scale generative AI applications with FMs
Learn more about Llama2 on Amazon Bedrock
Learn about Amazon Titan, high-performing FMs from Amazon to innovate responsibly
Learn how you can use the Amazon CodeWhisperer customization capability
Learn more about Generative BI features for QuickSight
Discover generative AI solutions from AWS Partners in AWS Marketplace

About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

V for Victory: ‘Cyberpunk 2077: Phantom Liberty’ Comes to GeForce NOW

The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology.

It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software.

A New Look for Night City

Cyberpunk 2077: Phantom Liberty on GeForce NOW — Experience NVIDIA DLSS 3.5 in *Cyberpunk 2077’s* spy-thriller expansion.

Take on a thrilling challenge with Phantom Liberty, an all-new adventure for Cyberpunk 2077. When the orbital shuttle of the President of the New United States of America is shot down over the deadliest district of Night City, there’s only one person who can save her. Become V, a cyberpunk for hire, and dive into a tangled web of espionage and political intrigue, unraveling a story that connects the highest echelons of power with the brutal world of black-market mercenaries.

Ultimate members can return to the neon lights of Night City and experience the benefits of NVIDIA DLSS 3.5 and its new Ray Reconstruction technology. These updates enhance the quality of full ray tracing in Cyberpunk 2077’s Ray Tracing: Overdrive Mode, as part of the game’s 2.0 update available for the base game for free and included with the Phantom Liberty expansion. Upgrade to a GeForce NOW Ultimate membership today to see Night City at its best.

Prepare for War

Quake II on GeForce NOW — id Software’s classic first-person shooter is better than ever, streaming from the cloud.

Experience the authentic, enhanced and complete version of id Software’s critically acclaimed first-person shooter, Quake II, now streaming from the cloud.

Humankind is at war with the Strogg, a hostile alien race that’s attacked Earth. In response, humanity launched a strike on the Strogg homeworld — which failed. Outnumbered and outgunned, battle through fortified military installations to shut down the enemy’s war machine. Only then will the fate of humanity be decided.

Quake II includes a new, enhanced version of id Software’s classic, along with both original mission packs: “The Reckoning” and “Ground Zero.” Plus, battle through 28 campaign levels in MachineGames’ all-new “Call of the Machine” expansion and play through the exclusive levels from Quake II 64 for the first time on PC. Blast the Stroggs or friends in classic multiplayer modes at up to 4K and 120 frames per second or with ultrawide resolutions for GeForce NOW Ultimate members.

Nice Shootin,’ Tex

GeForce NOW Kovaaks Ultimate Challenge — Ultimate leads the way.

The GeForce NOW Ultimate KovaaK’s challenge is complete, and the results are in: Ultimate power means more ultimate wins. Members worked to sharpen their skills and win amazing prizes during the challenge while experiencing the power of Ultimate for themselves. With up to 240 fps streaming at ultra-low latency, gamers are playing up to their ultimate potential, just by upgrading to an Ultimate membership.

The proof is in the data. Check out some powerful stats showing what Ultimate members accomplished:

Nearly 15,000 people took on the challenge, playing over 120,000 sessions.
Members saw a 2x boost in scores when playing on Ultimate over a Free membership.
All of the leaderboard’s top 25 slots were filled with those playing on Ultimate.

But don’t just take our word for it. Here’s what it felt like for TinooQ, who placed third overall in the challenge:

“As a long-time KovaaK’s user, transitioning to this platform was seamless, as the precision and responsiveness was nothing short of extraordinary.

“The minimal latency and the consistent 240 fps made me think that many people could rely solely on the GeForce NOW Ultimate plan and a monitor, that’s all. I found it perfect for top gaming, saving a lot of money and the PC hardware headaches I suffered when building mine.” — TinooQ

And check out what the press had to say about the Ultimate membership tier:

“GeForce NOW KovaaK’s Challenge Proves Gaming At Glorious 240 FPS Matters” – Hot Hardware

“Hot dang does GeForce Now Ultimate ever deliver.” – Tom’s Guide

“NVIDIA GeForce NOW has held the crown for cloud gaming performance for a while now, and it’s just getting better.” – 9to5Google

“Unsurprisingly, NVIDIA says 98% of the users who tried Kovaak’s Challenge on the GeForce NOW Ultimate tier have seen improvements in their test results over the free tier.” – Wccftech

“Nvidia has taken a different approach to cloud gaming: Instead of boosting their library and settling for 1080p 60fps, Nvidia’s GeForce Now service prioritizes performance, implementing faster graphics cards for players to use.” – SlashGear

“One of the distinguishing factors of GeForce Now is its superior image quality and lack of noticeable input delay. ” – Game Is Hard

“NVIDIA’s GeForce NOW is widely considered one of the best cloud gaming platforms in terms of latency, visual fidelity, and overall experience.” — TweakTown

Everyone’s a winner when they play on Ultimate. Upgrade today for the best performance in the cloud, even when streaming popular shooters like Counter Strike, Destiny 2, Tom Clancy’s Rainbow Six Siege and more, where every frame counts.

Even better, Ultimate members get a free copy of KovaaK’s – the world’s best aim trainer. Don’t miss this chance to claim the reward, starting today, only for Ultimate members. Be on the lookout for an email starting today, available only for a limited time.

Challenge Accepted

Infinity Strash DRAGON QUEST The Adventure of Dai on GeForce NOW — Be a hero. Be Dai.

Square Enix’s Infinity Strash: DRAGON QUEST The Adventure of Dai leads 26 new titles in the GeForce NOW library this week. In this action role-playing game based on the popular anime and manga series of the same name, Dai and his friends must fight the Dark Lord Hadlar and his evil army of monsters. Fulfill Dai’s dream of becoming a hero in this game, which features fast-paced, dynamic combat, stunning anime-style graphics and a rich storyline.

Here’s the full list of what’s joining this week:

These Doomed Isles (New release on Steam, Sept. 25)
Paleo Pines (New release on Steam, Sept. 26)
Infinity Strash: DRAGON QUEST The Adventure of Dai (New release on Steam, Sept. 28)
Pizza Possum (New release on Steam, Sept. 28)
Wildmender (New release on Steam, Sept. 28)
Overpass 2 (New release on Steam, Sept. 28)
Soulstice (New release on Epic Games Store, Free on Sept. 28)
Amnesia: Rebirth (Xbox, available on PC Game Pass)
BlazBlue: Cross Tag Battle (Xbox, available on PC Game Pass)
Bramble: The Mountain King (Xbox, available on PC Game Pass)
Broforce (Steam)
Don Duality (Steam)
Doom Eternal (Xbox, available on PC Game Pass)
Dordogne (Xbox, available on PC Game Pass)
Dust Fleet (Steam)
Eastern Exorcist (Xbox, available on PC Game Pass)
Figment 2: Creed Valley (Xbox, available on PC Game Pass)
I Am Fish (Xbox)
Necesse (Steam)
A Plague Tale: Innocence (Xbox)
Quake II (Steam, Epic Games Store and Xbox, available on PC Game Pass)
Road 96 (Xbox)
Spacelines from the Far Out (Xbox)
Totally Reliable Delivery Service (Xbox, available on PC Game Pass)
Warhammer 40,000: Battlesector (Xbox, available on PC Game Pass)
Yooka-Laylee and the Impossible Lair (Xbox)

What are you planning to play this weekend? Let us know on Twitter or in the comments below.

A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction

The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline design values, to review of value engineering proposals, to analysis of reports and compliance checks. Today, these are addressed by a Central Technical Team, comprised of subject matter experts (SMEs) who can answer such highly technical specialized questions, and provide this service to all stakeholders and teams throughout the project lifecycle. The team is looking for a generative AI question answering solution to quickly get information and proceed with their engineering design. Notably, these use cases are not limited to the Amazon D&C team alone but are applicable to the broader scope of Global Engineering Services involved in project deployment. The entire range of stakeholders and teams engaged in the project lifecycle can benefit from a generative AI question-answering solution, as it will enable quick access to critical information, streamlining the engineering design and project management processes.

The existing generative AI solutions for question answering are mainly based on Retrieval Augmented Generation (RAG). RAG searches documents through large language model (LLM) embedding and vectoring, creates the context from search results through clustering, and uses the context as an augmented prompt to inference a foundation model to get the answer. This method is less efficient for the highly technical documents from Amazon D&C, which contains significant unstructured data such as Excel sheets, tables, lists, figures, and images. In this case, the question answering task works better by fine-tuning the LLM with the documents. Fine-tuning adjusts and adapts the weights of the pre-trained LLM to improve the model quality and accuracy.

To address these challenges, we present a new framework with RAG and fine-tuned LLMs. The solution uses Amazon SageMaker JumpStart as the core service for the model fine-tuning and inference. In this post, we not only provide the solution, but also discuss the lessons learned and best practices when implementing the solution in real-world use cases. We compare and contrast how different methodologies and open-source LLMs performed in our use case and discuss how to find the trade-off between model performance and compute resource costs.

Solution overview

The solution has the following components, as shown in the architecture diagram:

Content repository – The D&C contents include a wide range of human-readable documents with various formats, such as PDF files, Excel sheets, wiki pages, and more. In this solution, we stored these contents in an Amazon Simple Storage Service (Amazon S3) bucket and used them as a knowledge base for information retrieval as well as inference. In the future, we will build integration adapters to access the contents directly from where they live.
RAG framework with a fine-tuned LLM – This consists of the following subcomponents:
1. RAG framework – This retrieves the relevant data from documents, augments the prompts by adding the retrieved data in context, and passes it to a fine-tuned LLM to generate outputs.
2. Fine-tuned LLM – We constructed the training dataset from the documents and contents and conducted fine-tuning on the foundation model. After the tuning, the model learned the knowledge from the D&C contents, and therefore can respond to the questions independently.
3. Prompt validation module – This measures the semantic match between the user’s prompt and the dataset for fine-tuning. If the LLM is fine-tuned to answer this question, then you can inference the fine-tuned model for a response. If not, you can use RAG to generate the response.
4. LangChain – We use LangChain to build a workflow to respond to the incoming questions.
End-user UI – This is the chatbot UI to capture users’ questions and queries, and present the answer from the RAG and LLM response.

In the next sections, we demonstrate how to create the RAG workflow and build the fine-tuned models.

RAG with foundation models by SageMaker JumpStart

RAG combines the powers of pre-trained dense retrieval and sequence-to-sequence (seq2seq) foundation models. For question answering from Amazon D&C documents, we need to prepare the following in advance:

Embedding and indexing the documents using an LLM embedding model – We split the multiple documents into small chunks based on the document chapter and section structure, tested with the Amazon GPT-J-6B model on SageMaker JumpStart to generate the indexes, and stored the indexes in a FAISS vector store
A pre-trained foundation model to generate responses from prompts – We tested with Flan-T5 XL, Flan-T5 XXL, and Falcon-7B models on SageMaker JumpStart

The question answering process is implemented by LangChain, which is a framework for developing applications powered by language models. The workflow in the chain contains the following steps:

Get a question from the user.
Perform semantic search on the indexed documents through FAISS to get the top K most-relevant document chunks.

Define the prompt template, such as

"""Answer based on context:nn{context}nn{question}"""

Augment the retrieved document chunks as the {context} and the user question as the {question} in the prompt.
Prompt the foundation model with the constructed zero-shot prompt.
Return the model output to the user.

We tested 125 questions on Amazon D&C requirements and specifications, and RAG returned a good response for some questions. In the following example, RAG with the Flan-T5-XXL model provided accurate response from the right sections of the documents.

Question	What are the PLUMBING requirements for Drinking Fountains and Water Coolers? (Section 22 47 00)
Expected answer	The Developer shall provide water dispensing bars, floor standing at the Warehouse wall space immediately adjacent to the entrance of each toilet block serving the Warehouse and in the equivalent location adjacent to the Lockers Room toilet block. Units shall be in accordance with Amazon standard. Final numbers and placement to be agreed with Amazon prior to installation. Minimum of two units dispensing cooled and room temperature water to be provided at each location. Water Coolers or Water Bars – The Developer shall provide potable water supplies as required for remote water coolers or water bars throughout the facility. The Developer shall provide water points, drainage and any electrical services required in the Break Rooms for the Water Bars which will be provided by the Catering Contractor.
RAG answer (GPT-J-6B embedding with Flan-T5-XXL)	The Developer shall provide water dispensing bars, floor standing at the Warehouse wall space immediately adjacent to the entrance of each toilet block serving the Warehouse and in the equivalent location adjacent to the Lockers Room toilet block. Units shall be in accordance with Amazon standard. Final numbers and placement to be agreed with Amazon prior to installation. Minimum of two units dispensing cooled and room temperature water to be provided at each location. Water Coolers or Water Bars – The Developer shall provide potable water supplies as required for remote water coolers or water bars throughout the facility. The Developer shall provide water points, drainage and any electrical services required in the Break Rooms for the Water Bars which will be provided by the Catering Contractor.

However, we observed that RAG doesn’t provide clear and crisp answers to the questions that are more specific and highly technical, such as guidelines, numbers, and figures. As shown in the following example, when a technical dimension output is expected, RAG can’t provide the specific answer, mainly because the embedding and searching process can’t return a concise context.

Question	What is the Floor Heights requirement for P2 First Floor level Concrete?
Expected answer	7,50 m AFFL
RAG answer (GPT-J-6B embedding with Flan-T5-XXL)	P2 First Floor level Concrete

This needs to be improved because hallucination is not tolerated due to the criticality of the consequences in this use case.

Fine-tune LLMs on SageMaker

To address this challenge and improve the response quality, we take a new approach to fine-tune the LLM model using the documents for a question answering task. The model will be trained to learn the corresponding knowledge from the documents directly. Unlike RAG, it’s not dependent on whether the documents are properly embedded and indexed, and whether the semantic search algorithm is effective enough to return the most relevant contents from the vector database.

To prepare the training dataset for fine-tuning, we extract the information from the D&C documents and construct the data in the following format:

Instruction – Describes the task and provides partial prompt
Input – Provides further context to be consolidated into the prompt
Response – The output of the model

During the training process, we add an instruction key, input key, and response key to each part, combine them into the training prompt, and tokenize it. Then the data is fed to a trainer in SageMaker to generate the fine-tuned model.

To accelerate the training process and reduce the cost of compute resources, we employed Parameter Efficient Fine-Tuning (PEFT) with the Low-Rank Adaptation (LoRA) technique. PEFT allows us to only fine-tune a small number of extra model parameters, and LoRA represents the weight updates with two smaller matrices through low-rank decomposition. With PEFT and LoRA on 8-bit quantization (a compression operation that further reduces the memory footprint of the model and accelerates the training and inference performance), we are able to fit the training of 125 question-answer pairs within a g4dn.x instance with a single GPU.

To prove the effectiveness of the fine-tuning, we tested with multiple LLMs on SageMaker. We selected five small-size models: Bloom-7B, Flan-T5-XL, GPT-J-6B, and Falcon-7B on SageMaker JumpStart, and Dolly-3B from Hugging Face on SageMaker.

Through 8-bit LoRA-based training, we are able to reduce the trainable parameters to no more than 5% of the full weights of each model. The training takes 10–20 epochs to converge, as shown in the following figure. For each model, the fine-tuning processes can fit on a single GPU of a g4dn.x instance, which optimized the costs of compute resources.

Inference the fine-tuned model deployed on SageMaker

We deployed the fine-tuned model along with the RAG framework in a single GPU g4dn.x node on SageMaker and compared the inference results for the 125 questions. The model performance is measured by two metrics. One is the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score, a popular natural language processing (NLP) model evaluation method that calculates the quotient of the matching words under the total count of words in the reference sentence. The other is the semantic (textual) similarity score, which measures how close the meaning of two pieces of text meanings are by using a transformer model to encode sentences to get their embeddings, then using a cosine similarity metric to compute their similarity score. From the experiments, we can see these two metrics are fairly consistent in presenting the quality of answers to the questions.

In the following table and figure, we can see that the fine-tuned Falcon-7B model provides the best quality of answering, and the Flan-T5-XL and Dolly-3B models also generate accurate responses.

.	RAG_GPT-J-6B_Flan-T5-XL	RAG_GPT-J-6B_Flan-T5-XXL	Fine-tuned Bloom-7B1	Fine-tuned Flan-T5-XL	Fine-tuned GPT-J-6B	Fine-tuned Dolly-3B	Fine-tuned Falcon-7B
ROUGE Score	0.1938	0.2747	0.3914	0.4977	0.3740	0.6991	0.9459
Semantic Similarity Score	0.4121	0.4786	0.5666	0.7205	0.5414	0.8147	0.9774

The following is one example answer from the models. RAG with the Flan-T5-XL model provided a highly hallucinated response, whereas the response from RAG with the larger Flan-T5-XXL model was more relevant but still not able to cover the full expected message. Comparatively, all the fine-tuned models (Flan-T5-XL, Dolly-3B, and Falcon-7B) were able to provide the most accurate response as the expected answer.

Question	What are the requirements for Post Grinding Sealer/Hardener? (TS0004 – section 2.8)
Expected answer	A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.
Answer from RAG (GPT-J-6B embedding with Flan-T5-XL)	The Penetrating Concrete Hardener must be discussed at the pre-pour meeting and must be approved by the Flooring Consultant and Amazon
Answer from RAG (GPT-J-6B embedding with Flan-T5-XXL)	The proposed compound must be discussed at the pre – pour meeting and must be approved by the Flooring Consultant and Amazon. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.
Answer from fine-tuned Flan-T5-XL	A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.
Answer from fine-tuned Dolly-3B	A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.
Answer from fine-tuned Falcon-7B	A Lithium based sealer/hardener will be used post any grinding/sanding procedures undertaken on the AR Floor. Please refer to TGC /0007 AR Floor SCA Procedures. Alternatives will be considered but must be approved by Amazon and the Flooring Consultant.

Solution prototype and outcome

We developed a prototype based on the presented architecture and conducted a proof of concept to demonstrate the outcome. To take advantage of both the RAG framework and the fine-tuned LLM, and also to reduce the hallucination, we first semantically validate the incoming question. If the question is among the training data for the fine-tuning (the fine-tuned model already has the knowledge to provide a high-quality answer), then we direct the question as a prompt to inference the fine-tuned model. Otherwise, the question goes through LangChain and gets the response from RAG. The following diagram illustrates this workflow.

We tested the architecture with a test dataset of 166 questions, which contains the 125 questions used to fine-tune the model and an additional 41 questions that the fine-tuned model wasn’t trained with. The RAG framework with the embedding model and fine-tuned Falcon-7B model provided high-quality results with a ROUGE score of 0.7898 and a semantic similarity score of 0.8781. As shown in the following examples, the framework is able to generate responses to users’ questions that are well matched with the D&C documents.

The following image is our first example document.

The following screenshot shows the bot output.

The bot is also able to respond with data from a table or list and display figures for the corresponding questions. For example, we use the following document.

The following screenshot shows the bot output.

We can also use a document with a figure, as in the following example.

The following screenshot shows the bot output with text and the figure.

The following screenshot shows the bot output with just the figure.

Lessons learned and best practices

Through the solution design and experiments with multiple LLMs, we learned how to ensure the quality and performance for the question answering task in a generative AI solution. We recommend the following best practices when you apply the solution to your question answering use cases:

RAG provides reasonable responses to engineering questions. The performance is heavily dependent on document embedding and indexing. For highly unstructured documents, you may need some manual work to properly split and augment the documents before LLM embedding and indexing.
The index search is important to determine the RAG final output. You should properly tune the search algorithm to achieve a good level of accuracy and ensure RAG generates more relevant responses.
Fine-tuned LLMs are able to learn additional knowledge from highly technical and unstructured documents, and possess the knowledge within the model with no dependency on the documents after training. This is especially useful for use cases where hallucination is not tolerated.
To ensure the quality of model response, the training dataset format for fine-tuning should utilize a properly defined, task-specific prompt template. The inference pipeline should follow the same template in order to generate human-like responses.
LLMs often come with a substantial price tag and demand considerable resources and exorbitant costs. You can use PEFT and LoRA and quantization techniques to reduce the demand of compute power and avoid high training and inference costs.
SageMaker JumpStart provides easy-to-access pre-trained LLMs for fine-tuning, inference, and deployment. It can significantly accelerate your generative AI solution design and implementation.

Conclusion

With the RAG framework and fine-tuned LLMs on SageMaker, we are able to provide human-like responses to users’ questions and prompts, thereby enabling users to efficiently retrieve accurate information from a large volume of highly unstructured and unorganized documents. We will continue to develop the solution, such as providing a higher level of contextual response from previous interactions, and further fine-tuning the models from human feedback.

Your feedback is always welcome; please leave your thoughts and questions in the comments section.

About the authors

Yunfei Bai is a Senior Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.

Burak Gozluklu is a Principal ML Specialist Solutions Architect located in Boston, MA. Burak has over 15 years of industry experience in simulation modeling, data science, and ML technology. He helps global customers adopt AWS technologies and specifically AI/ML solutions to achieve their business objectives. Burak has a PhD in Aerospace Engineering from METU, an MS in Systems Engineering, and a post-doc in system dynamics from MIT in Cambridge, MA. Burak is passionate about yoga and meditation.

Elad Dwek is a Construction Technology Manager at Amazon. With a background in construction and project management, Elad helps teams adopt new technologies and data-based processes to deliver construction projects. He identifies needs and solutions, and facilitates the development of the bespoke attributes. Elad has an MBA and a BSc in Structural Engineering. Outside of work, Elad enjoys yoga, woodworking, and traveling with his family.

MDaudit uses AI to improve revenue outcomes for healthcare customers

MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to improve end-user productivity to meet growing demand and adapt to the changing healthcare landscape. MDaudit recognized that in order to meet its healthcare customers’ unique business challenges, it would benefit from automating its external auditing workflow (EAW) using AI to reduce dependencies on legacy IT frameworks and reduce manual activities needed to manage external payer audits. The end goal was to empower its customers to quickly respond to a large volume of external audit requests and improve revenue outcomes with AI-driven automation. MDaudit also recognized the opportunity to evolve its existing architecture into a solution that could scale with the growing demand for its EAW module.

In this post, we discuss MDaudit’s solution to this challenge, the benefits for their customers, and the architecture involved.

Solution overview

MDaudit built an intelligent document processing (IDP) solution, SmartScan.ai. The solution automates the extraction and formatting of data elements from unstructured PDFs that are part of the Additional Documentation Requests (ADR) service for Payment Review that customers of MDaudit receive from commercial and federal payers across the country.

Designed with client-level isolation at the document level, MDaudit customers start by uploading their ADR documents via a web portal to Amazon Simple Storage Service (Amazon S3).

This prompts an AWS Lambda function to initiate Amazon Textract. Using Amazon Textract for optical character recognition (OCR) to convert text images into machine-readable text, MDaudits’s SmartScan.ai can process scanned PDFs without manual review. The solution also uses Amazon Comprehend, which uses natural language processing (NLP) to identify and extract key entities from the ADR documents, such as name, date of birth, and date of service. The OCR extract from Amazon Textract and the output from Amazon Comprehend are then compared against preexisting configurations of data objects stored in Amazon DynamoDB. If the format isn’t recognized, the solution conducts a generalized search to extract relevant data points from the PDFs uploaded by the customer. The new configuration is then sent to the human-in-the-loop using Amazon Augmented AI (Amazon A2I). After the configuration has been approved, it’s stored and made available for future scans, thus enhancing security. By using Amazon CloudWatch in the solution, MDaudit monitors metrics, events, and logs throughout the end-to-end solution.

Benefits

In the post pandemic era, the healthcare sector is still grappling with financial hardships characterized by thin margins as a result of staffing shortages, reduced patient volumes and the upsurge in inflation. Simultaneously, Payer’s post payment recovery audits have skyrocketed by more than 900% and aggravating the situation further, Revenue cycle management (RCM) workforce reductions by 50-70% have put them in a precarious position to defend against the overwhelming impact of these post payment audits. The external audit workflow offered by MDaudit streamlines the management and response to external audits through automated workflows, successfully safeguarding millions of dollars in revenue. With the integration of AI-driven capabilities, using AWS AI/ML services, their innovative solution SmartScan.ai introduces further time savings and enhanced data accuracy by automatically extracting pertinent patient information from lengthy audit letters, which can vary from tens to hundreds of pages. As a result, customers are now capable of managing a much higher volume of demand letters from Payers, increasing their productivity by an estimated tenfold. These advancements lead to improved efficiencies, significant cost savings, faster response to external audits and the retention of revenue in a timely manner.

The Initial adaptation statistics indicate that the average processing time for an ADR letter is approximately 40 seconds, with accuracy rates approaching 90%. Within the first couple months of launching SmartScan.ai, MDaudit’s customers have successfully responded to the audit requests and safeguarded approximately $3 million in revenue.

Our approach to innovation centers on collaboration with our ecosystem partners, and AWS has proven to be a valuable strategic ally in our healthcare transformation mission.” says Nisheet Goenka, VP of Engineering at MDaudit. “Our close cooperation with AWS and our extended account team not only expedited the development process but also spared us four months of dedicated engineering efforts. This has resulted in the creation of a solution that provides us with meaningful data to support our Healthcare customers.”

Summary

This post discussed the unique business challenges faced by customers in the healthcare industry. We also reviewed how MDaudit is solving those challenges, the architecture MDaudit used, and how AI and machine learning played a part in their solution. To start exploring ML and AI today, refer to Machine Learning on AWS, and see where it can help you in your next solution.

About the Authors

Jake Bernstein is a Solutions Architect at Amazon Web Services with a passion for modernization and serverless first architecture. And a focus on helping customers optimize their architecture and accelerate their cloud journey.

Guy Loewy is a Senior Solutions Architect At Amazon Web Services with a focus on serverless and event driven architecture.

Justin Leto is a Senior Solutions Architect At Amazon Web Services with a focus on Machine Learning and Analytics.

DENZA Unwraps Smart Driving Options for N7 Model Lineup, Powered by NVIDIA DRIVE Orin

DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC).

The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV with advanced driving functionality.

All N7 models can be equipped with the NVIDIA DRIVE Orin SoC for high-performance compute to simultaneously run in-vehicle applications and deep neural networks for automated driving.

NVIDIA DRIVE Orin serves as the brain behind DENZA’s proprietary Commuter Smart Driving system, which offers an array of smart features, including:

Navigate on autopilot for high-speed, all-scenario assisted driving.
Intelligent speed-limit control and emergency lane-keeping aid, for safer commutes on urban roads and highways.
Enhanced automatic emergency braking and front cross-traffic alert for increased safety at intersections and on narrow streets.
Automated parking assist, which scouts for parking spots, identifying horizontal, vertical and diagonal spaces to ease the challenge of parking in crowded areas.

Next-Gen Car Configuration

In addition to adopting accelerated computing in the car, DENZA is one of the flagship automotive trailblazers using the NVIDIA Omniverse Cloud platform to build and deploy next-generation car configurators to deliver greater personalization options for the consumer’s vehicle-purchasing experience.

Learn more about the DENZA N7 3D configurator.

Research Focus: Week of September 25, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus 25 | Week of September 25, 2023

NEW RESEARCH

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

Large Language Model (LLM) inference consists of two distinct phases – prefill phase, which processes the input prompt, and decode phase, which generates output tokens autoregressively. While the prefill phase effectively saturates graphics processing unit (GPU) compute at small batch sizes, the decode phase results in low compute utilization as it generates one token at a time per request. The varying prefill and decode times also lead to imbalance across micro-batches when using pipeline parallelism, resulting in further inefficiency due to bubbles.

In a new paper: SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills, researchers from Microsoft present a solution to these challenges that yields significant improvements in inference performance across models and hardware. SARATHI employs chunked-prefills, which splits a prefill request into equal sized chunks, and decode-maximal batching, which constructs a batch using a single prefill chunk and populates the remaining slots with decodes. Chunked-prefills allow constructing multiple decode-maximal batches from a single prefill request, maximizing coverage of decodes that can piggyback. Furthermore, the uniform compute design of these batches ameliorates the imbalance between micro-batches, significantly reducing pipeline bubbles.

Read the paper

NEW RESEARCH

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

Controllable video generation has gained significant attention in recent years. However, two main limitations persist: Firstly, most existing works focus on either text, image, or trajectory-based control, leading to an inability to achieve fine-grained control in videos. Secondly, trajectory control research is still in its early stages, with most experiments being conducted on simple datasets like Human3.6M (opens in new tab). This constraint limits the models’ capability to process open-domain images and effectively handle complex curved trajectories.

In a new paper: DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory, researchers from Microsoft propose an open-domain diffusion-based video generation model. To tackle the issue of insufficient control granularity in existing works, DragNUWA simultaneously introduces text, image, and trajectory information to provide fine-grained control over video content from semantic, spatial, and temporal perspectives. To resolve the problem of limited open-domain trajectory control in current research, the researchers propose trajectory modeling with three aspects: a trajectory sampler (TS) to enable open-domain control of arbitrary trajectories, a multiscale fusion (MF) to control trajectories in different granularities, and an adaptive training (AT) strategy to generate consistent videos following trajectories. Their experiments demonstrate DragNUWA’s superior performance in fine-grained control in video generation.

DragNUWA is purely a research project and there are no current plans to incorporate DragNUWA into a product. Any further research will continue to follow Microsoft AI principles.

NEW RESEARCH

Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals

Understanding cortical responses to human visual perception has emerged a research hotspot. Yet, the underlying mechanism of how human visual perceptions are intertwined with our cognitions is still a mystery. Thanks to recent advances in both neuroscience and artificial intelligence, researchers have been able to record the visually evoked brain activities and mimic the visual perception ability through computational approaches.

In a new paper: Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals, researchers from Microsoft reconstruct observed images based on portably accessible brain signals, i.e., electroencephalography (EEG) data. Since EEG signals are dynamic in the time-series format and are notoriously noisy, processing and extracting useful information requires more dedicated efforts. The researchers propose a comprehensive pipeline, named NeuroImagen, to incorporate a novel multi-level perceptual information decoding to draw multi-grained and heterogeneous outputs from the given EEG data. A pretrained latent diffusion model then leverages the extracted semantic information to reconstruct the high-resolution visual stimuli images. The experimental results illustrate the effectiveness of image reconstruction and superior quantitative performance of the proposed method.

Read the paper

The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.

The Fastest Path: Healthcare Startup Uses AI to Analyze Cancer Cells in the Operating Room

Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab.

In a surgical biopsy, a medical professional removes samples of cells or tissue that pathologists analyze for diseases such as cancer. By delivering these capabilities through a compact, AI-powered imaging system within the treatment room, Invenio aims to support rapid clinical decision-making.

“This technology will help surgeons make intraoperative decisions when performing a biopsy or surgery,” said Chris Freudiger, chief technology officer of Silicon Valley-based Invenio. “They’ll be able to rapidly evaluate whether the tissue sample contains cancerous cells, decide whether they need to take another tissue sample and, with the AI models Invenio is developing, potentially make a molecular diagnosis for personalized medical treatment within minutes.”

Quicker diagnosis enables quicker treatment. It’s especially critical for aggressive types of cancer that could grow or spread significantly in the weeks it takes for biopsy results to return from a dedicated pathology lab.

Invenio is a member of NVIDIA Inception, a program that provides cutting-edge startups with technological support and AI platform guidance. The company accelerates AI training and inference using NVIDIA GPUs and software libraries.

Laser Focus on Cancer Care

The NIO Laser Imaging System accelerates the imaging of fresh tissue biopsies.

Invenio’s NIO Laser Imaging System is a digital pathology tool that accelerates the imaging of fresh tissue biopsies. It’s been used in thousands of procedures in the U.S. and Europe. In 2021, it received the CE Mark of regulatory approval in Europe.

The company plans to adopt the NVIDIA Jetson Orin series of edge AI modules for its next-generation imaging system, which will feature near real-time AI inference accelerated by the NVIDIA TensorRT SDK.

“We’re building a layer of AI models on top of our imaging capabilities to provide physicians with not just the diagnostic image but also an analysis of what they’re seeing,” Freudiger said. “With the AI performance provided by NVIDIA Jetson at the edge, they’ll be able to quickly determine what kinds of cancer cells are present in a biopsy image.”

Invenio uses a cluster of NVIDIA RTX A6000 GPUs to train neural networks with tens of millions of parameters on pathologist-annotated images. The models were developed using the TensorFlow deep learning framework and trained on images acquired with NIO imaging systems.

“The most powerful capability for us is the expanded VRAM on the RTX A6000 GPUs, which allows us to load large batches of images and capture the variability of features,” Freudiger said. “It makes a big difference for AI training.”

On the Path to Clinical Deployment

One of Invenio’s AI products, NIO Glioma Reveal, is approved for clinical use in Europe and available for research use in the U.S. to help identify areas of cancerous cells in brain tissue.

A team of Invenio’s collaborators from the University of Michigan, New York University, University of California San Francisco, the Medical University of Vienna and University Hospital of Cologne recently developed a deep learning model that can find biomarkers of cancerous tumors with 93% accuracy in 90 seconds.

With this ability to analyze different molecular subtypes of cancer within a tissue sample, doctors can predict how well a patient will respond to chemotherapy — or determine whether a tumor has been successfully removed during surgery.

Beyond its work on brain tissue analysis, Invenio this year announced a clinical research collaboration with Johnson & Johnson’s Lung Cancer Initiative to develop and validate an AI solution that can help evaluate lung biopsies. The AI model will help doctors rapidly determine whether collected tissue samples contain cancer.

Lung cancer is the world’s deadliest form of cancer, and in the U.S. alone, lung nodules are found in over 1.5 million patients each year. Once approved for clinical use, Invenio’s NIO Lung Cancer Reveal tool aims to shorten the time needed to analyze tissue biopsies for these patients.

As part of this initiative, Invenio will run a clinical study before submitting the NVIDIA Jetson-powered AI solution for FDA approval.

Subscribe to NVIDIA healthcare news.

New Google.org grants to introduce 300,000 students to robotics and AI

For Google’s 25th birthday, Google.org is providing $10 million in grant funding to support robotics programs and AI education.Read More

The delicate balance of oversupply and undersupply

Classical approaches to sales and operations planning cycles

Modern, state-of-the-art time series forecasting enables choice

Forecasts designed to serve customers and generate business growth

Measuring oversupply and undersupply from historic data

Automatically selecting correct quantile points based on maximizing profit or customer service goals

Solution overview

Solution guidance to accelerate your implementation

Conclusion

References

About the Author

How Pixellot Works

Broadcasting South African School Sports

Building a Smart Sports City in China

Announcing the general availability of Amazon Bedrock, the easiest way to build generative AI applications with security and privacy built in

Announcing Amazon CodeWhisperer customization capability to generate more relevant code recommendations informed by your organization’s code base

Announcing the preview of Generative BI authoring capabilities in Amazon QuickSight to help business analysts easily create and customize visuals using natural-language commands

Generative AI tools and capabilities for every business

Resources

About the author

A New Look for Night City

Prepare for War

Nice Shootin,’ Tex

Challenge Accepted

Solution overview

RAG with foundation models by SageMaker JumpStart

Fine-tune LLMs on SageMaker

Inference the fine-tuned model deployed on SageMaker

Solution prototype and outcome

Lessons learned and best practices

Conclusion

About the authors

Solution overview

Benefits

Summary

About the Authors

Next-Gen Car Configuration

NEW RESEARCH

SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills

AI Explainer: Foundation models ​and the next era of AI

NEW RESEARCH

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

NEW RESEARCH

Seeing through the Brain: Image Reconstruction of Visual Perception from Human Brain Signals

Laser Focus on Cancer Care

On the Path to Clinical Deployment

Navigation

Computer Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2023 Vedere AI. All Rights Reserved.

AI Explainer: Foundation models and the next era of AI