How Deloitte Italy built a digital payments fraud detection solution using quantum machine learning and Amazon Braket

How Deloitte Italy built a digital payments fraud detection solution using quantum machine learning and Amazon Braket

As digital commerce expands, fraud detection has become critical in protecting businesses and consumers engaging in online transactions. Implementing machine learning (ML) algorithms enables real-time analysis of high-volume transactional data to rapidly identify fraudulent activity. This advanced capability helps mitigate financial risks and safeguard customer privacy within expanding digital markets.

Deloitte is a strategic global systems integrator with over 19,000 certified AWS practitioners across the globe. It continues to raise the bar through participation in the AWS Competency Program with 29 competencies, including Machine Learning.

This post demonstrates the potential for quantum computing algorithms paired with ML models to revolutionize fraud detection within digital payment platforms. We share how Deloitte built a hybrid quantum neural network solution with Amazon Braket to demonstrate the possible gains coming from this emerging technology.

The promise of quantum computing

Quantum computers harbor the potential to radically overhaul financial systems, enabling much faster and more precise solutions. Compared to classical computers, quantum computers are expected in the long run to have to advantages in the areas of simulation, optimization, and ML. Whether quantum computers can provide a meaningful speedup to ML is an active topic of research.

Quantum computing can perform efficient near real-time simulations in critical areas such as pricing and risk management. Optimization models are key activities in financial institutions, aimed at determining the best investment strategy for a portfolio of assets, allocating capital, or achieving productivity improvements. Some of these optimization problems are nearly impossible for traditional computers to tackle, so approximations are used to solve the problems in a reasonable amount of time. Quantum computers could perform faster and more accurate optimizations without using any approximations.

Despite the long-term horizon, the potentially disruptive nature of this technology means that financial institutions are looking to get an early foothold in this technology by building in-house quantum research teams, expanding their existing ML COEs to include quantum computing, or engaging with partners such as Deloitte.

At this early stage, customers seek access to a choice of different quantum hardware and simulation capabilities in order to run experiments and build expertise. Braket is a fully managed quantum computing service that lets you explore quantum computing. It provides access to quantum hardware from IonQ, OQC, Quera, Rigetti, IQM, a variety of local and on-demand simulators including GPU-enabled simulations, and infrastructure for running hybrid quantum-classical algorithms such as quantum ML. Braket is fully integrated with AWS services such as Amazon Simple Storage Service (Amazon S3) for data storage and AWS Identity and Access Management (IAM) for identity management, and customers only pay for what you use.

In this post, we demonstrate how to implement a quantum neural network-based fraud detection solution using Braket and AWS native services. Although quantum computers can’t be used in production today, our solution provides a workflow that will seamlessly adapt and function as a plug-and-play system in the future, when commercially viable quantum devices become available.

Solution overview

The goal of this post is to explore the potential of quantum ML and present a conceptual workflow that could serve as a plug-and-play system when the technology matures. Quantum ML is still in its early stages, and this post aims to showcase the art of the possible without delving into specific security considerations. As quantum ML technology advances and becomes ready for production deployments, robust security measures will be essential. However, for now, the focus is on outlining a high-level conceptual architecture that can seamlessly adapt and function in the future when the technology is ready.

The following diagram shows the solution architecture for the implementation of a neural network-based fraud detection solution using AWS services. The solution is implemented using a hybrid quantum neural network. The neural network is built using the Keras library; the quantum component is implemented using PennyLane.

The workflow includes the following key components for inference (A–F) and training (G–I):

  1. Ingestion – Real-time financial transactions are ingested through Amazon Kinesis Data Streams
  2. PreprocessingAWS Glue streaming extract, transform, and load (ETL) jobs consume the stream to do preprocessing and light transforms
  3. Storage – Amazon S3 is used to store output artifacts
  4. Endpoint deployment – We use an Amazon SageMaker endpoint to deploy the models
  5. Analysis – Transactions along with the model inferences are stored in Amazon Redshift
  6. Data visualizationAmazon QuickSight is used to visualize the results of fraud detection
  7. Training data – Amazon S3 is used to store the training data
  8. Modeling – A Braket environment produces a model for inference
  9. GovernanceAmazon CloudWatch, IAM, and AWS CloudTrail are used for observability, governance, and auditability, respectively

Dataset

For training the model, we used open source data available on Kaggle. The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset records transactions that occurred over a span of 2 days, during which there were 492 instances of fraud detected out of a total of 284,807 transactions. The dataset exhibits a significant class imbalance, with fraudulent transactions accounting for just 0.172% of the entire dataset. Because the data is highly imbalanced, various measures have been taken during data preparation and model development.

The dataset exclusively comprises numerical input variables, which have undergone a Principal Component Analysis (PCA) transformation because of confidentiality reasons.

The data only includes numerical input features (PCA-transformed due to confidentiality) and three key fields:

  • Time – Time between each transaction and first transaction
  • Amount – Transaction amount
  • Class – Target variable, 1 for fraud or 0 for non-fraud

Data preparation

We split the data into training, validation, and test sets, and we define the target and the features sets, where Class is the target variable:

y_train = df_train['Class']
x_train = df_ train.drop(['Class'], axis=1)
y_validation = df_ validation ['Class']
x_ validation = df_ validation.drop(['Class'], axis=1)
y_test = df_test['Class']
x_test = df_test.drop(['Class'], axis=1)

The Class field assumes values 0 and 1. To make the neural network deal with data imbalance, we perform a label encoding on the y sets:

lbl_clf = LabelEncoder()
y_train = lbl_clf.fit_transform(y_train)
y_train = tf.keras.utils.to_categorical(y_train)

The encoding applies to all the values the mapping: 0 to [1,0], and 1 to [0,1].

Finally, we apply scaling that standardizes the features by removing the mean and scaling to unit variance:

std_clf = StandardScaler()
x_train = std_clf.fit_transform(x_train)
x_validation = std_clf.fit_transform(x_validation)
x_test = std_clf.transform(x_test)

The functions LabelEncoder and StandardScaler are available in the scikit-learn Python library.

After all the transformations are applied, the dataset is ready to be the input of the neural network.

Neural network architecture

We composed the neural network architecture with the following layers based on several tests empirically:

  • A first dense layer with 32 nodes
  • A second dense layer with 9 nodes
  • A quantum layer as neural network output
  • Dropout layers with rate equals to 0.3

We apply an L2 regularization on the first layer and both L1 and L2 regularization on the second one, to avoid overfitting. We initialize all the kernels using the he_normal function. The dropout layers are meant to reduce overfitting as well.

hidden = Dense(32, activation ="relu", kernel_initializer='he_normal', kernel_regularizer=tf.keras.regularizers.l2(0,01))
out_2 = Dense(9, activation ="relu", kernel_initializer='he_normal', kernel_regularizer=tf.keras.regularizers.l1_l2(l1=0,001, l2=0,001))
do = Dropout(0,3)

Quantum circuit

The first step to obtain the layer is to build the quantum circuit (or the quantum node). To accomplish this task, we used the Python library PennyLane.

PennyLane is an open source library that seamlessly integrates quantum computing with ML. It allows you to create and train quantum-classical hybrid models, where quantum circuits act as layers within classical neural networks. By harnessing the power of quantum mechanics and merging it with classical ML frameworks like PyTorch, TensorFlow, and Keras, PennyLane empowers you to explore the exciting frontier of quantum ML. You can unlock new realms of possibility and push the boundaries of what’s achievable with this cutting-edge technology.

The design of the circuit is the most important part of the overall solution. The predictive power of the model depends entirely on how the circuit is built.

Qubits, the fundamental units of information in quantum computing, are entities that behave quite differently from classical bits. Unlike classical bits that can only represent 0 or 1, qubits can exist in a superposition of both states simultaneously, enabling quantum parallelism and faster calculations for certain problems.

We decide to use only three qubits, a small number but sufficient for our case.

We instantiate the qubits as follows:

num_wires = 3
dev = qml.device('default.qubit', wires=num_wires)

‘default.qubit’ is the PennyLane qubits simulator. To access qubits on a real quantum computer, you can replace the second line with the following code:

device_arn = "arn:aws:braket:eu-west-2::device/qpu/ionq/Aria-1"
dev = qml.device('braket.aws.qubit',device_arn=device_arn, wires=num_wires)

device_ARN could be the ARN of the devices supported by Braket (for a list of supported devices, refer to Amazon Braket supported devices).

We defined the quantum node as follows:

@qml.qnode(dev, interface="tf", diff_method="backprop")
def quantum_nn(inputs, weights):
    qml.RY(inputs[0], wires=0)
    qml.RY(inputs[1], wires=1)
    qml.RY(inputs[2], wires=2)
    qml.Rot(weights[0] * inputs[3], weights[1] * inputs[4], weights[2] * inputs[5], wires=1)
    qml.Rot(weights[3] * inputs[6], weights[4] * inputs[7], weights[5] * inputs[8], wires=2)
    qml.CNOT(wires=[1, 2])
    qml.RY(weights[6], wires=2)
    qml.CNOT(wires=[0, 2])
    qml.CNOT(wires=[1, 2])
    return [qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2))]

The inputs are the values yielded as output from the previous layer of the neural network, and the weights are the actual weights of the quantum circuit.

RY and Rot are rotation functions performed on qubits; CNOT is a controlled bitflip gate allowing us to embed the qubits.

qml.expval(qml.PauliZ(0)), qml.expval(qml.PauliZ(2)) are the measurements applied respectively to the qubits 0 and the qubits 1, and these values will be the neural network output.

Diagrammatically, the circuit can be displayed as:

0: ──RY(1.00)──────────────────────────────────────╭●────┤  <Z>

1: ──RY(2.00)──Rot(4.00,10.00,18.00)──╭●───────────│──╭●─┤

2: ──RY(3.00)──Rot(28.00,40.00,54.00)─╰X──RY(7.00)─╰X─╰X─┤  <Z>

The transformations applied to qubit 0 are fewer than the transformations applied to qbit 2. This choice is because we want to separate the states of the qubits in order to obtain different values when the measures are performed. Applying different transformations to qubits allows them to enter distinct states, resulting in varied outcomes when measurements are performed. This phenomenon stems from the principles of superposition and entanglement inherent in quantum mechanics.

After we define the quantum circuit, we define the quantum hybrid neural network:

def hybrid_model(num_layers, num_wires):
    weight_shapes = {"weights": (7,)}
    qlayer = qml.qnn.KerasLayer(quantum_nn, weight_shapes, output_dim=2)
    hybrid_model = tf.keras.Sequential([hidden,do, out_2,do,qlayer])
    return hybrid_model

KerasLayer is the PennyLane function that turns the quantum circuit into a Keras layer.

Model training

After we have preprocessed the data and defined the model, it’s time to train the network.

A preliminary step is needed in order to deal with the unbalanced dataset. We define a weight for each class according to the inverse root rule:

class_counts = np.bincount(y_train_list)
class_frequencies = class_counts / float(len(y_train))
class_weights = 1 / np.sqrt(class_frequencies)

The weights are given by the inverse of the root of occurrences for each of the two possible target values.

We compile the model next:

model.compile(optimizer='adam', loss = 'MSE', metrics = [custom_metric])

custom_metric is a modified version of the metric precision, which is a custom subroutine to postprocess the quantum data into a form compatible with the optimizer.

For evaluating model performance on imbalanced data, precision is a more reliable metric than accuracy, so we optimize for precision. Also, in fraud detection, incorrectly predicting a fraudulent transaction as valid (false negative) can have serious financial consequences and risks. Precision evaluates the proportion of fraud alerts that are true positives, minimizing costly false negatives.

Finally, we fit the model:

history = model.fit(x_train, y_train, epochs = 30, batch_size = 200, validation_data=(x_validation, y_ validation),class_weight=class_weights,shuffle=True)

At each epoch, the weights of both the classic and quantum layer are updated in order to reach higher accuracy. At the end of the training, the network showed a loss of 0.0353 on the training set and 0.0119 on the validation set. When the fit is complete, the trained model is saved in .h5 format.

Model results and analysis

Evaluating the model is vital to gauge its capabilities and limitations, providing insights into the predictive quality and value derived from the quantum techniques.

To test the model, we make predictions on the test set:

preds = model.predict(x_test)

Because the neural network is a regression model, it yields for each record of x_test a 2-D array, where each component can assume values between 0 and 1. Because we’re essentially dealing with a binary classification problem, the outputs should be as follows:

  • [1,0] – No fraud
  • [0,1] – Fraud

To convert the continuous values into binary classification, a threshold is necessary. Predictions that are equal to or above the threshold are assigned 1, and those below the threshold are assigned 0.

To align with our goal of optimizing precision, we chose the threshold value that results in the highest precision.

The following table summarizes the mapping between various threshold values and the precision.

Class Threshold = 0.65 Threshold = 0.70 Threshold = 0.75
No Fraud 1.00 1.00 1.00
Fraud 0.87 0.89 0.92

The model demonstrates almost flawless performance on the predominant non-fraud class, with precision and recall scores close to a perfect 1. Despite far less data, the model achieves precision of 0.87 for detecting the minority fraud class at a 0.65 threshold, underscoring performance even on sparse data. To efficiently identify fraud while minimizing incorrect fraud reports, we decide to prioritize precision over recall.

We also wanted to compare this model with a classic neural network only model to see if we are exploiting the gains coming from the quantum application. We built and trained an identical model in which the quantum layer is replaced by the following:

Dense(2,activation = "softmax")

In the last epoch, the loss was 0.0119 and the validation loss was 0.0051.

The following table summarizes the mapping between various threshold values and the precision for the classic neural network model.

Class Threshold=0.65 Threshold = 0.70 Threshold = 0.75
No Fraud 1.0 1.00 1.00
Fraud 0.83 0.84 0. 86

Like the quantum hybrid model, the model performance is almost perfect for the majority class and very good for the minority class.

The hybrid neural network has 1,296 parameters, whereas the classic one has 1,329. When comparing precision values, we can observe how the quantum solution provides better results. The hybrid model, inheriting the properties of high-dimensional spaces exploration and a non-linearity from the quantum layer, is able to generalize the problem better using fewer parameters, resulting in better performance.

Challenges of a quantum solution

Although the adoption of quantum technology shows promise in providing organizations numerous benefits, practical implementation on large-scale, fault-tolerant quantum computers is a complex task and is an active area of research. Therefore, we should be mindful of the challenges that it poses:

  • Sensitivity to noise – Quantum computers are extremely sensitive to external factors (such as atmospheric temperature) and require more attention and maintenance than traditional computers, and this can drift over time. One way to minimize the effects of drift is by taking advantage of parametric compilation—the ability to compile a parametric circuit such as the one used here only one time, and feed it fresh parameters at runtime, avoiding repeated compilation steps. Braket automatically does this for you.
  • Dimensional complexity – The inherent nature of qubits, the fundamental units of quantum computing, introduces a higher level of intricacy compared to traditional binary bits employed in conventional computers. By harnessing the principles of superposition and entanglement, qubits possess an elevated degree of complexity in their design. This intricate architecture renders the evaluation of computational capacity a formidable challenge, because the multidimensional aspects of qubits demand a more nuanced approach to assessing their computational prowess.
  • Computational errors – Increased calculation errors are intrinsic to quantum computing’s probabilistic nature during the sampling phase. These errors could impact accuracy and reliability of the results obtained through quantum sampling. Techniques such as error mitigation and error suppression are actively being developed in order to minimize the effects of errors resulting from noisy qubits. To learn more about error mitigation, see Enabling state-of-the-art quantum algorithms with Qedma’s error mitigation and IonQ, using Braket Direct.

Conclusion

The results discussed in this post suggest that quantum computing holds substantial promise for fraud detection in the financial services industry. The hybrid quantum neural network demonstrated superior performance in accurately identifying fraudulent transactions, highlighting the potential gains offered by quantum technology. As quantum computing continues to advance, its role in revolutionizing fraud detection and other critical financial processes will become increasingly evident. You can extend the results of the simulation by using real qubits and testing various outcomes on real hardware available on Braket, such as those from IQM, IonQ, and Rigetti, all on demand, with pay-as-you-go pricing and no upfront commitments.

To prepare for the future of quantum computing, organizations must stay informed on the latest advancements in quantum technology. Adopting quantum-ready cloud solutions now is a strategic priority, allowing a smooth transition to quantum when hardware reaches commercial viability. This forward-thinking approach will provide both a technological edge and rapid adaptation to quantum computing’s transformative potential across industries. With an integrated cloud strategy, businesses can proactively get quantum-ready, primed to capitalize on quantum capabilities at the right moment. To accelerate your learning journey and earn a digital badge in quantum computing fundamentals, see Introducing the Amazon Braket Learning Plan and Digital Badge.

Connect with Deloitte to pilot this solution for your enterprise on AWS.


About the authors

Federica Marini is a Manager in Deloitte Italy AI & Data practice with a strong experience as a business advisor and technical expert in the field of AI, Gen AI, ML and Data. She addresses research and customer business needs with tailored data-driven solutions providing meaningful results. She is passionate about innovation and believes digital disruption will require a human centered approach to achieve full potential.

Matteo Capozi is a Data and AI expert in Deloitte Italy, specializing in the design and implementation of advanced AI and GenAI models and quantum computing solutions. With a strong background on cutting-edge technologies, Matteo excels in helping organizations harness the power of AI to drive innovation and solve complex problems. His expertise spans across industries, where he collaborates closely with executive stakeholders to achieve strategic goals and performance improvements.

Kasi Muthu is a senior partner solutions architect focusing on generative AI and data at AWS based out of Dallas, TX. He is passionate about helping partners and customers accelerate their cloud journey. He is a trusted advisor in this field and has plenty of experience architecting and building scalable, resilient, and performant workloads in the cloud. Outside of work, he enjoys spending time with his family.

Kuldeep Singh is a Principal Global AI/ML leader at AWS with over 20 years in tech. He skillfully combines his sales and entrepreneurship expertise with a deep understanding of AI, ML, and cybersecurity. He excels in forging strategic global partnerships, driving transformative solutions and strategies across various industries with a focus on generative AI and GSIs.

Read More

Amazon SageMaker unveils the Cohere Command R fine-tuning model

Amazon SageMaker unveils the Cohere Command R fine-tuning model

AWS announced the availability of the Cohere Command R fine-tuning model on Amazon SageMaker. This latest addition to the SageMaker suite of machine learning (ML) capabilities empowers enterprises to harness the power of large language models (LLMs) and unlock their full potential for a wide range of applications.

Cohere Command R is a scalable, frontier LLM designed to handle enterprise-grade workloads with ease. Cohere Command R is optimized for conversational interaction and long context tasks. It targets the scalable category of models that balance high performance with strong accuracy, enabling companies to move beyond proof of concept and into production. The model boasts high precision on Retrieval Augmented Generation (RAG) and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages.

In this post, we explore the reasons for fine-tuning a model and the process of how to accomplish it with Cohere Command R.

Fine-tuning: Tailoring LLMs for specific use cases

Fine-tuning is an effective technique to adapt LLMs like Cohere Command R to specific domains and tasks, leading to significant performance improvements over the base model. Evaluations of fine-tuned Cohere Command R model have demonstrated improved performance by over 20% across various enterprise use cases in industries such as financial services, technology, retail, healthcare, legal, and healthcare. Because of its smaller size, a fine-tuned Cohere Command R model can be served more efficiently compared to models much larger than its class.

The recommendation is to use a dataset that contains at least 100 examples.

Cohere Command R uses a RAG approach, retrieving relevant context from an external knowledge base to improve outputs. However, fine-tuning allows you to specialize the model even further. Fine-tuning text generation models like Cohere Command R is crucial for achieving ultimate performance in several scenarios:

  •  Domain-specific adaptation – RAG models may not perform optimally in highly specialized domains like finance, law, or medicine. Fine-tuning allows you to adapt the model to these domains’ nuances for improved accuracy.
  • Data augmentation – Fine-tuning enables incorporating additional data sources or techniques, augmenting the model’s knowledge base for increased robustness, especially with sparse data.
  • Fine-grained control – Although RAG offers impressive general capabilities, fine-tuning permits fine-grained control over model behavior, tailoring it precisely to your desired task for ultimate precision.

The combined power of RAG and fine-tuned LLMs empowers you to tackle diverse challenges with unparalleled versatility and effectiveness. With the introduction of Cohere Command R fine-tuning on SageMaker, enterprises can now customize and optimize the model’s performance for their unique requirements. By fine-tuning on domain-specific data, businesses can enhance Cohere Command R’s accuracy, relevance, and effectiveness for their use cases, such as natural language processing, text generation, and question answering.

By combining the scalability and robustness of Cohere Command R with the ability to fine-tune its performance on SageMaker, AWS empowers enterprises to navigate the complexities of AI adoption and use its transformative power to drive innovation and growth across various industries and domains.

Customer data, including prompts, completions, custom models, and data used for fine-tuning or continued pre-training, remains private to customer AWS accounts and is never shared with third-party model providers.

Solution overview

In the following sections, we walk through the steps to fine-tune the Cohere Command R model on SageMaker. This includes preparing the data, deploying a model, preparing for fine-tuning, creating an endpoint for inference, and performing inference.

Prepare the fine-tuning data

Before you can start a fine-tuning job, you need to upload a dataset with training and (optionally) evaluation data.

First, make sure your data is in jsonl format. It should have the following structure:

  •  messages – This contains a list of messages of the conversation. A message consists of the following parts:
  •  role – This specifies the current speaker. You can pick from System, User, or Chatbot.
  •  content – This contains the content of the message.

The following is an example that trains a chatbot to answer questions. For the sake of readability, the document spans over multiple lines. For your dataset, make sure that each line contains one whole example.

{
  "messages": [
    {
      "role": "System",
      "content": "You are a large language model trained by Cohere."
    },
    {
      "role": "User",
      "content": "Hi! What were Time magazines top 10 cover stories in the last 10 years?"
    },
    {
      "role": "Chatbot",
      "content": "Time magazines top 10 cover stories in the last 10 years were:\n\n1. Volodymyr Zelenskyy\n2. Elon Musk\n3. Martin Luther King Jr.\n4. How Earth Survived\n5. Her Lasting Impact\n6. Nothing to See Here\n7. Meltdown\n8. Deal With It\n9. The Top of America\n10. Bitter Pill"
    },
    {
      "role": "User",
      "content": "Who is Volodymyr Zelenskyy?"
    },
    {
      "role": "Chatbot",
      "content": "Volodymyr Zelenskyy is a Ukrainian politician who was elected President of Ukraine in 2019. He previously served as the Minister of Internal Affairs in the government of Prime Minister Volodymyr Groysman."
    },
    {
      "role": "User",
      "content": "Thank you!"
    }
  ]
}

Deploy a model

Complete the following steps to deploy the model:

  1. On AWS Marketplace, subscribe to the Cohere Command R model

After you subscribe to the model, you can configure it and create a training job.

  1. Choose View in Amazon SageMaker.
  2. Follow the instructions in the UI to create a training job.

Alternatively, you can use the following example notebook to create the training job.

Prepare for fine-tuning

To fine-tune the model, you need the following:

  • Product ARN – This will be provided to you after you subscribe to the product.
  • Training dataset and evaluation dataset – Prepare your datasets for fine-tuning.
  • Amazon S3 location – Specify the Amazon Simple Storage Service (Amazon S3) location that stores the training and evaluation datasets.
  • Hyperparameters – Fine-tuning typically involves adjusting various hyperparameters like learning rate, batch size, number of epochs, and so on. You need to specify the appropriate hyperparameter ranges or values for your fine-tuning task.

Create an endpoint for inference

When the fine-tuning is complete, you can create an endpoint for inference with the fine-tuned model. To create the endpoint, use the create_endpoint method. If the endpoint already exists, you can connect to it using the connect_to_endpoint method.

Perform inference

You can now perform real-time inference using the endpoint. The following is the sample message that you use for input:

message = "Classify the following text as either very negative, negative, neutral, positive or very positive: mr. deeds is , as comedy goes , very silly -- and in the best way."
result = co.chat(message=message)
print(result)

The following screenshot shows the output of the fine-tuned model.


Optionally, you can also test the accuracy of the model using the evaluation data (sample_finetune_scienceQA_eval.jsonl).

Clean up

After you have completed running the notebook and experimenting with the Cohere Command R fine-tuned model, it is crucial to clean up the resources you have provisioned. Failing to do so may result in unnecessary charges accruing on your account. To prevent this, use the following code to delete the resources and stop the billing process:

co.delete_endpoint()
co.close()

Summary

Cohere Command R with fine-tuning allows you to customize your models to be performant for your business, domain, and industry. Alongside the fine-tuned model, users additionally benefit from Cohere Command R’s proficiency in the most commonly used business languages (10 languages) and RAG with citations for accurate and verified information. Cohere Command R with fine-tuning achieves high levels of performance with less resource usage on targeted use cases. Enterprises can see lower operational costs, improved latency, and increased throughput without extensive computational demands.

Start building with Cohere’s fine-tuning model in SageMaker today.


About the Authors

Shashi Raina is a Senior Partner Solutions Architect at Amazon Web Services (AWS), where he specializes in supporting generative AI (GenAI) startups. With close to 6 years of experience at AWS, Shashi has developed deep expertise across a range of domains, including DevOps, analytics, and generative AI.

James Yi is a Senior AI/ML Partner Solutions Architect in the Emerging Technologies team at Amazon Web Services. He is passionate about working with enterprise customers and partners to design, deploy and scale AI/ML applications to derive their business values. Outside of work, he enjoys playing soccer, traveling and spending time with his family.

Pradeep Prabhakaran is a Customer Solutions Architect at Cohere. In his current role at Cohere, Pradeep acts as a trusted technical advisor to customers and partners, providing guidance and strategies to help them realize the full potential of Cohere’s cutting-edge Generative AI platform. Prior to joining Cohere, Pradeep was a Principal Customer Solutions Manager at Amazon Web Services, where he led Enterprise Cloud transformation programs for large enterprises. Prior to AWS, Pradeep has held various leadership positions at consulting companies such as Slalom, Deloitte, and Wipro. Pradeep holds a Bachelor’s degree in Engineering and is based in Dallas, TX.

Read More

Derive meaningful and actionable operational insights from AWS Using Amazon Q Business

Derive meaningful and actionable operational insights from AWS Using Amazon Q Business

As a customer, you rely on Amazon Web Services (AWS) expertise to be available and understand your specific environment and operations. Today, you might implement manual processes to summarize lessons learned, obtain recommendations, or expedite the resolution of an incident. This can be time consuming, inconsistent, and not readily accessible.

This post shows how to use AWS generative artificial intelligence (AI) services, like Amazon Q Business, with AWS Support cases, AWS Trusted Advisor, and AWS Health data to derive actionable insights based on common patterns, issues, and resolutions while using the AWS recommendations and best practices enabled by support data. This post will also demonstrate how you can integrate these insights with your IT service management (ITSM) system (such as ServiceNow, Jira, and Zendesk), to allow you to implement recommendations and keep your AWS operations healthy.

Amazon Q Business is a fully managed, secure, generative-AI powered enterprise chat assistant that enables natural language interactions with your organization’s data. Ingesting data for support cases, Trusted Advisor checks, and AWS Health notifications into Amazon Q Business enables interactions through natural language conversations, sentiment analysis, and root cause analysis without needing to fully understand the underlying data models or schemas. The AI assistant provides answers along with links that point directly to the data sources. This allows you to easily identify and reference the underlying information sources that informed the AI’s response, providing more context and enabling further exploration of the topic if needed. Amazon Q Business integrates with ITSM solutions, allowing recommendations to be tracked and actioned within your existing workflows.

AWS Support offers a range of capabilities powered by technology and subject matter experts that support the success and operational health of your AWS environments. AWS Support provides you with proactive planning and communications, advisory, automation, and cloud expertise to help you achieve business outcomes with increased speed and scale in the cloud. These capabilities enable proactive planning for upcoming changes, expedited recovery from operational disruptions, and recommendations to optimize the performance and reliability of your AWS IT infrastructure.

This solution will demonstrate how to deploy Amazon Q Business and ingest data from AWS Support cases, AWS Trusted Advisor, and AWS Health using the provided code sample to generate insights based on your support data.

Overview of solution

Today, Amazon Q Business provides 43 connectors available to natively integrate with multiple data sources. In this post, we’re using the APIs for AWS Support, AWS Trusted Advisor, and AWS Health to programmatically access the support datasets and use the Amazon Q Business native Amazon Simple Storage Service (Amazon S3) connector to index support data and provide a prebuilt chatbot web experience. The AWS Support, AWS Trusted Advisor, and AWS Health APIs are available for customers with Enterprise Support, Enterprise On-Ramp, or Business support plans.

Q Support Insights (QSI) is the name of the solution provided in the code sample repository. QSI enables insights on your AWS Support datasets across your AWS accounts. The following diagram describes at a high level the QSI solution and components.

Overview of the QSI solution

Figure 1: Overview of the QSI solution

There are two major components in the QSI solution. First, as illustrated in the Linked Accounts group in Figure 1, this solution supports datasets from linked accounts and aggregates your data using the various APIs, AWS Lambda, and Amazon EventBridge. Second, the support datasets from linked accounts are stored in a central S3 bucket that you own, as shown in the Data Collection Account group in the Figure 1. These datasets are then indexed using the Amazon Q Business S3 connector.

Under the hood, the Amazon Q Business S3 connector creates a searchable index of your AWS Support datasets, and gathers relevant important details related to keywords like case titles, descriptions, best practices, keywords, dates, and so on. The generative AI capabilities of Amazon Q Business enable it to synthesize insights and generate natural language responses available for users in the Amazon Q Business web chat experience. Amazon Q Business also supports plugins and actions so users can directly create tickets in the ITSM system without leaving the chat experience.

By default, Amazon Q Business will only produce responses using the data you’re indexing. This behavior is aligned with the use cases related to our solution. If needed, this response setting can be changed to allow Amazon Q to fallback to large language model (LLM) knowledge.

Walkthrough

The high-level steps to deploy the solution are the following:

  1. Create the necessary buckets to contain the support cases exports and deployment resources.
  2. Upload the support datasets (AWS Support cases, AWS Trusted Advisor, and AWS Health) to the S3 data source bucket.
  3. Create the Amazon Q Business application, the data source, and required components using deployment scripts.
  4. Optionally, configure ITSM integration by using one of the available Amazon Q Business built-in plugins.
  5. Synchronize the data source to index the data.
  6. Test the solution through chat.

The full guidance and deployment options are available in the aws-samples Github repository. The solution can be deployed in a single account or in an AWS Organizations. In addition to the data security and protection Amazon Q Business supports, this solution integrates with your identity provider and respects access control lists (ACLs) so users get answers based on their unique permissions. This solution also provides additional controls to include or exclude specific accounts.

Prerequisites

For this solution to work, the following prerequisites are needed:

Create the Amazon Q Business application using the deployment scripts

Using the Amazon Q Business application creation module, you can set up and configure an Amazon Q Business application, along with its crucial components, in an automated manner. These components include an Amazon S3 data source connector, required IAM roles, and Amazon Q Business web experience.

Deploy the Amazon Q Business application

As stated in the preceding prerequisites section, IAM Identity Center must be configured in the same Region (us-east-1 or us-west-2) as your Amazon Q Business application.

To deploy and use the Amazon Q Business application, follow the steps described in the Amazon Q Business application creation module. The steps can be summarized as:

  1. Launch an AWS CloudShell in either the us-east-1 or us-west-2 Region in your data collection central account and clone the repository from GitHub.
  2. Navigate to the repository directory and run the deployment script, providing the required inputs when prompted. As stated in the prerequisites, an S3 bucket name is required in the data collection central account.
  3. After deployment, synchronize the data source, assign access to users and groups, and use the deployed web experience URL to interact with the Amazon Q Business application.

[Optional] Integrate your ITSM system

To integrate with your ITSM system, follow these steps:

  1. Within the Amazon Q Business application page, choose Plugins in the navigation pane and choose Add plugin.
  2. From the list of available plugins, select the one that matches your system. For example, Jira, ServiceNow, or Zendesk.
  3. Enter the details on the next screen (see Figure 2) for Amazon Q Business application to make the connection. This integration will result in directly logging tickets from Amazon Q Business to your IT teams based on data within the Amazon Q Business application.
The Amazon Q Business plug-in creation page

Figure 2 The Amazon Q Business plug-in creation page

Support Collector

You can use the Support Collector module to set up and configure AWS EventBridge to collect support-related data. This data includes information from AWS Support cases, AWS Trusted Advisor, and AWS Health. The collected data is then uploaded to a designated S3 bucket in the data collection account. The solution will retrieve up to 6 months of data by default, though you can change the timeframe to a maximum of 12 months.

Additionally, the Support Collector can synchronize with the latest updates on a daily basis, ensuring that your support data is always up to date. The Support Collector is configured through an AWS Lambda function and EventBridge, offering flexibility in terms of the data sources (AWS Support cases, AWS Trusted Advisor, and AWS Health) you want to include or exclude. You can choose data from one, two, or all three of these sources by configuring the appropriate scheduler.

Deploy the Support Collector

To deploy and use the Support Collector, follow the steps described in the Support Collector module.

The repository contains scripts and resources to automate the deployment of Lambda functions in designated member accounts. The deployed Lambda functions collect and upload AWS Support data (Support Cases, Health Events, and Trusted Advisor Checks) to an S3 bucket in the data collection central account. The collected data can be analyzed using Amazon Q Business.

There are two deployment options:

  1. AWS Organizations (StackSet): Use this option if you have AWS Organizations set up and want to deploy in accounts under organizational units. It creates a CloudFormation StackSet in the central account to deploy resources (IAM roles, Lambda functions, and EventBridge) across member accounts.
  2. Manual deployment of individual accounts (CloudFormation): Use this option if you don’t want to use AWS Organizations and want to target a few accounts. It creates a CloudFormation stack in a member account to deploy resources (IAM roles, Lambda functions, and EventBridge).

After deployment, an EventBridge scheduler periodically invokes the Lambda function to collect support data and store it in the data collection S3 bucket. Testing the Lambda function is possible with a custom payload. The deployment steps are fully automated using a shell script. The Q Support Insights (QSI) – AWS Support Collection Deployment guide, located in the src/support_collector subdirectory, outlines the steps to deploy the resources.

Amazon Q Business web experience

You can ask support-related questions using the Amazon Q Business web experience after you have the relevant support data collected in the S3 bucket and successfully indexed. For steps to configure and collect the data, see the preceding Support Collector section. Using the web experience, you can then ask questions as shown in the following demonstration.

Using Amazon Q Business web experience to get troubleshooting recommendations

Using Amazon Q Business web experience to get performance recommendations

Using Amazon Q Business web experience to get operational recommendations

Using Amazon Q Business web experience to get performance recommendations

Figure 3 Using Amazon Q Business web experience to get performance recommendations

Sample prompts

Try some of the following sample prompts:

  • I am having trouble with EKS add-on installation failures. It is giving ConfigurationConflict errors. Based on past support cases, please provide a resolution.
  • List AWS Account IDs with insufficient IPs
  • List health events with increased error rates
  • List services being deprecated this year
  • My Lambda function is running slow. How can I speed it up?

Clean up

After you’re done testing the solution, you can delete the resources to avoid incurring additional charges. See the Amazon Q Business pricing page for more information. Follow the instructions in the GitHub repository to delete the resources and corresponding CloudFormation templates.

Conclusion

In this post, you deployed a solution that indexes data from your AWS Support datasets stored in Amazon S3 and other AWS data sources like AWS Trusted Advisor and AWS Health. This demonstrates how to use new generative AI services like Amazon Q Business to find patterns across your most frequent issues, author new content such as internal documentation or an FAQ. Using support data presents a valuable opportunity to proactively address and prevent recurring issues in your AWS environment by using insights gained from past experiences. Embracing these insights enables a more resilient and optimized AWS experience tailored to your specific needs.

This solution can be expanded to use other internal data sources your company might use and use natural language to understand optimization opportunities that your teams can implement.


About the authors

ChitreshChitresh Saxena is a Sr. Technical Account Manager specializing in generative AI solutions and dedicated to helping customers successfully adopt AI/ML on AWS. He excels at understanding customer needs and provides technical guidance to build, launch, and scale AI solutions that solve complex business problems.

JonathanJonathan Delfour is a Principal Technical Account Manager supporting Energy customers, providing top-notch support as part of the AWS Enterprise Support team. His technical guidance and unwavering commitment to excellence ensure that customers can leverage the full potential of AWS, optimizing their operations and driving success.

KrishnaKrishna Atluru is an Enterprise Support Lead at AWS. He provides customers with in-depth guidance on improving security posture and operational excellence for their workloads. Outside of work, Krishna enjoys cooking, swimming and travelling.

ArishArish Labroo is a Principal Specialist Technical Account Manager – Builder supporting large AWS customers. He is focused on building strategic tools that help customers get the most value out of Enterprise Support.

ManikManik Chopra is a Principal Technical Account Manager at AWS. He helps customers adopt AWS services and provides guidance in various areas around Data Analytics and Optimization. His areas of expertise include delivering solutions using Amazon QuickSight, Amazon Athena, and various other automation techniques. Outside of work, he enjoys spending time outdoors and traveling.

Read More

Research Focus: Week of July 15, 2024

Research Focus: Week of July 15, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: July 15, 2024

MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In a recent article: MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model, researchers from Microsoft present MG-TSD, a novel approach aimed at tackling this challenge.

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models, yielding remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, the researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks, ranging from 4.7% to 35.8%.

The paper introducing this research: MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process(opens in new tab) (opens in new tab), was presented at ICLR 2024 (opens in new tab).


Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Machine learning applications based on large language models (LLMs) have been widely deployed in consumer products. Increasing the model size and its training dataset have played an important role in this process. Since larger model size can bring higher model accuracy, it is likely that future models will also grow in size, which vastly increases the computational and memory requirements of LLMs.

Mixture-of-Experts (MoE) architecture, which can increase model size without proportionally increasing computational requirements, was designed to address this challenge. Unfortunately, MoE’s high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE’s memory-hungry expert parameters to central processing unit (CPU) memory fall short.

In a recent paper: Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference, researchers from Microsoft address these challenges using algorithm-system co-design. Pre-gated MoE alleviates the dynamic nature of sparse expert activation, addressing the large memory footprint of MoEs while also sustaining high performance. The researchers demonstrate that pre-gated MoE improves performance, reduces graphics processing unit (GPU) memory consumption, and maintains model quality.

Spotlight: Event

Inclusive Digital Maker Futures for Children via Physical Computing

This workshop will bring together researchers and educators to imagine a future of low-cost, widely available digital making for children, both within the STEAM classroom and beyond.


What Matters in a Measure? A Perspective from Large-Scale Search Evaluation

Evaluation is a crucial aspect of information retrieval (IR) and has been thoroughly studied by academic and professional researchers for decades. Much of the research literature discusses techniques to produce a single number, reflecting the system’s performance: precision or cumulative gain, for example, or dozens of alternatives. Those techniques—metrics—are themselves evaluated, commonly by reference to sensitivity and validity.

To measure search in industry settings, many other aspects must be considered. For example, how much a metric costs; how robust it is to the happenstance of sampling; whether it is debuggable; and what is incentivized when a metric is taken as a goal. In a recent paper: What Matters in a Measure? A Perspective from Large-Scale Search Evaluation, researchers from Microsoft discuss what makes a search metric successful in large-scale settings, including factors which are not often canvassed in IR research, but which are important in “real-world” use. The researchers illustrate this discussion with examples from industrial settings and elsewhere and offer suggestions for metrics as part of a working system.


LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data

Partial differential equations (PDEs) are ubiquitous in mathematically-oriented scientific fields, such as physics and engineering. The ability to solve PDEs accurately and efficiently can empower deep understanding of the physical world. However, in many complex PDE systems, traditional solvers are too time-consuming. Recently, deep learning-based methods including neural operators have been successfully used to provide faster PDE solvers through approximating or enhancing conventional ones. However, this requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, also known as mean squared residual (MSR) loss constructed by the discretized PDE.

In a recent paper: LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data, researchers from Microsoft investigate the physical information in the MSR loss, or long-range entanglements. They identify the challenge: the neural network must model the long-range entanglements in the spatial domain of the PDE, whose patterns vary. To tackle the challenge, they propose LordNet, a tunable and efficient neural network for modeling various entanglements. Their tests show that Lordnet can be 40× faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.


FXAM: A unified and fast interpretable model for predictive analytics

Generalized additive model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serving predictive analytics in terms of both accuracy and training efficiency. In a recent paper: FXAM: A unified and fast interpretable model for predictive analytics, researchers from Microsoft propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM’s modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called three-stage iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. The researchers design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. They show that TSI is mathematically guaranteed to converge to the global optimum. They further propose a set of optimization techniques to speed up FXAM’s training algorithm to meet the needs of interactive analysis.

Microsoft Research in the news


Sriram Rajamani at Microsoft Research on AI and deep tech in India 

Fobes India | June 28, 2024

Sriram K Rajamani, managing director of Microsoft Research India Lab, reflects on computer science and engineering research, including how AI and LLMs can help solve local needs. Rajamani also discusses the technical aspects of how modern AI models work, and best practices from the research lab that could apply to India’s deep tech ecosystem.

The post Research Focus: Week of July 15, 2024 appeared first on Microsoft Research.

Read More

Decoding How AI-Powered Upscaling on NVIDIA RTX Improves Video Quality

Decoding How AI-Powered Upscaling on NVIDIA RTX Improves Video Quality

Editor’s note: This post is part of the AI Decoded series, which demystifies AI by making the technology more accessible, and showcases new hardware, software, tools and accelerations for RTX PC and workstation users.

Video is everywhere — nearly 80% of internet bandwidth today is used to stream video from content providers and social networks. While screens have become bigger and support higher resolutions, nearly all video is only 1080p quality or lower.

Upscalers can help sharpen streamed video and, powered by AI on the NVIDIA RTX platform, significantly enhance image quality and detail.

What Is an Upscaler?

The larger file size of videos makes it harder to compress and transmit compared to images or text. Platforms like Netflix, Vimeo and YouTube work around this limitation by encoding video — the process of compressing the raw source of a video into a smaller container format.

The encoder first analyzes the video to decide what information it can remove to make it fit a target resolution and frame rate. If the target bitrate is insufficient, the video quality decreases, resulting in a loss of detail and sharpness and the presence of encoding artifacts. The smaller the file, the easier it is to share on the internet — but the worse it looks.

Typically, software on the viewer’s device will upscale the video file to fit the display’s native resolution. However, these upscalers are fairly simplistic, merely multiplying pixels to meet the desired resolution. They can help sharpen the outlines of objects and scenes, but the final video typically carries encoding artifacts and sometimes looks over-sharpened and unnatural.

AI Know a Better Way

The NVIDIA RTX platform uses AI to easily de-artifact and upscale videos.

Easily de-artifact and upscale videos with RTX.

The process of AI upscaling involves analyzing images and motion vectors to generate new details not present in the original video. Instead of merely multiplying pixels, it recognizes the patterns of the image and enhances them to provide greater detail and video quality.

Images must be first de-artifacted before any processing begins. Artifacts — or unwanted distortions and anomalies that appear in video and image files — occur due to overcompression or data loss during transmission and storage.

NVIDIA AI networks can de-artifact images, helping remove blocky areas sometimes seen in streamed video. Without this first step, AI upscalers might end up enhancing the artifacted image itself instead of the desired content.

Super-Sized Video

Just like putting on a pair of prescription glasses can instantly snap the world into focus, RTX Video Super Resolution, one of NVIDIA’s latest innovations in AI-enhanced video technology, gives users a clearer picture into the world of streamed video.

Click the image to see the differences between bicubic upscaling (left) and RTX Video Super Resolution (right).

Available on GeForce RTX 40 and 30 Series GPUs and RTX professional GPUs, it uses AI running on dedicated Tensor Cores to remove block compression artifacts and upscale lower-resolution content up to 4K, matching the user’s native display resolution.

RTX Video Super Resolution can be used to enhance all video watched on browsers. By combining de-artifacting with AI upscaling techniques, it can make even low-bitrate Twitch streams look stunningly clear. RTX Video Super Resolution is also supported in popular video apps like VLC so users can apply the same upscaling process to their offline videos.

Creators can soon use RTX Video Super Resolution in editing apps like Black Magic’s Davinci Resolve, making it easier than ever to upscale lower-quality video files to 4K resolution, as well as convert standard-dynamic range source files into high-dynamic range (HDR).

Say Hi to High-Dynamic Range

RTX Video now also supports AI HDR. HDR video supports a wider range of colors, lending greater detail especially to the darker and lighter areas of images. The problem is that there isn’t that much HDR content online yet.

Enter RTX Video HDR — by simply turning on the feature, the AI network will turn any standard or low-dynamic-range content into HDR, performing the correct tone mapping so the image still looks natural and retains its original colors.

AI Across the Board

RTX Video is just the latest implementation of AI upscaling powered by NVIDIA RTX.

Members of the GeForce NOW cloud streaming service can play their favorite PC games on nearly any device. GeForce RTX servers located all over the world first render the game video content, encode it and then stream it to the player’s local device — just like streaming video from other content providers.

Members on older NVIDIA GPU-powered devices can still use AI-enhanced upscaling to improve gameplay quality. This means they can enjoy the best of both worlds — gameplay rendered on servers powered by RTX 4080-class GPUs in the cloud and AI-enhanced streaming quality. Get more information on enabling AI-enhanced upscaling on GeForce NOW.

The NVIDIA SHIELD TV takes this one step further, processing AI neural networks directly on its NVIDIA Tegra system-on-a-chip to upscale 1080p-quality or lower content from nearly any streaming platform to a display’s native resolution. That means users can improve the video quality of content streamed from Netflix, Prime Video, Max, Disney+ and more at the push of a remote button.

SHIELD TV is currently available for up to $30 off in North America and £30 or 35€ off in Europe as part of Amazon’s Prime Day event running July 16-17. For Prime members in Europe, eligible SHIELD TV purchases also include one month of the GeForce NOW Ultimate membership for free, enabling GeForce RTX 4080-class PC gameplay streamed directly to the living room.

AI has enabled unprecedented improvements in video quality, helping set a new standard in streaming experiences.

Generative AI is transforming gaming, videoconferencing and interactive experiences of all kinds. Make sense of what’s new and what’s next by subscribing to the AI Decoded newsletter.

Read More

PINE: Efficient Norm-Bound Verification for Secret-Shared Vectors

Secure aggregation of high-dimensional vectors is a fundamental primitive in federated statistics and learning. A two-server system such as PRIO allows for scalable aggregation of secret-shared vectors. Adversarial clients might try to manipulate the aggregate, so it is important to ensure that each (secret-shared) contribution is well-formed. In this work, we focus on the important and well-studied goal of ensuring that each contribution vector has bounded Euclidean norm. Existing protocols for ensuring bounded-norm contributions either incur a large communication overhead, or only allow for…Apple Machine Learning Research

Projected Language Models: A Large Model Pre-Segmented Into Smaller Ones

This paper has been accepted at the Foundation Models in the Wild workshop at ICML 2024.
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to get a small language model with good specialized accuracy, even when specialization data is unknown during pretraining. We propose a novel architecture, projected networks (PN). PN is a high capacity network whose parameters…Apple Machine Learning Research