Improved ML model deployment using Amazon SageMaker Inference Recommender

Improved ML model deployment using Amazon SageMaker Inference Recommender

Each machine learning (ML) system has a unique service level agreement (SLA) requirement with respect to latency, throughput, and cost metrics. With advancements in hardware design, a wide range of CPU- and GPU-based infrastructures are available to help you speed up inference performance. Also, you can build these ML systems with a combination of ML models, tasks, frameworks, libraries, tools, and inference engines, making it important to evaluate the ML system performance for the best possible deployment configurations. You need recommendations on finding the most cost-effective ML serving infrastructure and the right combination of software configuration to achieve the best price-performance to scale these applications.

Amazon SageMaker Inference Recommender is a capability of Amazon SageMaker that reduces the time required to get ML models in production by automating load testing and model tuning across SageMaker ML instances. In this post, we highlight some of the recent updates to Inference Recommender:

  • SageMaker Python SDK support for running Inference Recommender
  • Inference Recommender usability improvements
  • New APIs that provide flexibility in running Inference Recommender
  • Deeper integration with Amazon CloudWatch for logging and metrics

Credit card fraud detection use case

Any fraudulent activity that is not detected and mitigated immediately can cause significant financial loss. Particularly, credit card payment fraud transactions need to be identified right away to protect the individual’s and company’s financial health. In this post, we discuss a credit card fraud detection use case, and learn how to use Inference Recommender to find the optimal inference instance type and ML system configurations that can detect fraudulent credit card transactions in milliseconds.

We demonstrate how to set up Inference Recommender jobs for a credit card fraud detection use case. We train an XGBoost model for a classification task on a credit card fraud dataset. We use Inference Recommender with a custom load to meet inference SLA requirements to satisfy peak concurrency of 30,000 transactions per minute while serving predictions results in less than 100 milliseconds. Based on Inference Recommender’s instance type recommendations, we can find the right real-time serving ML instances that yield the right price-performance for this use case. Finally, we deploy the model to a SageMaker real-time endpoint to get prediction results.

The following table summarizes the details of our use case.

Model Framework XGBoost
Model Size 10 MB
End-to-End Latency 100 milliseconds
Invocations per Second 500 (30,000 per minute)
ML Task Binary Classification
Input Payload 10 KB

We use a synthetically created credit card fraud dataset. The dataset contains 28 numerical features, time of the transaction, transaction amount, and class target variables. The class column corresponds to whether or not a transaction is fraudulent. The majority of data is non-fraudulent (284,315 samples), with only 492 samples corresponding to fraudulent examples. In the data, Class is the target classification variable (fraudulent vs. non-fraudulent) in the first column, followed by other variables.

In the following sections, we show how to use Inference Recommender to get ML hosting instance type recommendations and find optimal model configurations to achieve better price-performance for your inference application.

Which ML instance type and configurations should you select?

With Inference Recommender, you can run two types of jobs: default and advanced.

The default Instance Recommender job runs a set of load tests to recommended the right ML instance types for any ML use case. SageMaker real-time deployment supports a wide range of ML instances to host and serve the credit card fraud detection XGBoost model. The default job can run a load test on a selection of instances that you provide in the job configuration. If you have an existing endpoint for this use case, you can run this job to find the cost-optimized performant instance type. Inference Recommender will compile and optimize the model for a specific hardware of inference endpoint instance type using Amazon SageMaker Neo. It’s important to note that not all compilation results in improved performance. Inference Recommender will report compilation details when the following conditions are met:

  • Successful compilation of the model using Neo. There could be issues in the compilation process such as invalid payload, data type, or more. In this case, compilation information is not available.
  • Successful inference using the compiled model that shows performance improvement, which appears in the inference job response.

An advanced job is a custom load test job that allows you to perform extensive benchmarks based on your ML application SLA requirements, such as latency, concurrency, and traffic pattern. You can configure a custom traffic pattern to simulate credit card transactions. Additionally, you can define the end-to-end model latency to predict if a transaction is fraudulent and define the maximum concurrent transactions to the model for prediction. Inference Recommender uses this information to run a performance benchmark load test. The latency, concurrency, and cost metrics from the advanced job help you make informed decisions about the ML serving infrastructure for mission-critical applications.

Solution overview

The following diagram shows the solution architecture for training an XGBoost model on the credit card fraud dataset, running a default job for instance type recommendation, and performing load testing to decide the optimal inference configuration for the best price-performance.

The diagram shows the following steps:

  1. Train an XGBoost model to classify credit card transactions as fraudulent or legit. Deploy the trained model to a SageMaker real-time endpoint. Package the model artifacts and sample payload (.tar.gz format), and upload them to Amazon Simple Storage Service (Amazon S3) so Inference Recommender can use these when the job is run. Note that the training step in this post is optional.
  2. Configure and run a default Inference Recommender job on a list of supported instance types to find the right ML instance type that gives the best price-performance for this use case.
  3. Optionally, run a default Inference Recommender job on an existing endpoint.
  4. Configure and run an advanced Inference Recommender job to perform a custom load test to simulate user interactions with the credit card fraud detection application. This helps you find the right configurations to satisfy latency, concurrency, and cost for this use case.
  5. Analyze the default and advanced Inference Recommender job results, which include ML instance type recommendation latency, performance, and cost metrics.

A complete example is available in our GitHub notebook.

Prerequisites

To use Inference Recommender, make sure to meet the prerequisites.

Python SDK support for Inference Recommender

We recently released Python SDK support for Inference Recommender. You can now run default and advanced jobs using a single function: right_size. Based on the parameters of the function call, Inference Recommender infers if it should run default or advanced jobs. This greatly simplifies the use of Inference Recommender using the Python SDK. To run the Inference Recommender job, complete the following steps:

  1. Create a SageMaker model by specifying the framework, version, and image scope:
    model = Model(
        model_data=model_url,
        role=role,
        image_uri = sagemaker.image_uris.retrieve(framework="xgboost", 
        region=region, 
        version="1.5-1", 
        py_version="py3", 
        image_scope='inference'),
        sagemaker_session=sagemaker_session
        )

  2. Optionally, register the model in the SageMaker model registry. Note that parameters such as domain and task during model package creation are also optional parameters in the recent release.
    model_package = model.register(
        content_types=["text/csv"],
        response_types=["text/csv"],
        model_package_group_name=model_package_group_name,
        image_uri=model.image_uri,
        approval_status="Approved",
        framework="XGBOOST"
    )

  3. Run the right_size function on the supported ML inference instance types using the following configuration. Because XGBoost is a memory-intensive algorithm, we provide ml.m5 type instances to get instance type recommendations. You can call the right_size function on the model registry object as well.
    model.right_size(
        sample_payload_url=sample_payload_url,
        supported_content_types=["text/csv"],
        supported_instance_types=["ml.m5.large", 
                                  "ml.m5.xlarge", 
                                  "ml.m5.2xlarge", 
                                  "ml.m5.4xlarge", 
                                  "ml.m5.12xlarge"],
        framework="XGBOOST",
        job_name="credit-card-fraud-default-job"
    )
    INFO:sagemaker:Advance Job parameters were not specified. Running Default job...

  4. Define additional parameters to the right_size function to run an advanced job and custom load test on the model:
    1. Configure the traffic pattern using the phases parameter. In the first phase, we start the load test with two initial users and create two new users for every minute for 2 minutes. In the following phase, we start the load test with six initial users and create two new users for every minute for 2 minutes. Stopping conditions for the load tests are p95 end-to-end latency of 100 milliseconds and concurrency to support 30,000 transactions per minute or 500 transactions per second.
    2. We tune the endpoint against the environment variable OMP_NUM_THREADS with values [3,4,5] and we aim to limit the latency requirement to 100 milliseconds and achieve max concurrency of 30,000 invocations per minute. The goal is to find which value for OMP_NUM_THREADS provides the best performance.
from sagemaker.parameter import CategoricalParameter 
from sagemaker.inference_recommender.inference_recommender_mixin import (  
    Phase,  
    ModelLatencyThreshold 
) 
hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.m5.4xlarge"]), 
        "OMP_NUM_THREADS": CategoricalParameter(["3", "4", "6"]),
    } 
] 
phases = [ 
    Phase(duration_in_seconds=120, initial_number_of_users=2, spawn_rate=2), 
    Phase(duration_in_seconds=120, initial_number_of_users=6, spawn_rate=2) 
] 
model_latency_thresholds = [ 
    ModelLatencyThreshold(percentile="P95", value_in_milliseconds=100) 
]

model.right_size( 
    sample_payload_url=sample_payload_url, 
    supported_content_types=["text/csv"], 
    framework="XGBOOST", 
    job_duration_in_seconds=7200, 
    hyperparameter_ranges=hyperparameter_ranges, 
    phases=phases, # TrafficPattern 
    max_invocations=30000, # StoppingConditions 
    model_latency_thresholds=model_latency_thresholds,
    job_name="credit-card-fraud-advanced-job"
)
INFO:sagemaker:Advance Job parameters were specified. Running Advanced job...

Run Inference Recommender jobs using the Boto3 API

You can use the Boto3 API to launch Inference Recommender default and advanced jobs. You need to use the Boto3 API (create_inference_recommendations_job) to run Inference Recommender jobs on an existing endpoint. Inference Recommender infers the framework and version from the existing SageMaker real-time endpoint. The Python SDK doesn’t support running Inference Recommender jobs on existing endpoints.

The following code snippet shows how to create a default job:

sagemaker_client.create_inference_recommendations_job(
    JobName = "credit-card-fraud-default-job",
    JobType = 'Default',
    RoleArn = <ROLE_ARN>,
    InputConfig = {
        'ModelPackageVersionArn': <MODEL_PACKAGE_ARN>, #optional
        'Endpoints': ['EndpointName': <ENDPOINT_POINT>]
    }
)

Later in this post, we discuss the parameters needed to configure an advanced job.

Configure a traffic pattern using the TrafficPattern parameter. In the first phase, we start a load test with two initial users (InitialNumberOfUsers) and create two new users (SpawnRate) for every minute for 2 minutes (DurationInSeconds). In the following phase, we start the load test with six initial users and create two new users for every minute for 2 minutes. Stopping conditions (StoppingConditions) for the load tests are p95 end-to-end latency (ModelLatencyThresholds) of 100 milliseconds (ValueInMilliseconds) and concurrency to support 30,000 transactions per minute or 500 transactions per second (MaxInvocations). See the following code:

env_parameter_ranges = [{"Name": "OMP_NUM_THREADS", "Value": ["3", "4", "5"]}]

sagemaker_client.create_inference_recommendations_job(JobName=load_test_job_name,
        JobType='Advanced', RoleArn=role_arn, InputConfig={
    'ModelPackageVersionArn': model_package_arn, #optional
    'JobDurationInSeconds': 7200,
    'TrafficPattern': {'TrafficType': 'PHASES',
                       'Phases': [
                       {'InitialNumberOfUsers': 2,
                       'SpawnRate': 2, 
                       'DurationInSeconds': 120
                       },
                       {'InitialNumberOfUsers': 6, 
                       'SpawnRate': 6,
                       'DurationInSeconds': 120
                       }]},
    'ResourceLimit': {'MaxNumberOfTests': 10, 'MaxParallelOfTests': 3},
    'EndpointConfigurations': [{'InstanceType': 'ml.m5.4xlarge'
                                'EnvironmentParameterRanges': 
                                {'CategoricalParameterRanges': env_parameter_ranges}
                                }],
    }, StoppingConditions={'MaxInvocations': 30000,
                           'ModelLatencyThresholds': 
                           [{'Percentile': 'P95',
                            'ValueInMilliseconds': 100
                           }]})

Inference Recommender job results and metrics

The results of the default Inference Recommender job contain a list of endpoint configuration recommendations, including instance type, instance count, and environment variables. The results contain configurations for SAGEMAKER_MODEL_SERVER_WORKERS and OMP_NUM_THREADS associated with the latency, concurrency, and throughput metrics. OMP_NUM_THREADS is the model server tunable environment parameter. As shown in the details in the following table, with an ml.m5.4xlarge instance with SAGEMAKER_MODEL_SERVER_WORKERS=3 and OMP_NUM_THREADS=3, we got a throughput of 32,628 invocations per minute and model latency under 10 milliseconds. ml.m5.4xlarge had 100% improvement in latency, an approximate 115% increase in concurrency compared to the ml.m5.xlarge instance configuration. Also, it was 66% more cost-effective compared to the ml.m5.12xlarge instance configurations while achieving comparable latency and throughput.

Instance Type Initial Instance Count OMP_NUM_THREADS Cost Per Hour Max Invocations Model Latency CPU Utilization Memory Utilization SageMaker Model Server Workers
ml.m5.xlarge 1 2 0.23 15189 18 108.864 1.62012 1
ml.m5.4xlarge 1 3 0.922 32628 9 220.57001 0.69791 3
ml.m5.large 1 2 0.115 13793 19 106.34 3.24398 1
ml.m5.12xlarge 1 4 2.765 32016 4 215.32401 0.44658 7
ml.m5.2xlarge 1 2 0.461 32427 13 248.673 1.43109 3

We have included CloudWatch helper functions in the notebook. You can use the functions to get detailed charts of your endpoints during the load test. The charts have details on invocation metrics like invocations, model latency, overhead latency, and more, and instance metrics such as CPUUtilization and MemoryUtilization. The following example shows the CloudWatch metrics for our ml.m5.4xlarge model configuration.

You can visualize Inference Recommender job results in Amazon SageMaker Studio by choosing Inference Recommender under Deployments in the navigation pane. With a deployment goal for this use case (high latency, high throughput, default cost), the default Inference Recommender job recommended an ml.m5.4xlarge instance because it provided the best latency performance and throughput to support a maximum 34,600 invocations per minute (576 TPS). You can use these metrics to analyze and find the best configurations that satisfy latency, concurrency, and cost requirements of your ML application.

We recently introduced ListInferenceRecommendationsJobSteps, which allows you to analyze subtasks in an Inference Recommender job. The following code snippet shows how to use the list_inference_recommendations_job_steps Boto3 API to get the list of subtasks. This can help with debugging Inference Recommender job failures at the step level. This functionality is not supported in the Python SDK yet.

sm_client = boto3.client("sagemaker", region_name=region)
list_job_steps_response = sm_client.list_inference_recommendations_job_steps
                          (JobName='<JOB_NAME>')
print(list_job_steps_response)

The following code shows the response:

{
    "Steps": [
        {
            "StepType": "BENCHMARK",
            "JobName": "SMPYTHONSDK-<JOB_NAME>",
            "Status": "COMPLETED",
            "InferenceBenchmark": {
                "Metrics": {
                    "CostPerHour": 1.8359999656677246,
                    "CostPerInference": 1.6814110495033674e-06,
                    "MaxInvocations": 18199,
                    "ModelLatency": 40,
                    "CpuUtilization": 106.06400299072266,
                    "MemoryUtilization": 0.3920480012893677
                },
                "EndpointConfiguration": {
                    "EndpointName": "sm-epc-<ENDPOINTNAME>",
                    "VariantName": "sm-epc-<VARIANTNAME>",
                    "InstanceType": "ml.c5.9xlarge",
                    "InitialInstanceCount": 1
                },
                "ModelConfiguration": {
                    "EnvironmentParameters": [
                        {
                            "Key": "SAGEMAKER_MODEL_SERVER_WORKERS",
                            "ValueType": "String",
                            "Value": "1"
                        },
                        {
                            "Key": "OMP_NUM_THREADS",
                            "ValueType": "String",
                            "Value": "28"
                        }
                    ]
                }
            }
        },
     ...... <TRUNCATED>
    "ResponseMetadata": {
        "RequestId": "<RequestId>",
        "HTTPStatusCode": 200,
        "HTTPHeaders": {
            "x-amzn-requestid": "<x-amzn-requestid>",
            "content-type": "application/x-amz-json-1.1",
            "content-length": "1443",
            "date": "Mon, 20 Feb 2023 16:53:30 GMT"
        },
        "RetryAttempts": 0
    }
}

Run an advanced Inference Recommender job

Next, we run an advanced Inference Recommender job to find optimal configurations such as SAGEMAKER_MODEL_SERVER_WORKERS and OMP_NUM_THREADS on an ml.m5.4xlarge instance type. We set the hyperparameters of the advanced job to run a load test on different combinations:

hyperparameter_ranges = [ 
    { 
        "instance_types": CategoricalParameter(["ml.m5.4xlarge"]), 
        "OMP_NUM_THREADS": CategoricalParameter(["3", "4", "6"]),
    } 
]

You can view the advanced Inference Recommender job results on the Studio console, as shown in the following screenshot.

Using the Boto3 API or CLI commands, you can access all the metrics from the advanced Inference Recommender job results. InitialInstanceCount is the number of instances that you should provision in the endpoint to meet ModelLatencyThresholds and MaxInvocations mentioned in StoppingConditions. The following table summarizes our results.

Instance Type Initial Instance Count OMP_NUM_THREADS Cost Per Hour Max Invocations Model Latency CPU Utilization Memory Utilization
ml.m5.2xlarge 2 3 0.922 39688 6 86.732803 3.04769
ml.m5.2xlarge 2 4 0.922 42604 6 177.164993 3.05089
ml.m5.2xlarge 2 5 0.922 39268 6 125.402 3.08665
ml.m5.4xlarge 2 3 1.844 38174 4 102.546997 2.68003
ml.m5.4xlarge 2 4 1.844 39452 4 141.826004 2.68136
ml.m5.4xlarge 2 5 1.844 40472 4 107.825996 2.70936

Clean up

Follow the instructions in the notebook to delete all the resources created as part of this post to avoid incurring additional charges.

Summary

Finding the right ML serving infrastructure, including instance type, model configurations, and auto scaling polices, can be tedious. This post showed how you can use the Inference Recommender Python SDK and Boto3 APIs to launch default and advanced jobs to find the optimal inference infrastructure and configurations. We also discussed the new improvements to Inference Recommender, including Python SDK support and usability improvements. Check out our GitHub repository to get started.


About the Authors

Shiva Raaj Kotini works as a Principal Product Manager in the AWS SageMaker inference product portfolio. He focuses on model deployment, performance tuning, and optimization in SageMaker for inference.

John Barboza is a Software Engineer at AWS. He has extensive experience working on distributed systems. His current focus is on improving the SageMaker inference experience. In his spare time, he enjoys cooking and biking.

Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like Amazon EMR, Amazon EFA, and Amazon RDS. Currently, he is focused on improving the SageMaker inference experience. In his spare time, he enjoys hiking and marathons.

Ram Vegiraju is an ML Architect with the SageMaker service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Vikram Elango is an Sr. AIML Specialist Solutions Architect at AWS, based in Virginia USA. He is currently focused on Generative AI, LLMs, prompt engineering, large model inference optimization and scaling ML across enterprises. Vikram helps financial and insurance industry customers with design, thought leadership to build and deploy machine learning applications at scale. In his spare time, he enjoys traveling, hiking, cooking and camping with his family.

Read More

Don’t Wait: GeForce NOW Six-Month Priority Memberships on Sale for Limited Time

Don’t Wait: GeForce NOW Six-Month Priority Memberships on Sale for Limited Time

GFN Thursday rolls up this week with a hot new deal for a GeForce NOW six-month Priority membership.

Enjoy the cloud gaming service with seven new games to stream this week, including more favorites from Bandai Namco Europe and F1 2021 from Electronic Arts.

Make Gaming a Priority 

GeForce NOW Ecosystem
Psst, pass it on! Lock in a lower price now and access Priority for the next six months.

Starting today, GeForce NOW is offering a limited-time discount sure to spark up gamers looking to skip the wait and jump right into their games. Through Sunday, May 21, get 40% off a six-month Priority membership for $29.99, normally $49.99.

It’s perfect for those looking to try out GeForce NOW or lock in a lower price for another half-year with one of GeForce NOW’s premium membership tiers.

Priority members get higher access to GeForce gaming servers, meaning less wait times than free members, as well as gaming sessions extended up to six hours. Priority members can also stream across multiple devices with beautifully ray-traced graphics with RTX ON for supported games.

With new games added weekly and other top titles on the way, gamers won’t want to miss out on this hot deal. Sign up now or power up devices even further with an Ultimate membership, GeForce NOW’s highest performance tier.

The Dark Side of the Cloud

Dark Pictures Anthology on GeForce NOW
Get ready for a “Dark Pictures Anthology” series marathon.

Carve out time this month for The Dark Pictures Anthology series from Bandai Namco Europe, with support for ray tracing to bring a realistic tinge of spine-tingling horror to the cloud.

Each entry of the popular series features interactive horror stories with unique characters and plots. Choose what happens in each adventure while discovering branching narratives and multiple endings.

The first game in the series, Man of Medan, follows a group of young adults stranded on a ghost ship haunted by malevolent spirits. Little Hope, next in the series, features a group of college students and their professor who must escape a cursed New England town haunted by witches.

House of Ashes, the series’ third installment, features war soldiers who unknowingly awaken an ancient evil force when they stumble across an underground temple. The Devil in Me, set in a replica of a real-life serial killer’s “Murder Castle,” follows a documentary film crew as they set out to capture footage of the hotel, only to find their lives at grave risk.

With more Bandai Namco titles joining the cloud this week, it’ll be easy as pie to start a marathon for The Dark Pictures Anthology series, if you dare.

Take a Victory Lap 

F1 2021 on GeForce NOW
Light up the track with “F1 2021” in the cloud.

Jump into the seat of a Formula One driver in F1 2021 from Electronic Arts. Every story has its beginning, and members can start theirs with the Braking Point story mode, a thrilling experience with on-track competition and off-track drama.

Take on the Formula One world and go head to head against official teams and drivers in “Career Mode,” solo or with a buddy. Or, put the pedal to the metal with “Real-Season Start” mode to redo the F1 2021 season as desired, if Max Verstappen didn’t happen to be your winner of choice. There’s plenty of content with other gaming modes, including “My Team” and online multiplayer.

F1 2021 is fueled up with RTX ON for Ultimate and Priority members — the ultrarealistic game play will have players feeling as if they’re racing directly from the track.

Blazing-Hot New Games

Monster Hunter Rise Sunbreak on GeForce NOW

Sit back and relax with new games and a demo joining the cloud this week. Start with Monster Hunter Rise — first-time hunters can try it out for free with a demo of the game and its expansion, Sunbreak, available now to all GeForce NOW members.

Plus, the highly anticipated first-person action role-playing game Dead Island 2 launches in the cloud this week. In this sequel to the popular game Dead Island from Deep Silver, horror and dark humor mix well throughout the zombie-slaying adventure set in Los Angeles. Play solo if you dare, or with some buddies.

GFN Thursday keeps on rolling with seven hot new games available to stream this week:

  • Survival: Fountain of Youth (New release on Steam, April 19)
  • Dead Island 2 (New release on Epic Games Store, April 21)
  • F1 2021 (Steam)
  • The Dark Pictures Anthology: Man of Medan (Steam)
  • The Dark Pictures Anthology: Little Hope (Steam)
  • The Dark Pictures Anthology: House of Ashes (Steam)
  • The Dark Pictures Anthology: The Devil in Me (Steam)

That wraps up another GFN Thursday. Let us know how you’re making gaming a priority this weekend in the comments below, or on the GeForce NOW Facebook and Twitter channels.

Read More

NVIDIA Announces Partners of the Year in Europe, Middle East

NVIDIA Announces Partners of the Year in Europe, Middle East

NVIDIA today recognized a dozen partners for their work helping customers in Europe, the Middle East and Africa harness the power of AI across industries.

At a virtual EMEA Partner Day event, which was hosted by the NVIDIA Partner Network (NPN) and drew more than 750 registrants, Partner of the Year awards were given to companies working with NVIDIA to lead AI education and adoption across the region. The winners are transforming how their customers tap AI to improve data centers, manufacturing, sales and marketing workflows and more.

“NVIDIA partners have been at the forefront of technological advances and incredible business opportunities emerging across EMEA, using innovative NVIDIA solutions to help customers reduce costs, increase efficiency and solve their greatest challenges,” said Alfred Manhart, vice president of the EMEA channel at NVIDIA. “With the EMEA NPN Partner of the Year awards, we honor those who play a critical role in the success of our business as they apply their knowledge and expertise to deliver transformative solutions across a range of industries.”

The 2023 NPN award winners for EMEA are:

Central Europe

  • THINK ABOUT IT, Germany — Rising Star Partner of the Year. Recognized for driving exceptional revenue growth of close to 100% across the complete NVIDIA portfolio. Throughout four years of working with NVIDIA, the IT services company has become a cornerstone of the NVIDIA partner landscape in Germany.
  • Delta Computer Reinbek, Germany — Star Performer Partner of the Year. Recognized for outstanding sales achievement and customer relations, deploying NVIDIA high performance computing, machine learning, deep learning and AI solutions to enterprise, industry and higher education and research institutes throughout Germany.
  • MEGWARE Computer, Germany — Go-to-Market Excellence Partner of the Year. Recognized for its broad NVIDIA H100 marketing campaign, using its own benchmark center to increase awareness and generate leads.

Northern Europe

  • AMAX, Ireland — Rising Star Partner of the Year. Recognized for its significant commitment to helping customers meet their AI and HPC goals, and aligning with members of the NVIDIA Inception program for startups to create new opportunities within the enterprise and automotive sectors.
  • Boston Ltd., U.K. — Star Performer Partner of the Year. Recognized for exceptional execution across business areas and implementation of a full-stack approach to deliver complex solutions built on NVIDIA technologies, which have led Boston Ltd. to achieve record revenues.
  • Scan Computers International Ltd., U.K. — Go-to-Market Excellence Partner of the Year. Recognized for designing and delivering several successful marketing campaigns comprising high-quality digital content and a comprehensive user-experience strategy, resulting in a strong return on investment.

Southern Europe and Middle East

  • Computacenter, France — Rising Star Partner of the Year. Recognized for its fast growth across solution areas, addition of many new customers and close engagement with NVIDIA to drive significant revenue growth.
  • MBUZZ, United Arab Emirates — Star Performer Partner of the Year. Recognized for its dedication to increasing the adoption of NVIDIA technologies throughout the Middle East, growing revenue in areas ranging from HPC to visualization and achieving 100% annual growth.
  • Azken Muga, Spain — Go-to-Market Excellence Partner of the Year. Recognized for outstanding marketing performance, alignment with NVIDIA’s strategic goals and consistent investment in high-impact, high-quality marketing campaigns with a focus on the NVIDIA DGX platform.

Distribution Partners

  • PNY Technologies EuropeDistributor of the Year. Recognized for the second consecutive year for providing NVIDIA accelerated computing platforms and software to vertical markets — including media and entertainment, healthcare and cloud data center — as well as its commitment to delivering significant year-on-year sales growth.
  • TD SynnexNetworking Distributor of the Year. Recognized for its dedication toward, expertise in and understanding of both NVIDIA technologies and customers’ business needs, as well as its commitment to partner outreach and consistent reseller support.

Outstanding Impact

  • SoftServeOutstanding Impact Award. Recognized for its commitment to innovating and collaborating with NVIDIA at all levels. SoftServe has dedicated significant time and resources to creating a practice based on the NVIDIA Omniverse platform for building and operating metaverse applications — building industry-specific showcases, developing dedicated Omniverse labs and training, and enabling thousands of its employees to tap NVIDIA solutions.

“It’s an honor for SoftServe to be recognized as the NPN Outstanding Impact Partner of the Year, an award that demonstrates the importance of collaborating with strong partners like NVIDIA to solve complex challenges using cutting-edge technologies,” said Volodymyr Semenyshyn, president of EMEA at SoftServe. “SoftServe is fully dedicated to advancing our vision of accelerated computing by combining NVIDIA’s trailblazing technologies with our strong industry expertise to deliver leading IT solutions and services that empower our customers.”

This year’s awards arrive as AI adoption is rapidly expanding across industries, unlocking new opportunities and accelerating discovery in healthcare, finance, business services and more. As AI models become more complex, the 2023 NPN Award winners are expert partners that can help enterprises develop and deploy AI in production using the infrastructure that best aligns with their operations.

Learn how to join the NPN, or find your local NPN partner.

Read More

Amazon Comprehend document classifier adds layout support for higher accuracy

Amazon Comprehend document classifier adds layout support for higher accuracy

The ability to effectively handle and process enormous amounts of documents has become essential for enterprises in the modern world. Due to the continuous influx of information that all enterprises deal with, manually classifying documents is no longer a viable option. Document classification models can automate the procedure and help organizations save time and resources. Traditional categorization techniques, such as manual processing and keyword-based searches, become less efficient and more time-consuming as the volume of documents increases. This inefficiency causes lower productivity and higher operating expenses. Additionally, it can prevent crucial information from being accessible when needed, which could lead to a poor customer experience and impact decision-making. At AWS re:Invent 2022, Amazon Comprehend, a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text, launched support for native document types. This new feature gave you the ability to classify documents in native formats (PDF, TIFF, JPG, PNG, DOCX) using Amazon Comprehend.

Today, we are excited to announce that Amazon Comprehend now supports custom classification model training with documents like PDF, Word, and image formats. You can now train bespoke document classification models on native documents that support layout in addition to text, increasing the accuracy of the results.

In this post, we provide an overview of how you can get started with training an Amazon Comprehend custom document classification model.

Overview

The capacity to understand the relative placements of objects within a defined space is referred to as layout awareness. In this case, it aids the model in understanding how headers, subheadings, tables, and graphics relate to one another inside a document. The model can more effectively categorize a document based on its content when it’s aware of the structure and layout of the text.

In this post, we walk through the data preparation steps involved, demonstrate the model training process, and discuss the benefits of using the new custom document classification model in Amazon Comprehend. As a best practice, you should consider the following points before you begin training the custom document classification model.

Evaluate your document classification needs

Identify the various types of documents they you may need to classify, along with the different classes or categories to support your use case. Determine the suitable classification structure or taxonomy after evaluating the amount and types of documents that need to be categorized. Document types may vary from PDF, Word, images, and so on. Ensure you have authorized access to a diverse set of labeled documents either via a document management system or other storage mechanisms.

Prepare your data

Ensure that the document files you intend to use for model training aren’t encrypted or locked—for example, make sure that your PDF files aren’t encrypted and locked with a password. You must decrypt such files before you can use them for training purposes. Label a sample of your documents with the appropriate categories or labels (classes). Determine whether single-label classification (multi-class mode) or multi-label classification is appropriate for your use case. Multi-class mode associates only a single class with each document, whereas multi-label mode associates one or more class with a document.

Consider model evaluation

Use the labeled dataset to train the model so it can learn to classify new documents accurately and evaluate how the newly trained model version performs by understanding the model metrics. To understand the metrics provided by Amazon Comprehend post-model training, refer to Custom classifier metrics. After the training process is complete, you can begin classifying documents asynchronously or in real time. We walk through how to train a custom classification model in the following sections.

Prepare the training data

Before we train our custom classification model, we need to prepare the training data. Training data is comprised of a set of labeled documents, which can be pre-identified documents from a document repository that you already have access to. For our example, we trained a custom classification model with a few different document types that are typically found in a health insurance claim adjudication process: patient discharge summary, invoices, receipts, and so on. We also need to prepare an annotations file in CSV format. Following is an example of an annotations file CSV data required for the training:

 discharge_summary,summary-1.pdf,1
 discharge_summary,summary-2.pdf,1
 invoice,invoice-1.pdf,1
 invoice,invoice-1.pdf,2
 invoice,invoice-2.pdf,1

The annotations CSV file must contain three columns. The first column contains the desired class (label) for the document, the second column is the document name (file name), and the last column is the page number of the document that you want to include in the training dataset. Because the training process supports native multi-page PDF and DOCX files, you must specify the page number in case the document is a multi-page document. If you want to include all pages of a multi-page document in the training dataset, you must specify each page as a separate line in the CSV annotations file. For example, in the preceding annotations file, invoice-1.pdf is a two-page document, and we want to include both pages in the classification dataset. Because files like PDF, PNG, and TIFF are image formats, the page number (third column) value must always be 1. If your dataset contains multi-frame (multi-page) TIF files, you must split them into separate TIF files in order to use them in the training process.

We prepared an annotations file called test.csv with the appropriate data to train a custom classification model. For each sample document, the CSV file contains the class that document belongs to, the location of the document in Amazon Simple Storage Service (Amazon S3), such as path/to/prefix/document.pdf, and the page number (if applicable). Because most of our documents are either single-page DOCX, PDF files, or TIF, JPG, or PNG files, the page number assigned is 1. Because our annotations CSV and sample documents are all under the same Amazon S3 prefix, we don’t need to explicitly specify the prefix in the second column. We also prepare at least 10 document samples or more for each class, and we used a mix of JPG, PNG, DOCX, PDF, and TIF files for training the model. Note that it’s usually recommended to have a diverse set of sample documents for model training to avoid overfitting of the model, which impacts its ability to recognize new documents. It’s also recommended that the number of samples per class is balanced, although it’s not required to have an exact same number of samples per class. Next, we upload the test.csv annotations file and all the documents into Amazon S3. The following image shows part of our annotations CSV file.

Train a custom classification model

Now that we have the annotations file and all our sample documents ready, we set up a custom classification model and train it. Before you begin setting up custom classification model training, make sure that the annotations CSV and sample documents exist in an Amazon S3 location.

  1. On the Amazon Comprehend console, choose Custom classification in the navigation pane.
  2. Choose Create new model.
  3. For Model name, enter a unique name.
  4. For Version name, enter a unique version name.
  5. For Training model type, select Native documents.

This tells Amazon Comprehend that you intend to use native document types to train the model instead of serialized text.

  1. For Classifier mode, select Using single-label mode.

This mode tells the classifier that we intend to classify documents into a single class. If you need to train a model with multi-label mode, meaning a document may belong to one or more than one class, you must set up the annotations file appropriately by specifying the classes of the document separated by a special character in the annotations CSV file. In that case, you would select the Using multi-label mode option.

  1. For Annotation location on S3, enter the path of the annotations CSV file.
  2. For Training data location on S3, enter the Amazon S3 location where your documents reside.
  3. Leave all other options as default in this section.
  4. In the Output data section, specify an Amazon S3 location for your output.

This is optional, but it’s a good practice to provide an output location because Amazon Comprehend will generate the post-model training evaluation metrics in this location. This data is useful to evaluate model performance, iterate, and improve the accuracy of your model.

  1. In the IAM role section, choose an appropriate AWS Identity and Access Management (IAM) role that allows Amazon Comprehend to access the Amazon S3 location and write and read from it.
  2. Choose Create to initiate the model training.

The model may take several minutes to train, depending on the number of classes and the dataset size. You can review the training status on the Custom classification page. The training process will display a Submitted status right after the training process starts and will change to Training status when the training process begins. After your model is trained, the Version status will change to Trained. If Amazon Comprehend finds inconsistencies in your training data, the status will show In error along with an alert that shows the appropriate error message so that you can take corrective action and restart the training process with the corrected data.

In this post, we demonstrated the steps to train a custom classifier model using the Amazon Comprehend console. You can also use the AWS SDK in any language (for example, Boto3 for Python) or the AWS Command Line Interface (AWS CLI) to initiate a custom classification model training. With either the SDK or AWS CLI, you can use the CreateDocumentClassifier API to initiate the model training, and subsequently use the DescribeDocumentClassifier API to check the status of the model.

After the model is trained, you can perform either real-time analysis or asynchronous (batch) analysis jobs on new documents. To perform real-time classification on documents, you must deploy an Amazon Comprehend real-time endpoint with the trained custom classification model. Real-time endpoints are best suited for use cases that require low-latency, real-time inference results, whereas for classifying a large set of documents, an asynchronous analysis job is more appropriate. To learn how you can perform asynchronous inference on new documents using a trained classification model, refer to Introducing one-step classification and entity recognition with Amazon Comprehend for intelligent document processing.

Benefits of the layout-aware custom classification model

The new classifier model offers a number of improvements. It’s not only easier to train the new model, but you can also train a new model with just a few samples for each class. Additionally, you no longer have to extract serialized plain text out of scanned or digital documents such as images or PDFs to prepare the training dataset. The following are some additional noteworthy improvements that you can expect from the new classification model:

  • Improved accuracy – The model now takes into account the layout and structure of documents, which leads to a better understanding of the structure and content of the documents. This helps distinguish between documents with similar text but different layouts or structures, resulting in increased classification accuracy.
  • Robustness – The model now handles variations in document structure and formatting. This makes it better suited for classifying documents from different sources with varying layouts or formatting styles, which is a common challenge in real-world document classification tasks. It’s compatible with several document types natively, making it versatile and applicable to different industries and use cases.
  • Reduced manual intervention – Higher accuracy leads to less manual intervention in the classification process. This can save time and resources, and increase operational efficiency in your document processing workload.

Conclusion

The new Amazon Comprehend document classification model, which incorporates layout awareness, is a game-changer for businesses dealing with large volumes of documents. By understanding the structure and layout of documents, this model offers improved classification accuracy and efficiency. Implementing a robust and accurate document classification solution using a layout-aware model can help your business save time, reduce operational costs, and enhance decision-making processes.

As a next step, we encourage you to try the new Amazon Comprehend custom classification model via the Amazon Comprehend console. We also recommend revisiting our custom classification model improvement announcements from last year and visit the GitHub repository for code samples.


About the authors

Anjan Biswas is a Senior AI Services Solutions Architect with a focus on AI/ML and Data Analytics. Anjan is part of the world-wide AI services team and works with customers to help them understand and develop solutions to business problems with AI and ML. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations, and is actively helping customers get started and scale on AWS AI services.

Godwin Sahayaraj Vincent is an Enterprise Solutions Architect at AWS who is passionate about Machine Learning and providing guidance to customers  to design, deploy and manage their AWS workloads and architectures. In his spare time, he loves to play cricket with his friends and tennis with his three kids.

Wrick Talukdar is a Senior Architect with the Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.

Read More

Responsible AI at Google Research: Technology, AI, Society and Culture

Responsible AI at Google Research: Technology, AI, Society and Culture

Google sees AI as a foundational and transformational technology, with recent advances in generative AI technologies, such as LaMDA, PaLM, Imagen, Parti, MusicLM, and similar machine learning (ML) models, some of which are now being incorporated into our products. This transformative potential requires us to be responsible not only in how we advance our technology, but also in how we envision which technologies to build, and how we assess the social impact AI and ML-enabled technologies have on the world. This endeavor necessitates fundamental and applied research with an interdisciplinary lens that engages with — and accounts for — the social, cultural, economic, and other contextual dimensions that shape the development and deployment of AI systems. We must also understand the range of possible impacts that ongoing use of such technologies may have on vulnerable communities and broader social systems.

Our team, Technology, AI, Society, and Culture (TASC), is addressing this critical need. Research on the societal impacts of AI is complex and multi-faceted; no one disciplinary or methodological perspective can alone provide the diverse insights needed to grapple with the social and cultural implications of ML technologies. TASC thus leverages the strengths of an interdisciplinary team, with backgrounds ranging from computer science to social science, digital media and urban science. We use a multi-method approach with qualitative, quantitative, and mixed methods to critically examine and shape the social and technical processes that underpin and surround AI technologies. We focus on participatory, culturally-inclusive, and intersectional equity-oriented research that brings to the foreground impacted communities. Our work advances Responsible AI (RAI) in areas such as computer vision, natural language processing, health, and general purpose ML models and applications. Below, we share examples of our approach to Responsible AI and where we are headed in 2023.

A visual diagram of the various social, technical, and equity-oriented research areas that TASC studies to progress Responsible AI in a way that respects the complex relationships between AI and society.

Theme 1: Culture, communities, & AI

One of our key areas of research is the advancement of methods to make generative AI technologies more inclusive of and valuable to people globally, through community-engaged, and culturally-inclusive approaches. Toward this aim, we see communities as experts in their context, recognizing their deep knowledge of how technologies can and should impact their own lives. Our research champions the importance of embedding cross-cultural considerations throughout the ML development pipeline. Community engagement enables us to shift how we incorporate knowledge of what’s most important throughout this pipeline, from dataset curation to evaluation. This also enables us to understand and account for the ways in which technologies fail and how specific communities might experience harm. Based on this understanding we have created responsible AI evaluation strategies that are effective in recognizing and mitigating biases along multiple dimensions.

Our work in this area is vital to ensuring that Google’s technologies are safe for, work for, and are useful to a diverse set of stakeholders around the world. For example, our research on user attitudes towards AI, responsible interaction design, and fairness evaluations with a focus on the global south demonstrated the cross-cultural differences in the impact of AI and contributed resources that enable culturally-situated evaluations. We are also building cross-disciplinary research communities to examine the relationship between AI, culture, and society, through our recent and upcoming workshops on Cultures in AI/AI in Culture, Ethical Considerations in Creative Applications of Computer Vision, and Cross-Cultural Considerations in NLP.

Our recent research has also sought out perspectives of particular communities who are known to be less represented in ML development and applications. For example, we have investigated gender bias, both in natural language and in contexts such as gender-inclusive health, drawing on our research to develop more accurate evaluations of bias so that anyone developing these technologies can identify and mitigate harms for people with queer and non-binary identities.

Theme 2: Enabling Responsible AI throughout the development lifecycle

We work to enable RAI at scale, by establishing industry-wide best practices for RAI across the development pipeline, and ensuring our technologies verifiably incorporate that best practice by default. This applied research includes responsible data production and analysis for ML development, and systematically advancing tools and practices that support practitioners in meeting key RAI goals like transparency, fairness, and accountability. Extending earlier work on Data Cards, Model Cards and the Model Card Toolkit, we released the Data Cards Playbook, providing developers with methods and tools to document appropriate uses and essential facts related to a dataset. Because ML models are often trained and evaluated on human-annotated data, we also advance human-centric research on data annotation. We have developed frameworks to document annotation processes and methods to account for rater disagreement and rater diversity. These methods enable ML practitioners to better ensure diversity in annotation of datasets used to train models, by identifying current barriers and re-envisioning data work practices.

Future directions

We are now working to further broaden participation in ML model development, through approaches that embed a diversity of cultural contexts and voices into technology design, development, and impact assessment to ensure that AI achieves societal goals. We are also redefining responsible practices that can handle the scale at which ML technologies operate in today’s world. For example, we are developing frameworks and structures that can enable community engagement within industry AI research and development, including community-centered evaluation frameworks, benchmarks, and dataset curation and sharing.

In particular, we are furthering our prior work on understanding how NLP language models may perpetuate bias against people with disabilities, extending this research to address other marginalized communities and cultures and including image, video, and other multimodal models. Such models may contain tropes and stereotypes about particular groups or may erase the experiences of specific individuals or communities. Our efforts to identify sources of bias within ML models will lead to better detection of these representational harms and will support the creation of more fair and inclusive systems.

TASC is about studying all the touchpoints between AI and people — from individuals and communities, to cultures and society. For AI to be culturally-inclusive, equitable, accessible, and reflective of the needs of impacted communities, we must take on these challenges with inter- and multidisciplinary research that centers the needs of impacted communities. Our research studies will continue to explore the interactions between society and AI, furthering the discovery of new ways to develop and evaluate AI in order for us to develop more robust and culturally-situated AI technologies.

Acknowledgements

We would like to thank everyone on the team that contributed to this blog post. In alphabetical order by last name: Cynthia Bennett, Eric Corbett, Aida Mostafazadeh Davani, Emily Denton, Sunipa Dev, Fernando Diaz, Mark Díaz, Shaun Kane, Shivani Kapania, Michael Madaio, Vinodkumar Prabhakaran, Rida Qadri, Renee Shelby, Ding Wang, and Andrew Zaldivar. Also, we would like to thank Toju Duke and Marian Croak for their valuable feedback and suggestions.

Read More

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Use streaming ingestion with Amazon SageMaker Feature Store and Amazon MSK to make ML-backed decisions in near-real time

Businesses are increasingly using machine learning (ML) to make near-real-time decisions, such as placing an ad, assigning a driver, recommending a product, or even dynamically pricing products and services. ML models make predictions given a set of input data known as features, and data scientists easily spend more than 60% of their time designing and building these features. Furthermore, highly accurate predictions depend on timely access to feature values that change quickly over time, adding even more complexity to the job of building a highly available and accurate solution. For example, a model for a ride-sharing app can choose the best price for a ri­de from the airport, but only if it knows the number of ride requests received in the past 10 minutes and the number of passengers projected to land in the next 10 minutes. A routing model in a call center app can pick the best available agent for an incoming call, but it’s only effective if it knows the customer’s latest web session clicks.

Although the business value of near-real-time ML predictions is enormous, the architecture required to deliver them reliably, securely, and with good performance is complicated. Solutions need high-throughput updates and low-latency retrieval of the most recent feature values in milliseconds, something most data scientists aren’t prepared to deliver. As a result, some enterprises have spent millions of dollars inventing their own proprietary infrastructure for feature management. Other firms have limited their ML applications to simpler patterns like batch scoring until ML vendors provide more comprehensive off-the-shelf solutions for online feature stores.

To address these challenges, Amazon SageMaker Feature Store provides a fully managed central repository for ML features, making it easy to securely store and retrieve features without having to build and maintain your own infrastructure. Feature Store lets you define groups of features, use batch ingestion and streaming ingestion, retrieve the latest feature values with single-digit millisecond latency for highly accurate online predictions, and extract point-in-time correct datasets for training. Instead of building and maintaining these infrastructure capabilities, you get a fully managed service that scales as your data grows, enables sharing features across teams, and lets your data scientists focus on building great ML models aimed at game-changing business use cases. Teams can now deliver robust features once and reuse them many times in a variety of models that may be built by different teams.

This post walks through a complete example of how you can couple streaming feature engineering with Feature Store to make ML-backed decisions in near-real time. We show a credit card fraud detection use case that updates aggregate features from a live stream of transactions and uses low-latency feature retrievals to help detect fraudulent transactions. Try it out for yourself by visiting our GitHub repo.

Credit card fraud use case

Stolen credit card numbers can be bought in bulk on the dark web from previous leaks or hacks of organizations that store this sensitive data. Fraudsters buy these card lists and attempt to make as many transactions as possible with the stolen numbers until the card is blocked. These fraud attacks typically happen in a short time frame, and this can be easily spotted in historical transactions because the velocity of transactions during the attack differs significantly from the cardholder’s usual spending pattern.

The following table shows a sequence of transactions from one credit card where the cardholder first has a genuine spending pattern, and then experiences a fraud attack starting on November 4.

cc_num trans_time amount fraud_label
…1248 Nov-01 14:50:01 10.15 0
… 1248 Nov-02 12:14:31 32.45 0
… 1248 Nov-02 16:23:12 3.12 0
… 1248 Nov-04 02:12:10 1.01 1
… 1248 Nov-04 02:13:34 22.55 1
… 1248 Nov-04 02:14:05 90.55 1
… 1248 Nov-04 02:15:10 60.75 1
… 1248 Nov-04 13:30:55 12.75 0

For this post, we train an ML model to spot this kind of behavior by engineering features that describe an individual card’s spending pattern, such as the number of transactions or the average transaction amount from that card in a certain time window. This model protects cardholders from fraud at the point of sale by detecting and blocking suspicious transactions before the payment can complete. The model makes predictions in a low-latency, real-time context and relies on receiving up-to-the-minute feature calculations so it can respond to an ongoing fraud attack. In a real-world scenario, features related to cardholder spending patterns would only form part of the model’s feature set, and we can include information about the merchant, the cardholder, the device used to make the payment, and any other data that may be relevant to detecting fraud.

Because our use case relies on profiling an individual card’s spending patterns, it’s crucial that we can identify credit cards in a transaction stream. Most publicly available fraud detection datasets don’t provide this information, so we use the Python Faker library to generate a set of transactions covering a 5-month period. This dataset contains 5.4 million transactions spread across 10,000 unique (and fake) credit card numbers, and is intentionally imbalanced to match the reality of credit card fraud (only 0.25% of the transactions are fraudulent). We vary the number of transactions per day per card, as well as the transaction amounts. See our GitHub repo for more details.

Overview of the solution

We want our fraud detection model to classify credit card transactions by noticing a burst of recent transactions that differs significantly from the cardholder’s usual spending pattern. Sounds simple enough, but how do we build it?

The following diagram shows our overall solution architecture. We feel that this same pattern will work well for a variety of streaming aggregation use cases. At a high level, the pattern involves the following five pieces:

  1. Feature store – We use Feature Store to provide a repository of features with high-throughput writes and secure low-latency reads, using feature values that are organized into multiple feature groups.
  2. Batch ingestion – Batch ingestion takes labeled historical credit card transactions and creates the aggregate features and ratios needed for training the fraud detection model. We use an Amazon SageMaker Processing job and the built-in Spark container to calculate aggregate weekly counts and transaction amount averages and ingest them into the feature store for use in online inference.
  3. Model training and deployment – This aspect of our solution is straightforward. We use Amazon SageMaker to train a model using the built-in XGBoost algorithm on aggregated features created from historical transactions. The model is deployed to a SageMaker endpoint, where it handles fraud detection requests on live transactions.
  4. Streaming ingestion – An Amazon Kinesis Data Analytics for Apache Flink application backed by Apache Kafka topics in Amazon Managed Streaming for Apache Kafka (MSK) (Amazon MSK) calculates aggregated features from a transaction stream, and an AWS Lambda function updates the online feature store. Apache Flink is a popular framework and engine for processing data streams.
  5. Streaming predictions – Lastly, we make fraud predictions on a stream of transactions, using Lambda to pull aggregate features from the online feature store. We use the latest feature data to calculate transaction ratios and then call the fraud detection endpoint.

Prerequisites

We provide an AWS CloudFormation template to create the prerequisite resources for this solution. The following table lists the stacks available for different Regions.

AWS Region Link
us-east-1
us-east-2
us-west-1
eu-west-1
ap-northeast-1

In the following sections, we explore each component of our solution in more detail.

Feature store

ML models rely on well-engineered features coming from a variety of data sources, with transformations as simple as calculations or as complicated as a multi-step pipeline that takes hours of compute time and complex coding. Feature Store enables the reuse of these features across teams and models, which improves data scientist productivity, speeds up time to market, and ensures consistency of model input.

Each feature inside Feature Store is organized into a logical grouping called a feature group. You decide which feature groups you need for your models. Each one can have dozens, hundreds, or even thousands of features. Feature groups are managed and scaled independently, but they’re all available for search and discovery across teams of data scientists responsible for many independent ML models and use cases.

ML models often require features from multiple feature groups. A key aspect of a feature group is how often its feature values need to be updated or materialized for downstream training or inference. You refresh some features hourly, nightly, or weekly, and a subset of features must be streamed to the feature store in near-real time. Streaming all feature updates would lead to unnecessary complexity, and could even lower the quality of data distributions by not giving you the chance to remove outliers.

In our use case, we create a feature group called cc-agg-batch-fg for aggregated credit card features updated in batch, and one called cc-agg-fg for streaming features.

The cc-agg-batch-fg feature group is updated nightly, and provides aggregate features looking back over a 1-week time window. Recalculating 1-week aggregations on streaming transactions don’t offer meaningful signals, and would be a waste of resources.

Conversely, our cc-agg-fg feature group must be updated in a streaming fashion, because it offers the latest transaction counts and average transaction amounts looking back over a 10-minute time window. Without streaming aggregation, we couldn’t spot the typical fraud attack pattern of a rapid sequence of purchases.

By isolating features that are recalculated nightly, we can improve ingestion throughput for our streaming features. Separation lets us optimize the ingestion for each group independently. When designing for your use cases, keep in mind that models requiring features from a large number of feature groups may want to make multiple retrievals from the feature store in parallel to avoid adding excessive latency to a real-time prediction workflow.

The feature groups for our use case are shown in the following table.

cc-agg-fg cc-agg-batch-fg
cc_num (record id) cc_num (record id)
trans_time trans_time
num_trans_last_10m num_trans_last_1w
avg_amt_last_10m avg_amt_last_1w

Each feature group must have one feature used as a record identifier (for this post, the credit card number). The record identifier acts as a primary key for the feature group, enabling fast lookups as well as joins across feature groups. An event time feature is also required, which enables the feature store to track the history of feature values over time. This becomes important when looking back at the state of features at a specific point in time.

In each feature group, we track the number of transactions per unique credit card and its average transaction amount. The only difference between our two groups is the time window used for aggregation. We use a 10-minute window for streaming aggregation, and a 1-week window for batch aggregation.

With Feature Store, you have the flexibility to create feature groups that are offline only, online only, or both online and offline. An online store provides high-throughput writes and low-latency retrievals of feature values, which is ideal for online inference. An offline store is provided using Amazon Simple Storage Service (Amazon S3), giving firms a highly scalable repository, with a full history of feature values, partitioned by feature group. The offline store is ideal for training and batch scoring use cases.

When you enable a feature group to provide both online and offline stores, SageMaker automatically synchronizes feature values to an offline store, continuously appending the latest values to give you a full history of values over time. Another benefit of feature groups that are both online and offline is that they help avoid the problem of training and inference skew. SageMaker lets you feed both training and inference with the same transformed feature values, ensuring consistency to drive more accurate predictions. The focus in our post is to demonstrate online feature streaming, so we implemented online-only feature groups.

Batch ingestion

To materialize our batch features, we create a feature pipeline that runs as a SageMaker Processing job on a nightly basis. The job has two responsibilities: producing the dataset for training our model, and populating the batch feature group with the most up-to-date values for aggregate 1-week features, as shown in the following diagram.

Each historical transaction used in the training set is enriched with aggregated features for the specific credit card involved in the transaction. We look back over two separate sliding time windows: 1 week back, and the preceding 10 minutes. The actual features used to train the model include the following ratios of these aggregated values:

  • amt_ratio1 =avg_amt_last_10m / avg_amt_last_1w
  • amt_ratio2 =transaction_amount / avg_amt_last_1w
  • count_ratio =num_trans_last_10m / num_trans_last_1w

For example, count_ratio is the transaction count from the prior 10 minutes divided by the transaction count from the last week.

Our ML model can learn patterns of normal activity vs. fraudulent activity from these ratios, rather than relying on raw counts and transaction amounts. Spending patterns on different cards vary greatly, so normalized ratios provide a better signal to the model than the aggregated amounts themselves.

You may be wondering why our batch job is computing features with a 10-minute lookback. Isn’t that only relevant for online inference? We need the 10-minute window on historical transactions to create an accurate training dataset. This is critical for ensuring consistency with the 10-minute streaming window that will be used in near-real time to support online inference.

The resulting training dataset from the processing job can be saved directly as a CSV for model training, or it can be bulk ingested into an offline feature group that can be used for other models and by other data science teams to address a wide variety of other use cases. For example, we can create and populate a feature group called cc-transactions-fg. Our training job can then pull a specific training dataset based on the needs for our specific model, selecting specific date ranges and a subset of features of interest. This approach enables multiple teams to reuse feature groups and maintain fewer feature pipelines, leading to significant cost savings and productivity improvements over time. This example notebook demonstrates the pattern of using Feature Store as a central repository from which data scientists can extract training datasets.

In addition to creating a training dataset, we use the PutRecord API to put the 1-week feature aggregations into the online feature store nightly. The following code demonstrates putting a record into an online feature group given specific feature values, including a record identifier and an event time:

record = [{'FeatureName': 'cc_num', 
              'ValueAsString': str(cc_num)},
             {'FeatureName':'avg_amt_last_1w', 
              'ValueAsString': str(avg_amt_last_1w)},
             {'FeatureName':'num_trans_last_1w', 
              'ValueAsString': str(num_trans_last_1w)}]
event_time_feature = {
                 'FeatureName': 'trans_time',
                 'ValueAsString': str(int(round(time.time())))}
record.append(event_time_feature)
response = feature_store_client.put_record(
    FeatureGroupName=’cc-agg-batch-fg’, Record=record)

ML engineers often build a separate version of feature engineering code for online features based on the original code written by data scientists for model training. This can deliver the desired performance, but is an extra development step, and introduces more chance for training and inference skew. In our use case, we show how using SQL for aggregations can enable a data scientist to provide the same code for both batch and streaming.

Streaming ingestion

Feature Store delivers single-digit millisecond retrieval of pre-calculated features, and it can also play an effective role in solutions requiring streaming ingestion. Our use case demonstrates both. Weekly lookback is handled as a pre-calculated feature group, materialized nightly as shown earlier. Now let’s dive into how we calculate features aggregated on the fly over a 10-minute window and ingest them into the feature store for later online inference.

In our use case, we ingest live credit card transactions to a source MSK topic, and use a Kinesis Data Analytics for Apache Flink application to create aggregate features in a destination MSK topic. The application is written using Apache Flink SQL. Flink SQL makes it simple to develop streaming applications using standard SQL. It’s easy to learn Flink if you have ever worked with a database or SQL-like system by remaining ANSI-SQL 2011 compliant. Apart from SQL, we can build Java and Scala applications in Amazon Kinesis Data Analytics using open-source libraries based on Apache Flink. We then use a Lambda function to read the destination MSK topic and ingest the aggregate features into a SageMaker feature group for inference. Creating the Apache Flink application using Flink’s SQL API is straightforward. We use Flink SQL to aggregate the streaming data in the source MSK topic and store it in a destination MSK topic.

To produce aggregate counts and average amounts looking back over a 10-minute window, we use the following Flink SQL query on the input topic and pipe the results to the destination topic:

SELECT 
 cc_num, 
 COUNT(*) OVER LAST_10_MINUTES as cc_count,
 AVG(amount) OVER LAST_10_MINUTES as avg_amount
FROM 
 cctopic
WINDOW LAST_10_MINUTES AS (
 PARTITION BY cc_num
 ORDER BY proc_ts
 RANGE INTERVAL '10' MINUTE PRECEDING
 );

cc_num amount datetime num_trans_last_10m avg_amt_last_10m
…1248 50.00 Nov-01,22:01:00 1 74.99
…9843 99.50 Nov-01,22:02:30 1 99.50
…7403 100.00 Nov-01,22:03:48 1 100.00
…1248 200.00 Nov-01,22:03:59 2 125.00
…0732 26.99 Nov01, 22:04:15 1 26.99
…1248 50.00 Nov-01,22:04:28 3 100.00
…1248 500.00 Nov-01,22:05:05 4 200.00

In this example, notice that the final row has a count of four transactions in the last 10 minutes from the credit card ending with 1248, and a corresponding average transaction amount of $200.00. The SQL query is consistent with the one used to drive creation of our training dataset, helping to avoid training and inference skew.

As transactions stream into the Kinesis Data Analytics for Apache Flink aggregation app, the app sends the aggregate results to our Lambda function, as shown in the following diagram. The Lambda function takes these features and populates the cc-agg-fg feature group.

We send the latest feature values to the feature store from Lambda using a simple call to the PutRecord API. The following is the core piece of Python code for storing the aggregate features:

record = [{'FeatureName': 'cc_num', 
           'ValueAsString': str(cc_num)},
          {'FeatureName':'avg_amt_last_10m', 
           'ValueAsString': str(avg_amt_last_10m)},
          {'FeatureName':'num_trans_last_10m', 
           'ValueAsString': str(num_trans_last_10m)},
          {'FeatureName': 'evt_time', 
           'ValueAsString': str(int(round(time.time())))}]
featurestore_runtime.put_record(FeatureGroupName='cc-agg-fg', 
                                Record=record)

We prepare the record as a list of named value pairs, including the current time as the event time. The Feature Store API ensures that this new record follows the schema that we identified when we created the feature group. If a record for this primary key already existed, it is now overwritten in the online store.

Streaming predictions

Now that we have streaming ingestion keeping the feature store up to date with the latest feature values, let’s look at how we make fraud predictions.

We create a second Lambda function that uses the source MSK topic as a trigger. For each new transaction event, the Lambda function first retrieves the batch and streaming features from Feature Store. To detect anomalies in credit card behavior, our model looks for spikes in recent purchase amounts or purchase frequency. The Lambda function computes simple ratios between the 1-week aggregations and the 10-minute aggregations. It then invokes the SageMaker model endpoint using those ratios to make the fraud prediction, as shown in the following diagram.

We use the following code to retrieve feature values on demand from the feature store before calling the SageMaker model endpoint:

featurestore_runtime =  
        boto3.client(service_name='sagemaker-featurestore-runtime')
response = featurestore_runtime.get_record(
		FeatureGroupName=feature_group_name, 
        RecordIdentifierValueAsString=record_identifier_value)

SageMaker also supports retrieving multiple feature records with a single call, even if they are from different feature groups.

Finally, with the model input feature vector assembled, we call the model endpoint to predict if a specific credit card transaction is fraudulent. SageMaker also supports retrieving multiple feature records with a single call, even if they are from different feature groups.

sagemaker_runtime =  
    boto3.client(service_name='runtime.sagemaker')
request_body = ','.join(features)
response = sagemaker_runtime.invoke_endpoint(
    EndpointName=ENDPOINT_NAME,
    ContentType='text/csv',
    Body=request_body)
probability = json.loads(response['Body'].read().decode('utf-8'))

In this example, the model came back with a probability of 98% that the specific transaction was fraudulent, and it was able to use near-real-time aggregated input features based on the most recent 10 minutes of transactions on that credit card.

Test the end-to-end solution

To demonstrate the full end-to-end workflow of our solution, we simply send credit card transactions into our MSK source topic. Our automated Kinesis Data Analytics for Apache Flink aggregation takes over from there, maintaining a near-real-time view of transaction counts and amounts in Feature Store, with a sliding 10-minute lookback window. These features are combined with the 1-week aggregate features that were already ingested to the feature store in batch, letting us make fraud predictions on each transaction.

We send a single transaction from three different credit cards. We then simulate a fraud attack on a fourth credit card by sending many back-to-back transactions in seconds. The output from our Lambda function is shown in the following screenshot. As expected, the first three one-off transactions are predicted as NOT FRAUD. Of the 10 fraudulent transactions, the first is predicted as NOT FRAUD, and the rest are all correctly identified as FRAUD. Notice how the aggregate features are kept current, helping drive more accurate predictions.

Conclusion

We have shown how Feature Store can play a key role in the solution architecture for critical operational workflows that need streaming aggregation and low-latency inference. With an enterprise-ready feature store in place, you can use both batch ingestion and streaming ingestion to feed feature groups, and access feature values on demand to perform online predictions for significant business value. ML features can now be shared at scale across many teams of data scientists and thousands of ML models, improving data consistency, model accuracy, and data scientist productivity. Feature Store is available now, and you can try out this entire example. Let us know what you think.

Special thanks to everyone who contributed to the previous blog post with a similar architecture: Paul Hargis, James Leoni and Arunprasath Shankar.


About the Authors

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in feature stores, computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Raj Ramasubbu is a Senior Analytics Specialist Solutions Architect focused on big data and analytics and AI/ML with Amazon Web Services. He helps customers architect and build highly scalable, performant, and secure cloud-based solutions on AWS. Raj provided technical expertise and leadership in building data engineering, big data analytics, business intelligence, and data science solutions for over 18 years prior to joining AWS. He helped customers in various industry verticals like healthcare, medical devices, life science, retail, asset management, car insurance, residential REIT, agriculture, title insurance, supply chain, document management, and real estate.

Prabhakar Chandrasekaran is a Senior Technical Account Manager with AWS Enterprise Support. Prabhakar enjoys helping customers build cutting-edge AI/ML solutions on the cloud. He also works with enterprise customers providing proactive guidance and operational assistance, helping them improve the value of their solutions when using AWS. Prabhakar holds six AWS and six other professional certifications. With over 20 years of professional experience, Prabhakar was a data engineer and a program leader in the financial services space prior to joining AWS.

Read More

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

This is a guest post co-written with Fred Wu from Sportradar.

Sportradar is the world’s leading sports technology company, at the intersection between sports, media, and betting. More than 1,700 sports federations, media outlets, betting operators, and consumer platforms across 120 countries rely on Sportradar knowhow and technology to boost their business.

Sportradar uses data and technology to:

  • Keep betting operators ahead of the curve with the products and services they need to manage their sportsbook
  • Give media companies the tools to engage more with fans
  • Give teams, leagues, and federations the data they need to thrive
  • Keep the industry clean by detecting and preventing fraud, doping, and match fixing

This post demonstrates how Sportradar used Amazon’s Deep Java Library (DJL) on AWS alongside Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Simple Storage Service (Amazon S3) to build a production-ready machine learning (ML) inference solution that preserves essential tooling in Java, optimizes operational efficiency, and increases the team’s productivity by providing better performance and accessibility to logs and system metrics.

The DJL is a deep learning framework built from the ground up to support users of Java and JVM languages like Scala, Kotlin, and Clojure. Right now, most deep learning frameworks are built for Python, but this neglects the large number of Java developers and developers who have existing Java code bases they want to integrate the increasingly powerful capabilities of deep learning into. With the DJL, integrating this deep learning is simple.

In this post, the Sportradar team discusses the challenges they encountered and the solutions they created to build their model inference platform using the DJL.

Business requirements

We are the US squad of the Sportradar AI department. Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football. We recently developed four more new models.

The fourth down decision models for the NFL and NCAA predict the probabilities of the outcome of a fourth down play. A play outcome could be a field goal attempt, play, or punt.

The drive outcome models for the NFL and NCAA predict the probabilities of the outcome of the current drive. A drive outcome could be an end of half, field goal attempt, touchdown, turnover, turnover on downs, or punt.

Our models are the building blocks of other models where we generate a list of live betting markets, include spread, total, win probability, next score type, next team to score, and more.

The business requirements for our models are as follows:

  • The model predictor should be able to load the pre-trained model file one time, then make predictions on many plays
  • We have to generate the probabilities for each play under 50-milisecond latency
  • The model predictor (feature extraction and model inference) has to be written in Java, so that the other team can import it as a Maven dependency

Challenges with the in-place system

The main challenge we have is how to bridge the gap between model training in Python and model inference in Java. Our data scientists train the model in Python using tools like PyTorch and save the model as PyTorch scripts. Our original plan was to also host the models in Python and utilize gRPC to communicate with another service, which will use the Java gRPC client to send the request.

However, a few issues came with this solution. Mainly, we saw the network overhead between two different services running in separate run environments or pods, which resulted in higher latency. But the maintenance overhead was the main reason we abandoned this solution. We had to build both the gRPC server and the client program separately and keep the protocol buffer files consistent and up to date. Then we needed to Dockerize the application, write a deployment YAML file, deploy the gRPC server to our Kubernetes cluster, and make sure it’s reliable and auto scalable.

Another problem was whenever an error occurred on the gRPC server side, the application client only got a vague error message instead of a detailed error traceback. The client had to reach out to the gRPC server maintainer to learn exactly which part of the code caused the error.

Ideally, we instead want to load the model PyTorch scripts, extract the features from model input, and run model inference entirely in Java. Then we can build and publish it as a Maven library, hosted on our internal registry, which our service team could import into their own Java projects. When we did our research online, the Deep Java Library showed up on the top. After reading a few blog posts and DJL’s official documentation, we were sure DJL would provide the best solution to our problem.

Solution overview

The following diagram compares the previous and updated architecture.

The following diagram outlines the workflow of the DJL solution.

workflow

The steps are as follows:

  1. Training the models – Our data scientists train the models using PyTorch and save the models as torch scripts. These models are then pushed to an Amazon Simple Storage Service (Amazon S3) bucket using DVC, a version control tool for ML models.
  2. Implementing feature extraction and feeding ML features – The framework team pulls the models from Amazon S3 into a Java repository where they implement feature extraction and feed ML features into the predictor. They use the DJL PyTorch engine to initialize the model predictor.
  3. Packaging and publishing the inference code and models – The GitLab CI/CD pipeline packages and publishes the JAR file that contains the inference code and models to an internal Apache Archiva registry.
  4. Importing the inference library and making calls – The Java client imports the inference library as a Maven dependency. All inference calls are made via Java function calls within the same Kubernetes pod. Because there are no gRPC calls, the inferencing response time is improved. Furthermore, the Java client can easily roll back the inference library to a previous version if needed. In contrast, the server-side error is not transparent for the client side in gRPC-based solutions, making error tracking difficult.

We have seen a stable inferencing runtime and reliable prediction results. The DJL solution offers several advantages over gRPC-based solutions:

  • Improved response time – With no gRPC calls, the inferencing response time is improved
  • Easy rollbacks and upgrades – The Java client can easily roll back the inference library to a previous version or upgrade to a new version
  • Transparent error tracking – In the DJL solution, the client can receive detailed error trackback messages in case of inferencing errors

Deep Java Library overview

The DJL is a full deep learning framework that supports the deep learning lifecycle from building a model, training it on a dataset, to deploying it in production. It has intuitive helpers and utilities for modalities like computer vision, natural language processing, audio, time series, and tabular data. DJL also features an easy model zoo of hundreds of pre-trained models that can be used out of the box and integrated into existing systems.

It is also a fully Apache-2 licensed open-source project and can be found on GitHub. The DJL was created at Amazon and open-sourced in 2019. Today, DJL’s open-source community is led by Amazon and has grown to include many countries, companies, and educational institutions. The DJL continues to grow in its ability to support different hardware, models, and engines. It also includes support for new hardware like ARM (both in servers like AWS Graviton and laptops with Apple M1) and AWS Inferentia.

The architecture of DJL is engine agnostic. It aims to be an interface describing what deep learning could look like in the Java language, but leaves room for multiple different implementations that could provide different capabilities or hardware support. Most popular frameworks today such as PyTorch and TensorFlow are built using a Python front end that connects to a high-performance C++ native backend. The DJL can use this to connect to these same native backends to take advantage of their work on hardware support and performance.

For this reason, many DJL users also use it for inference only. That is, they will train a model using Python and then load it using the DJL for deployment as part of their existing Java production system. Because the DJL utilizes the same engine that powers Python, it’s able to run without any decrease in performance or loss in accuracy. This is exactly the strategy that we found to support the new models.

The following diagram illustrates the workflow under the hood.

djl

When the DJL loads, it finds all the engine implementations available in the class path using Java’s ServiceLoader. In this case, it detects the DJL PyTorch engine implementation, which will act as the bridge between the DJL API and the PyTorch Native.

The engine then works to load the PyTorch Native. By default, it downloads the appropriate native binary based on your OS, CPU architecture, and CUDA version, making it almost effortless to use. You can also provide the binary using one of the many available native JAR files, which are more reliable for production environments that often have limited network access for security.

Once loaded, the DJL uses the Java Native Interface to translate all the easy high-level functionalities in DJL into the equivalent low-level native calls. Every operation in the DJL API is hand-crafted to best fit the Java conventions and make it easily accessible. This also includes dealing with native memory, which is not supported by the Java Garbage Collector.

Although all these details are within the library, calling it from a user standpoint couldn’t be easier. In the following section, we walk through this process.

How Sportradar implemented DJL

Because we train our models using PyTorch, we use the DJL’s PyTorch engine for the model inference.

Loading the model is incredibly easy. All it takes is to build a criteria describing the model to load and where it is from. Then, we load it and use the model to create a new predictor session. See the following code:

crite

For our model, we also have a custom translator, which we call MyTranslator. We use the translator to encapsulate the preprocessing code that converts from a convenient Java type into the input expected by the model and the postprocessing code that converts from the model output into a convenient output. In our case, we chose to use a float[] as the input type and the built-in DJL classifications as the output type. The following is a snippet of our translator code:

It’s pretty amazing that with just a few lines of code, the DJL loads the PyTorch scripts and our custom translator, and then the predictor is ready to make the predictions.

Conclusion

Sportradar’s product built on the DJL solution went live before the 2022–23 NFL regular season started, and it has been running smoothly since then. In the future, Sportradar plans to re-platform existing models hosted on gRPC servers to the DJL solution.

The DJL continues to grow in many different ways. The most recent release, v0.21.0, has many improvements, including updated engine support, improvements on Spark, Hugging Face batch tokenizers, an NDScope for easier memory management, and enhancements to the time series API. It also has the first major release of DJL Zero, a new API aiming to allow support for both using pre-trained models and training your own custom deep learning models even with zero knowledge of deep learning.

The DJL also features a model server called DJL Serving. It makes it simple to host a model on an HTTP server from any of the 10 supported engines, including the Python engine to support Python code. With v0.21.0 of DJL Serving, it includes faster transformer support, Amazon SageMaker multi-model endpoint support, updates for Stable Diffusion, improvements for DeepSpeed, and updates to the management console. You can now use it to deploy large models with model parallel inference using DeepSpeed and SageMaker.

There is also much upcoming with the DJL. The largest area under development is large language model support for models like ChatGPT or Stable Diffusion. There is also work to support streaming inference requests in DJL Serving. Thirdly, there are improvements to demos and the extension for Spark. Of course, there is also standard continuing work including features, fixes, engine updates, and more.

For more information on the DJL and its other features, see Deep Java Library.

Follow our GitHub repo, demo repository, Slack channel, and Twitter for more documentation and examples of the DJL!


About the authors

Fred Wu is a Senior Data Engineer at Sportradar, where he leads infrastructure, DevOps, and data engineering efforts for various NBA and NFL products. With extensive experience in the field, Fred is dedicated to building robust and efficient data pipelines and systems to support cutting-edge sports analytics.

Zach Kimberg is a Software Developer in the Amazon AI org. He works to enable the development, training, and production inference of deep learning. There, he helped found and continues to develop the DeepJavaLibrary project.

Kanwaljit Khurmi is a Principal Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Read More