HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

HAYAT HOLDING uses Amazon SageMaker to increase product quality and optimize manufacturing output, saving $300,000 annually

This is a guest post by Neslihan Erdogan, Global Industrial IT Manager at HAYAT HOLDING.

With the ongoing digitization of the manufacturing processes and Industry 4.0, there is enormous potential to use machine learning (ML) for quality prediction. Process manufacturing is a production method that uses formulas or recipes to produce goods by combining ingredients or raw materials.

Predictive quality comprises the use of ML methods in production to estimate and classify product-related quality based on manufacturing process data with the following goals[1]:

  • Quality description – The identification of relationships between process variables and product quality. For instance, how does the volume of an adhesive ingredient effect the quality parameters, such as its strength and elasticity.
  • Quality prediction – The estimation of a quality variable on the basis of process variables for decision support or for automation. For example, how much kg/m3 adhesive ingredient shall be ingested to achieve certain strength and elasticity.
  • Quality classification – In addition to quality prediction, this involves estimation of certain product quality types.

In this post, we share how HAYAT HOLDING—a global player with 41 companies operating in different industries, including HAYAT, the world’s fourth-largest branded diaper manufacturer, and KEAS, the world’s fifth-largest wood-based panel manufacturer—collaborated with AWS to build a solution that uses Amazon SageMaker Model Training, Amazon SageMaker Automatic Model Tuning, and Amazon SageMaker Model Deployment to continuously improve operational performance, increase product quality, and optimize manufacturing output of medium-density fiberboard (MDF) wood panels.

Product quality prediction and adhesive consumption recommendation results can be observed by field experts through dashboards in near-real time, resulting in a faster feedback loop. Laboratory results indicate a significant impact equating to savings of $300,000 annually, reducing their carbon footprint in production by preventing unnecessary chemical waste.

ML-based predictive quality in HAYAT HOLDING

HAYAT is the world’s fourth-largest branded baby diapers manufacturer and the largest paper tissue manufacturer of the EMEA. KEAS (Kastamonu Entegre Ağaç Sanayi) is a subsidy of HAYAT HOLDING, for production in the wood-based panel industry, and is positioned as the fourth in Europe and the fifth in the world.

Medium-density fiberboard (MDF) is an engineered wood product made by breaking down wood residuals into fibers, combining it with adhesives, and forming it into panels by applying high temperature and pressure. It has many application areas such as furniture, cabinetry, and flooring.

Production of MDF wood panels requires extensive use of adhesives (double-digit tons consumed each year at HAYAT HOLDING).

In a typical production line, hundreds of sensors are used. Product quality is identified by tens of parameters. Applying the correct volume of adhesives is an important cost item as well as an important quality factor for the produced panel, such as density, screw holding ability, tensile strength, modulus elasticity, and bending strength. While excessive use of glue increases production costs redundantly, poor utilization of glue raises quality problems. Incorrect usage causes up to tens of thousands of dollars in a single shift. The challenge is that there is a regressive dependency of quality on the production process.

Human operators decide on the amount of glue to be used based on domain expertise. This know-how is solely empirical and takes years of expertise to build competence. To support the decision-making for the human operator, laboratory tests are performed on selected samples to precisely measure quality during production. The lab results provide feedback to the operators revealing product quality levels. Nevertheless, lab tests are not in real time and are applied with a delay of up to several hours. The human operator uses lab results to gradually adjust glue consumption to achieve the required quality threshold.

Overview of solution

Quality prediction using ML is powerful but requires effort and skill to design, integrate with the manufacturing process, and maintain. With the support of AWS Prototyping specialists, and AWS Partner Deloitte, HAYAT HOLDING built an end-to-end pipeline as follows:

  • Ingest sensor data from production plant to AWS
  • Perform data preparation and ML model generation
  • Deploy models at the edge
  • Create operator dashboards
  • Orchestrate the workflow

The following diagram illustrates the solution architecture.

Data ingestion

HAYAT HOLDING has a state-of-the art infrastructure for acquiring, recording, analyzing, and processing measurement data.

Two types of data sources exist for this use case. Process parameters are set for the production of a particular product and are usually not changed during production. Sensor data is taken during the manufacturing process and represents the actual condition of the machine.

Input data is streamed from the plant via OPC-UA through SiteWise Edge Gateway in AWS IoT Greengrass. In total, 194 sensors were imported and used to increase the accuracy of the predictions.

Model training and optimization with SageMaker automatic model tuning

Prior to the model training, a set of data preparation activities are performed. For instance, an MDF panel plant produces multiple distinct products on the same production line (multiple types and sizes of wood panels). Each batch is associated with a different product, with different raw materials and different physical characteristics. Although the equipment and process time series are recorded continuously and can be seen as a single-flow time series indexed by time, they need to be segmented by the batch they are associated with. For instance, in a shift, product panels may be produced for different durations. A sample of the produced MDF is sent to the laboratory for quality tests from time to time. Other feature engineering tasks include feature reduction, scaling, unsupervised dimensionality reduction using PCA (Principal Component Analysis), feature importance, and outlier detection.

After the data preparation phase, a two-stage approach is used to build the ML models. Lab test samples are conducted by intermittent random product sampling from the conveyor belt. Samples are sent to a laboratory for quality tests. Because the lab results can’t be presented in real time, the feedback loop is relatively slow. The first model is trained to predict lab results for product quality parameters: density, elasticity, pulling resistance, swelling, absorbed water, surface durability, moisture, surface suction, and bending resistance. The second model is trained to recommend the amount of glue to be used in production, depending on the predicted output quality.

Setting up and managing custom ML environments can be time-consuming and cumbersome. Amazon SageMaker provides a suite of built-in algorithms, pre-trained models, and pre-built solution templates to help data scientists and ML practitioners get started on training and deploying ML models quickly.

Multiple ML models were trained using SageMaker built-in algorithms for the top N most produced product types and for different quality parameters. The quality prediction models identify the relationships between glue usage and nine quality parameters. The recommendation models predict the minimum glue usage to satisfy quality requirements using the following approach: an algorithm starts from the highest allowed glue amount and reduces it step by step if all requirements are satisfied until the minimum amount of glue allowed. If the max amount of glue doesn’t satisfy all the requirements, it gives an error.

SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

With automatic model tuning, the team focused on defining the right objective, scoping the hyperparameters and the search space. Automatic model tuning takes care of the rest, including the infrastructure, running and orchestrating training jobs in parallel, and improving hyperparameter selection. Automatic model tuning provides a wide range of training instance types. The model was fine-tuned on c5.x2large instance types using an intelligent version of hyperparameter tuning methods that is based on the Bayesian search theory and is designed to find the best model in the shortest time.

Inference at the edge

Multiple methods are available for deploying ML models to get predictions.

SageMaker real-time inference is ideal for workloads where then are real-time, interactive, low-latency requirements. During the prototyping phase, HAYAT HOLDING deployed models to SageMaker hosting services and got endpoints that are fully managed by AWS. SageMaker multi-model endpoints provide a scalable and cost-effective solution for deploying large numbers of models. They use the same fleet of resources and a shared serving container to host all your models. This reduces hosting costs by improving endpoint utilization compared with using single-model endpoints. It also reduces deployment overhead because SageMaker manages loading models in memory and scaling them based on the traffic patterns to your endpoint.

SageMaker real-time inference is used with multi-model endpoints for cost optimization and for making all models available at all times during development. Although using an ML model for each product type results in higher inference accuracy, the cost of developing and testing these models increases accordingly, and it also becomes difficult to manage multiple models. SageMaker multi-model endpoints address these pain points and give the team a rapid and cost-effective solution to deploy multiple ML models.

Amazon SageMaker Edge provides model management for edge devices so you can optimize, secure, monitor, and maintain ML models on fleets of edge devices. Operating ML models on edge devices is challenging, because devices, unlike cloud instances, have limited compute, memory, and connectivity. After the model is deployed, you need to continuously monitor the models, because model drift can cause the quality of model to decay overtime. Monitoring models across your device fleets is difficult because you need to write custom code to collect data samples from your device and recognize skew in predictions.

For production, the SageMaker Edge Manager agent is used to make predictions with models loaded onto an AWS IoT Greengrass device.

Conclusion

HAYAT HOLDING was evaluating an advanced analytics platform as part of their digital transformation strategy and wanted to bring AI to the organization for quality prediction in production.

With the support of AWS Prototyping specialists and AWS Partner Deloitte, HAYAT HOLDING built a unique data platform architecture and an ML pipeline to address long-term business and technical needs.

HAYAT KIMYA integrated the ML solution in one of its plants. Laboratory results indicate a significant impact equating to savings of $300,000 annually, reducing their carbon footprint in production by preventing unnecessary chemical waste. The solution provides a faster feedback loop to the human operators by presenting product quality predictions and adhesive consumption recommendation results through dashboards in near-real time. The solution will eventually be deployed across HAYAT HOLDING’s other wood panel plants.

ML is a highly iterative process; over the course of a single project, data scientists train hundreds of different models, datasets, and parameters in search of maximum accuracy. SageMaker offers the most complete set of tools to harness the power of ML. It lets you organize, track, compare, and evaluate ML experiments at scale. You can boost the bottom-line impact of your ML teams to achieve significant productivity improvements using SageMaker built-in algorithms, automatic model tuning, real-time inference, and multi-model endpoints.

Accelerate time to results and optimize operations by modernizing your business approach from edge to cloud using Machine Learning on AWS. Take advantage of industry-specific innovations and solutions using AWS for Industrial.

Share your feedback and questions in the comments.


About HAYAT HOLDING

HAYAT HOLDING, whose foundations were laid in 1937, is a global player today, with 41 companies operating in different industries, including HAYAT in the fast-moving consumer goods sector, KEAS (Kastamonu Entegre Ağaç Sanayi) in the wood-based panel sector, and LIMAS in the port management sector, with a workforce of over 17,000 people. HAYAT HOLDING delivers 49 brands produced with advanced technologies in 36 production facilities in 13 countries to millions of consumers worldwide.

Operating in the fast-moving consumer goods sector, Hayat was founded in 1987. Today, rapidly advancing on the path of globalization with 21 production facilities in 8 countries around the world, Hayat is the world’s fourth-largest branded diaper manufacturer and the largest tissue producer in the Middle East, Eastern Europe, and Africa, and a major player in the fast-moving consumer goods sector. With its 16 powerful brands, including Molfix, Bebem, Molped, Joly, Bingo, Test, Has, Papia, Familia, Teno, Focus, Nelex, Goodcare, and Evony in the hygiene, home care, tissue, and personal health categories, Hayat brings HAYAT* to millions of homes in more than 100 countries.

Kastamonu Entegre Ağaç Sanayi (KEAS), the first investment of HAYAT HOLDING in its industrialization move, was founded in 1969. Continuing its uninterrupted growth towards becoming a global power in its sector, it ranks fourth in Europe and fifth in the world. KEAS ranks first in the industry with its approximately 7,000 employees and exports to more than 100 countries.

*“Hayat” means “life” in Turkish.

References

  1. Tercan H, “Machine learning and deep learning based predictive quality in manufacturing: a systematic review”, Journal of Intelligent Manufacturing, 2022.

About the authors

Neslihan Erdoğan, (BSc and MSc in Electrical Engineering), held various technical & business roles as a specialist, architect and manager in Information Technologies. She has been working in HAYAT as the Global Industrial IT Manager and led Industry 4.0, Digital Transformation, OT Security  and Data & AI projects.

Çağrı Yurtseven (BSc in Electrical-Electronics Engineering, Bogazici University) is the Enterprise Account Manager at Amazon Web Services. He is leading Sustainability and Industrial IOT initiatives in Turkey while helping customers realize their full potential by showing the art of the possible on AWS.

Cenk Sezgin (PhD – Electrical Electronics Engineering) is a Principal Manager at AWS EMEA Prototyping Labs. He supports customers with exploration, ideation, engineering and development of state-of-the-art solutions using emerging technologies such as IoT, Analytics, AI/ML & Serverless.

Hasan-Basri AKIRMAK (BSc and MSc in Computer Engineering and Executive MBA in Graduate School of Business) is a Principal Solutions Architect at Amazon Web Services. He is a business technologist advising enterprise segment clients. His area of specialty is designing architectures and business cases on large scale data processing systems and Machine Learning solutions. Hasan has delivered Business development, Systems Integration, Program Management for clients in Europe, Middle East and Africa. Since 2016 he mentored hundreds of entrepreneurs at startup incubation programs pro-bono.

Mustafa Aldemir (BSc in Electrical-Electronics Engineering, MSc in Mechatronics and PhD-candidate in Computer Science) is the Robotics Prototyping Lead at Amazon Web Services. He has been designing and developing Internet of Things and Machine Learning solutions for some of the biggest customers across EMEA and leading their teams in implementing them. Meanwhile, he has been delivering AI courses at Amazon Machine Learning University and Oxford University.

Read More

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

Achieve effective business outcomes with no-code machine learning using Amazon SageMaker Canvas

On November 30, 2021, we announced the general availability of Amazon SageMaker Canvas, a visual point-and-click interface that enables business analysts to generate highly accurate machine learning (ML) predictions without having to write a single line of code. With Canvas, you can take ML mainstream throughout your organization so business analysts without data science or ML experience can use accurate ML predictions to make data-driven decisions.

ML is becoming ubiquitous in organizations across industries to gather valuable business insights using predictions from existing data quickly and accurately. The key to scaling the use of ML is making it more accessible. This means empowering business analysts to use ML on their own, without depending on data science teams. Canvas helps business analysts apply ML to common business problems without having to know the details such as algorithm types, training parameters, or ensemble logic. Today, customers are using Canvas to address a wide range of use cases across verticals including churn detection, sales conversion, and time series forecasting.

In this post, we discuss key Canvas capabilities.

Get started with Canvas

Canvas offers an interactive tour to help you navigate through the visual interface, starting with importing data from the cloud or on-premises sources. Getting started with Canvas is quick; we offer sample datasets for multiple use cases, including predicting customer churn, estimating loan default probabilities, forecasting demand, and predicting supply chain delivery times. These datasets cover all the use cases currently supported by Canvas, including binary classification, multi-class classification, regression, and time series forecasting. To learn more about navigating Canvas and using the sample datasets, see Amazon SageMaker Canvas accelerates onboarding with new interactive product tours and sample datasets.

Exploratory data analysis

After you import your data, Canvas allows you to explore and analyze it, before building predictive models. You can preview your imported data and visualize the distribution of different features. You can then choose to transform your data to make it suitable to address your problem. For example, you may choose to drop columns, extract date and time, impute missing values, or replace outliers with standard or custom values. These activities are recorded in a model recipe, which is a series of steps towards data preparation. This recipe is maintained throughout the lifecycle of a particular ML model from data preparation to generating predictions. See Amazon SageMaker Canvas expands capabilities to better prepare and analyze data for machine learning to learn more about preparing and analyzing data within Canvas.

Visualize your data

Canvas also offers the ability to define and create new features in your data through mathematical operators and logical functions. You can visualize and explore your data through box plots, bar graphs, and scatterplots by dragging and dropping features directly on charts. In addition, Canvas provides correlation matrices for numerical and categorical variables to understand the relationships between features in your data. This information can be used to refine your input data and drive more accurate models. For more details on data analysis capabilities in Canvas, see Use Amazon SageMaker Canvas for exploratory data analysis. To learn more about mathematical functions and operators in Canvas, see Amazon SageMaker Canvas supports mathematical functions and operators for richer data exploration.

After you prepare and explore your data, Canvas gives you an option to validate your datasets so you can proactively check for data quality issues. Canvas validates the data on your behalf and surfaces issues such as missing values in any row or column and too many unique labels in the target column compared to the number of rows. In addition, Canvas provides you with options to fix these issues before you build your ML model. For a deeper dive into data validation capabilities, refer to Identifying and avoiding common data issues while building no code ML models with Amazon SageMaker Canvas.

Build ML models

The first step towards building ML models in Canvas is to define the target column for the problem. For example, you could choose the total number of rooms as the target column to determine home prices in a housing model. Alternatively, you could use churn as the target column to determine the probability of losing customers under different conditions. After you select the target column, Canvas automatically determines the type of problem for the model to be built.

Prior to building an ML model, you can get directional insights into the model’s estimated accuracy and how each feature influenced results by running a preview analysis. Based on these insights, you can further prepare, analyze, or explore your data to get the desired accuracy for model predictions.

Canvas offers two methods to train ML models: Quick build and Standard build. Both methods deliver a fully trained ML model with complete transparency to understand the importance of each feature towards the model outcome. Quick build focuses on speed and experimentation, whereas standard build focuses on the highest levels of accuracy by going through multiple iterations of data preprocessing, choosing the right algorithm, exploring the hyperparameter space, and generating multiple candidate models before selecting the best performing model. This process is done behind the scenes by Canvas without the need to write code.

New performance improvements deliver up to three times faster ML model training time, enabling rapid prototyping and faster time-to-value for business outcomes. To learn more, see Amazon SageMaker Canvas announces up to 3x faster ML model training time.

Model analysis

After you build the model, Canvas provides detailed model accuracy metrics and feature explainability.

Canvas also presents a Sankey chart depicting the flow of the data from one value into the other, including false positives and false negatives.

For users interested in analyzing more advanced metrics, Canvas provides F1 scores that combine precision and recall, an accuracy metric quantifying how many times the model made a correct prediction across the entire dataset, and the Area Under the Curve (AUC), which measures how well the model separates the categories in the dataset.

Model predictions

With Canvas, you can run real-time predictions on the trained model with interactive what-if analyses by analyzing the impact of different feature values on the model accuracy.

Furthermore, you can run batch predictions on any validation dataset as a whole. These predictions can be previewed and downloaded for use with downstream applications.

Sharing and collaboration

Canvas allows you to continue the ML journey by sharing your models with your data science teams for review, feedback, and updates. You can share your models with other users using Amazon SageMaker Studio, a fully integrated development environment (IDE) for ML. Studio users can review the model and, if needed, update data transformations, retrain the model, and share back the updated version of the model with Canvas users who can then use it to generate predictions.

In addition, data scientists can share models built outside of Amazon SageMaker with Canvas users, removing the heavy lifting to build a separate tool or user interface to share models between different teams. With the bring your own model (BYOM) approach, you can now use ML models built by your data science teams in other environments and generate predictions within minutes directly in Canvas. This seamless collaboration between business and technical teams helps democratize ML across the organization by bringing transparency to ML models and accelerating ML deployments. To learn more about sharing and collaboration between business and technical teams using Canvas, see New – Bring ML Models Built Anywhere into Amazon SageMaker Canvas and Generate Predictions.

Conclusion

Get started today with Canvas and take advantage of ML to achieve your business outcomes without writing a line of code. Learn more from the interactive tutorial or MOOC course on Coursera. Happy innovating!


About the author

Shyam Srinivasan is on the AWS low-code/no-code ML product team. He cares about making the world a better place through technology and loves being part of this journey. In his spare time, Shyam likes to run long distances, travel around the world, and experience new cultures with family and friends.

Read More

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

How the UNDP Independent Evaluation Office is using AWS AI/ML services to enhance the use of evaluation to support progress toward the Sustainable Development Goals

The United Nations (UN) was founded in 1945 by 51 original Member States committed to maintaining international peace and security, developing friendly relations among nations, and promoting social progress, better living standards, and human rights. The UN is currently made up of 193 Member States and has evolved over the years to keep pace with a rapidly changing world. The United Nations Development Programme (UNDP) is the UN’s development agency and operates in over 170 countries and territories. It plays a critical role in helping countries achieve the Sustainable Development Goals (SDGs), which are a global call to action to end poverty, protect the planet, and ensure all people enjoy peace and prosperity.

As a learning organization, the UNDP highly values the evaluation function. Each UNDP program unit commissions evaluations to access the performance of their projects and programs. The Independent Evaluation Office (IEO) is a functionally independent office within the UNDP that supports the oversight and accountability functions of the Executive Board and management of the UNDP, UNCDF, and UNV. The core functions of the IEO are to conduct independent programmatic and thematic evaluations that are of strategic importance to the organization—like its support for the COVID-19 pandemic recovery.

In this post, we discuss how the IEO developed UNDP’s artificial intelligence and machine learning (ML) platform—named Artificial Intelligence for Development Analytics (AIDA)— in collaboration with AWS, UNDP’s Information and Technology Management Team (UNDP ITM), and the United Nations International Computing Centre (UNICC). AIDA is a web-based platform that allows program managers and evaluators to expand their evidence base by searching existing data in a smarter, more efficient, and innovative way to produce insights and lessons learned. By searching at the granular level of paragraphs, AIDA finds pieces of evidence that would not be found using conventional searches. The creation of AIDA aligns with the UNDP Strategic Plan 2022–2025 to use digitization and innovation for greater development impact.

The challenge

The IEO is the custodian of the UNDP Evaluation Resource Center (ERC). The ERC is a repository of over 6,000 evaluation reports that cover every aspect of the organization’s work, everywhere it has worked, since 1997. The findings and recommendations of the evaluation reports inform UNDP management, donor, and program staff to better design future interventions, take course-correction measures in their current programs, and make funding and policy decisions at every level.

Before AIDA, the process to extract evaluative evidence and generate lessons and insights was manual, resource-intensive, and time-consuming. Moreover, traditional search methods didn’t work well with unstructured data, therefore the evidence base was limited. To address this challenge, the IEO decided to use AI and ML to better mine the evaluation database for lessons and knowledge.

The AIDA team was mindful of the challenging task of extracting evidence from unstructured data such as evaluation reports. Usually, evaluation reports are 80–100 pages, are in multiple languages, and contain findings, conclusions, and recommendations. Even though evaluations are guided by the UNDP Evaluation Guideline, there is no standard written format for these evaluations, and the aforementioned sections may occur at different locations in the document, or not all of them may exist. Therefore, accurately exacting evaluative evidence at the paragraph level and applying appropriate labels was a significant ML challenge.

Solution overview

The AIDA technical solution was developed by AWS Professional Services and the UNICC. The core technology platform was designed and developed by the AWS ProServe team. The UNICC was responsible for developing the AIDA web portal and human-in-the-loop interface. The AIDA platform was envisioned to provide a simple and highly accurate mechanism to search UNDP evaluation reports across various themes and export them for further analysis. AIDA’s architecture needed to address several requirements:

  • Automate the extraction and labeling of evaluation data
  • Process thousands of reports
  • Allow the IEO to add new labels without calling on the expertise of data scientists and ML experts

To deliver the requirements, the components were designed with these tenets in mind:

  • Technically and environmentally sustainable
  • Cost conscious
  • Extensible to allow for future expansion

The resulting solution can be broken down to three components, as shown in the following architecture diagram:

  • Data ingestion and extraction
  • Data classification
  • Intelligent search

The following sections describe these components in detail.

Data ingestion and extraction

Evaluation reports are prepared and submitted by UNDP program units across the globe—there is no standard report layout template or format. The data ingestion and extraction component ingests and extracts content from these unstructured documents.

Amazon Textract is used to extract data from PDF documents. This solution uses the asynchronous StartDocumentTextDetection API to build the document processing workflow that handles Amazon Textract asynchronous invocation, raw response extraction, and persistence in Amazon Simple Storage Service (Amazon S3). This solution adds an Amazon Textract postprocessing component to handle paragraph-based text extraction. The postprocessing component uses bounding box metadata from Amazon Textract for intelligent data extraction. The postprocessing component is capable of extracting data from complex, multi-format, multi-page PDF files with varying headers, footers, footnotes, and multi-column data. The Apache Tika open-source Python library is used for data extraction from word documents.

The following diagram illustrates this workflow, orchestrated with AWS Step Functions.

This workflow has the following steps:

  1. TextractCompleted is the first step to ensure documents are not processed multiple times with Amazon Textract. This step is to avoid unnecessary processing time and cost by preventing duplicate processing.
  2. TextractAsyncCallTask submits the documents to be processed by Amazon Textract using the asynchronous StartDocumentTextDetection API. This API processes the documents and stores the JSON output files in Amazon S3 for postprocessing.
  3. TextractAsyncSNSListener is an AWS Lambda function that handles the Amazon Textract job completion event, and returns the metadata back to the workflow for further processing.
  4. TextractPostProcessorTask is an AWS Lambda function that uses the metadata and processes the JSON output files produced by Amazon Textract to extract meaningful paragraphs.
  5. TextractQAValidationTask is an AWS Lambda function that performs some simple text validations on the extracted paragraphs and collects metrics like number of complete or incomplete paragraphs. These metrics are used to measure the quality of text extractions.

Please refer to TextractAsync, an IDP CDK construct that abstracts the invocation of the Amazon Textract Async API, handling Amazon Simple Notification Service (Amazon SNS) messages and workflow processing to accelerate your development.

Data classification

The data classification component identifies the critical parts of the evaluation reports, and further classifies them into a taxonomy of categories organized around the various themes of the Sustainable Development Goals. We have built one multi-class and two multi-label classification models with Amazon Comprehend.

Extracted paragraphs are processed using Step Functions, which integrates with Amazon Comprehend to perform classification in batch mode. Paragraphs are classified into findings, recommendations, and conclusions (FRCs) using a custom multi-class model, which helps identify the critical sections of the evaluation reports. For the identified critical sections, we identify the categories (thematic and non-thematic) using a custom multi-label classification model. Thematic and non-thematic classification is used to identify and align the evaluation reports with Sustainable Development Goals like no poverty (SDG-1), gender equality (SDG-5), clean water and sanitation (SDG-6), and affordable and clean energy (SDG-7).

The following figure depicts the Step Functions workflow to process text classification.

To reduce cost on the classification process, we have created the workflow to submit Amazon Comprehend jobs in batch mode. The workflow waits for all the Amazon Comprehend jobs to complete and performs data refinement by aggregating the text extraction and Amazon Comprehend results to filter the paragraphs that aren’t identified as FRC, and aggregates the thematic and non-thematic classification categories by paragraphs.

Extracted paragraphs with their classification categories are stored in Amazon RDS for PostgreSQL. This is a staging database to preserve all the extraction and classification results. We also use this database to further enrich the results to aggregate the themes of the paragraphs, and filter paragraphs that are not FRC. Enriched content is fed to Amazon Kendra.

For the first release, we had over 2 million paragraphs extracted and classified. With the help of FRC custom classification, we were able to accurately narrow down the paragraphs to over 700,000 from 2 million. The Amazon Comprehend custom classification model helped accurately present the relevant content and substantially reduced the cost on Amazon Kendra indexes.

Amazon DynamoDB is used for storing document metadata and keeping track of the document processing status across all key components. Metadata tracking is particularly useful to handle errors and retries.

Intelligent search

The intelligent search capability allows the users of the AIDA platform to intuitively search for evaluative evidence on UNDP program interventions contained within all the evaluation reports. The following diagram illustrates this architecture.

Amazon Kendra is used for intelligent searches. Enriched content from Amazon RDS for PostgreSQL is ingested into Amazon Kendra for indexing. The web portal layer uses the intelligent search capability of Amazon Kendra to intuitively search the indexed content. Labelers use the human-in-the-loop user interface to update the text classification generated by Amazon Comprehend for any extracted paragraphs. Changes to the classification are immediately reflected in the web portal, and human-updated feedback is extracted and used for Amazon Comprehend model training to continuously improve the custom classification model.

AIDA incorporates a human-in-the-loop functionality, which boosts AIDA’s capacity to correct classifications (FRC, thematic, non-thematic) and data extractions errors. Labels, updated by the humans performing the human-in-the-loop function, are augmented to the training dataset and used to retrain the Amazon Comprehend models to continuously improve classification accuracy.

Conclusion

In this post, we discussed how evaluators, through the IEO’s AIDA platform, are using Amazon AI and ML services like Amazon Textract, Amazon Comprehend, and Amazon Kendra to build a custom document processing system that identifies, extracts, and classifies data from unstructured documents. Using Amazon Textract for PDF text extraction improved paragraph-level evidence extraction from under 60% to over 80% accuracy. Additionally, multi-label classification improved from under 30% to 90% accuracy by retraining models in Amazon Comprehend with improved training datasets.

This platform enabled evaluators to intuitively search relevant content quickly and accurately. Transforming unstructured data to semi-structured data empowers the UNDP and other UN entities to make informed decisions based on a corpus of hundreds or thousands of data points about what works, what doesn’t work, and how to improve the impact of UNDP operations for the people it serves.

For more information about the intelligent document processing reference architecture, refer to Intelligent Document Processing. Please share your thoughts with us in the comments section.


About the Authors

Oscar A. Garcia is the Director of the Independent Evaluation Office (IEO) of the United Nations Development Program (UNDP). As Director, he provides strategic direction, thought leadership, and credible evaluations to advance UNDP work in helping countries progress towards national SDG achievement. Oscar also currently serves as the Chairperson of the United Nations Evaluation Group (UNEG). He has more than 25 years of experience in areas of strategic planning, evaluation, and results-based management for sustainable development. Prior to joining the IEO as Director in 2020, he served as Director of IFAD’s Independent Office of Evaluation (IOE), and Head of Advisory Services for Green Economy, UNEP. Oscar has authored books and articles on development evaluation, including one on information and communication technology for evaluation. He is an economist with a master’s degree in Organizational Change Management, New School University (NY), and an MBA from Bolivian Catholic University, in association with the Harvard Institute for International Development.

Sathya Balakrishnan is a Sr. Customer Delivery Architect in the Professional Services team at AWS, specializing in data and ML solutions. He works with US federal financial clients. He is passionate about building pragmatic solutions to solve customers’ business problems. In his spare time, he enjoys watching movies and hiking with his family.

Thuan Tran is a Senior Solutions Architect in the World Wide Public Sector supporting the United Nations. He is passionate about using AWS technology to help customers conceptualize the art of the possible. In this spare time, he enjoys surfing, mountain biking, axe throwing, and spending time with family and friends.

Prince Mallari is an NLP Data Scientist in the Professional Services team at AWS, specializing in applications of NLP for public sector customers. He is passionate about using ML as a tool to allow customers to be more productive. In his spare time, he enjoys playing video games and developing one with his friends.

Read More

Enabling Optimal Inference Performance on AMD EPYC™ Processors with the ZenDNN Library

Enabling Optimal Inference Performance on AMD EPYC™ Processors with the ZenDNN Library

Posted by Sarina Sit, AMD

AMD launched the 4th Generation of AMD EPYC™ processors in November of 2022. 4th Gen AMD EPYC processors include numerous hardware improvements over the prior generation, such as AVX-512 and VNNI instruction set extensions, that are well-suited for improving inference performance. However, hardware is only one piece of the puzzle; software is a crucial component for effectively taking advantage of the underlying hardware.

We are happy to announce the new availability of the TensorFlow-ZenDNN plug-in for TensorFlow v2.12 and above, which represents the ongoing and focused effort by AMD to improve the accessibility of ZenDNN optimizations for the community via framework upstreaming. This plug-in enables neural network inferencing on AMD EPYC CPUs with the AMD ZenDNN library. 

ZenDNN 

ZenDNN, which is available open-source from GitHub, is a low-level AMD deep neural network library that includes basic neural network building blocks optimized for AMD EPYC CPUs. ZenDNN is purpose-built to help deep learning application and framework developers improve inference performance on AMD EPYC CPUs across an array of workloads, including computer vision, natural language processing, and recommender systems.

TF-ZenDNN 

We have integrated ZenDNN into high-level AI frameworks for ease of use. Our prototype integration with TensorFlow, called TF-ZenDNN, is done by forking the TensorFlow repository at a specific version and directly modifying TensorFlow code. TF-ZenDNN is available as a binary package for direct integration from AMD’s ZenDNN developer resources page (diagram 1 below), with installation instructions available in our TensorFlow + ZenDNN User Guide.

AMD Zen Deep Neural Network (ZenDNN) Resources page
Diagram 1. The ZenDNN v4.0 binary package available on our ZenDNN developer resources page is referred to in this blog as our TF-ZenDNN direct integration version.

TF-ZenDNN optimizes graphs at the network level and provides tuned primitive implementations at a library level, including Convolution, MatMul, Elementwise, and Pooling (Max and Average). We have seen performance benefits across a variety of neural network models, including the breadth of convolutional neural networks depicted by the orange line below in Graph 1. Optimizing Tencent’s AI Applications with the ZenDNN AI Inference Library and TF-ZenDNN impact on TinyDefectNet demonstrates the high performance of ZenDNN and its integration with TensorFlow, respectively. 

AMD Zen Deep Neural Network (ZenDNN) Resources page
Graph 1. Performance uplift of the TensorFlow-ZenDNN plug-in v0.1 and TF-ZenDNN direct integration v4.0 compared to TF-vanilla (without ZenDNN). As optimizations continue to be added to the TensorFlow-ZenDNN plug-in, the extent of performance uplift is expected to compare to that of TF-ZenDNN direct integration. Please see endnotes ZD-045 through ZD-051 at the end of this blog.

TensorFlow-ZenDNN Plug-in 

TF-ZenDNN direct integration, as in the binary form described in the section above, requires significant changes in the TensorFlow code. Upstreaming such changes to the TensorFlow repository would be cumbersome and unsustainable. TensorFlow v2.5 provides a PluggableDevice mechanism that enables modular, plug-and-play integration of device-specific code. AMD adopted PluggableDevice when implementing the TensorFlow-ZenDNN plug-in for AMD EPYC CPUs. TensorFlow-ZenDNN plug-in adds custom kernel implementations and operations specific to AMD EPYC CPUs to TensorFlow through its kernel and op registration C API (diagram 2 below).

AMD Zen Deep Neural Network (ZenDNN) Resources page
Diagram 2. The TensorFlow-ZenDNN plug-in upstreamed into TFv2.12 enables the addition of custom kernels and operations developed by AMD for performance improvement on AMD EPYC processors.

The main difference between the TensorFlow-ZenDNN plug-in and TF-ZenDNN direct integration is compatibility with standard TensorFlow packages. TF-ZenDNN direct integration is a standalone package which replaces standard TensorFlow packages. TensorFlow-ZenDNN plug-in is a supplemental package to be installed alongside standard TensorFlow packages starting from TF version 2.12 onwards.

From a TensorFlow developer’s perspective, the TensorFlow-ZenDNN plug-in approach simplifies the process of leveraging ZenDNN optimizations compared to the TF-ZenDNN direct integration approach. With TF-ZenDNN direct integration, the developer needs to download the foundational TensorFlow build and navigate separately to the AMD ZenDNN developer resources page to download the specific TF-ZenDNN binary for integration. In contrast, with the TensorFlow-ZenDNN plug-in approach, everything that a user needs to take advantage of ZenDNN resides on TensorFlow pages, as described further in the next section, “Step-by-Step Guide to using ZenDNN on AMD EPYC Processors”.

The TensorFlow-ZenDNN plug-in, in its first iteration (v0.1), currently offers 16 common ZenDNN ops, including Conv2D, MatMul, BatchMatMul, FusedBatchNorm, AvgPool, and MaxPool. Other ops that are not covered will fall back to TensorFlow’s native kernels. TensorFlow-ZenDNN plug-in provides competitive performance with TF-ZenDNN direct integration package for models such as ResNet, Inception, and VGG variants, as represented in Graph 1 above, with the blue bars representing TensorFlow-ZenDNN plug-in performance and the orange line representing TF-ZenDNN direct integration performance. However, TF-ZenDNN direct integration still outperforms the plug-in for other models, such as MobileNet and EfficientNet, because the plug-in does not yet support graph optimizations that are currently featured in TF-ZenDNN direct integration packages. We expect the performance to be closer once the TensorFlow-ZenDNN plug-in reaches feature parity with TF-ZenDNN direct integration.

Step-by-Step Guide to using ZenDNN
on AMD EPYC Processors 

Taking advantage of ZenDNN optimizations in TensorFlow is straightforward:  

1. Download ZenDNN Plug-in CPU wheel file from the TensorFlow Community Supported Builds webpage.

2. Pip install the ZenDNN plug-in using the following commands:

pip install tensorflow-cpu==2.12.0 

pip install tensorflow_zendnn_plugin-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

 
3. Enable ZenDNN optimizations in your inference flow by setting the following environment variables:

export TF_ENABLE_ZENDNN_OPTS=1

export TF_ENABLE_ONEDNN_OPTS=0

To disable ZenDNN optimizations in your inference execution, you can set the corresponding ZenDNN environment variable to 0:

export TF_ENABLE_ZENDNN_OPTS=0.  

TensorFlow-ZenDNN plug-in is supported with ZenDNN v3.3. Please see Chapter 5 of the TensorFlow-ZenDNN Plug-in User Guide for performance tuning guidelines. 

For optimal inference performance, AMD recommends using the TF-ZenDNN direct integration binaries available on the AMD ZenDNN developer resources page

What’s Coming Next with ZenDNN 

TensorFlow v2.12 marks the first release of our TensorFlow-ZenDNN plug-in. AMD intends to continue improving the performance of the TensorFlow-ZenDNN plug-in on current- and future-generation AMD EPYC processors by supporting more ZenDNN ops, graph optimizations, and quantization in subsequent TensorFlow-ZenDNN plug-in releases. Such enhancements include a planned plug-in version transition from ZenDNN v3.3 to ZenDNN v4.0 to enable optimizations that take advantage of the AVX-512 and VNNI capability in 4th Gen EPYC processors.

With our aim of continuously improving the TensorFlow-ZenDNN plug-in for the community, we encourage TensorFlow developers to test this new TensorFlow-ZenDNN plug-in and share comments and concerns on our ZenDNN GitHub page. Technical support resources can also be reached via the following email address: zendnnsupport@amd.com.

We are excited to continue collaborating with TensorFlow to improve the ZenDNN experience for the wider TensorFlow developer community!

Acknowledgements

The development and upstreaming of the TensorFlow-ZenDNN plug-in is the work of many people from AMD and the TensorFlow team at Google.

From AMD: Chandra Kumar Ramasamy, Aakar Dwivedi, Savan Anadani, Arun Ramachandran, Avinash-Chandra Pandey, Ratan Prasad, Aditya Chatterjee, Alok Ranjan Srivastava, Prakash Raghavendra, Pradeep Kumar Sinha, Vincent Dee.

From Google: Penporn Koanantakool, Eugene Zhulenev, Douglas Yarrington.

Legal Endnotes

ZD-045 through ZD-051:

Testing conducted by AMD Performance Labs as of Tuesday, February 7, 2023 on test systems comprising of:

AMD System: AMD Eng Sample of the AMD EPYC™ 9004 96-core processor, dual socket, with hyperthreading on, 2150 MHz CPU frequency (Max 3700 MHz), 768GB RAM, 768MB L3 Cache, NPS1 mode, Ubuntu® 20.04.5 LTS version, kernel version 5.4.0-131-generic, BIOS TQZ1000F, GCC/G++ version 11.1.0, GNU ID 2.31, Python 3.8.15. For no ZenDNN, Tensorflow 2.12. For the ZenDNN plug-in, AOCL BLIS 3.0.6, Tensorflow 2.12, ZenDNN version 3.3; for Direct Integration AOCL BLIS 4.0, Tensorflow Version 2.10, ZenDNN 4.0.

Tests run all from Unified Inference Frontend 1.1 (UIF1.1) model zoo:

  • FP32 EfficientNet
  • FP32 Inception v3
  • FP32 MobileNet v1
  • FP32 VGG16
  • FP32 RefineDet
  • FP32 BERT Base
  • FP32 ResNet50

Results may vary based on factors such as software versions and BIOS settings. ZD-045 thru ZD-051

Read More

Research Focus: Week of March 27, 2023

Research Focus: Week of March 27, 2023

Microsoft Research Focus 12 edition, week of March 27, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEWS

Bing’s gendered translations tackle bias in machine translation

Machine translation (MT) models are designed to learn from large amounts of data collected from real-world sources. However, this training data may contain implicit biases which may be amplified by the model. One such example is the expression of gender, which can vary widely across different languages. In English, the word “lawyer” can refer to either a male or female individual, whereas in Spanish, “abogada” and “abogado” are used to refer to a female and male lawyer, respectively. As a result, MT models often assign arbitrary genders to animate entities in the translated output, even when the source text does not imply a specific gender.

The Microsoft Translator team has released a feature on Bing Translator which will provide feminine and masculine translations for sentences that have gender neutral words such as “doctor” or “teacher” when translating from English to Spanish, French and Italian. Additionally, to support ongoing research and track progress towards reducing gender bias in MT, the team has published a technical paper outlining their evaluation methodology and test sets. These test sets comprise a linguistically diverse corpus of gender-ambiguous source sentences, along with multiple alternative target language translations.


AWARD

Microsoft researcher honored by Women in AI Netherlands

Rianne van den Berg, a Principal Researcher at Microsoft Research in Amsterdam, has won the AI Researcher award from Women in AI Netherlands.

Rianne was recognized for her work in deep learning and physics. The award announcement noted her published work in journals such as Nature Physics and Physical Review Letters as well as at prominent AI conferences, such as NeurIPS, ICML and ICLR. The organization also cited Rianne’s dedication to diversity and inclusion.

In her role on the AI4Science team at Microsoft Research, Rianne’s research focuses on the intersection between computational chemistry and deep learning, with an emphasis on modeling chemical reactions. Her prior research has spanned topics ranging from generative modeling and variational inference to source compression, graph-structured learning, and condensed-matter physics. She received her PhD in theoretical condensed-matter physics in 2016 at the University of Amsterdam, where she also worked as a postdoctoral researcher as part of the Amsterdam Machine Learning Lab (AMLAB).


Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.

INTERVIEW

Recognizing women in technology

Why are women underrepresented in STEM and AI and how can we close that gap? How is technology shaping society, from gender issues to creativity and collaboration?

Microsoft Research Principal Researcher Cheng Zhang sat down to discuss these issues and more with the UK Chinese Women Connect Association, which recently recognized her as the Highly Commended awardee in the Chinese Women of the Year: Technology category.

In the interview, Cheng talks about her career in technology research and why she came to Microsoft Research Cambridge, where she works with the Machine Intelligence group. The conversation covers the impact of AI, strategies for making an impact—especially at a very large company—and the value of learning from others. Catch a video replay of this fascinating interview.

The post Research Focus: Week of March 27, 2023 appeared first on Microsoft Research.

Read More

Research Focus: Week of March 27, 2023

Research Focus: Week of March 27, 2023

Microsoft Research Focus 12 edition, week of March 27, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEWS

Bing’s gendered translations tackle bias in machine translation

Machine translation (MT) models are designed to learn from large amounts of data collected from real-world sources. However, this training data may contain implicit biases which may be amplified by the model. One such example is the expression of gender, which can vary widely across different languages. In English, the word “lawyer” can refer to either a male or female individual, whereas in Spanish, “abogada” and “abogado” are used to refer to a female and male lawyer, respectively. As a result, MT models often assign arbitrary genders to animate entities in the translated output, even when the source text does not imply a specific gender.

The Microsoft Translator team has released a feature on Bing Translator which will provide feminine and masculine translations for sentences that have gender neutral words such as “doctor” or “teacher” when translating from English to Spanish, French and Italian. Additionally, to support ongoing research and track progress towards reducing gender bias in MT, the team has published a technical paper outlining their evaluation methodology and test sets. These test sets comprise a linguistically diverse corpus of gender-ambiguous source sentences, along with multiple alternative target language translations.


AWARD

Microsoft researcher honored by Women in AI Netherlands

Rianne van den Berg, a Principal Researcher at Microsoft Research in Amsterdam, has won the AI Researcher award from Women in AI Netherlands.

Rianne was recognized for her work in deep learning and physics. The award announcement noted her published work in journals such as Nature Physics and Physical Review Letters as well as at prominent AI conferences, such as NeurIPS, ICML and ICLR. The organization also cited Rianne’s dedication to diversity and inclusion.

In her role on the AI4Science team at Microsoft Research, Rianne’s research focuses on the intersection between computational chemistry and deep learning, with an emphasis on modeling chemical reactions. Her prior research has spanned topics ranging from generative modeling and variational inference to source compression, graph-structured learning, and condensed-matter physics. She received her PhD in theoretical condensed-matter physics in 2016 at the University of Amsterdam, where she also worked as a postdoctoral researcher as part of the Amsterdam Machine Learning Lab (AMLAB).


Spotlight: On-Demand EVENT

Microsoft Research Summit 2022

On-Demand
Watch now to learn about some of the most pressing questions facing our research community and listen in on conversations with 120+ researchers around how to ensure new technologies have the broadest possible benefit for humanity.

INTERVIEW

Recognizing women in technology

Why are women underrepresented in STEM and AI and how can we close that gap? How is technology shaping society, from gender issues to creativity and collaboration?

Microsoft Research Principal Researcher Cheng Zhang sat down to discuss these issues and more with the UK Chinese Women Connect Association, which recently recognized her as the Highly Commended awardee in the Chinese Women of the Year: Technology category.

In the interview, Cheng talks about her career in technology research and why she came to Microsoft Research Cambridge, where she works with the Machine Intelligence group. The conversation covers the impact of AI, strategies for making an impact—especially at a very large company—and the value of learning from others. Catch a video replay of this fascinating interview.

The post Research Focus: Week of March 27, 2023 appeared first on Microsoft Research.

Read More

Blender Update 3.5 Fuels 3D Content Creation, Powered by NVIDIA GeForce RTX GPUs

Blender Update 3.5 Fuels 3D Content Creation, Powered by NVIDIA GeForce RTX GPUs

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

It’s a celebration, creators!

Blender, the world’s most popular 3D creation suite — free and open source — released its major version 3.5 update. Expected to have a profound impact on 3D creative workflows, this latest release features support for Open Shading Language (OSL) shaders with the NVIDIA OptiX ray-tracing engine.

Plus, 3D artist and filmmaker Pablo Reche Beltrán, aka AuraProds, joins the 50th edition of the In the NVIDIA Studio series this week to share his Jurassic Park-inspired short film. Thank you to the artists who’ve contributed to the series, those who influence and inspire each day and those who will do so in the future.

Finally, enter the Watch ‘n Learn Giveaway hosted by creative community 80LV for a chance to win a powerful GeForce RTX 4080 GPU. Enter by watching an Omniverse for creators GTC session, filling out this form, and tagging #GTC23 and @NVIDIAOmniverse on social media by Thursday, March 30.

Better Renders in Blender 3.5

Supporting OSL shaders with NVIDIA OptiX, the Blender update 3.5 now enables 3D artists to use shaders in Cycles and render them on an NVIDIA RTX GPU. Previously, such a workflow was limited to CPU rendering only.

Blender update 3.5 supports Open Shading Language shaders with NVIDIA OptiX. Image courtesy of Sprite Fright, studio.blender.org.

This saves an extraordinary amount of time for artists, as scenes that use OSL can be completely rendered 36x faster than with a CPU alone.

Blender 3.5 also delivers updates to creative workflows in animation and rigging, nodes and physics, modeling and more.

Read the full release notes.

Dino-mite Renders Never Go Extinct

3D artist AuraProds fondly remembers watching the blockbuster movie Jurassic Park in theaters as a kid, wishing to one day recreate a memorable scene in which a giant T-Rex frightens the main characters who are huddled together in a car. Unlike that scary moment, however, the artist’s AuraProds video came together in a rather cute way.

“The concept art was made by my five- and eight-year-old cousins,” said AuraProds. “They drew the dinosaurs and inspired me to turn them into 3D.”

Based in Almería, a small city in southern Spain, AuraProds was perfectly situated to capture video footage in the nearby town of Tabernas, where Hollywood directors often shoot western movies. With the requisite footage captured, AuraProds modeled dinosaurs to populate the scene.

 

His preferred 3D app is Blender. “No doubt,” said AuraProds. ”I really like it because I can bring to 3D any idea in my head with a simple workflow.”

The artist modeled and sculpted each dinosaur by hand, using Blender Cycles RTX-accelerated OptiX ray tracing in the viewport for interactive, photorealistic modeling, thanks to his GeForce RTX 4080 GPU.

Detailed sculpting was done in Blender.

Satisfied with the models, AuraProds experimented with a variety of textures before moving on to rig and animate his models. Motion-blur visual effects were applied quickly with accelerated rendering and NVIDIA NanoVBD for easier rendering of volumes.

Geo nodes can add organic style and customization to Blender scenes and animation.

RTX-accelerated OptiX ray tracing in Blender Cycles allowed AuraProds to quickly export fully rendered files to Blackmagic Design’s DaVinci Resolve software.

Visual effects were applied and rendered in Blender.

Here, AuraProds RTX GPU dramatically accelerated his compositing workflows. GPU-accelerated color grading, video editing and color scopes were applied with ease. And decoding with NVDEC, also GPU accelerated, enabled buttery-smooth playback and scrubbing of high-resolution footage.

“NVIDIA RTX GPUs are the only technology that could handle the massive amount of polygons for this project. I know from my own experience how reliable the GPUs are.”’ — AuraProds.

The RTX 4080 GPU sped AuraProds use of new AI video editing tools, delivering up to a 2x increase in AI performance over the previous generation. Performance boosts were also applied to existing RTX-accelerated AI features — including automatic tagging of clips and tracking of effects, SpeedWarp for smooth slow motion and Video Super Resolution.

AuraProds then applied several AI features to achieve his desired visual effect. He wrapped up the project by deploying the RTX 4080 GPU’s dual AV1 video encoders — cutting the export time in half.

“I got to bring that aesthetic and memory to this new digital age with today’s AI tools,” said AuraProds.

3D artist AuraProds.

For more of AuraProds video content, check out VELOX — a short 3D film shot entirely with a green screen — made by scanning an entire desert and creating several 3D spaceships over two months, available on his YouTube channel.

Experienced and aspiring content creators can discover exclusive step-by-step tutorials from industry-leading artists, inspiring community showcases and more on the NVIDIA Studio YouTube channel, which includes a curated Blender playlist.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Get updates directly in your inbox by subscribing to the Studio newsletter

Read More

Ubisoft’s Yves Jacquier on How Generative AI Will Revolutionize Gaming

Ubisoft’s Yves Jacquier on How Generative AI Will Revolutionize Gaming

Tools like ChatGPT have awakened the world to the potential of generative AI. Now, much more is coming.

On the latest episode of the NVIDIA AI Podcast, Yves Jacquier, executive director of Ubisoft La Forge, shares valuable insights into the transformative potential of generative AI in the gaming industry. With over two decades of experience in technology innovation, science and R&D management across various sectors, Jacquier’s comprehensive expertise makes him a true visionary in the field.

During his conversation with podcast host Noah Kravitz, Jacquier highlighted how generative AI, which enables computers to create unique content such as images, text and music, is already revolutionizing the gaming sector. By designing new levels, characters and items, and generating realistic graphics and soundscapes, this cutting-edge technology offers countless opportunities for more immersive and engaging experiences.

As the driving force behind Ubisoft La Forge, Jacquier plays a crucial role in shaping the company’s academic R&D strategy. Key milestones include establishing a chair in AI deep learning in 2011 and founding Ubisoft La Forge, the first lab in the gaming industry dedicated to applied academic research. This research is being translated into state-of-the-art gaming experiences.

Jacquier expressed confidence that generative AI will play a vital role in sculpting the gaming industry and providing unparalleled gaming experiences for enthusiasts around the world.

Related Articles

Sequoia Capital’s Pat Grady and Sonya Huang on Generative AI

Partners at Sequoia Capital, Pat Grady and Sonya Huang, discuss their thought-provoking essay, “Generative AI: A Creative New World.” The authors explore the potential of generative AI to unlock new realms of creativity and expression, while also addressing the challenges and ethical implications of this technology. They also provide insights into generative AI’s future.

Real or Not Real? Attorney Steven Frank Employs Deep Learning to Authenticate Art

Steven Frank, a partner at law firm Morgan Lewis, specializes in intellectual property and commercial technology law. He is part of a husband-wife duo that utilized convolutional neural networks to authenticate masterpieces, including da Vinci’s Salvador Mundi, with the aid of AI.

GANTheftAuto: Harrison Kinsley on AI-Crafted Gaming Environments

While humans playing games against machines is a familiar concept, computers can now develop games for people to enjoy. Programming aficionado and social media influencer Harrison Kinsley devised GANTheftAuto, an AI-driven neural network that produces a playable segment of the iconic video game Grand Theft Auto V.

Subscribe, Review and Follow NVIDIA AI on Twitter

If you found this episode insightful, subscribe to the NVIDIA AI Podcast on your preferred podcast platform and leave a rating and review. Stay connected with @NVIDIAAI on Twitter.

Read More

GPU performance improvement for TorchInductor vs eager-mode

Experience the power of PyTorch 2.0 on AMD Solutions

PyTorch 2.0 represents a significant step forward for the PyTorch machine learning framework. The stable release of PyTorch 2.0 brings new features that unlock even higher performance, while remaining backward compatible with prior releases and retaining the Pythonic focus which has helped to make PyTorch so enthusiastically adopted by the AI/ML community. AMD has long been a strong proponent of PyTorch, and we are delighted that PyTorch 2.0 stable release includes support for AMD Instinct™ and Radeon™ GPUs that are supported by the ROCm™ software platform.

Along with the stable PyTorch 2.0 release, the Beta includes torch.compile underpinned by TorchInductor with support for AMD Instinct and Radeon GPUs through OpenAI Triton deep learning compiler. Through TorchInductor, developers can now generate low level code using Triton that are portable and performant to hand-written kernels on native hardware centric kernel programming models.

Compilers like Triton can optimize the code generated by machine learning frameworks such as PyTorch for multiple AI accelerators including AMD Instinct GPU accelerator by leveraging hardware-specific features of the AMD CDNA™ GPU architecture. This makes it easy for developers and users to switch seamlessly from any HW to AMD Instinct GPU accelerators and get great out of the box performance.

In addition, compilers like Triton can also enable developers to use high-level programming languages, such as Python, to write machine learning code that can be efficiently compiled and executed on specialized hardware. This can help greatly improve the productivity of machine learning developers, as they can focus on the algorithmic aspects of their models and rely on the compiler to generate efficient code.

OpenAI Triton is a just-in-time (JIT) compiler that optimizes and accelerates the execution of deep learning models on various hardware architectures such as CPUs, GPUs, and ASICs. Here is a high-level overview

  1. Model Loading: The Triton server loads a trained deep learning model from a storage location, typically a file in a model format such as torchfx graphs.
  2. Graph Optimization: Triton optimizes the graph representation of the loaded model. This includes transformations such as common subexpression elimination, dead code elimination, and operator fusion, which can help reduce memory usage and computational overhead.
  3. Tensor Memory Allocation: Triton allocates memory for the tensors used by the model. This includes input and output tensors, as well as intermediate tensors created during computation.
  4. Hardware-Specific Optimization: Triton applies hardware-specific optimizations to the optimized graph representation of the model. These optimizations can include using low-level hardware instructions, optimizing data movement between different types of memory, and leveraging hardware-specific data structures that leverages domain specific architectures like CDNA on AMD Instinct GPUs
  5. Code Generation: Triton generates efficient machine code for the optimized graph representation of the model. This code can then be executed on the hardware platform for which it was optimized.
  6. Execution: Triton executes the generated code on the hardware platform, typically in a just-in-time fashion. Triton can also dynamically adjust the batch size and other parameters of the model during execution to maximize performance.
  7. Result Return: Triton returns the results of the computation to the client that requested the inference.

By design, PyTorch 2.0 is backward compatible to earlier PyTorch releases. This holds true for the ROCm build of PyTorch 2.0 as well. Developers using PyTorch with AMD GPUs can migrate to PyTorch 2.0 with the confidence that their existing code will continue to work without any required changes, so there is no penalty to access the improvements that come with this release. On the other hand, using PyTorch 2.0 and TorchInductor can result in significant performance improvement over the default eager-mode as shown below.

The initial results using AMD Instinct MI250 GPUs already shows strong performance improvement with minimal optimization on TorchInductor compared to the default eager-mode. We see an average performance increase of up to 1.54X on 44 out of the 45 models on HuggingFace benchmarks suite with CamemBert, DistillGPT2 and T5Small being a few of the standout models with up to 1.5X or more performance improvement over eager-mode. We are looking forward to continued engagement with members of the PyTorch team at Meta to enable further optimization on ROCm software stack and the additional performance improvement for future PyTorch releases.

GPU performance improvement for TorchInductor vs eager-mode

Image 1: AMD MI250 GPU performance improvement for TorchInductor vs eager-mode using HuggingFace MI200-89.

PyTorch 2.0 follows the same set of install options as before to build and install for supporting AMD GPUs. These include an installable Python package hosted at pytorch.org, AMD’s public PyTorch docker image, and of course the option to build from source using the upstream PyTorch repository. As with PyTorch builds for other platforms, the specific command line to be run for pip-based install is provided by the configurator at https://pytorch.org/get-started/locally/.

The GPUs supported by the ROCm software platform which forms the basis for PyTorch support on AMD GPUs are documented at https://docs.amd.com/bundle/Hardware_and_Software_Reference_Guide/page/Hardware_and_Software_Support.html

Conclusion

PyTorch 2.0 represents a major step in continuing to broaden support for ML developers by increasing performance while maintaining a simple, Pythonic interface. This performance uplift is made possible in large part by the new TorchInductor infrastructure, which in turn harnesses the Triton ML programming language and just-in-time compiler. AMD’s support for these technologies allows users to realize the full promise of the new PyTorch architecture. Our GPU support in PyTorch 2.0 is just one manifestation of a larger vision around AI and machine learning. AI/ML plays an important role in multiple AMD product lines, including Instinct and Radeon GPUs, Alveo™ data center accelerators, and both Ryzen™ and EPYC processors. These hardware and software initiatives are all part of AMD’s Pervasive AI vision, and we look forward to addressing the many new challenges and opportunities of this dynamic space.

MI200-89 – PyTorch Inductor mode HuggingFace Transformers training speedup, running the standard PyTorch 2.0 test suite, over PyTorch eager-mode comparison based on AMD internal testing on a single GCD as of 3/10/2023 using a 2P AMD EPYC™ 7763 production server with 4x AMD Instinct™ MI250 (128GB HBM2e) 560W GPUs with Infinity Fabric™ technology; host ROCm™ 5.3, guest ROCm™ 5.4.4, PyTorch 2.0.0, Triton 2.0. Server manufacturers may vary configurations, yielding different results. Performance may vary based on factors including use of latest drivers and optimizations.

© 2023 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, AMD CDNA, AMD Instinct, EPYC, Radeon, ROCm, Ryzen, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective owners.

Read More

Leveraging transfer learning for large scale differentially private image classification

Leveraging transfer learning for large scale differentially private image classification

Large deep learning models are becoming the workhorse of a variety of critical machine learning (ML) tasks. However, it has been shown that without any protection it is plausible for bad actors to attack a variety of models, across modalities, to reveal information from individual training examples. As such, it’s essential to protect against this sort of information leakage.

Differential privacy (DP) provides formal protection against an attacker who aims to extract information about the training data. The most popular method for DP training in deep learning is differentially private stochastic gradient descent (DP-SGD). The core recipe implements a common theme in DP: “fuzzing” an algorithm’s outputs with noise to obscure the contributions of any individual input.

In practice, DP training can be very expensive or even ineffective for very large models. Not only does the computational cost typically increase when requiring privacy guarantees, but the noise also increases proportionally. Given these challenges, there has recently been much interest in developing methods that enable efficient DP training. The goal is to develop simple and practical methods for producing high-quality large-scale private models.

The ImageNet classification benchmark is an effective test bed for this goal because 1) it is a challenging task even in the non-private setting, that requires sufficiently large models to successfully classify large numbers of varied images and 2) it is a public, open-source dataset, which other researchers can access and use for collaboration. With this approach, researchers may simulate a practical situation where a large model is required to train on private data with DP guarantees.

To that end, today we discuss improvements we’ve made in training high-utility, large-scale private models. First, in “Large-Scale Transfer Learning for Differentially Private Image Classification”, we share strong results on the challenging task of image classification on the ImageNet-1k dataset with DP constraints. We show that with a combination of large-scale transfer learning and carefully chosen hyperparameters it is indeed possible to significantly reduce the gap between private and non-private performance even on challenging tasks and high-dimensional models. Then in “Differentially Private Image Classification from Features”, we further show that privately fine-tuning just the last layer of pre-trained model with more advanced optimization algorithms improves the performance even further, leading to new state-of-the-art DP results across a variety of popular image classification benchmarks, including ImageNet-1k. To encourage further development in this direction and enable other researchers to verify our findings, we are also releasing the associated source code.

Transfer learning and differential privacy

The main idea behind transfer learning is to reuse the knowledge gained from solving one problem and then apply it to a related problem. This is especially useful when there is limited or low-quality data available for the target problem as it allows us to leverage the knowledge gained from a larger and more diverse public dataset.

In the context of DP, transfer learning has emerged as a promising technique to improve the accuracy of private models, by leveraging knowledge learned from pre-training tasks. For example, if a model has already been trained on a large public dataset for a similar privacy-sensitive task, it can be fine-tuned on a smaller and more specific dataset for the target DP task. More specifically, one first pre-trains a model on a large dataset with no privacy concerns, and then privately fine-tunes the model on the sensitive dataset. In our work, we improve the effectiveness of DP transfer learning and illustrate it by simulating private training on publicly available datasets, namely ImageNet-1k, CIFAR-100, and CIFAR-10.

Better pre-training improves DP performance

To start exploring how transfer learning can be effective for differentially private image classification tasks, we carefully examined hyperparameters affecting DP performance. Surprisingly, we found that with carefully chosen hyperparameters (e.g., initializing the last layer to zero and choosing large batch sizes), privately fine-tuning just the last layer of a pre-trained model yields significant improvements over the baseline. Training just the last layer also significantly improves the cost-utility ratio of training a high-quality image classification model with DP.

As shown below, we compare the performance on ImageNet of the best hyperparameter recommendations both with and without privacy and across a variety of model and pre-training dataset sizes. We find that scaling the model and using a larger pre-training dataset decreases the gap in accuracy coming from the addition of the privacy guarantee. Typically, privacy guarantees of a system are characterized by a positive parameter ε, with smaller ε corresponding to better privacy. In the following figure, we use the privacy guarantee of ε = 10.

Comparing our best models with and without privacy on ImageNet across model and pre-training dataset sizes. The X-axis shows the different Vision Transformer models we used for this study in ascending order of model size from left to right. We used JFT-300M to pretrain B/16, L/16 and H/14 models, JFT-4B (a larger version of JFT-3B) to pretrain H/14-4b and JFT-3B to pretrain G/14-3b. We do this in order to study the effectiveness of jointly scaling the model and pre-training dataset (JFT-3B or 4B). The Y-axis shows the Top-1 accuracy on ImageNet-1k test set once the model is finetuned (in the private or non-private way) with the ImageNet-1k training set. We consistently see that the scaling of the model and the pre-training dataset size decreases the gap in accuracy coming from the addition of the privacy guarantee of ε = 10.

Better optimizers improve DP performance

Somewhat surprisingly, we found that privately training just the last layer of a pre-trained model provides the best utility with DP. While past studies [1, 2, 3] largely relied on using first-order differentially private training algorithms like DP-SGD for training large models, in the specific case of privately learning just the last layer from features, we observe that computational burden is often low enough to allow for more sophisticated optimization schemes, including second-order methods (e.g., Newton or Quasi-Newton methods), which can be more accurate but also more computationally expensive.

In “Differentially Private Image Classification from Features”, we systematically explore the effect of loss functions and optimization algorithms. We find that while the commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting: least-squares linear regression is much more effective than logistic regression from both a privacy and computational standpoint for typical range of ε values ([1, 10]), and even more effective for stricter epsilon values (ε < 1).

We further explore using DP Newton’s method to solve logistic regression. We find that this is still outperformed by DP linear regression in the high privacy regime. Indeed, Newton’s method involves computing a Hessian (a matrix that captures second-order information), and making this matrix differentially private requires adding far more noise in logistic regression than in linear regression, which has a highly structured Hessian.

Building on this observation, we introduce a method that we call differentially private SGD with feature covariance (DP-FC), where we simply replace the Hessian in logistic regression with privatized feature covariance. Since feature covariance only depends on the inputs (and neither on model parameters nor class labels), we are able to share it across classes and training iterations, thus greatly reducing the amount of noise that needs to be added to protect it. This allows us to combine the benefits of using logistic regression with the efficient privacy protection of linear regression, leading to improved privacy-utility trade-off.

With DP-FC, we surpass previous state-of-the-art results considerably on three private image classification benchmarks, namely ImageNet-1k, CIFAR-10 and CIFAR-100, just by performing DP fine-tuning on features extracted from a powerful pre-trained model.

Comparison of top-1 accuracies (Y-axis) with private fine-tuning using DP-FC method on all three datasets across a range of ε (X-axis). We observe that better pre-training helps even more for lower values of ε (stricter privacy guarantee).

Conclusion

We demonstrate that large-scale pre-training on a public dataset is an effective strategy for obtaining good results when fine-tuned privately. Moreover, scaling both model size and pre-training dataset improves performance of the private model and narrows the quality gap compared to the non-private model. We further provide strategies to effectively use transfer learning for DP. Note that this work has several limitations worth considering — most importantly our approach relies on the availability of a large and trustworthy public dataset, which can be challenging to source and vet. We hope that our work is useful for training large models with meaningful privacy guarantees!

Acknowledgements

In addition to the authors of this blogpost, this research was conducted by Abhradeep Thakurta, Alex Kurakin and Ashok Cutkosky. We are also grateful to the developers of Jax, Flax, and Scenic libraries. Specifically, we would like to thank Mostafa Dehghani for helping us with Scenic and high-performance vision baselines and Lucas Beyer for help with deduping the JFT data. We are also grateful to Li Zhang, Emil Praun, Andreas Terzis, Shuang Song, Pierre Tholoniat, Roxana Geambasu, and Steve Chien for stimulating discussions on differential privacy throughout the project. Additionally, we thank anonymous reviewers, Gautam Kamath and Varun Kanade for helpful feedback throughout the publication process. Finally, we would like to thank John Anderson and Corinna Cortes from Google Research, Borja Balle, Soham De, Sam Smith, Leonard Berrada, and Jamie Hayes from DeepMind for generous feedback.

Read More