Boost your forecast accuracy with time series clustering

Boost your forecast accuracy with time series clustering

Time series are sequences of data points that occur in successive order over some period of time. We often analyze these data points to make better business decisions or gain competitive advantages. An example is Shimamura Music, who used Amazon Forecast to improve shortage rates and increase business efficiency. Another great example is Arneg, who used Forecast to predict maintenance needs.

AWS provides various services catered to time series data that are low code/no code, which both machine learning (ML) and non-ML practitioners can use for building ML solutions. These includes libraries and services like AutoGluon, Amazon SageMaker Canvas, Amazon SageMaker Data Wrangler, Amazon SageMaker Autopilot, and Amazon Forecast.

In this post, we seek to separate a time series dataset into individual clusters that exhibit a higher degree of similarity between its data points and reduce noise. The purpose is to improve accuracy by either training a global model that contains the cluster configuration or have local models specific to each cluster.

We explore how to extract characteristics, also called features, from time series data using the TSFresh library—a Python package for computing a large number of time series characteristics—and perform clustering using the K-Means algorithm implemented in the scikit-learn library.

We use the Time Series Clustering using TSFresh + KMeans notebook, which is available on our GitHub repo. We recommend running this notebook on Amazon SageMaker Studio, a web-based, integrated development environment (IDE) for ML.

Solution overview

Clustering is an unsupervised ML technique that groups items together based on a distance metric. The Euclidean distance is most commonly used for non-sequential datasets. However, because a time series inherently has a sequence (timestamp), the Euclidean distance doesn’t work well when used directly on time series because it’s invariant to time shifts, ignoring the time dimension of data. For a more detailed explanation, refer to Time Series Classification and Clustering with Python. A better distance metric that works directly on time series is Dynamic Time Warping (DTW). For an example of clustering based on this metric, refer to Cluster time series data for use with Amazon Forecast.

In this post, we generate features from the time series dataset using the TSFresh Python library for data extraction. TSFresh is a library that calculates a large number of time series characteristics, which include the standard deviation, quantile, and Fourier entropy, among others. This allows us to remove the time dimensionality of the dataset and apply common techniques that work for data with flattened formats. In addition to TSFresh, we also use StandardScaler, which standardizes features by removing the mean and scaling to unit variance, and Principal component analysis (PCA) to perform dimensionality reduction. Scaling reduces the distance between data points, which in turn promotes stability in the model training process, and dimensionality reduction allows the model to learn from fewer features while retaining the major trends and patterns, thereby enabling more efficient training.

Data loading

For this example, we use the UCI Online Retail II Data Set and perform basic data cleansing and preparation steps as detailed in the Data Cleaning and Preparation notebook.

Feature extraction with TSFresh

Let’s start by using TSFresh to extract features from our time series dataset:

from tsfresh import extract_features
extracted_features = extract_features(
    df_final, 
    column_id="StockCode", 
    column_sort="timestamp")

Note that our data has been converted from a time series to a table comparing StockCode values vs. Feature values.

feature table

Next, we drop all features with n/a values by utilizing the dropna method:

extracted_features_cleaned=extracted_features
extracted_features_cleaned=extracted_features_cleaned.dropna(axis=1)

Then we scale the features using StandardScaler. The values in the extracted features consist of both negative and positive values. Therefore, we use StandardScaler instead of MinMaxScaler:

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
extracted_features_cleaned_std = scaler.fit_transform(extracted_features_cleaned)

We use PCA to do dimensionality reduction:

from sklearn.decomposition import PCA
pca = PCA()
pca.fit(extracted_features_cleaned_std)

And we determine the optimal number of components for PCA:

plt.figure(figsize=(20,10))
plt.grid()
plt.plot(np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')

The explained variance ratio is the percentage of variance attributed to each of the selected components. Typically, you determine the number of components to include in your model by cumulatively adding the explained variance ratio of each component until you reach 0.8–0.9 to avoid overfitting. The optimal value usually occurs at the elbow.

As shown in the following chart, the elbow value is approximately 100. Therefore, we use 100 as the number of components for PCA.

PCA

Clustering with K-Means

Now let’s use K-Means with the Euclidean distance metric for clustering. In the following code snippet, we determine the optimal number of clusters. Adding more clusters decreases the inertia value, but it also decreases the information contained in each cluster. Additionally, more clusters means more local models to maintain. Therefore, we want to have a small cluster size with a relatively low inertia value. The elbow heuristic works well for finding the optimal number of clusters.

from sklearn.cluster import KMeans
wcss = []
for i in range(1,10):
    km = KMeans(n_clusters=i) 
    km.fit(scores_pca)
    wcss.append(km.inertia_)
plt.figure(figsize=(20,10))
plt.grid()
plt.plot(range(1,10),wcss,marker='o',linestyle='--')
plt.xlabel('number of clusters')
plt.ylabel('WCSSS')

The following chart visualizes our findings.

Elbow

Based on this chart, we have decided to use two clusters for K-Means. We made this decision because the within-cluster sum of squares (WCSS) decreases at the highest rate between one and two clusters. It’s important to balance ease of maintenance with model performance and complexity, because although WCSS continues to decrease with more clusters, additional clusters increase the risk of overfitting. Furthermore, slight variations in the dataset can unexpectedly reduce accuracy.

It’s important to note that both clustering methods, K-Means with Euclidian distance (discussed in this post) and K-means algorithm with DTW, have their strengths and weaknesses. The best approach depends on the nature of your data and the forecasting methods you’re using. Therefore, we highly recommend experimenting with both approaches and comparing their performance to gain a more holistic understanding of your data.

Conclusion

In this post, we discussed the powerful techniques of feature extraction and clustering for time series data. Specifically, we showed how to use TSFresh, a popular Python library for feature extraction, to preprocess your time series data and obtain meaningful features.

When the clustering step is complete, you can train multiple Forecast models for each cluster, or use the cluster configuration as a feature. Refer to the Amazon Forecast Developer Guide for information about data ingestion, predictor training, and generating forecasts. If you have item metadata and related time series data, you can also include these as input datasets for training in Forecast. For more information, refer to Start your successful journey with time series forecasting with Amazon Forecast.

References


About the Authors

patrusheAleksandr Patrushev is AI/ML Specialist Solutions Architect at AWS, based in Luxembourg. He is passionate about the cloud and machine learning, and the way they could change the world. Outside work, he enjoys hiking, sports, and spending time with his family.

celimawsChong En Lim is a Solutions Architect at AWS. He is always exploring ways to help customers innovate and improve their workflows. In his free time, he loves watching anime and listening to music.

emiasnikEgor Miasnikov is a Solutions Architect at AWS based in Germany. He is passionate about the digital transformation of our lives, businesses, and the world itself, as well as the role of artificial intelligence in this transformation. Outside of work, he enjoys reading adventure books, hiking, and spending time with his family.

Read More

Celebrating Google Summer of Code Responsible AI Projects

Celebrating Google Summer of Code Responsible AI Projects

Posted by Bhaktipriya Radharapu, Software Engineer, Google Research

One of the key goals of Responsible AI is to develop software ethically and in a way that is responsive to the needs of society and takes into account the diverse viewpoints of users. Open source software helps address this by providing a way for a wide range of stakeholders to contribute.

To continue making Responsible AI development more inclusive and transparent, and in line with our AI Principles, Google’s Responsible AI team partnered with Google Summer of Code (GSoC), to provide students and professionals with the opportunity to contribute to open source projects that promote Responsible AI resources and practices. GSoC is a global, online program focused on bringing new contributors into open source software development. GSoC contributors work with an open source organization on a 12+ week programming project under the guidance of mentors. By bringing in new contributors and ideas, we saw that GSoC helped to foster a more innovative and creative environment for Responsible AI development.

This was also the first time several of Google’s Responsible AI tools, such as The Learning Interpretability Tool (LIT), TensorFlow Model Remediation and Data Cards Playbook, pulled in contributions from third-party developers across the globe, bringing in diverse and new developers to join us in our journey for building Responsible AI for all.

We’re happy to share the work completed by GSoC participants and share what they learned about working with state-of-the-art fairness and interpretability techniques, what we learned as mentors, and how rewarding summer of code was for each of us, and for the Responsible AI community.

We had the opportunity to mentor four developers – Aryan Chaurasia, Taylor Lee, Anjishnu Mukherjee, Chris Schmitz. Aryan successfully implemented XAI tutorials for LIT under the mentorship of Ryan Mullins, software engineer at Google. These showcase how LIT can be used to evaluate the performance of (multi-lingual) question-answering models, and understand behavioral patterns in text-to-image generation models.

Anjishnu implemented Tutorials for LIT also under the mentorship of Ryan Mullins. Anjishnu’s work influenced in-review research assessing professionals’ interpretability practices in production settings.

Chris, under the technical guidance of Jenny Hamer, a software engineer at Google, created two tutorials for TensorFlow Model Remediations’ experimental technique, Fair Data Reweighting. The tutorials help developers apply a fairness-enforcing data reweighting algorithm, a pre-processing bias remediation technique that is model architecture agnostic.

Finally, Taylor, under the guidance of Mahima Pushkarna, a senior UX designer at Google Research, and Andrew Zaldivar, a Responsible AI Developer Advocate at Google, designed the information architecture and user experience for activities from the Data Cards Playbook. This project translated a manual calculator that helps groups assess the reader-centricity of their Data Card templates into virtual experiences to foster rich discussion.

The participants learned a lot about working with state-of-the-art fairness and interpretability techniques. They also learned about the challenges of developing Responsible AI systems, and about the importance of considering the social implications of their work. What is also unique about GSOC is that this wasn’t just code and development – mentees were exposed to the code-adjacent work such as design and technical writing skills that are essential for the success of software projects and critical for cutting-edge Responsible AI projects; giving them a 360º view into the lifecycle of Responsible AI projects.

The program was open to participants from all over the world, and saw participation from 14 countries. We set-up several community channels for participants and professionals to discuss Responsible AI topics and Google’s Responsible AI tools and offerings which organically grew to 300+ members. The community engaged in various hands-on starter projects for GSoC in the areas of fairness, interpretibility and transparency, and were guided by a team of 8 Google Research mentors and organizers.

We were able to underscore the importance of community and collaboration in open source software development, especially in a field like Responsible AI, which thrives on transparent, inclusive development. Overall, the Google Summer of Code program has been a valuable tool for democratizing the responsible development of AI technologies. By providing a platform for mentorship, and innovation, GSoC has helped us improve the quality of open source software and to guide developers with tools and techniques to build AI in a safe and responsible way.

We’d like to say a heartfelt thank you to all the participants, mentors, and organizers who made Summer of Code a success. We’re excited to see how our developer community continues to work on the future of Responsible AI, together.

We encourage you to check out Google’s Responsible AI toolkit and share what you have built with us by tagging #TFResponsibleAI on your social media posts, or share your work for the community spotlight program.

If you’re interested in participating in the Summer of Code with TensorFlow in 2023, you can find more information about our organization and suggested projects here.

Acknowledgements:

Mentors and Organizers:

Andrew Zaldivar, Mahima Pushkarna, Ryan Mullins, Jenny Hamer, Pranjal Awasthi, Tesh Goyal, Parker Barnes,Bhaktipriya Radharapu

Sponsors and champions:

Special thanks to Shivani Poddar, Amy Wang, Piyush Kumar, Donald Gonzalez, Nikhil Thorat, Daniel Smilkov, James Wexler, Stephanie Taylor, Thea Lamkin, Philip Nelson, Christina Greer, Kathy Meier-Hellstern and Marian Croak for enabling this work.

Read More

NVIDIA Honors Partners Helping Industries Harness AI to Transform Business

NVIDIA Honors Partners Helping Industries Harness AI to Transform Business

NVIDIA today recognized a dozen partners in the Americas for their work enabling customers to build and deploy AI applications across a broad range of industries.

NVIDIA Partner Network (NPN) Americas Partner of the Year awards were given out to companies in 13 categories covering AI, consulting, distribution, education, healthcare, integration, networking, the public sector, rising star, service delivery, software and the Canadian market.  A new award category created this year recognizes growing AI adoption in retail, as leaders begin to introduce new AI-powered services addressing customer service, loss prevention and restocking analytics.

“NVIDIA’s commitment to driving innovation in AI has created new opportunities for partners to help customers leverage cutting-edge technology to reduce costs, grow opportunities and solve business challenges,” said Rob Enderle, president and principal analyst at the Enderle Group. “The winners of the 2023 NPN awards reflect a diverse group of AI business experts that have showcased deep knowledge in delivering transformative solutions to customers across a range of industries.”

The 2023 NPN award winners for the Americas are:

  • Arrow ElectronicsDistribution Partner of the Year. Recognized for providing end-to-end NVIDIA AI technologies across a variety of industries, such as manufacturing, retail, healthcare and robotics, to help organizations drive accelerated computing and robotics strategies via on-prem, hybrid cloud and intelligent edge solutions, and through Arrow’s Autonomous Machines Center of Excellence.
  • Cambridge ComputerHigher Education Partner of the Year. Recognized for the third consecutive year for its continued focus on providing NVIDIA AI solutions to the education, life sciences and research computing sectors.
  • CDW Software Partner of the Year. Recognized for deploying NVIDIA AI and visualization solutions to customers from a broad range of industries and adopting deep industry expertise for end-to-end customer support.
  • CDW CanadaCanadian Partner of the Year. Recognized for providing IT solutions that enable the nation’s leading vendors to offer customized solutions with NVIDIA technology, meeting the needs of each client.
  • Deloitte Consulting Partner of the Year. Recognized for the third consecutive year for creating new AI markets for clients by expanding AI investments in solutions developed with NVIDIA across enterprise AI, as well as expanding into new offerings with generative AI and NVIDIA DGX Cloud.
  • FedData Technology SolutionsRising Star Partner of the Year. Recognized for NVIDIA DGX-based design wins with key federal customers and emerging work with the NVIDIA Omniverse platform for building and operating metaverse applications.
  • InsightRetail Partner of the Year. Recognized for its deep understanding of the industry, ecosystem partnerships and the ability to orchestrate best-in-class solutions to bring real-time speed and predictability to retailers, enabling intelligent stores, intelligent quick-service restaurants, intelligent supply chain and omni-channel management.
  • LambdaSolution Integration Partner of the Year. Recognized for the third consecutive year for its commitment to providing end-to-end NVIDIA solutions, both on premises and in the cloud, across industries including higher education and research, the federal and public sectors, and healthcare and life sciences.
  • Mark IIIHealthcare Partner of the Year. Recognized for its unique team and deep understanding of the NVIDIA portfolio, which provides academic medical centers, research institutions, healthcare systems and life sciences organizations with NVIDIA infrastructure, software and cloud technologies to build out AI, HPC and simulation Centers of Excellence.
  • Microway Public Sector Partner of the Year. Recognized for its technical depth and engineering focus on servicing the public sector using technologies across the NVIDIA portfolio, including high performance computing and other specializations.
  • Quantiphi Service Delivery Partner of the Year. Recognized for the second consecutive year for its commitment to driving adoption of NVIDIA products in areas like generative AI services with customized large language models, digital avatars, edge computing, medical imaging and data science, as well as its expertise in helping customers build and deploy AI solutions at scale.
  • World Wide TechnologyAI Solution Provider of the  Year. Recognized for its leadership in driving adoption of the NVIDIA portfolio of AI and accelerated computing solutions, as well as its continued investments in AI infrastructure for large language models, computer vision, Omniverse-based digital twins, and customer testing and labs in the WWT Advanced Technology Center.
  • World Wide Technology Networking Partner of the Year. Recognized for its expertise driving NVIDIA high-performance networking solutions to support accelerated computing environments across multiple industries and AI solutions.

This year’s awards arrive as AI adoption is rapidly expanding across industries, unlocking new opportunities and accelerating discovery in healthcare, finance, business services and more. As AI models become more complex, the 2023 NPN Award winners are expert partners that can help enterprises develop and deploy AI in production using the infrastructure that best aligns with their operations.

Learn how to join the NPN, or find your local NPN partner.

Read More

Video Editor Patrick Stirling Invents Custom Effect for DaVinci Resolve Software

Video Editor Patrick Stirling Invents Custom Effect for DaVinci Resolve Software

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

AI-powered technology in creative apps, once considered nice to have, is quickly becoming essential for aspiring and experienced content creators.

Video editor Patrick Stirling used the Magic Mask feature in Blackmagic Design’s DaVinci Resolve software to create a custom effect that creates textured animations of people, this week In the NVIDIA Studio.

“I wanted to use ‘Magic Mask’ to replace subjects with textured, simplified, cut-out versions of themselves,” said the artist. “This style is reminiscent of construction-paper creations that viewers might have played with in childhood, keeping the energy of a scene while also pulling attention away from any specific features of the subject.”

Stirling’s effect creates textured, animated characters.

Stirling’s original attempts to implement this effect were cut short due to the limitations of his six-year-old system. So Stirling built his first custom PC — equipped with a GeForce RTX 4080 GPU — to tackle the challenge. The difference was night and day, he said.

Stirling’s effect on full display in DaVinci Resolve.

“I was able to find and maintain a creative flow so much more easily when I didn’t feel like I was constantly running into a wall and waiting for my system to catch up,” said Stirling.

“While the raw power of RTX GPUs is incredible, the work NVIDIA does to improve working in DaVinci Resolve, specifically, is really impressive. It’s extremely reassuring to know that I have the power to build complex effects.” — Patrick Stirling

The AI-powered Magic Mask feature, which allows quick selection of objects and people in a scene, was accelerated by his RTX 4080 GPU, delivering up to a 2x increase in AI performance over the previous generation. “The GPU also provides the power the DaVinci Neural Engine needs for some of these really cool effects,” said Stirling.

Stirling opened a short clip within the RTX GPU-accelerated Fusion page in DaVinci Resolve, a node-based workflow with hundreds of 2D and 3D tools. Nodes are popular as they make video editing a completely procedural process — allowing for non-linear, non-destructive workflows.

He viewed edits in real time using two windows opened side by side, with original footage on the left and node modifications on the right.

Original footage and node-based modifications, side by side.

Stirling then drew blue lines to apply Magic Mask to each surface on the subject that he wanted to layer. As its name suggests, Magic Mask works like magic, but it’s not perfect. When the effect masked more than the extended jacket layer, Stirling drew a secondary red line to designate what not to capture in that area.

The suit-jacket layer is masked as intended.

He applied similar techniques to the dress shirt, hands, beard, hair and facial skin. The artist then added generic colored backgrounds with Background nodes on each layer to complete his 2D character.

Textures provide contrast to the scene.

Stirling used Merge nodes to combine background and foreground images. He deployed the Fast Noise node to create two types of textures for the 2D man and the real-life footage, providing more contrast for the visual.

Organizing nodes is important to this creative workflow.

Stirling then added a color corrector to tweak saturation, his RTX GPU accelerating the process. He completed his video editing by combining the Magic Mask effect and all remaining nodes — Background, Merge and Fast Noise.

“DaVinci Resolve and the GeForce RTX 4080 feel like a perfect fit,” said Stirling.

When it’s time to wrap up the project, Stirling can deploy the RTX 4080 GPU’s dual AV1 video encoders — which would cut export times in half.

Stirling encourages aspiring content creators to “stay curious” and “not ignore the value of connecting with other creative people.”

“Regularly being around people doing the same kind of work as you will constantly expose new methods and approaches for your own creative projects,” he said.

Video editor Patrick Stirling.

Check out Stirling’s YouTube channel for DaVinci Resolve tutorials.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter.

Read More

Generate a counterfactual analysis of corn response to nitrogen with Amazon SageMaker JumpStart solutions

Generate a counterfactual analysis of corn response to nitrogen with Amazon SageMaker JumpStart solutions

In his book The Book of Why, Judea Pearl advocates for teaching cause and effect principles to machines in order to enhance their intelligence. The accomplishments of deep learning are essentially just a type of curve fitting, whereas causality could be used to uncover interactions between the systems of the world under various constraints without testing hypotheses directly. This could provide answers that lead us to AGI (artificial generalized intelligence).

This solution proposes a causal inference framework using Bayesian networks to represent causal dependencies and draw causal conclusions based on observed satellite imagery and experimental trial data in the form of simulated weather and soil conditions. The case study is the causal relationship between nitrogen-based fertilizer application and the corn yields.

The satellite imagery is processed using purpose-built Amazon SageMaker geospatial capabilities and enriched with custom-built Amazon SageMaker Processing operations. The causal inference engine is deployed with Amazon SageMaker Asynchronous Inference.

In this post, we demonstrate how to create this counterfactual analysis using Amazon SageMaker JumpStart solutions.

Solution overview

The following diagram shows the architecture for the end-to-end workflow.

Prerequisites

You need an AWS account to use this solution.

To run this JumpStart 1P Solution and have the infrastructure deployed to your AWS account, you need to create an active Amazon SageMaker Studio instance (refer to Onboard to Amazon SageMaker Domain). When your Studio instance is ready, follow the instructions in SageMaker JumpStart to launch the Crop Yield Counterfactuals solution.

Note that this solution is currently available in the US West (Oregon) Region only.

Causal inference

Causality is all about understanding change, but how to formalize this in statistics and machine learning (ML) is not a trivial exercise.

In this crop yield study, the nitrogen added as fertilizer and the yield outcomes might be confounded. Similarly, the nitrogen added as a fertilizer and the nitrogen leaching outcomes could be confounded as well, in the sense that a common cause can explain their association. However, association is not causation. If we know which observed factors confound the association, we account for them, but what if there are other hidden variables responsible for confounding? Reducing the amount of fertilizer won’t necessarily reduce residual nitrogen; similarly, it might not drastically diminish the yield, whereas the soil and climatic conditions could be the observed factors that confound the association. How to handle confounding is the central problem of causal inference. A technique introduced by R. A. Fisher called randomized controlled trial aims to break possible confounding.

However, in the absence of randomized control trials, there is a need for causal inference purely from observational data. There are ways to connect the causal questions to data in observational studies by writing the causal graphical model on what we postulate as how things happen. This involves claiming the corresponding traverses will capture the corresponding dependencies, while satisfying the graphical criterion for conditional ignorability (to what extent we can treat causation as association based on the causal assumptions). After we have postulated the structure, we can use the implied invariances to learn from observational data and plug in causal questions, inferring causal claims without randomized control trials.

This solution uses both data from simulated randomized control trials (RCTs) as well as observational data from satellite imagery. A series of simulations conducted over thousands of fields and multiple years in Illinois (United States) are used to study the corn response to increasing nitrogen rates for a broad combination of weather and soil variation seen in the region. It addresses the limitation of using trial data limited in the number of soils and years it can explore by using crop simulations of various farming scenarios and geographies. The database was calibrated and validated using data from more than 400 trials in the region. Initial nitrogen concentration in the soil was set randomly among a reasonable range.

Additionally, the database is enhanced with observations from satellite imagery, whereas zonal statistics are derived from spectral indices in order to represent spatio-temporal changes in vegetation, seen across geographies and phenological phases.

Causal inference with Bayesian networks

Structural causal models (SCMs) use graphical models to represent causal dependencies by incorporating both data-driven and human inputs. A particular type of structure causal model called Bayesian networks is proposed to model the crop phenology dynamics using probabilistic expressions by representing variables as nodes and relationships between variables as edges. Nodes are indicators of crop growth, soil and weather conditions, and the edges between them represent spatio-temporal causal relationships. The parent nodes are field-related parameters (including the day of sowing and area planted), and the child nodes are yield, nitrogen uptake, and nitrogen leaching metrics.

For more information, refer to the database characterization and the guide for identifying the corn growth stages.

A few steps are required to build a Bayesian networks model (with CausalNex) before we can use it for counterfactual and interventional analysis. The structure of the causal model is initially learned from data, whereas subject matter expertise (trusted literature or empirical beliefs) is used to postulate additional dependencies and independencies between random variables and intervention variables, as well as asserting the structure is causal.

Using NO TEARS, a continuous optimization algorithm for structure learning, the graph structure describing conditional dependencies between variables is learned from data, with a set of constraints imposed on edges, parent nodes, and child nodes that are not allowed in the causal model. This preserves the temporal dependencies between variables. See the following code:

"""
tabu_edges: Imposing edges that are not allowed in the causal model
tabu_parents: Imposing parent nodes that are not allowed in the causal model
tabu_child: Imposing child nodes that are not allowed in the causal model
"""
from causalnex.structure.notears import from_pandas

g_learned = from_pandas(
    X,
    tabu_edges=tabu_edges,
    tabu_parent_nodes=tabu_parents,
    tabu_child_nodes=tabu_child,
    max_iter=100,
)

The next step encodes domain knowledge in models and captures phenology dynamics, while avoiding spurious relationships. Multicollinearity analysis, variation inflation factor analysis, and global feature importance using SHAP analysis are conducted to extract insights and constraints on water stress variables (expansion, phenology, and photosynthesis around flowering), weather and soil variables, spectral indices, and the nitrogen-based indicators:

"""
edges: Modifying the structure by imposing constraints on edges
"""
from causalnex.structure import StructureModel

g = StructureModel()
g.add_edges_from(
                edges,
                origin="expert"
                )

Bayesian networks in CausalNex support only discrete distributions. Any continuous features, or features with a large number of categories, are discretized prior to fitting the Bayesian network:

from causalnex.discretiser.discretiser_strategy import (
    DecisionTreeSupervisedDiscretiserMethod,
    MDLPSupervisedDiscretiserMethod
)

discretiser = DecisionTreeSupervisedDiscretiserMethod(
    mode="single", 
    tree_params={"max_depth": 2, "random_state": 2022},
)
discretiser.fit(
    feat_names=features, 
    dataframe=df, 
    target_continuous=True,
    target=target,
)

After the structure is reviewed, the conditional probability distribution of each variable given its parents can be learned from data, in a step called likelihood estimation:

from causalnex.network import BayesianNetwork

bn = BayesianNetwork(g)
bn = bn.fit_node_states(discretised_data)
bn = bn.fit_cpds(
    train,
    method="BayesianEstimator",
    bayes_prior="K2",
)

Finally, the structure and likelihoods are used to perform observational inference on the fly, following a deterministic Junction Tree algorithm (JTA), and making interventions using do-calculus. SageMaker Asynchronous Inference allows queuing incoming requests and processes them asynchronously. This option is ideal for both observational and counterfactual inference scenarios, where the process can’t be parallelized, thereby taking significant time to update the probabilities throughout the network, although multiple queries can be run in parallel. See the following code:

"""
Query the marginal likelihood of states in the graph given some observations. 
These observations can be made anywhere in the network, 
and their impact will be propagated through to the node of interest.
"""
from causalnex.inference import InferenceEngine

ie = InferenceEngine(bn)

pseudo_observation = [{"day_sow":0}, {"day_sow":1}, {"day_sow":2}]
marginals_multi = ie.query(
    pseudo_observation,
    parallel=True,
    num_cores=multiprocessing.cpu_count(),
)
# distribution before intervention
marginals_before = ie.query()["Y_corn"]

# updating a node distribution
ie.do_intervention("N_fert", 0)

# effect of do on marginals
marginals_after = ie.query()["Y_corn"]

# Resetting the node distribution
ie.reset_do("N_fert")

For further details, refer to the inference script.

The causal model notebook is a step-by-step guide on running the preceding steps.

Geospatial data processing

Earth Observation Jobs (EOJs) are chained together to acquire and transform satellite imagery, whereas purpose-built operations and pre-trained models are used for cloud removal, mosaicking, band math operations, and resampling. In this section, we discuss in more detail the geospatial processing steps.

Area of interest

In the following figure, green polygons are the selected counties, the orange grid is the database map (a grid of 10 x 10 km cells where trials are conducted in the region), and the grid of grayscale squares is the 100 km x 100 km Sentinel-2 UTM tiling grid.

Spatial files are used to map the simulated database with corresponding satellite imagery, overlaying polygons of 10 km x 10 km cells that divide the state of Illinois (where trials are conducted in the region), counties polygons, and 100 km x 100 km Sentinel-2 UTM tiles. To optimize the geospatial data processing pipeline, a few nearby Sentinel-2 tiles are first selected. Next, the aggregated geometries of tiles and cells are overlayed in order to obtain the region of interest (RoI). The counties and the cell IDs that are fully observed within the RoI are selected to form the polygon geometry passed onto the EOJs.

Time range

For this exercise, the corn phenology cycle is divided into three stages: the vegetative stages v5 to R1 (emergence, leaf collars, and tasseling), the reproductive stages R1 to R4 (silking, blister, milk, and dough) and the reproductive stages R5 (dented) and R6 (physiological maturity). Consecutive satellite visits are acquired for each phenology stage within a time range of 2 weeks and a predefined area of interest (selected counties), enabling spatial and temporal analysis of satellite imagery. The following figure illustrates these metrics.

Cloud removal

Cloud removal for Sentinel-2 data uses an ML-based semantic segmentation model to identify clouds in the image, where cloudy pixels are replaced by with value -9999 (nodata value):

request_polygon_coordinates = [[(-90.571754, 39.839326), (-90.893651, 39.84092), (-90.916609, 39.845075), (-90.916071, 39.757168), (-91.147678, 39.75707), (-91.265848, 39.757258), (-91.365125, 39.758723), (-91.367962, 39.759124), (-91.365396, 39.777266), (-91.432919, 39.840554), (-91.446385, 39.870394), (-91.455887, 39.945538), (-91.460287, 39.980333), (-91.494865, 40.037421), (-91.510322, 40.127994), (-91.512974, 40.181062), (-91.510332, 40.201142), (-91.258828, 40.197299), (-90.911969, 40.193088), (-90.909756, 40.284394), (-90.450227, 40.276335), (-90.451502, 40.188892), (-90.199556, 40.183945), (-90.118966, 40.235263), (-90.033026, 40.377806), (-89.92468, 40.435921), (-89.717104, 40.435655), (-89.714927, 40.319218), (-89.602979, 40.320129), (-89.601604, 40.122432), (-89.578289, 39.976127), (-89.698259, 39.975309), (-89.701864, 39.916787), (-89.994506, 39.901925), (-89.994405, 39.87286), (-90.583534, 39.87675), (-90.582435, 39.854574), (-90.571754, 39.839326)]]
start_time = '2018-08-15T00:00:00Z'
end_time = '2018-09-15T00:00:00Z'

eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": 'arn:aws:sagemaker-geospatial:us-west-2:378778860802:raster-data-collection/public/nmqj48dcu3g7ayw8',
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {"Coordinates": request_polygon_coordinates}
            }
        },
        "TimeRangeFilter": {"StartTime": start_time, "EndTime": end_time},
        "PropertyFilters": {
            "Properties": [{"Property": {"EoCloudCover": 
            {"LowerBound": 0, "UpperBound": 10}}}],
            "LogicalOperator": "AND",
        },
    }
}

eoj_config = {
    "JobConfig": {
        "CloudRemovalConfig": {
            "AlgorithmName": "INTERPOLATION",
            "InterpolationValue": "-9999",
            "TargetBands": ["red", "green", "blue", "nir", "swir16"],
        },
    }
}

eojParams = {
    "Name": "cloudremoval",
    "InputConfig": eoj_input_config,
    **eoj_config,
    "ExecutionRoleArn": role_arn,
}

eoj_response = sg_client.start_earth_observation_job(**eojParams)

After the EOJ is created, the ARN is returned and used to perform the subsequent geomosaic operation.

To get the status of a job, you can run sg_client.get_earth_observation_job(Arn = response['Arn']).

Geomosaic

The geomosaic EOJ is used to merge images from multiple satellite visits into a large mosaic, by overwriting nodata or transparent pixels (including the cloudy pixels) with pixels from other timestamps:

eoj_config = {"JobConfig": {"GeoMosaicConfig": {"AlgorithmName": "NEAR"}}}

eojParams = {
    "Name": "geomosaic",
    "InputConfig": {"PreviousEarthObservationJobArn": eoj_arn},
    **eoj_config,
    "ExecutionRoleArn": role_arn,
}

eoj_response = sg_client.start_earth_observation_job(**eojParams)

After the EOJ is created, the ARN is returned and used to perform the subsequent resampling operation.

Resampling

Resampling is used to downscale the resolution of the geospatial image in order to match the resolution of the crop masks (10–30 m resolution rescaling):

eoj_config = {
    "JobConfig": {
        "ResamplingConfig": {
            "OutputResolution": {"UserDefined": {"Value": 30, "Unit": "METERS"}},
            "AlgorithmName": "NEAR",
        },
    }
}

eojParams = {
    "Name": "resample",
    "InputConfig": {"PreviousEarthObservationJobArn": eoj_arn},
    **eoj_config,
    "ExecutionRoleArn": role_arn,
}

eoj_response = sg_client.start_earth_observation_job(**eojParams)

After the EOJ is created, the ARN is returned and used to perform the subsequent band math operation.

Band math

Band math operations are used for transforming the observations from multiple spectral bands to a single band. It includes the following spectral indices:

  • EVI2 – Two-Band Enhanced Vegetation Index
  • GDVI – Generalized Difference Vegetation Index
  • NDMI – Normalized Difference Moisture Index
  • NDVI – Normalized Difference Vegetation Index
  • NDWI – Normalized Difference Water Index

See the following code:

spectral_indices = [['EVI2', ' 2.5 * ( nir - red ) / ( nir + 2.4 * red + 1.0 ) '],
 ['GDVI', ' ( ( nir * * 2.0 ) - ( red * * 2.0 ) ) / ( ( nir * * 2.0 ) + ( red * * 2.0 ) ) '],
 ['NDMI', ' ( nir - swir16 ) / ( nir + swir16 ) '],
 ['NDVI', ' ( nir - red ) / ( nir + red ) '],
 ['NDWI', ' ( green - nir ) / ( green + nir ) ']]

eoj_config = {
    "JobConfig": {
        "BandMathConfig": {"CustomIndices": {"Operations": []}},
    }
}

for indices in spectral_indices:
    eoj_config["JobConfig"]["BandMathConfig"]["CustomIndices"]["Operations"].append(
        {"Name": indices[0], "Equation": indices[1][1:-1]}
    )

eojParams = {
    "Name": "bandmath",
    "InputConfig": {"PreviousEarthObservationJobArn": eoj_arn},
    **eoj_config,
    "ExecutionRoleArn": role_arn,
}

eoj_response = sg_client.start_earth_observation_job(**eojParams)

Zonal statistics

The spectral indices are further enriched using Amazon SageMaker Processing, where GDAL-based custom logic is used to do the following:

  • Merge the spectral indices into a single multi-channel mosaic
  • Reproject the mosaic to the crop mask‘s projection
  • Apply the crop mask and reproject the mosaic to the cells polygons’s CRC
  • Calculate zonal statistics for selected polygons (10 km x 10 km cells)

With parallelized data distribution, manifest files (for each crop phenological stage) are distributed across several instances using the ShardedByS3Key S3 data distribution type. For further details, refer to the feature extraction script.

The geospatial processing notebook is a step-by-step guide on running the preceding steps.

The following figure shows RGB channels of consecutive satellite visits representing the vegetative and reproductive stages of the corn phenology cycle, with (right) and without (left) crop masks (CW 20, 26 and 33, 2018 Central Illinois).

In the following figure, spectral indices (NDVI, EVI2, NDMI) of consecutive satellite visits represent the vegetative and reproductive stages of the corn phenology cycle (CW 20, 26 and 33, 2018 Central Illinois).

Clean up

If you no longer want to use this solution, you can delete the resources it created. After the solution is deployed in Studio, choose Delete all resources to automatically delete all standard resources that were created when launching the solution, including the S3 bucket.

Conclusion

This solution provides a blueprint for use cases where causal inference with Bayesian networks are the preferred methodology for answering causal questions from a combination of data and human inputs. The workflow includes an efficient implementation of the inference engine, which queues incoming queries and interventions and processes them asynchronously. The modular aspect enables the reuse of various components, including geospatial processing with purpose-built operations and pre-trained models, enrichment of satellite imagery with custom-built GDAL operations, and multimodal feature engineering (spectral indices and tabular data).

In addition, you can use this solution as a template for building gridded crop models where nitrogen fertilizer management and environmental policy analysis are conducted.

For more information, refer to Solution Templates and follow the guide to launch the Crop Yield Counterfactuals solution in the US West (Oregon) Region. The code is available in the GitHub repo.

Citations

German Mandrini, Sotirios V. Archontoulis, Cameron M. Pittelkow, Taro Mieno, Nicolas F. Martin,
Simulated dataset of corn response to nitrogen over thousands of fields and multiple years in Illinois,
Data in Brief, Volume 40, 2022, 107753, ISSN 2352-3409

Useful resources


About the Authors

Paul Barna is a Senior Data Scientist with the Machine Learning Prototyping Labs at AWS.

Read More

Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart

Zero-shot prompting for the Flan-T5 foundation model in Amazon SageMaker JumpStart

The size and complexity of large language models (LLMs) have exploded in the last few years. LLMs have demonstrated remarkable capabilities in learning the semantics of natural language and producing human-like responses. Many recent LLMs are fine-tuned with a powerful technique called instruction tuning, which helps the model perform new tasks or generate responses to novel prompts without prompt-specific fine-tuning. An instruction-tuned model uses its understanding of related tasks or concepts to generate predictions to novel prompts. Because this technique doesn’t involve updating model weights, it avoids the time-consuming and computationally expensive process required to fine-tune a model for a new, previously unseen task.

In this post, we show how you can access and deploy an instruction-tuned Flan T5 model from Amazon SageMaker Jumpstart. We also demonstrate how you can engineer prompts for Flan-T5 models to perform various natural language processing (NLP) tasks. Furthermore, these tasks can be performed with zero-shot learning, where a well-engineered prompt can guide the model towards desired results. For example, consider providing a multiple-choice question and asking the model to return the appropriate answer from the available choices. We cover prompts for the following NLP tasks:

  • Text summarization
  • Common sense reasoning
  • Question answering
  • Sentiment classification
  • Translation
  • Pronoun resolution
  • Text generation based on article
  • Imaginary article based on title

Code for all the steps in this demo is available in the following notebook.

JumpStart is the machine learning (ML) hub of Amazon SageMaker that offers a one-click access to over 350 built-in algorithms; pre-trained models from TensorFlow, PyTorch, Hugging Face, and MXNet; and pre-built solution templates. JumpStart also provides pre-trained foundation models like Stability AI’s Stable Diffusion text-to-image model, BLOOM, Cohere’s Generate, Amazon’s AlexaTM and more.

Instruction tuning

Instruction tuning is a technique that involves fine-tuning a language model on a collection of NLP tasks using instructions. In this technique, the model is trained to perform tasks by following textual instructions instead of specific datasets for each task. The model is fine-tuned with a set of input and output examples for each task, allowing the model to generalize to new tasks that it hasn’t been explicitly trained on as long as prompts are provided for the tasks. Instruction tuning helps improve the accuracy and effectiveness of models and is helpful in situations where large datasets aren’t available for specific tasks.

A myriad of instruction tuning research has been performed since 2020, producing a collection of various tasks, templates, and methods. One of the most prominent instruction tuning methods, Finetuning language models (Flan), aggregates these publicly available collections into a Flan Collection to produce fine-tuned models on a wide variety of instructions. In this way, the multi-task Flan models are competitive with the same models independently fine-tuned on each specific task and can generalize beyond the specific instructions seen during training to following instructions in general.

Zero-shot learning

Zero-shot learning in NLP allows a pre-trained LLM to generate responses to tasks that it hasn’t been specifically trained for. In this technique, the model is provided with an input text and a prompt that describes the expected output from the model in natural language. The pre-trained models can use its knowledge to generate coherent and relevant responses even for prompts it hasn’t specifically been trained on. Zero-shot learning can reduce the time and data required while improving efficiency and accuracy of NLP tasks. Zero-shot learning is used in a variety of NLP tasks, such as question answering, summarization, and text generation.

Few-shot learning involves training a model to perform new tasks by providing only a few examples. This is useful where limited labeled data is available for training. Although this post primarily focuses on zero-shot learning, the referenced models are also capable of generating responses to few-shot learning prompts.

Flan-T5 model

A popular encoder-decoder model known as T5 (Text-to-Text Transfer Transformer) is one such model that was subsequently fine-tuned via the Flan method to produce the Flan-T5 family of models. Flan-T5 is an instruction-tuned model and therefore is capable of performing various zero-shot NLP tasks, as well as few-shot in-context learning tasks. With appropriate prompting, it can perform zero-shot NLP tasks such as text summarization, common sense reasoning, natural language inference, question answering, sentence and sentiment classification, translation, and pronoun resolution. The examples provided in this post are generated with the Flan-T5 family.

JumpStart provides convenient deployment of this model family through Amazon SageMaker Studio and the SageMaker SDK. This includes Flan-T5 Small, Flan-T5 Base, Flan-T5 Large, Flan-T5 XL, and Flan-T5 XXL. Furthermore, JumpStart provides three versions of Flan-T5 XXL at different levels of quantization:

  • Flan-T5 XXL – The full model, loaded in single-precision floating-point format (FP32).
  • Flan-T5 XXL FP16 – A half-precision floating-point format (FP16) version of the full model. This implementation consumes less GPU memory and performs faster inference than the FP32 version.
  • Flan-T5 XXL BNB INT8 – An 8-bit quantized version of the full model, loaded onto the GPU context using the accelerate and bitsandbytes libraries. This implementation provides accessibility to this LLM on instances with less compute, such as a single-GPU ml.g5.xlarge instance.

Prompt engineering for zero-shot NLP tasks on Flan-T5 models

Prompt engineering deals with creating high-quality prompts to guide the model towards the desired responses. Prompts need to be designed based on the specific task and dataset being used. The goal here is to provide the model with necessary information to generate high-quality responses while minimizing noise. This could involve keywords, additional contexts, questions, and more. For example, see the following code:

Input with Prompt: Translate this English sentence to Spanish: Cat loves chicken pizza
Model Output: Gato ama la pizza de pollo

A well-designed prompt can make the model more creative and generalized so that it can easily adapt to new tasks. Prompts can also help incorporate domain knowledge on specific tasks and improve interpretability. Prompt engineering can greatly improve the performance of zero-shot and few-shot learning models. Creating high-quality prompts requires careful consideration of the task at hand, as well as a deep understanding of the model’s strengths and limitations.

In the provided example notebook, each task demonstrates at least seven prompt templates and a comprehensive set of parameters to control the model output, such as maximum sequence length, number of return sequences, and number of beams. In addition, the prompt templates used are from the Flan T5 GitHub repository, which consists of many templates used within the Flan Collection. This collection of templates is helpful to explore when you perform your own prompt engineering.

In the following table, the Flan-T5 XXL model is used to generate responses for various zero-shot NLP tasks. The first column shows the task, the second column contains the prompt provided to the model (where the template text is bold and the non-bold text is the example input), and the third column is the response from the model when queried against the prompt.

Take the summarization task for an example: to create a model prompt, you can concatenate the template Briefly summarize this paragraph: with the text example you want to summarize. All tasks in this table used the same payload parameters: max_length=150 to provide an upper limit on the number of response tokens, no_repeat_ngram_size=5 to discourage n-gram repetition, and do_sample=False to disable sampling for repeatability. We discuss available payload parameter options when querying the endpoint in more detail later.

Task Prompt (template in bold) Model output
Summarization Briefly summarize this paragraph: Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend’s Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages.
Understand the capabilities of Amazon Comprehend
Common sense reasoning or natural language reasoning The world cup has kicked off in Los Angeles, United States.nnBased on the paragraph above can we conclude that ”The world cup takes place in United States.”?nn[”yes”, ”no”] yes
question answering

Answer based on context:nnThe newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more.

For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.

Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.

The Kindle Scribe makes it easy to read and write like you would on paper

The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.

It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.

The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing.nnwhat are the key features of new Kindle?

10.2-inch, glare-free screen
Sentence or sentiment classification Review:nThis moive is so great and once again dazzles and delights usnIs this movie review sentence negative or positive?nOPTIONS:n-positive n-negative positive
Translation My name is ArthurnnTranslate to German Mein Name ist Arthur
Pronoun resolution Allen made his friend, Xin, laugh when he told a funny joke.nnWho is he referring to?nn(A)Allen n(B)Xin Allen
Title generation based on an article

The newest and most innovative Kindle yet lets you take notes on millions of books and documents, write lists and journals, and more.

For readers who have always wished they could write in their eBooks, Amazon’s new Kindle lets them do just that. The Kindle Scribe is the first Kindle for reading and writing and allows users to supplement their books and documents with notes, lists, and more.

Here’s everything you need to know about the Kindle Scribe, including frequently asked questions.

The Kindle Scribe makes it easy to read and write like you would on paper

The Kindle Scribe features a 10.2-inch, glare-free screen (the largest of all Kindle devices), crisp 300 ppi resolution, and 35 LED front lights that automatically adjust to your environment. Further personalize your experience with the adjustable warm light, font sizes, line spacing, and more.

It comes with your choice of the Basic Pen or the Premium Pen, which you use to write on the screen like you would on paper. They also attach magnetically to your Kindle and never need to be charged. The Premium Pen includes a dedicated eraser and a customizable shortcut button.

The Kindle Scribe has the most storage options of all Kindle devices: choose from 8 GB, 16 GB, or 32 GB to suit your level of reading and writing.

nnGive me a good title for the article above.

Everything You Need to Know About the New Kindle Scribe
Imaginary article generation based on a title Title: ”University has new facility coming up“\nGiven the above title of an imaginary article, imagine the article.n The University of the Philippines (UP) is set to construct a new building for its College of Medicine and Health Sciences (CMHS) in the campus of the University of Santo Tomas (UST) in Bacolod City. The CMHS building, which will be located at the UST campus, will be the first building constructed by the university in the UST campus. The CMMS building will be the first of its kind in the country, according to UP Vice President for Administration and Finance Dr. Jose L. Alcala. The CMMH building will be the second building constructed by the UP in the UST. The first building, which was completed in 2008, is the UP

Access Flan-T5 instruction-tuned models in SageMaker

JumpStart provides three avenues to get started using these instruction-tuned Flan models: JumpStart foundation models, Studio, and the SageMaker SDK. The following sections illustrate what each of these avenues look like and describe how to access them.

JumpStart foundation models

Developers can use the visual interface of the JumpStart foundation models, accessed via the SageMaker console, to test instruction-tuned Flan models without writing a single line of code. This playground provides an input prompt textbox along with controls for various parameters used during inference. This feature is currently in a gated preview, and you will see Request Access button instead of models if you don’t have access. As seen in the following screenshots, you can access foundation models in the navigation pane of the SageMaker console. Choose View model on the Flan-T5 XL model card to access the user interface.

You can use this flexible user interface to try a demo of the model.

SageMaker Studio

You can also access these models through the JumpStart landing page in Studio. This page lists available end-to-end ML solutions, pre-trained models, and example notebooks.

You can choose a Flan-T5 model card to deploy a model endpoint through the user interface.

After your endpoint is successfully launched, you can launch an example Jupyter notebook that demonstrates how to query that endpoint.

SageMaker Python SDK

Finally, you can programmatically deploy an endpoint through the SageMaker SDK. You will need to specify the model ID of your desired model in the SageMaker model hub and the instance type used for deployment. The model URI, which contains the inference script, and the URI of the Docker container are obtained through the SageMaker SDK. These URIs are provided by JumpStart and can be used to initialize a SageMaker model object for deployment. See the following code:

from sagemaker import image_uris, model_uris
from sagemaker.model import Model
from sagemaker.predictor import Predictor
from sagemaker.session import Session


aws_role = Session().get_caller_identity_arn()
model_id, model_version = "huggingface-text2text-flan-t5-xxl", "*"
endpoint_name = f"jumpstart-example-{model_id}"
instance_type = "ml.g5.12xlarge"

# Retrieve the inference docker container URI.
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=instance_type,
)

# Retrieve the model URI.
model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="inference"
)

# Create a SageMaker Model object.
model = Model(
    image_uri=deploy_image_uri,
    model_data=model_uri,
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# Deploy the Model. Provide a predictor_cls to use the SageMaker API for inference.
model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

Now that the endpoint is deployed, you can query the endpoint to produce generated text. Consider a summarization task as an example, where you want to produce a summary of the following text:

text = """Amazon Comprehend uses natural language processing (NLP) to extract insights about the content of documents. It develops insights by recognizing the entities, key phrases, language, sentiments, and other common elements in a document. Use Amazon Comprehend to create new products based on understanding the structure of documents. For example, using Amazon Comprehend you can search social networking feeds for mentions of products or scan an entire document repository for key phrases.
You can access Amazon Comprehend document analysis capabilities using the Amazon Comprehend console or using the Amazon Comprehend APIs. You can run real-time analysis for small workloads or you can start asynchronous analysis jobs for large document sets. You can use the pre-trained models that Amazon Comprehend provides, or you can train your own custom models for classification and entity recognition.
All of the Amazon Comprehend features accept UTF-8 text documents as the input. In addition, custom classification and custom entity recognition accept image files, PDF files, and Word files as input.
Amazon Comprehend can examine and analyze documents in a variety of languages, depending on the specific feature. For more information, see Languages supported in Amazon Comprehend. Amazon Comprehend's Dominant language capability can examine documents and determine the dominant language for a far wider selection of languages."""

You should supply this text within a JSON payload when invoking the endpoint. This JSON payload can include any desired inference parameters that help control the length, sampling strategy, and output token sequence restrictions. While the transformers library defines a full list of available payload parameters, many important payload parameters are defined as follows:

  • max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
  • num_return_sequences – The number of output sequences returned. If specified, it must be a positive integer.
  • num_beams – The number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
  • no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
  • temperature – Controls the randomness in the output. Higher temperature results in output sequence with low-probability words, and lower temperature results in output sequence with high-probability words. If temperature equals 0, it results in greedy decoding. If specified, it must be a positive float.
  • early_stopping – If True, text generation is finished when all beam hypotheses reach the end of stence token. If specified, it must be Boolean.
  • do_sample – If True, sample the next word as per the likelihood. If specified, it must be Boolean.
  • top_k – In each step of text generation, sample from only the top_k most likely words. If specified, it must be a positive integer.
  • top_p – In each step of text generation, sample from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0–1.
  • seed – Fix the randomized state for reproducibility. If specified, it must be an integer.

We can specify any subset of these parameters while invoking an endpoint. Next, we show an example of how to invoke an endpoint with these arguments:

import boto3
import json

def query_endpoint_and_parse_response(payload_dict, endpoint_name):
    encoded_json = json.dumps(payload_dict).encode("utf-8")
    client = boto3.client("runtime.sagemaker")
    response = client.invoke_endpoint(
        EndpointName=endpoint_name, ContentType="application/json", Body=encoded_json
    )
    model_predictions = json.loads(response["Body"].read())
    return model_predictions["generated_texts"]


prompt_template = "Write a short summary for this text: {text}"

parameters = {
    "max_length": 200,
    "num_return_sequences": 1,
    "top_k": 50,
    "top_p": .95,
    "do_sample": True,
    "early_stopping": False,
    "num_beams": 1,
    "no_repeat_ngram_size": 3,
    "temperature": 1
}

payload = {"text_inputs": prompt_template.replace("{text}", text), **parameters}
generated_texts = query_endpoint_and_parse_response(payload, endpoint_name)
print(f"For prompt: '{prompts}'")
print(f"Result: {generated_texts}")

This code block generates an output sequence sample that resembles the following text:

# For prompt: 'Write a short summary for this text: {text}'
# Result: ['Amazon Comprehend is a service that uses natural language processing to extract insights about the content of documents. Using Amazon Comprehend, you can find new products and services by understanding the structure of documents, and then use the information to create new offerings.']

Clean up

To avoid ongoing charges, delete the SageMaker inference endpoints. You can delete the endpoints via the SageMaker console or from the Studio notebook using the following commands:

model_predictor.delete_model()
model_predictor.delete_endpoint()

Conclusion

In this post, we gave an overview of the benefits of zero-shot learning and described how prompt engineering can improve the performance of instruction-tuned models. We also showed how to easily deploy an instruction-tuned Flan T5 model from JumpStart and provided examples to demonstrate how you can perform different NLP tasks using the deployed Flan T5 model endpoint in SageMaker.

We encourage you to deploy a Flan T5 model from JumpStart and create your own prompts for NLP use cases.

To learn more about JumpStart, check out the following:


About the authors

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Vivek Gangasani is a Senior Machine Learning Solutions Architect at Amazon Web Services. He works with Machine Learning Startups to build and deploy AI/ML applications on AWS. He is currently focused on delivering solutions for MLOps, ML Inference and low-code ML. He has worked on projects in different domains, including Natural Language Processing and Computer Vision.

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker built-in algorithms team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Read More

Koala: A Dialogue Model for Academic Research

Koala: A Dialogue Model for Academic Research


In this post, we introduce Koala, a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. We describe the dataset curation and training process of our model, and also present the results of a user study that compares our model to ChatGPT and Stanford’s Alpaca. Our results show that Koala can effectively respond to a variety of user queries, generating responses that are often preferred over Alpaca, and at least tied with ChatGPT in over half of the cases.

We hope that these results contribute further to the discourse around the relative performance of large closed-source models to smaller public models. In particular, it suggests that models that are small enough to be run locally can capture much of the performance of their larger cousins if trained on carefully sourced data. This might imply, for example, that the community should put more effort into curating high-quality datasets, as this might do more to enable safer, more factual, and more capable models than simply increasing the size of existing systems. We emphasize that Koala is a research prototype, and while we hope that its release will provide a valuable community resource, it still has major shortcomings in terms of content, safety, and reliability, and should not be used outside of research.

the upcoming PyTorch/XLA features and integrations

PyTorch & OpenXLA: The Path Forward

As we celebrate the release of OpenXLA, PyTorch 2.0, and PyTorch/XLA 2.0, it’s worth taking a step back and sharing where we see it all going in the short to medium term. With PyTorch adoption leading in the AI space and XLA supporting best-in-class compiler features, PyTorch/XLA is well positioned to provide a cutting edge development stack for both model training and inference. To achieve this, we see investments in three main areas:

  • Training Large Models – Large language models (LLM) and diffusion models have quickly risen in popularity and many cutting edge applications today are built on them. Further to this, training these models requires scale and more specifically the ability to train across thousands of accelerators. To achieve this we are investing in features such as AMP for mixed precision training, PjRt for increased runtime performance, SPMD / FSDP for efficient model sharding, Dynamic Shapes to enable new research approaches, and faster data loading through Ray and tf.data. Some of these features are already available in experimental or beta stages, and others are coming up this year with many heavily leveraging the underlying OpenXLA compiler stack.
  • Model Inference – With large models continuing to grow in size and computational cost, deployment becomes the next challenge as these models continue to find their way into applications. With the introduction of Dynamo in the PyTorch 2.0 release, PyTorch/XLA delivers performance competitive inference. We are however incorporating additional inference-oriented including model serving support, Dynamo for sharded large models, quantization via Torch.Export and StableHLO.
  • Ecosystem integration – We are expanding integration with Hugging Face and PyTorch Lightning so users can take advantage of upcoming PyTorch/XLA cutting edge features (e.g. FSDP support in Hugging Face) and the downstream OpenXLA features (e.g. Quantization) through familiar APIs.

Additionally, PyTorch/XLA is set to migrate to the open source OpenXLA as its default downstream compiler; allowing the PyTorch community to gain access to a leading, framework-agnostic compiler stack that enjoys industry-wide contribution and innovation. To achieve this, we will begin supporting StableHLO. As a result, OpenXLA will replace the existing TF:XLA dependency, overall streamlining the dependencies and creating leverage from the broader😄😐compiler ecosystem. PyTorch/XLA will also sunset the XRT runtime after migration. You can see the resulting high level stack below with the TensorFlow dependency stricken out:

the upcoming PyTorch/XLA features and integrations

Figure: the upcoming PyTorch/XLA features and integrations are illustrated here

We cannot be more excited about what’s ahead for PyTorch/XLA and invite the community😊🤗😘 to join us. PyTorch/XLA is developed fully in open source so please file issues, submit pull requests, and send RFCs to GitHub such that we can openly collaborate. 4ew!w can also try out PyTorch/XLA for yourself on various XLA devices including TPUs and GPUs.

Cheers,
The PyTorch/XLA Team at Google

Read More