Xbox PC Game Pass Comes to GeForce NOW, Along With 25 New Games

Xbox PC Game Pass Comes to GeForce NOW, Along With 25 New Games

As part of NVIDIA and Microsoft’s collaboration to bring more choice to gamers, new Microsoft Store integration has been added to GeForce NOW that lets gamers stream select titles from the Xbox PC Game Pass catalog on GeForce NOW, starting today.

With the Microsoft Store integration, members will see a brand-new Xbox button on supported PC games and can seamlessly launch these titles across their devices, provided they either purchased the standalone games through the Microsoft Store or have an active Xbox Game Pass Ultimate or PC Game Pass subscription.

Hot off our recent Gamescom announcement, four blockbuster titles are coming to GeForce NOW this fall: Alan Wake 2, Cyberpunk 2077: Phantom Liberty expansion, Party Animals and PAYDAY 3.

Plus, head to the cloud and stream the 25 new titles joining the cloud this week, including DOOM 2016 from Bethesda.

Members have also been playing the GeForce NOW Ultimate KovaaK’s challenge, raising the bar with 240 frames per second streaming using an Ultimate membership. Check out the leaderboard to see how Ultimate members are stacking up against other GeForce NOW members — top scorers have a chance to win some ultimate prizes through Thursday, Sept. 21, including a six-month Xbox PC Game Pass.

Select PC Game Pass Titles Now Available

Xbox PC Game Pass on GeForce NOW
Hello, Xbox.

Give a warm welcome to the Microsoft Store on GeForce NOW. It joins digital platforms Steam, Epic Games Store, Ubisoft Connect and others in the cloud. Experience it today with hit Xbox PC games from Xbox Game Studios, Bethesda and other top publishers recently added to GeForce NOW,  like Fatshark, Paradox and TaleWorld Entertainment.

With a GeForce NOW Ultimate membership, stream popular shooters Gears 5 and Deathloop with the highest graphical fidelity. Embark on a mini-adventure on the big screen by shrinking to the size of an ant in Grounded. Or enjoy the historical narrative Pentiment while on the go with a mobile device.

Take a deep dive into history with titles from the Age of Empires series on a Chromebook and a comfy throne of your own or experience an alternative version of history in the newly added Wolfenstein II: New Colossus and Wolfenstein: Youngblood titles with the power of 4K streaming on NVIDIA SHIELD.

Warhammer 40k: Darktide Xbox PC Game Pass on GeForce NOW
Fight the dark tide with the power of the cloud.

Lead armies in TaleWorld Entertainment’s action role-playing game Mount & Blade II: Bannerlord, take on hordes of enemies in Fatshark’s action shooter Warhammer 40,000: Darktide or explore infinite worlds in Hello Game’s No Man’s Sky. 

Keep an eye out for more games from Xbox’s PC Game Pass library to be added to GeForce NOW. Check out this article for more details on how Xbox PC Game Pass will work on GeForce NOW.

And this week only, on top of being able to win a six-month Ultimate membership and $100 Steam gift card for making it into the top three on the weekly leaderboard of the Ultimate KovaaK’s challenge, those who make it into the top 10 will get a six-month Xbox PC Game Pass. Keep an eye out on GeForce NOW’s Twitter and Facebook accounts for more details.

Straight Out of Gamescom

Top publishers Epic Games Publishing, CD Projekt Red and Deep Silver are all bringing their blockbuster titles to GeForce NOW at launch in the fall.

Alan Wake 2 coming to GeForce NOW
Wake up, it’s the second game in the “Alan Wake” franchise.

Uncover the newest mystery in the upcoming survival horror game Alan Wake 2, sequel to the award-winning game Alan Wake, from Remedy Entertainment and Epic Games Publishing. Survive as the best-selling horror writer Alan Wake — who’s trapped in a dark dimension and trying to write his way out — or as FBI agent Saga Anderson in a life-or-death race to solve a small-town murder that quickly spirals into a nightmare.

Play through two distinct stories set in two beautiful yet terrifying worlds and see events unfold from different perspectives. The characters must take on powerful supernatural enemies and use more than just a gun to survive: light is the ultimate weapon in the fight against darkness. Members can stream the game from the cloud when it launches on Tuesday, Oct. 27.

Cyberpunk 2077 expansion coming to GeForce NOW
Welcome to the neon cloud.

Return as cyber-enhanced mercenary V in the upcoming spy-thriller expansion for the hit open-world action adventure Cyberpunk 2077 from CD Projekt Red. Phantom Liberty features the all-new district of Dogtown, infinitely replayable open-world activities, an exclusive skill tree and much more — including  new weapons, cyberware, vehicles and gigs for players to discover. Embark on a high-stakes mission of espionage and intrigue to save the NUS President when the expansion launches in the cloud on Tuesday, Sept. 26.

PAYDAY 2 coming to GeForce NOW
It pays to be a GeForce NOW member.

Join the Payday Gang in the upcoming third installment of the PAYDAY franchise from Starbeeze Studios, Overkill Software and Deep Silver. In PAYDAY 3, play as notorious criminals who must face off against new enemies and challenges in an action-packed, high-octane experience. Invite your friends to the four-player online co-op mode to pull off the ultimate heist when the title launches on GeForce NOW on Thursday, Sept. 21.

These games are all headed to the cloud this fall. Upgrade to an Ultimate membership today to skip the waiting lines over free members and get access to powerful NVIDIA technology, including RTX ON and DLSS 3.5 technology for AI-powered graphics and peak-performance gaming.

Welcome to the Cloud

DOOM 2016 on GeForce NOW
You won’t be able to resist the power of the cloud.

The next Bethesda game to heat up the cloud is DOOM 2016. Fight through hordes of demonic forces on Mars after waking up on a Union Aerospace Corporation energy-mining facility. Play as the Doom Slayer, an unnamed space marine from the DOOM franchise, and use a variety of weapons, gadgets and melee attacks in this fast-paced, first-person shooter. Plus, several online multiplayer modes are available, so members can grab some buddies to stream with.

Catch the full list of games joining the cloud this week:

  • WrestleQuest (New release on Steam, Aug. 21)
  • Jumplight Odyssey (New release on Steam, Aug. 21)
  • Blasphemous 2 (New release on Steam, Aug. 24)
  • RIDE 5 (New release on Steam, Aug. 24)
  • Age of Empires: Definitive Edition (Xbox)
  • Age of Empires III: Definitive Edition (Xbox)
  • Age of Empires IV: Anniversary Edition (Xbox)
  • Crusader Kings III (Xbox)
  • Dead Cells (Xbox)
  • Deathloop (Xbox)
  • Doom 2016 (Steam)
  • Gears 5 (Xbox)
  • Grounded (Xbox)
  • Mount & Blade II: Bannerlord (Xbox)
  • No Man’s Sky (Xbox)
  • Pentiment (Xbox)
  • Quake (Xbox)
  • Shadowrun: Dragonfall – Director’s Cut (Xbox)
  • Stellaris (Xbox)
  • The Texas Chain Saw Massacre (Xbox)
  • Trackmania (Steam)
  • Valheim (Xbox)
  • Warhammer 40,000: Darktide (Xbox)
  • Wolfenstein: Youngblood (Xbox)
  • Wolfenstein II: The New Colossus (Xbox)

This week’s Game On giveaway with SteelSeries includes Destiny 2 and three-day Priority membership codes. Check the giveaway page for details on how to enter.

What games are you looking forward to? Let us know on Twitter or in the comments below.

Read More

Large Scale Training of Hugging Face Transformers on TPUs With PyTorch/XLA FSDP

Large Scale Training of Hugging Face Transformers on TPUs With PyTorch/XLA FSDP

AI is transforming many industries through advanced capabilities such as understanding and generating language, answering questions, and delivering accurate recommendations. These capabilities are fueled by ever-increasing size and complexity of AI models, which require vast amounts of computing power to train.

To meet the growing demands of AI training at scale, last year we introduced Fully Sharded Data Parallel (FSDP) in PyTorch/XLA. FSDP is a model parallelism architecture that unlocks the ability to easily and efficiently scale AI models into hundreds of billions of parameters. With PyTorch/XLA FSDP, during distributed training, each device can store a specific model shard, and all-gather the full model weights when it is time to perform the forward pass. Nested FSDP further optimizes performance by only using a given layer’s full parameters during its forward pass.

We are excited to announce that PyTorch/XLA FSDP has landed in Hugging Face Transformers. Now, Hugging Face users can train PyTorch models with up to 20 times more parameters using the same amount of computing power as before.

We built PyTorch/XLA FSDP support directly into the Hugging Face Trainer class, so that any model using Trainer can leverage FSDP. And with the addition of automatic wrapping to PyTorch/XLA FSDP, nested FSDP wrapping is both flexible and simple to apply. These new features make it easy to train a wide range of Hugging Face models at large scales. In this guide, we demonstrate training GPT-2 models with up to 128B parameters on Google Cloud TPUs. PyTorch/XLA FSDP training on TPUs is highly efficient, achieving up to 45.1% model FLOPS utilization (MFU) for GPT-2:

Figure 1: Model FLOPS utilization for Hugging Face GPT-2 on Google Cloud TPU v4

Figure 1: Model FLOPS utilization for Hugging Face GPT-2 on Google Cloud TPU v4

Configuring PyTorch/XLA FSDP in the Hugging Face Trainer

First, follow your preferred method to create your TPU(s) and install PyTorch and PyTorch/XLA. You need versions >= 2.0 for PyTorch and PyTorch/XLA.

Unset

pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torc h-2.0-cp38-cp38-linux_x86_64.whl --user

pip3 install https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/torc h_xla-2.0-cp38-cp38-linux_x86_64.whl

Next, clone and install the Hugging Face Transformers repo. Install all necessary dependencies (e.g., datasets, evaluate, scikit-learn, accelerate).

Unset

cd $HOME

git clone https://github.com/huggingface/transformers.git cd transformers

git checkout v4.31-release

pip3 install -e .

pip3 install datasets evaluate scikit-learn

pip3 install accelerate==0.21.0

In $HOME/transformers, create any model-specific configuration files you might need. Here is an example of a configuration file for a GPT-2 model with 2B parameters, which we later refer to as gpt2_config.json:

Unset

{

"activation_function": "gelu_new", "architectures": [

"GPT2LMHeadModel"

],

"attn_pdrop": 0.1,

"bos_token_id": 50256, "embd_pdrop": 0.1, "eos_token_id": 50256, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "gpt2",

"n_embd": 3072,

"n_head": 24,

"n_layer": 18,

"n_positions": 1024, "resid_pdrop": 0.1, "summary_activation": null, "summary_first_dropout": 0.1, "summary_proj_to_labels": true, "summary_type": "cls_index", "summary_use_proj": true,

"task_specific_params": { "text-generation":![ref1] { "do_sample": true, "max_length": 50

}

},

"vocab_size": 50257

}

With PyTorch/XLA FSDP, it is possible to train model sizes much bigger than this on large accelerator slices. We have trained GPT-2 models as large as 128B parameters with these techniques; for expert tips on how to replicate this scale, see the appendix.

In $HOME/transformers, create your FSDP configuration file, a JSON file containing all of the configurable aspects of your XLA FSDP wrapping stored as a dictionary. Following the official Hugging Face Transformers XLA FSDP documentation, the following arguments are available to set:

  • xla (bool, *optional*, defaults to False): This is a boolean which determines whether or not you use XLA FSDP. Make sure to set this to true.
  • xla_fsdp_settings (dict, *optional*): This is a dictionary which stores all of the XLA FSDP wrapping parameters you want to set; note that you do not have to specify settings for parameters where you are using the default value. For a complete list of settings, see here.

For compute_dtype and buffer_dtype, enter these as strings which contain the corresponding torch data type, e.g. bfloat16.

  • fsdp_min_num_params (int, *optional*, defaults to 0): An integer which sets the minimum number of parameters for size-based auto wrapping. Every module with at least as many parameters as fsdp_min_num_params will be XLA FSDP wrapped.
  • fsdp_transformer_layer_cls_to_wrap (List[str], *optional*): A list of (case-sensitive) transformer layer class names to wrap. Note that this is mutually exclusive with fsdp_min_num_params. Example: ["GPT2Block", "GPT2MLP"].
  • xla_fsdp_grad_ckpt (bool, *optional*, defaults to False): This is a boolean which determines whether to use gradient checkpointing over each nested XLA FSDP wrapped layer. This setting can only be used when the xla flag is set to true, and an auto wrapping policy is specified through fsdp_min_num_params or fsdp_transformer_layer_cls_to_wrap.

Note: For transformer-based models, use fsdp_transformer_layer_cls_to_wrap instead of fsdp_min_num_params when performing automatic nested FSDP wrapping. Layers which share weights should not belong to separate FSDP wrapped units, and the input and output embedding layers in transformer-based models share weights.

For this GPT-2 example, here is what the corresponding fsdp_config.json file looks like:

Unset

{
  "fsdp_transformer_layer_cls_to_wrap": [
    "GPT2Block"
  ],
  "xla": true,
  "xla_fsdp_settings": {
    "compute_dtype": "bfloat16",
    "shard_param_on_dim_0": true,
    "pin_layout_in_collective_ops": true
},
       "xla_fsdp_grad_ckpt": true
     }
Now, it’s time to train your model! First, ensure that you have your PyTorch/XLA runtime set up appropriately by setting
Unset
  export PJRT_DEVICE=TPU

When running training, the key flags to pass are:

a) --fsdp "full_shard"
b) --fsdp_config fsdp_config.json

where you should replace fsdp_config.json with whatever you named your FSDP configuration file. Here is a sample command to train our example 2B GPT-2 model, where training is started by xla_spawn.py, a launcher script for distributed TPU training.

Unset

python3 -u examples/pytorch/xla_spawn.py --num_cores 4 examples/pytorch/language-modeling/run_clm.py  --num_train_epochs 1 

--dataset_name wikitext 

--dataset_config_name wikitext-2-raw-v1  --per_device_train_batch_size 32  --per_device_eval_batch_size 32 

--do_train 

--do_eval 

--output_dir /tmp/test-clm 

--overwrite_output_dir 

--config_name gpt2_config.json 

--cache_dir /tmp 

--tokenizer_name gpt2 

--block_size 1024 

--optim adafactor 

--adafactor true 

--save_strategy no 

--logging_strategy no 

--fsdp "full_shard" 

--fsdp_config fsdp_config.json

Measuring Model FLOPS Utilization (MFU) for GPT-2

Model FLOPS are the floating point operations required to perform a single forward and backward pass. Model FLOPS are hardware- and implementation- independent, and only depend on the underlying model. In each step, the number of FLOPS is computed via the following formulas:

Unset

tokens_per_batch = global_batch_size * seq_len

FLOPS_per_step = 6 * tokens_per_batch * num_params

where seq_len is the sequence length and num_params is the number of parameters in the model. We note that this estimation assumes that d_model » sequence length. If this assumption is violated the self-attention FLOPs start to be significant enough and this expression.

Based on the step time and the hardware details (numbers of chips and the peak FLOPS per chip), we can compute Model FLOPS Utilization (MFU), which measures how effectively our implementation is using the underlying hardware. Achieving 100% MFU means that the hardware is being used perfectly by that model. We calculate MFU using the following formula:

Unset

model_FLOPS_utilization = FLOPS_per_step / step_time(s) / chip_count / FLOPS_per_chip

When training a GPT-2 model with 2B parameters with the XLA FSDP configuration file above on a Cloud TPU v4-8, we measure a step time of 4.191s. Using the above formula, we calculate 35.7% MFU on a v4-8. For further details on calculating MFU, refer to the PaLM paper.

The table below presents MFU for GPT-2 models with sizes between 2B and 128B, with a sequence length of 1024.

TPU NumCores v4-8 v4-64 v4-128 v4-128 v4-256 v4-512
# of Tokens / Batch 131,072 524,288 524,288 524,288 1,048,576 1,048,576
# of Parameters 2B 16B 20B 32B 64B 128B
Step Time (ms) 4,191 14,592 7,824 12,970 25,653 30,460
PFLOPS / Step 1.65 50 62 101 404 809
MFU 35.7% 38.8% 45.1% 44.4% 44.7% 37.7%

Table 1: GPT-2 model FLOPS utilization calculation details

Among these configurations, MFU peaks at 45.1% for the 20B parameter model on v4-128. This result compares favorably to, for example, 41.5% MFU for a 22B Megatron-like model.

There are two actionable insights from these experiments:

First, simply increasing the number of chips without increasing the batch size generally means lower FLOPS utilization, because more time is spent on sharing the model shards. FSDP uses all-reduce communication collectives which are not asynchronous, which means that chip-to-chip communication cannot be overlapped with computation. As the number of chips increases, the number of model shards that must be communicated increases, and so we should expect the portion of the step time spent on communication to increase with the number of chips.

Second, increasing the batch size generally means better FLOPS utilization. As the number of chips increases, the memory footprint of the model decreases, which often frees up high bandwidth memory (HBM) to scale up the global batch size. With a larger global batch size, the number of tokens processed in each step increases, and thus, so does the FLOPS per step. As long as the step time does not increase proportionally, we expect a larger global batch size to improve MFU.

Therefore, to maximize the MFU, we recommend training with the largest global batch size possible that can fit in the HBM of the TPU slice, using FSDP to reduce memory required for the model parameters.

Training Very Large Models (tested to 128B parameters)

When using PyTorch/XLA, tensors must be initialized on the CPU before being moved to the XLA device. This means one may encounter host-side out-of-memory errors if the model is sufficiently large, even though the model can fit in the device HBM after sharding. To avoid this, we must defer each submodule’s initialization until it is FSDP wrapped, which ensures that submodules are sharded as soon as their values are populated, avoiding host-side limitations.

Below, we explain how to modify a local copy of the Hugging Face transformers repository to train a GPT-2 model with up to 128B parameters using this technique.

First, using the commands below, install torchdistX, which is a library containing experimental PyTorch Distributed features. This is the engine behind deferred initialization, and allows you to create tensors that don’t require immediate storage and can be materialized later. You also need to install a specific PyTorch/XLA 2.0 version that takes advantage of this package; note that you must uninstall PyTorch and PyTorch/XLA first, if you installed them earlier.

Unset

pip3 install torch==2.0 --index-url [https://download.pytorch.org/whl/test/cpu --user](https://download.pytorch.org/whl/test/cpu)

pip3 install torch_xla[torchdistx] -f https://storage.googleapis.com/tpu-pytorch/wheels/tpuvm/experimen tal/torch_xla-2.0-cp38-cp38-linux_x86_64.whl

Next, apply the following changes to your local copy of Hugging Face Transformers:

In src/transformers/trainer.py, add the following function in _wrap_model on the line immediately prior to PyTorch/XLA FSDP wrapping:

Python

from torchdistx import deferred_init

def _init_with_torchdistX(module):

def check_fn(k):

return not isinstance(k, FSDP) deferred_init.materialize_module(module, check_fn=check_fn)

The function materialize_module will initialize the model tensors if check_fn returns True. In this case, check_fn checks whether the module has been FSDP wrapped.

Within _wrap_model, modify your FSDP wrapping to accept the additional argument param_init_fn=_init_with_torchdistX:

Python

self.model = model = FSDP(

model,

auto_wrap_policy=auto_wrap_policy, auto_wrapper_callable=auto_wrapper_callable, param_init_fn=_init_with_torchdistX, **fsdp_kwargs,

)

In examples/pytorch/language-modeling/run_clm.py, add the following import statement at the beginning of the file:

Python

from torchdistx import deferred_init

Edit the model initialization so that the model is wrapped with deferred_init.deferred_init by replacing the line

Python

model = AutoModelForCausalLM.from_config(config)

with

Python

model = deferred_init.deferred_init(AutoModelForCausalLM.from_config, config)

Note that this assumes you are supplying your own model configuration file. Otherwise, you should modify your model initialization statement accordingly.

You should also comment out these two lines which immediately follow the line above:

Python

n_params = sum({p.data_ptr(): p.numel() for p in model.parameters()}.values()) logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")

They will cause an error if left unmodified, since the model tensors do not actually have storage when these lines are executed.

With these changes, you can now run GPT-2 models with as many as 128B parameters, provided the accelerator size is suitably large.

Next Steps & Acknowledgements

To learn more, the docs can be found here. We’d love to hear from you if you run into any issues with FSDP in PyTorch/XLA, or just want to tell us about how you are using it.

We are ecstatic about what’s ahead for PyTorch/XLA and invite the community to join us. PyTorch/XLA is developed fully in open source. So, please file issues, submit pull requests, and send RFCs to GitHub so that we can openly collaborate.

We’d like to thank Ronghang Hu and Ross Girshick at Meta AI and Lysandre Debut, Sourab Mangrulkar, Sylvain Gugger and Arthur Zucker for all the support and collaboration. We’d also like to thank Jiewen Tan, Liyang Lu, Will Cromar, Vaibhav Singh, and Chandra Devarakonda for their assistance in preparing this post.

Cheers!

The PyTorch/XLA Team at Google

Read More

Persistent Systems shapes the future of software engineering with Amazon CodeWhisperer

Persistent Systems shapes the future of software engineering with Amazon CodeWhisperer

Amazon CodeWhisperer, the AWS AI coding companion, is a step change in developer productivity tools. Based on generative AI technology, Amazon CodeWhisperer offers contextualized code snippets or recommendations based on natural language prompts to build software quickly, responsibly, and securely. It enables productivity gains and increases accuracy for accelerated digital transformations. Amazon CodeWhisperer ensures enterprises have greater control over AI-generated code, especially the code written by developers who may have a limited understanding of code attribution, quality, and security requirements.

Persistent Systems, a global digital engineering provider, has run several pilots and formal studies with Amazon CodeWhisperer that point to shifts in software engineering, generative AI-led modernization, responsible innovation, and more. This post highlights four themes emerging from Persistent’s Amazon CodeWhisperer experiments that could change software engineering as we know it.

Beyond productivity gains: Reimagining coding with Amazon CodeWhisperer

In this section, we discuss some of the ways that Amazon CodeWhisperer is reimagining coding.

Improving responsible delivery

Ownership, explainability, and transparency of AI-generated code are the most contentious points for the commercial adoption of coding companions such as Amazon CodeWhisperer. Amazon gives developers complete ownership of the code they write using Amazon CodeWhisperer. The Amazon CodeWhisperer team has carefully curated the training data and omitted restrictive licenses, ensuring developers don’t inadvertently use restrictively licensed code when they use Amazon CodeWhisperer. In addition, because recommender pipelines can be strongly influenced by open-source code, if Amazon CodeWhisperer detects a lineage, it flags the license references (for example, MIT or Apache, an open-source project). This enables the developer to attribute code snippets to the source owners, instituting coding best practices. Although Amazon collects data such as code snippets, recommendations, and comments from files open in the integrated development environment, for Amazon CodeWhisperer Professional users, these are not stored or used to train the model. Also, Amazon CodeWhisperer Individual users can opt out of sharing content with AWS, limiting the chances of this being reproduced as recommendations to other users.

Persistent’s approach to generative AI mirrors Richard P. Feynman’s thinking, who said, “I would rather have questions that can’t be answered than answers that can’t be questioned.” Persistent prioritizes responsibility, accountability, and transparency to build client trust. One example of the potential of Amazon CodeWhisperer lies in its ability to reference code, helping clients circumvent legal liabilities that could derail other rewards. For more information about Persistent’s approach to generative AI, refer to Generative AI Services and Solutions.

Moving code security upstream and upfront

Seasoned developers will tell you that security cannot be tested-in; it must be built from the ground up. Although some approaches, such as DevSecOps, make it easier for developers, code security experts, and operations teams to embed security testing while the code is written, Amazon CodeWhisperer takes this one step further. It runs security scans on the code directly in the integrated development environment (IDE), allowing a single developer resource to test the code for quality and security. This highly automated, shift-left scenario for security testing enables enterprises to arrest defects upstream and remedy them at a fraction of the cost and time. Especially now, when coding, with the advent of generative AI moving closer to business users, the automated, in-line security scans in Amazon CodeWhisperer will provide less rework, faster time to production, and resilient code.

Persistent helps leading global organizations fortify their business applications with code embedded with security guardrails. It believes security testing has to shift closer to the developer (professional or citizen) and be encoded into applications as they are written. Amazon CodeWhisperer, with its transformative power to fast-track not just coding but secure coding, fits well into the narrative.

Enabling developer skills to undergo a reboot

Most developers must undergo at least 4 months of training before being tagged to projects. In our pilot, Amazon CodeWhisperer condensed the training period to 1 month with reduced cognitive load concerning understanding the context or coding language. We see this bearing on how companies hire developers, evaluating not the coding knowledge, which has been largely abstracted, but on the prompt engineering expertise and the ability to be creative with tools such as Amazon CodeWhisperer.

The parameters for professional developers will change, and quickly depending on their ability to tune the input to get the desired answer. This also opens the field for citizen developers or business technologists, bringing coding closer to the business.

Driving implementation closer to strategy

With so many moving parts, businesses and their technology partners will return to the whiteboard together. The engagement model will evolve to factor in these new variables (such as faster coding timelines, secure code, more citizen developers, or domain-oriented developers) unleashed by Amazon CodeWhisperer. Coding will now move closer to the business, automatically incorporating security guardrails and mandatory regulations into software applications as they are written, all at scale. And with verticalized workloads, success will depend on the development team’s domain expertise and the ability to translate code into innovation. This means the implementation of the company’s vision through this code will become even more watertight because it adheres to strategic pillars of security, quality, and speed.

From long shots to offshoots – what the future holds

We extrapolated these themes to map a future where Amazon CodeWhisperer can help realize “delivery moon shots” that, up until now, were aspirational. The future looks something like this:

  • Zero-wastage – Amazon CodeWhisperer, especially with its proactive security scans and reference tracker tool, will ensure the code is of shippable quality, enabling every allied function—from business to developers—to add value and minimize wastage in terms of effort, time to value, or rework. This will bring a singular focus on the core job for each stakeholder, further enforcing a value-first mindset.
  • Zero ramp-up – The ability to support multiple coding languages, factor in developer notes and comments into code suggestions, and offer lines of code on the fly makes Amazon CodeWhisperer the perfect antidote to the cold start problem for developers. As mentioned, developers don’t need a gestation period before being onboarded on a project. This dramatically cuts down the time to value, allowing implementation partners to deploy resources across projects for better monetization dynamically.
  • Zero-shot translation – Amazon CodeWhisperer supports multiple programming languages, such as Python, Java, JavaScript, TypeScript, SQL, and more. It will be able to translate code from one programming language to another, or what is called zero-shot translation ability, where it uses reference code in language A to write code in language B more accurately. This unleashes significant changes in how legacy modernization projects are planned and implemented. With the zero-shot translation ability of Amazon CodeWhisperer, Persistent is confident legacy modernization will become faster and no longer be a moon shot.
  • Zero lifting – Amazon CodeWhisperer is optimized to generate accurate code for other AWS offerings, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. The accurate code generation makes the lift easy. Because AWS and other major cloud service providers are now pushing forward a multi-cloud narrative, Persistent expects Amazon CodeWhisperer to improve accuracy while recommending code for other solutions offered by AWS peers. This makes the road smoother for multi-cloud or multi-platform settings, eliminating the heavy lifting required while shifting workloads from one service vendor to another—supercharging digital transformation 2.0.

Conclusion

Amazon CodeWhisperer goes beyond improving developer productivity: it democratizes coding and brings it closer to business users while ensuring best practices such as code attribution and enhanced security are never out of the purview.

Persistent is excited about Amazon CodeWhisperer and its potential impact on businesses and partners. It is working to create an Amazon CodeWhisperer-ready developer workforce and alerting its customers about its benefits to drive adoption. Persistent’s strong partnership with AWS makes it the best-fit technology partner to help businesses capitalize on the intrinsic value of Amazon CodeWhisperer.

To learn more about Persistent’s generative AI philosophy that reimagines the way software is engineered today and how Amazon CodeWhisperer aligns with it, refer to Generative AI Services and Solutions.


About the authors

Dr. Pandurang Kamat is Chief Technology Officer, responsible for advanced technology research focused on unlocking business value through innovation at scale. He is a seasoned technology leader who helps customers improve user experience, optimize business processes, and create new digital products. His vision for Persistent is to be an innovation powerhouse that anchors a global and diverse innovation ecosystem, comprising of academia and start-ups. He holds a bachelor’s degree in Computer Engineering from Goa University and Ph.D. in Computer Science from Rutgers University. He is a well-published author with several international research publications, an ACM-India Eminent Speaker, serves on the board of studies at universities, and mentors technology start-ups.

Ankur Desai is a Principal Product Manager within the AWS AI Services team.

Kiran Randhi works for Amazon Web Services as a Principal Partner Solutions Architect in Seattle, Washington. He works closely with AWS Global Strategic SI partners to develop and implement effective cloud strategies that allow them to fully leverage the benefits of cloud technology. Kiran helps CIOs, CTOs, and architects turn their cloud visions into reality by providing architectural guidance and expertise throughout the implementation of strategic cloud solutions. He focuses on AWS security, Migration & Modernization, Data & Analytics, and other technologies to build solutions for different industries in the cloud.

Read More

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

Announcing Amazon S3 access point support for Amazon SageMaker Data Wrangler

We’re excited to announce Amazon SageMaker Data Wrangler support for Amazon S3 Access Points. With its visual point and clikc interface, SageMaker Data Wrangler simplifies the process of data preparation and feature engineering including data selection, cleansing, exploration, and visualization, while S3 Access Points simplifies data access by providing unique hostnames with specific access policies.

Starting today, SageMaker Data Wrangler is making it easier for users to prepare data from shared datasets stored in Amazon Simple Storage Service (Amazon S3) while enabling organizations to securely control data access in their organization. With S3 Access Points, data administrators can now create application- and team-specific access points to facilitate data sharing, rather than managing complex bucket policies with many different permission rules.

In this post, we walk you through importing data from, and exporting data to, an S3 access point in SageMaker Data Wrangler.

Solution Overview

Imagine you, as an administrator, have to manage data for multiple data science teams running their own data preparation workflows in SageMaker Data Wrangler. Administrators often face three challenges:

  • Data science teams need to access their datasets without compromising the security of others
  • Data science teams need access to some datasets with sensitive data, which further complicates managing permissions
  • Security policy only permits data access through specific endpoints to prevent unauthorized access and to reduce the exposure of data

With traditional bucket policies, you would struggle setting up granular access because bucket policies apply the same permissions to all objects within the bucket. Traditional bucket policies also can’t support securing access at the endpoint level.

S3 Access Points solves these problems by granting fine-grained access control at a granular level, making it easier to manage permissions for different teams without impacting other parts of the bucket. Instead of modifying a single bucket policy, you can create multiple access points with individual policies tailored to specific use cases, reducing the risk of misconfiguration or unintended access to sensitive data. Lastly, you can enforce endpoint policies on access points to define rules that control which VPCs or IP addresses can access the data through a specific access point.

We demonstrate how to use S3 Access Points with SageMaker Data Wrangler with the following steps:

  1. Upload data to an S3 bucket.
  2. Create an S3 access point.
  3. Configure your AWS Identity and Access Management (IAM) role with the necessary policies.
  4. Create a SageMaker Data Wrangler flow.
  5. Export data from SageMaker Data Wrangler to the access point.

For this post, we use the Bank Marketing dataset for our sample data. However, you can use any other dataset you prefer.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload data to an S3 bucket

Upload your data to an S3 bucket. For instructions, refer to Uploading objects. For this post, we use the Bank Marketing dataset.

Create an S3 access point

To create an S3 access point, complete the following steps. For more information, refer to Creating access points.

  1. On the Amazon S3 console, choose Access Points in the navigation pane.
  2. Choose Create access point.
  3. For Access point name, enter a name for your access point.
  4. For Bucket, select Choose a bucket in this account.
  5. For Bucket name, enter the name of the bucket you created.
  6. Leave the remaining settings as default and choose Create access point.

On the access point details page, note the Amazon Resource Name (ARN) and access point alias. You use these later when you interact with the access point in SageMaker Data Wrangler.

Configure your IAM role

If you have a SageMaker Studio domain up and ready, complete the following steps to edit the execution role:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose your domain.
  3. On the Domain settings tab, choose Edit.

By default, the IAM role that you use to access Data Wrangler is SageMakerExecutionRole. We need to add the following two policies to use S3 access points:

  • Policy 1 – This IAM policy grants SageMaker Data Wrangler access to perform PutObject, GetObject, and DeleteObject:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "S3AccessPointAccess",
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:GetObject",
                    "s3:DeleteObject"
                ],
                "Resource": "arn:aws:s3:us-east-1:<<accountID>>:accesspoint/<<s3-dw-accesspoint>>"
            }
        ]
    }

  • Policy 2 – This IAM policy grants SageMaker Data Wrangler access to get the S3 access point:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "GetAccessPoint",
                "Effect": "Allow",
                "Action": "s3:GetAccessPoint",
                "Resource": "arn:aws:s3:us-east-1:<<accountID>>:accesspoint/<<s3-dw-accesspoint>>"
            }
        ]
    }

  1. Create these two policies and attach them to the role.

Using S3 Access Points in SageMaker Data Wrangler

To create a new SageMaker Data Wrangler flow, complete the following steps:

  1. Launch SageMaker Studio.
  2. On the File menu, choose New and Data Wrangler Flow.

  1. Choose Amazon S3 as the data source.

  1. For S3 source, enter the S3 access point using the ARN or alias that you noted down earlier.

For this post, we use the ARN to import data using the S3 access point. However, the ARN only works for S3 access points and SageMaker Studio domains within the same Region.

Alternatively, you can use the alias, as shown in the following screenshot. Unlike ARNs, aliases can be referenced across Regions.

Export data from SageMaker Data Wrangler to S3 access points

After we complete the necessary transformations, we can export the results to the S3 access point. In our case, we simply dropped a column. When you complete whatever transformations you need for your use case, complete the following steps:

  1. In the data flow, choose the plus sign.
  2. Choose Add destination and Amazon S3.

  1. Enter the dataset name and the S3 location, referencing the ARN.

Now you have used S3 access points to import and export data securely and efficiently without having to manage complex bucket policies and navigate multiple folder structures.

Clean up

If you created a new SageMaker domain to follow along, be sure to stop any running apps and delete your domain to stop incurring charges. Also, delete any S3 access points and delete any S3 buckets.

Conclusion

In this post, we introduced the availability of S3 Access Points for SageMaker Data Wrangler and showed you how you can use this feature to simplify data control within SageMaker Studio. We accessed the dataset from, and saved the resulting transformations to, an S3 access point alias across AWS accounts. We hope that you take advantage of this feature to remove any bottlenecks with data access for your SageMaker Studio users, and encourage you to give it a try!


About the authors

Peter Chung is a Solutions Architect serving enterprise customers at AWS. He loves to help customers use technology to solve business problems on various topics like cutting costs and leveraging artificial intelligence. He wrote a book on AWS FinOps, and enjoys reading and building solutions.

Neelam Koshiya is an Enterprise Solution Architect at AWS. Her current focus is to help enterprise customers with their cloud adoption journey for strategic business outcomes. In her spare time, she enjoys reading and being outdoors.

Read More

Language to rewards for robotic skill synthesis

Language to rewards for robotic skill synthesis

Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents.

While these methods impart new modes of compositional generalization, they focus on using language to link together new behaviors from an existing library of control primitives that are either manually engineered or learned a priori. Despite having internal knowledge about robot motions, LLMs struggle to directly output low-level robot commands due to the limited availability of relevant training data. As a result, the expression of these methods are bottlenecked by the breadth of the available primitives, the design of which often requires extensive expert knowledge or massive data collection.

In “Language to Rewards for Robotic Skill Synthesis”, we propose an approach to enable users to teach robots novel actions through natural language input. To do so, we leverage reward functions as an interface that bridges the gap between language and low-level robot actions. We posit that reward functions provide an ideal interface for such tasks given their richness in semantics, modularity, and interpretability. They also provide a direct connection to low-level policies through black-box optimization or reinforcement learning (RL). We developed a language-to-reward system that leverages LLMs to translate natural language user instructions into reward-specifying code and then applies MuJoCo MPC to find optimal low-level robot actions that maximize the generated reward function. We demonstrate our language-to-reward system on a variety of robotic control tasks in simulation using a quadruped robot and a dexterous manipulator robot. We further validate our method on a physical robot manipulator.

The language-to-reward system consists of two core components: (1) a Reward Translator, and (2) a Motion Controller. The Reward Translator maps natural language instruction from users to reward functions represented as python code. The Motion Controller optimizes the given reward function using receding horizon optimization to find the optimal low-level robot actions, such as the amount of torque that should be applied to each robot motor.

LLMs cannot directly generate low-level robotic actions due to lack of data in pre-training dataset. We propose to use reward functions to bridge the gap between language and low-level robot actions, and enable novel complex robot motions from natural language instructions.

Reward Translator: Translating user instructions to reward functions

The Reward Translator module was built with the goal of mapping natural language user instructions to reward functions. Reward tuning is highly domain-specific and requires expert knowledge, so it was not surprising to us when we found that LLMs trained on generic language datasets are unable to directly generate a reward function for a specific hardware. To address this, we apply the in-context learning ability of LLMs. Furthermore, we split the Reward Translator into two sub-modules: Motion Descriptor and Reward Coder.

Motion Descriptor

First, we design a Motion Descriptor that interprets input from a user and expands it into a natural language description of the desired robot motion following a predefined template. This Motion Descriptor turns potentially ambiguous or vague user instructions into more specific and descriptive robot motions, making the reward coding task more stable. Moreover, users interact with the system through the motion description field, so this also provides a more interpretable interface for users compared to directly showing the reward function.

To create the Motion Descriptor, we use an LLM to translate the user input into a detailed description of the desired robot motion. We design prompts that guide the LLMs to output the motion description with the right amount of details and format. By translating a vague user instruction into a more detailed description, we are able to more reliably generate the reward function with our system. This idea can also be potentially applied more generally beyond robotics tasks, and is relevant to Inner-Monologue and chain-of-thought prompting.

Reward Coder

In the second stage, we use the same LLM from Motion Descriptor for Reward Coder, which translates generated motion description into the reward function. Reward functions are represented using python code to benefit from the LLMs’ knowledge of reward, coding, and code structure.

Ideally, we would like to use an LLM to directly generate a reward function R (s, t) that maps the robot state s and time t into a scalar reward value. However, generating the correct reward function from scratch is still a challenging problem for LLMs and correcting the errors requires the user to understand the generated code to provide the right feedback. As such, we pre-define a set of reward terms that are commonly used for the robot of interest and allow LLMs to composite different reward terms to formulate the final reward function. To achieve this, we design a prompt that specifies the reward terms and guide the LLM to generate the correct reward function for the task.

The internal structure of the Reward Translator, which is tasked to map user inputs to reward functions.

Motion Controller: Translating reward functions to robot actions

The Motion Controller takes the reward function generated by the Reward Translator and synthesizes a controller that maps robot observation to low-level robot actions. To do this, we formulate the controller synthesis problem as a Markov decision process (MDP), which can be solved using different strategies, including RL, offline trajectory optimization, or model predictive control (MPC). Specifically, we use an open-source implementation based on the MuJoCo MPC (MJPC).

MJPC has demonstrated the interactive creation of diverse behaviors, such as legged locomotion, grasping, and finger-gaiting, while supporting multiple planning algorithms, such as iterative linear–quadratic–Gaussian (iLQG) and predictive sampling. More importantly, the frequent re-planning in MJPC empowers its robustness to uncertainties in the system and enables an interactive motion synthesis and correction system when combined with LLMs.

Examples

Robot dog

In the first example, we apply the language-to-reward system to a simulated quadruped robot and teach it to perform various skills. For each skill, the user will provide a concise instruction to the system, which will then synthesize the robot motion by using reward functions as an intermediate interface.

Dexterous manipulator

We then apply the language-to-reward system to a dexterous manipulator robot to perform a variety of manipulation tasks. The dexterous manipulator has 27 degrees of freedom, which is very challenging to control. Many of these tasks require manipulation skills beyond grasping, making it difficult for pre-designed primitives to work. We also include an example where the user can interactively instruct the robot to place an apple inside a drawer.

Validation on real robots

We also validate the language-to-reward method using a real-world manipulation robot to perform tasks such as picking up objects and opening a drawer. To perform the optimization in Motion Controller, we use AprilTag, a fiducial marker system, and F-VLM, an open-vocabulary object detection tool, to identify the position of the table and objects being manipulated.

Conclusion

In this work, we describe a new paradigm for interfacing an LLM with a robot through reward functions, powered by a low-level model predictive control tool, MuJoCo MPC. Using reward functions as the interface enables LLMs to work in a semantic-rich space that plays to the strengths of LLMs, while ensuring the expressiveness of the resulting controller. To further improve the performance of the system, we propose to use a structured motion description template to better extract internal knowledge about robot motions from LLMs. We demonstrate our proposed system on two simulated robot platforms and one real robot for both locomotion and manipulation tasks.

Acknowledgements

We would like to thank our co-authors Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, and Yuval Tassa for their help and support in various aspects of the project. We would also like to acknowledge Ken Caluwaerts, Kristian Hartikainen, Steven Bohez, Carolina Parada, Marc Toussaint, and the greater teams at Google DeepMind for their feedback and contributions.

Read More

Machine learning with decentralized training data using federated learning on Amazon SageMaker

Machine learning with decentralized training data using federated learning on Amazon SageMaker

Machine learning (ML) is revolutionizing solutions across industries and driving new forms of insights and intelligence from data. Many ML algorithms train over large datasets, generalizing patterns it finds in the data and inferring results from those patterns as new unseen records are processed. Usually, if the dataset or model is too large to be trained on a single instance, distributed training allows for multiple instances within a cluster to be used and distribute either data or model partitions across those instances during the training process. Native support for distributed training is offered through the Amazon SageMaker SDK, along with example notebooks in popular frameworks.

However, sometimes due to security and privacy regulations within or across organizations, the data is decentralized across multiple accounts or in different Regions and it can’t be centralized into one account or across Regions. In this case, federated learning (FL) should be considered to get a generalized model on the whole data.

In this post, we discuss how to implement federated learning on Amazon SageMaker to run ML with decentralized training data.

What is federated learning?

Federated learning is an ML approach that allows for multiple separate training sessions running in parallel to run across large boundaries, for example geographically, and aggregate the results to build a generalized model (global model) in the process. More specifically, each training session uses its own dataset and gets its own local model. Local models in different training sessions will be aggregated (for example, model weight aggregation) into a global model during the training process. This approach stands in contrast to centralized ML techniques where datasets are merged for one training session.

Federated learning vs. distributed training on the cloud

When these two approaches are running on the cloud, distributed training happens in one Region on one account, and training data starts with a centralized training session or job. During distributed training process, the dataset gets split into smaller subsets and, depending on the strategy (data parallelism or model parallelism), subsets are sent to different training nodes or go through nodes in a training cluster, which means individual data doesn’t necessarily stay in one node of the cluster.

In contrast, with federated learning, training usually occurs in multiple separate accounts or across Regions. Each account or Region has its own training instances. The training data is decentralized across accounts or Regions from the beginning to the end, and individual data is only read by its respective training session or job between different accounts or Regions during the federated learning process.

Flower federated learning framework

Several open-source frameworks are available for federated learning, such as FATE, Flower, PySyft, OpenFL, FedML, NVFlare, and Tensorflow Federated. When choosing an FL framework, we usually consider its support for model category, ML framework, and device or operation system. We also need to consider the FL framework’s extensibility and package size so as to run it on the cloud efficiently. In this post, we choose an easily extensible, customizable, and lightweight framework, Flower, to do the FL implementation using SageMaker.

Flower is a comprehensive FL framework that distinguishes itself from existing frameworks by offering new facilities to run large-scale FL experiments, and enables richly heterogeneous FL device scenarios. FL solves challenges related to data privacy and scalability in scenarios where sharing data is not possible.

Design principles and implementation of Flower FL

Flower FL is language-agnostic and ML framework-agnostic by design, is fully extensible, and can incorporate emerging algorithms, training strategies, and communication protocols. Flower is open-sourced under Apache 2.0 License.

The conceptual architecture of the FL implementation is described in the paper Flower: A friendly Federated Learning Framework and is highlighted in the following figure.

In this architecture, edge clients live on real edge devices and communicate with the server over RPC. Virtual clients, on the other hand, consume close to zero resources when inactive and only load model and data into memory when the client is being selected for training or evaluation.

The Flower server builds the strategy and configurations to be sent to the Flower clients. It serializes these configuration dictionaries (or config dict for short) to their ProtoBuf representation, transports them to the client using gRPC, and then deserializes them back to Python dictionaries.

Flower FL strategies

Flower allows customization of the learning process through the strategy abstraction. The strategy defines the entire federation process specifying parameter initialization (whether it’s server or client initialized), the minimum number of clients available required to initialize a run, the weight of the client’s contributions, and training and evaluation details.

Flower has an extensive implementation of FL averaging algorithms and a robust communication stack. For a list of averaging algorithms implemented and associated research papers, refer to the following table, from Flower: A friendly Federated Learning Framework.

Federated learning with SageMaker: Solution architecture

A federated learning architecture using SageMaker with the Flower framework is implemented on top of bi-directional gRPC (foundation) streams. gRPC defines the types of messages exchanged and uses compilers to then generate efficient implementation for Python, but it can also generate the implementation for other languages, such as Java or C++.

The Flower clients receive instructions (messages) as raw byte arrays via the network. Then the clients deserialize and run the instruction (training on local data). The results (model parameters and weights) are then serialized and communicated back to the server.

The server/client architecture for Flower FL is defined in SageMaker using notebook instances in different accounts in the same Region as the Flower server and Flower client. The training and evaluation strategies are defined on the server as well as the global parameters, then the configuration is serialized and sent to the client over VPC peering.

The notebook instance client starts a SageMaker training job that runs a custom script to trigger the instantiation of the Flower client, which deserializes and reads the server configuration, triggers the training job, and sends the parameters response.

The last step occurs on the server when the evaluation of the newly aggregated parameters is triggered upon completion of the number of runs and clients stipulated on the server strategy. The evaluation takes place on a testing dataset existing only on the server, and the new improved accuracy metrics are produced.

The following diagram illustrates the architecture of the FL setup on SageMaker with the Flower package.

Arch-on-sagemaker

Implement federated learning using SageMaker

SageMaker is a fully managed ML service. With SageMaker, data scientists and developers can quickly build and train ML models, and then deploy them into a production-ready hosted environment.

In this post, we demonstrate how to use the managed ML platform to provide a notebook experience environment and perform federated learning across AWS accounts, using SageMaker training jobs. The raw training data never leaves the account that owns the data and only the derived weights are sent across the peered connection.

We highlight the following core components in this post:

  • Networking – SageMaker allows for quick setup of default networking configuration while also allowing you to fully customize the networking depending on your organization’s requirements. We use a VPC peering configuration within the Region in this example.
  • Cross-account access settings – In order to allow a user in the server account to start a model training job in the client account, we delegate access across accounts using AWS Identity and Access Management (IAM) roles. This way, a user in the server account doesn’t have to sign out of the account and sign in to the client account to perform actions on SageMaker. This setting is only for purposes of starting SageMaker training jobs, and it doesn’t have any cross-account data access permission or sharing.
  • Implementing federated learning client code in the client account and server code in the server account – We implement federated learning client code in the client account by using the Flower package and SageMaker managed training. Meanwhile, we implement server code in the server account by using the Flower package.

Set up VPC peering

A VPC peering connection is a networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses. Instances in either VPC can communicate with each other as if they are within the same network.

To set up a VPC peering connection, first create a request to peer with another VPC. You can request a VPC peering connection with another VPC in the same account, or in our use case, connect with a VPC in a different AWS account. To activate the request, the owner of the VPC must accept the request. For more details about VPC peering, refer to Create a VPC peering connection.

Launch SageMaker notebook instances in VPCs

A SageMaker notebook instance provides a Jupyter notebook app through a fully managed ML Amazon Elastic Compute Cloud (Amazon EC2) instance. SageMaker Jupyter notebooks are used to perform advanced data exploration, create training jobs, deploy models to SageMaker hosting, and test or validate your models.

The notebook instance has a variety of networking configurations available to it. In this setup, we have the notebook instance run within a private subnet of the VPC and don’t have direct internet access.

Configure cross-account access settings

Cross-account access settings include two steps to delegate access from the server account to client account by using IAM roles:

  1. Create an IAM role in the client account.
  2. Grant access to the role in the server account.

For detailed steps to set up a similar scenario, refer to Delegate access across AWS accounts using IAM roles.

In the client account, we create an IAM role called FL-kickoff-client-job with the policy FL-sagemaker-actions attached to the role. The FL-sagemaker-actions policy has JSON content as follows:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateTrainingJob",
                "sagemaker:DescribeTrainingJob",
                "sagemaker:StopTrainingJob",
                "sagemaker:UpdateTrainingJob"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeSubnets",
                "ec2:DescribeVpcs",
                "ec2:DescribeNetworkInterfaces"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::<client-account-number>:role/service-role/AmazonSageMaker-ExecutionRole-<xxxxxxxxxxxxxxx>"
        }
    ]
}

We then modify the trust policy in the trust relationships of the FL-kickoff-client-job role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<server-account-number>:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {}
        }
    ]
}

In the server account, permissions are added to an existing user (for example, developer) to allow switching to the FL-kickoff-client-job role in client account. To do this, we create an inline policy called FL-allow-kickoff-client-job and attach it to the user. The following is the policy JSON content:

{
    "Version": "2012-10-17",
    "Statement": {
        "Effect": "Allow",
        "Action": "sts:AssumeRole",
        "Resource": "arn:aws:iam::<client-account-number>:role/FL-kickoff-client-job"
    }
}

Sample dataset and data preparation

In this post, we use a curated dataset for fraud detection in Medicare providers’ data released by the Centers for Medicare & Medicaid Services (CMS). Data is split into a training dataset and a testing dataset. Because the majority of the data is non-fraud, we apply SMOTE to balance the training dataset, and further split the training dataset into training and validation parts. Both the training and validation data are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket for model training in the client account, and the testing dataset is used in the server account for testing purposes only. Details of the data preparation code are in the following notebook.

With the SageMaker pre-built Docker images for the scikit-learn framework and SageMaker managed training process, we train a logistic regression model on this dataset using federated learning.

Implement a federated learning client in the client account

In the client account’s SageMaker notebook instance, we prepare a client.py script and a utils.py script. The client.py file contains code for the client, and the utils.py file contains code for some of the utility functions that will be needed for our training. We use the scikit-learn package to build the logistic regression model.

In client.py, we define a Flower client. The client is derived from the class fl.client.NumPyClient. It needs to define the following three methods:

  • get_parameters – It returns the current local model parameters. The utility function get_model_parameters will do this.
  • fit – It defines the steps to train the model on the training data in client’s account. It also receives global model parameters and other configuration information from the server. We update the local model’s parameters using the received global parameters and continue training it on the dataset in the client account. This method also sends the local model’s parameters after training, the size of the training set, and a dictionary communicating arbitrary values back to the server.
  • evaluate – It evaluates the provided parameters using the validation data in the client account. It returns the loss together with other details such as the size of the validation set and accuracy back to the server.

The following is a code snippet for the Flower client definition:

"""Client interface"""
class FlowerClient(fl.client.NumPyClient):
    def get_parameters(self, config):  
        return utils.get_model_parameters(model)

    def fit(self, parameters, config): 
        utils.set_model_params(model, parameters)
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            model.fit(X_train, y_train)
        return utils.get_model_parameters(model), len(X_train), {}

    def evaluate(self, parameters, config):
        utils.set_model_params(model, parameters)
        loss = log_loss(y_test, model.predict_proba(X_test))
        accuracy = model.score(X_test, y_test)
        return loss, len(X_test),  {"accuracy": accuracy}

We then use SageMaker script mode to prepare the rest of the client.py file. This includes defining parameters that will be passed to SageMaker training, loading training and validation data, initializing and training the model on the client, setting up the Flower client to communicate with the server, and finally saving the trained model.

utils.py includes a few utility functions that are called in client.py:

  • get_model_parameters – It returns the scikit-learn LogisticRegression model parameters.
  • set_model_params – It sets the model’s parameters.
  • set_initial_params – It initializes the parameters of the model as zeros. This is required because the server asks for initial model parameters from the client at launch. However, in the scikit-learn framework, LogisticRegression model parameters are not initialized until model.fit() is called.
  • load_data – It loads the training and testing data.
  • save_model – It saves model as a .joblib file.

Because Flower is not a package installed in the SageMaker pre-built scikit-learn Docker container, we list flwr==1.3.0 in a requirements.txt file.

We put all three files (client.py, utils.py, and requirements.txt) under a folder and tar zip it. The .tar.gz file (named source.tar.gz in this post) is then uploaded to an S3 bucket in the client account.

Implement a federated learning server in the server account

In the server account, we prepare code on a Jupyter notebook. This includes two parts: the server first assumes a role to start a training job in the client account, then the server federates the model using Flower.

Assume a role to run the training job in the client account

We use the Boto3 Python SDK to set up an AWS Security Token Service (AWS STS) client to assume the FL-kickoff-client-job role and set up a SageMaker client so as to run a training job in the client account by using the SageMaker managed training process:

sts_client = boto3.client('sts')
assumed_role_object = sts_client.assume_role(
    RoleArn = "arn:aws:iam::<client-account-number>:role/FL-kickoff-client-job",
    RoleSessionName = "AssumeRoleSession1"
)

credentials = assumed_role_object['Credentials']

sagemaker_client = boto3.client(
    'sagemaker',
    aws_access_key_id = credentials['AccessKeyId'],
    aws_secret_access_key = credentials['SecretAccessKey'],
    aws_session_token = credentials['SessionToken'],
)

Using the assumed role, we create a SageMaker training job in client account. The training job uses the SageMaker built-in scikit-learn framework. Note that all S3 buckets and the SageMaker IAM role in the following code snippet are related to the client account:

sagemaker_client.create_training_job(
    TrainingJobName = training_job_name,
    HyperParameters = {
        "penalty": "l2",
        "max-iter": "10",
        "server-address":"<server-ip-address>:8080",
        "sagemaker_program": "client.py",
        "sagemaker_submit_directory": "s3://<client-account-s3-code-bucket>/client_code/source.tar.gz",
    },
    AlgorithmSpecification = {
        "TrainingImage": training_image,
        "TrainingInputMode": "File",
    },
    RoleArn = "arn:aws:iam::<client-account-number>:role/service-role/AmazonSageMaker-ExecutionRole-<xxxxxxxxxxxxxxx>",
    InputDataConfig=[
        {
            "ChannelName": "train",
            "DataSource": {
                "S3DataSource": {
                    "S3DataType": "S3Prefix",
                    "S3Uri": "s3://<client-account-s3-data-bucket>/data_prep/",
                    "S3DataDistributionType": "FullyReplicated",
                }
            },
        },
    ],
    OutputDataConfig = {
        "S3OutputPath": "s3://<client-account-s3-bucket-for-model-artifact>/client_artifact/"
    },
    ResourceConfig = {
        "InstanceType": "ml.m5.xlarge", 
        "InstanceCount": 1, 
        "VolumeSizeInGB": 10,
    },
    VpcConfig={
        'SecurityGroupIds': [
            "<client-account-notebook-instance-security-group>",
        ],
        'Subnets': [
            "<client-account-notebook-instance-sunbet>",
        ]
    },
    StoppingCondition = {
        "MaxRuntimeInSeconds": 86400
    },
)

Aggregate local models into a global model using Flower

We prepare code to federate the model on the server. This includes defining the strategy for federation and its initialization parameters. We use utility functions in the utils.py script described earlier to initialize and set model parameters. Flower allows you to define your own callback functions to customize an existing strategy. We use the FedAvg strategy with custom callbacks for evaluation and fit configuration. See the following code:

    """Initialize the model and federation strategy, then start the server"""
    model = LogisticRegression()
    utils.set_initial_params(model)
    
    strategy = fl.server.strategy.FedAvg(
        min_available_clients = 1,  # Minimum number of clients that need to be connected to the server before a training round can start
        min_fit_clients = 1,  # Minimum number of clients to be sampled for the next round
        min_evaluate_clients = 1,
        evaluate_fn = get_evaluate_fn(model, X_test, y_test),
        on_fit_config_fn = fit_round,
    )
    
    fl.server.start_server(
        server_address = args.server_address, 
        strategy = strategy, 
        config = fl.server.ServerConfig(num_rounds=3)  # run for 3 rounds
    )
    
    utils.save_model(args.model_dir, model)

The following two functions are mentioned in the preceding code snippet:

  • fit_round – It’s used to send the round number to the client. We pass this callback as the on_fit_config_fn parameter of the strategy. We do this simply to demonstrate the use of the on_fit_config_fn parameter.
  • get_evaluate_fn – It’s used for model evaluation on the server.

For demo purposes, we use the testing dataset that we set aside in data preparation to evaluate the model federated from the client’s account and communicate the result back to the client. However, it’s worth noting that in almost all real use cases, the data used in the server account is not split from the dataset used in the client account.

After the federated learning process is finished, a model.tar.gz file is saved by SageMaker as a model artifact in an S3 bucket in the client account. Meanwhile, a model.joblib file is saved on the SageMaker notebook instance in the server account. Lastly, we use the testing dataset to test the final model (model.joblib) on the server. Testing output of the final model is as follows:

fl-result

Clean up

After you are done, clean up the resources in both the server account and client account to avoid additional charges:

  1. Stop the SageMaker notebook instances.
  2. Delete VPC peering connections and corresponding VPCs.
  3. Empty and delete the S3 bucket you created for data storage.

Conclusion

In this post, we walked through how to implement federated learning on SageMaker by using the Flower package. We showed how to configure VPC peering, set up cross-account access, and implement the FL client and server. This post is useful for those who need to train ML models on SageMaker using decentralized data across accounts with restricted data sharing. Because the FL in this post is implemented using SageMaker, it’s worth noting that a lot more features in SageMaker can be brought into the process.

Implementing federated learning on SageMaker can take advantage of all the advanced features that SageMaker provides through the ML lifecycle. There are other ways to achieve or apply federated learning on the AWS Cloud, such as using EC2 instances or on the edge. For details about these alternative approaches, refer to Federated Learning on AWS with FedML and Applying Federated Learning for ML at the Edge.


About the authors

Sherry Ding is a senior AI/ML specialist solutions architect at Amazon Web Services (AWS). She has extensive experience in machine learning with a PhD degree in computer science. She mainly works with public sector customers on various AI/ML-related business challenges, helping them accelerate their machine learning journey on the AWS Cloud. When not helping customers, she enjoys outdoor activities.

Lorea Arrizabalaga is a Solutions Architect aligned to the UK Public Sector, where she helps customers design ML solutions with Amazon SageMaker. She is also part of the Technical Field Community dedicated to hardware acceleration and helps with testing and benchmarking AWS Inferentia and AWS Trainium workloads.

Ben Snively is an AWS Public Sector Senior Principal Specialist Solutions Architect. He works with government, non-profit, and education customers on big data, analytical, and AI/ML projects, helping them build solutions using AWS.

Read More

Coming This Fall: NVIDIA DLSS 3.5 for Chaos Vantage, D5 Render, Omniverse and Popular Game Titles

Coming This Fall: NVIDIA DLSS 3.5 for Chaos Vantage, D5 Render, Omniverse and Popular Game Titles

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

Gamescom, the biggest gaming event of the year, kicks off tomorrow in Cologne, Germany, but gamers and content creators can find some of the latest innovations, tools and AI-powered tech this week In the NVIDIA Studio.

On the eve of the show’s official opening, NVIDIA announced NVIDIA DLSS 3.5 featuring Ray Reconstruction — a new neural rendering AI model that creates more beautiful and realistic ray-traced visuals than traditional rendering methods — for real-time 3D creative apps and games.

NVIDIA RTX Remix, a free modding platform built on NVIDIA Omniverse and available now, gives people the tools to create and share #RTXON mods for classic games. We also announced Half-Life 2 RTX: An RTX Remix Project, a community remaster project of Valve’s Half-Life 2, one of the highest-rated games of all time.

This week’s In the NVIDIA Studio installment also features digital artist Diyor Makhmudov’s 3D work, inspired by the extraordinary gaming franchise The Witcher.

Reallusion software released an update to the iClone Omniverse Connector, including real-time synchronization of projects and enhanced import functionality for OpenUSD, enabling quicker, more efficient workflows. Learn more in the latest edition of the Into the Omniverse series.

Finally, calling all video editors to sign up for the premiere DaVinci Resolve event — ResolveCon — in Portland, Oregon, from Aug. 25-27. In-person attendees can win giveaways, including new GeForce RTX GPUs, while virtual attendees can view tutorials livestreamed by In the NVIDIA Studio artist Casey Faris.

Next-Level Graphical Fidelity With DLSS 3.5 

NVIDIA DLSS 3.5 adds Ray Reconstruction, which improves ray-traced image quality for all GeForce RTX GPUs by replacing hand-tuned denoisers with an NVIDIA supercomputer-trained AI network that generates higher-quality pixels in between sampled rays.

Seeing is believing — watch the Tech Talk with NVIDIA Vice President of Applied Deep Learning Research Bryan Catanzaro to learn how DLSS 3.5 works.

Creative apps with ray-traced renderers face a wide variety of content that is difficult for traditional denoisers to handle, as they require hand-tuning for every scene. As a result, content previews return suboptimal image quality. With DLSS 3.5, the AI neural network recognizes a wide variety of scenes, producing high-quality images during preview and before committing hours to a final render.

D5 Render and Chaos Vantage, two popular professional-grade 3D apps for architects and designers, feature real-time preview modes with ray tracing. With DLSS 3.5, the AI neural network replaces the denoisers, inferring and producing higher-quality previews while building and iterating.

DLSS 3.5 will improve image quality in D5 Render.

Popular creative apps Chaos Vantage, D5 Render and NVIDIA Omniverse, as well as popular gaming titles Alan Wake 2, Cyberpunk 2077, Cyberpunk 2077: Phantom Liberty and Portal with RTX, are all adding support for NVIDIA DLSS 3.5 this fall.

Developers will be able to seamlessly integrate DLSS 3.5 with the new Streamline SDK coming soon. Learn more about DLSS 3.5.

Half-Life 2 RTX: An RTX Remix Project

Half-Life 2 RTX: An RTX Remix Project is being developed by four of Half-Life 2’s top mod teams, now known as Orbifold Studios.

Half-Life 2 RTX: An RTX Remix Project.

Using the latest version of RTX Remix, Orbifold Studios is rebuilding materials with physically based rendering properties, adding extra geometric detail with Valve’s Hammer editor and using the full range of NVIDIA technologies, including NVIDIA DLSS, NVIDIA RTX IO and NVIDIA Reflex, to breathe new life into the critically acclaimed title.

As with Portal with RTX, a high-fidelity reimagining of Valve’s timeless classic, and Portal: Prelude RTX, built by community modders, nearly every asset in Half-Life 2 RTX: An RTX Remix Project is being reconstructed in high fidelity and with full ray tracing (otherwise known as path tracing, enabling advanced rendering techniques. Compared to the original, some assets feature 20x the geometric detail.

Half-Life 2 RTX: An RTX Remix Project is early in development and is a community effort looking to galvanize talented modders and artists everywhere. To join the project, apply via the Orbifold Studios website.

Eat. Sleep. Game. Create. Repeat.

Already building 3D scenes in immaculate detail, 19-year-old digital creator and 3D lighting artist Diyorbek Makhmudov has the savvy skills of an industry veteran and a bright future ahead.

‘Tavern’ inspired by The Witcher 3.

Makhmudov has always had a deep-rooted passion for gaming, gaining inspiration from the 3D worlds of his favorite games. Most notably, The Witcher franchise has fueled him to create 3D worlds that showcase his own signature look and feel.

‘Calm’ inspired by The Witcher 3.

Unlike many content creators featured In the NVIDIA Studio, Makhmudov doesn’t like to bring his own life experience into the creative process, enjoying the escapism offered by world-building.

“I like to immerse myself in another universe,” said Makhmudov. “I don’t like to express my feelings, thoughts or emotions in my creations.”

‘Tavern’ inspired by The Witcher 3.

Makhmudov follows standard 3D creative workflow practices: gathering reference material, prepping materials, shaping environments and tinkering with materials, textures and colors. But it’s in 3D creation where he really shines.

 

Makhmudov uses his preferred 3D app — Cinema 4D — to achieve smooth interactivity while working with complex 3D models thanks to the NVIDIA GPU-accelerated viewport. It’s powered by a GeForce RTX 3090 graphics card, which offers considerable increases in efficiency while fueling creativity.

 

In the video above, Makhmudov is able to move within the scene, tinkering while the scene renders in real time.

 

Cinema 4D also supports several popular GPU-accelerated renderers such as Chaos V-Ray, OTOY OctaneRender and Maxon’s Redshift. This flexibility allows Makhmudov to use whichever best suits his needs.

Too many great rendering choices.

“Redshift is fast, has a good light-linking system and I have almost full control over everything,” said Makhmudov. He prefers OctaneRender for exporting ultra-realistic renders quickly. The built-in Cinema 4D render is also a speedy option. The only thing he can’t do is work on a CPU alone because, to quote Makhmudov, “It’s very slow.”

Perfectly lit.

 

“When you do personal work, it pushes you more to achieve a good result,” said Makhmudov. “As a bonus, that big portfolio will be a major advantage in your job search.”

3D artist Diyor Makhmudov.

Check out Makhmudov on ArtStation.

Follow NVIDIA Studio on Instagram, Twitter and Facebook. Access tutorials on the Studio YouTube channel and get updates directly in your inbox by subscribing to the Studio newsletter. 

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team. Developers can get started with Omniverse resources. Stay up to date on the platform by subscribing to the newsletter, and follow NVIDIA Omniverse on Instagram, Medium and Twitter

For more, join the Omniverse community and check out the Omniverse forums, Discord server, Twitch and YouTube channels.

Read More