Upscale images with Stable Diffusion in Amazon SageMaker JumpStart

In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Today, we announce a new feature that lets you upscale images (resize images without losing quality) with Stable Diffusion models in JumpStart. An image that is low resolution, blurry, and pixelated can be converted into a high-resolution image that appears smoother, clearer, and more detailed. This process, called upscaling, can be applied to both real images and images generated by text-to-image Stable Diffusion models. This can be used to enhance image quality in various industries such as ecommerce and real estate, as well as for artists and photographers. Additionally, upscaling can improve the visual quality of low-resolution images when displayed on high-resolution screens.

Stable Diffusion uses an AI algorithm to upscale images, eliminating the need for manual work that may require manually filling gaps in an image. It has been trained on millions of images and can accurately predict high-resolution images, resulting in a significant increase in detail compared to traditional image upscalers. Additionally, unlike non-deep-learning techniques such as nearest neighbor, Stable Diffusion takes into account the context of the image, using a textual prompt to guide the upscaling process.

In this post, we provide an overview of how to deploy and run inference with the Stable Diffusion upscaler model in two ways: via JumpStart’s user interface (UI) in Amazon SageMaker Studio, and programmatically through JumpStart APIs available in the SageMaker Python SDK.

Solution overview

The following images show examples of upscaling performed by the model. On the left is the original low-resolution image enlarged to match the size of the image generated by the model. On the right is the image generated by the model.

The first generated image is the result of low resolution cat image and the prompt “a white cat.”

The second generated image is the result of low resolution butterfly image and the prompt “a butterfly on a green leaf.”

Running large models like Stable Diffusion requires custom inference scripts. You have to run end-to-end tests to make sure that the script, the model, and the desired instance work together efficiently. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. You can access these scripts with one click through the Studio UI or with very few lines of code through the JumpStart APIs.

The following sections provide an overview of how to deploy the model and run inference using either the Studio UI or the JumpStart APIs.

Note that by using this model, you agree to the CreativeML Open RAIL++-M License.

Access JumpStart through the Studio UI

In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI. The following video shows how to find the pre-trained Stable Diffusion upscaler model on JumpStart and deploy it. The model page contains valuable information about the model and how to use it. For inference, we use the ml.p3.2xlarge instance type because it provides the GPU acceleration needed for low-inference latency at a low price point. After you configure the SageMaker hosting instance, choose Deploy. It will take 5–10 minutes until the endpoint is up and running and ready to respond to inference requests.

Video: stable diffusion upscaling.mov

To accelerate the time to inference, JumpStart provides a sample notebook that shows how to run inference on the newly created endpoint. To access the notebook in Studio, choose Open Notebook in the Use Endpoint from Studio section of the model endpoint page.

Use JumpStart programmatically with the SageMaker SDK

You can use the JumpStart UI to deploy a pre-trained model interactively in just a few clicks. However, you can also use JumpStart models programmatically by using APIs that are integrated into the SageMaker Python SDK.

In this section, we choose an appropriate pre-trained model in JumpStart, deploy this model to a SageMaker endpoint, and run inference on the deployed endpoint, all using the SageMaker Python SDK. The following examples contain code snippets. For the full code with all of the steps in this demo, see the Introduction to JumpStart – Enhance image quality guided by prompt example notebook.

Deploy the pre-trained model

SageMaker utilizes Docker containers for various build and runtime tasks. JumpStart utilizes the SageMaker Deep Learning Containers (DLCs) that are framework-specific. We first fetch any additional packages, as well as scripts to handle training and inference for the selected task. Then the pre-trained model artifacts are separately fetched with model_uris, which provides flexibility to the platform. This allows multiple pre-trained models to be used with a single inference script. The following code illustrates this process:

model_id, model_version = "model-upscaling-stabilityai-stable-diffusion-x4-upscaler-fp16", "*"
# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,  # automatically inferred from model_id
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="inference")

base_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="inference")

Next, we provide those resources into a SageMaker model instance and deploy an endpoint:

# Create the SageMaker model instance
model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",  # entry point file in source_dir and present in deploy_source_uri
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name,
)

# deploy the Model - note that we need to pass the Predictor class when we deploy the model through the Model class,
# in order to run inference through the SageMaker API
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    predictor_cls=Predictor,
    endpoint_name=endpoint_name,
)

After our model is deployed, we can get predictions from it in real time!

Input format

The endpoint accepts a low-resolution image as raw RGB values or a base64 encoded image. The inference handler decodes the image based on content_type:

For content_type = “application/json”, the input payload must be a JSON dictionary with the raw RGB values, a textual prompt, and other optional parameters
For content_type = “application/json;jpeg”, the input payload must be a JSON dictionary with the base64 encoded image, a textual prompt, and other optional parameters

Output format

The following code examples give you a glimpse of what the outputs look like. Similarly to the input format, the endpoint can respond with the raw RGB values of the image or a base64 encoded image. This can be specified by setting accept to one of the two values:

For accept = “application/json”, the endpoint returns the a JSON dictionary with RGB values for the image
For accept = “application/json;jpeg”, the endpoint returns a JSON dictionary with the JPEG image as bytes encoded with base64.b64 encoding

Note that sending or receiving the payload with the raw RGB values may hit default limits for the input payload and the response size. Therefore, we recommend using the base64 encoded image by setting content_type = “application/json;jpeg” and accept = “application/json;jpeg”.

The following code is an example inference request:

content_type = “application/json;jpeg” 

# We recommend rescaling the image of low_resolution_image such that both height and width are powers of 2.
# This can be achieved by original_image = Image.open('low_res_image.jpg'); rescaled_image = original_image.rescale((128,128)); rescaled_image.save('rescaled_image.jpg')
with open(low_res_img_file_name,'rb') as f: low_res_image_bytes = f.read()

encoded_image = base64.b64encode(bytearray(low_res_image_bytes)).decode()

payload = { "prompt": "a cat", "image": encoded_image,  "num_inference_steps":50, "guidance_scale":7.5}

accept = "application/json;jpeg"

def query(model_predictor, payload, content_type, accept):
    """Query the model predictor."""
    query_response = model_predictor.predict(
        payload,
        {
            "ContentType": content_type,
            "Accept": accept,
        },
    )
    return query_response

The endpoint response is a JSON object containing the generated images and the prompt:

def parse_response(query_response):
"""Parse response and return the generated images and prompt."""

    response_dict = json.loads(query_response)
    return response_dict["generated_images"], response_dict["prompt"]
    
query_response = query(model_predictor, json.dumps(payload).encode('utf-8'), content_type, accept)
generated_images, prompt = parse_response(query_response)

Supported parameters

Stable Diffusion upscaling models support many parameters for image generation:

image – A low resolution image.
prompt – A prompt to guide the image generation. It can be a string or a list of strings.
num_inference_steps (optional) – The number of denoising steps during image generation. More steps lead to higher quality image. If specified, it must a positive integer. Note that more inference steps will lead to a longer response time.
guidance_scale (optional) – A higher guidance scale results in an image more closely related to the prompt, at the expense of image quality. If specified, it must be a float. guidance_scale<=1 is ignored.
negative_prompt (optional) – This guides the image generation against this prompt. If specified, it must be a string or a list of strings and used with guidance_scale. If guidance_scale is disabled, this is also disabled. Moreover, if the prompt is a list of strings, then the negative_prompt must also be a list of strings.
seed (optional) – This fixes the randomized state for reproducibility. If specified, it must be an integer. Whenever you use the same prompt with the same seed, the resulting image will always be the same.
noise_level (optional) – This adds noise to latent vectors before upscaling. If specified, it must be an integer.

You can recursively upscale an image by invoking the endpoint repeatedly to get higher and higher quality images.

Image size and instance types

Images generated by the model can be up to four times the size of the original low-resolution image. Furthermore, the model’s memory requirement (GPU memory) grows with the size of the generated image. Therefore, if you’re upscaling an already high-resolution image or are recursively upscaling images, select an instance type with a large GPU memory. For instance, ml.g5.2xlarge has more GPU memory than the ml.p3.2xlarge instance type we used earlier. For more information on different instance types, refer to Amazon EC2 Instance Types.

Upscaling images piece by piece

To decrease memory requirements when upscaling large images, you can break the image into smaller sections, known as tiles, and upscale each tile individually. After the tiles have been upscaled, they can be blended together to create the final image. This method requires adapting the prompt for each tile so the model can understand the content of the tile and avoid creating strange images. The style part of the prompt should remain consistent for all tiles to make blending easier. When using higher denoising settings, it’s important to be more specific in the prompt because the model has more freedom to adapt the image. This can be challenging when the tile contains only background or isn’t directly related to the main content of the picture.

Limitations and bias

Even though Stable Diffusion has impressive performance in upscaling, it suffers from several limitations and biases. These include but are not limited to:

The model may not generate accurate faces or limbs because the training data doesn’t include sufficient images with these features
The model was trained on the LAION-5B dataset, which has adult content and may not be fit for product use without further considerations
The model may not work well with non-English languages because the model was trained on English language text
The model can’t generate good text within images

For more information on limitations and bias, refer to the Stable Diffusion upscaler model card.

Clean up

After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped. The code to clean up the endpoint is available in the associated notebook.

Conclusion

In this post, we showed how to deploy a pre-trained Stable Diffusion upscaler model using JumpStart. We showed code snippets in this post—the full code with all of the steps in this demo is available in the Introduction to JumpStart – Enhance image quality guided by prompt example notebook. Try out the solution on your own and send us your comments.

To learn more about the model and how it works, see the following resources:

To learn more about JumpStart, check out the following blog posts:

About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on Natural Language Processing (NLP), Large Language Models (LLMs), and Generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers being successful in their AI/ML journey on AWS and has worked with organizations in many industries, including Insurance, Financial Services, Media and Entertainment, Healthcare, Utilities, and Manufacturing. In his spare time Heiko travels as much as possible.

Vedere AI