How an important change in web standards impacts your image annotation jobs

Earlier in 2020, widely used browsers like Chrome and Firefox changed their default behavior for rotating images based on image metadata, referred to as EXIF data. Previously, images always displayed in browsers exactly how they’re stored on disk, which is typically unrotated. After the change, images now rotate according to a piece of image metadata called orientation value. This has important implications for the entire machine learning (ML) community. For example, if the EXIF orientation isn’t considered, applications that you use to annotate images may display images in unexpected orientations and result in confusing or incorrect labels.

For example, before the change, by default images would display in the orientation stored on the device, as shown in the following image. After the change, by default, images display according to the orientation value in EXIF data, as shown in the second image.

Here, the image was stored in portrait mode, with EXIF data attached to indicate it should be displayed with a landscape orientation.

To ensure images are predictably oriented, ML annotation services need to be able to view image EXIF data. The recent change to global web standards requires you to grant explicit permission to image annotation services to view your image EXIF data.

To guarantee data consistency between workers and across datasets, the annotation tools used by Amazon SageMaker Ground Truth, Amazon Augmented AI (Amazon A2I), and Amazon Mechanical Turk need to understand and control orientations of input images that are shown to workers. Therefore, from January 12, 2021, onward, AWS requires that you add a cross-origin resource sharing (CORS) header configuration to Amazon Simple Storage Service (Amazon S3) buckets that contain labeling job or human review task input data. This policy allows these AWS services to view EXIF data and verify that images are predictably oriented in labeling and human review tasks.

This post provides details on the image metadata change, how it can impact labeling jobs and human review tasks, and how you can update your S3 buckets with these new, required permissions.

What is EXIF data?

EXIF data is metadata that tells us things about the image. EXIF data typically includes the height and width of an image but can also include things like the date a photo was taken, what kind of camera was used, and even GPS coordinates where the image was captured. For the image annotation web application community, the orientation property of EXIF is about to become very important.

When you take a photo, whether it’s landscape or portrait, the data is written to storage in the landscape orientation. Instead of storing a portrait photo in the portrait orientation, the camera writes a piece of metadata to the image to explain to applications how that image should be rotated when it’s shown to humans. To learn more, see Exif.

A big change to browsers: Why EXIF data is important

Until recently, popular web browsers such as Chrome and Firefox didn’t use EXIF orientation values, meaning that images that users annotated were never rotated. This means the annotation data matched how the image was stored and the orientation value didn’t matter.

Earlier in 2020, Chrome and Firefox changed their default behavior to begin using EXIF data by default. To make sure image annotating tasks weren’t impacted, AWS mitigated this change by preventing rotation so that users continued to annotate images in their unrotated form. However, AWS can no longer automatically prevent the rotation of images because the web standards group W3C has decided that the ability to control image rotation violates the web’s Same Origin Policy.

It is estimated that, starting with Chrome 88 on January 19th, 2021, annotation services like the ones offered by AWS will require additional permissions to control the orientation of your images when displayed to human workers.

When using AWS services, you can grant these permissions by adding a CORS header policy to the S3 buckets that contain your input images.

Upcoming change to AWS image annotation job security requirements

It is recommended you add a CORS configuration to all S3 buckets that contain input data used for active and future labeling jobs as soon as possible. Starting January 12th, 2021, to ensure human workers annotate your input images in a predictable orientation when you submit requests to create one of the following, you must add a CORS header policy to the S3 buckets that contain your input images:

If you have pre-existing active resources like Ground Truth streaming labeling jobs, you must add a CORS header policy to the S3 bucket used to create those resources. For Ground Truth, this is the input data S3 bucket identified when you created the streaming labeling job.

Additionally, if you reuse resources, such as cloning a Ground Truth labeling job, make sure the input data S3 bucket you use has a CORs header policy attached.

In the context of input image data, AWS services use CORS headers to view EXIF orientation data to control image rotation.

If you don’t add a CORS header policy to an S3 bucket that contains input data by January 12th, 2021, Ground Truth, Amazon A2I, and Mechanical Turk tasks created using this S3 bucket will fail.

Adding a CORS header policy to an S3 bucket

If you’re creating an Amazon A2I human loop or Mechanical Turk job, or you’re using the CreateLabelingJob API to create a Ground Truth labeling job, you can add a CORS policy to an S3 bucket that contains input data on the Amazon S3 console.

If you create your job through the Ground Truth console, under Enable enhanced image access, a check box is select to enable CORS configuration on the S3 bucket that contains your input manifest file as shown in the following image. Keep this check box selected. If all of your input data is not located in the same S3 bucket as your input manifest file, you must manually add a CORS configuration to all S3 buckets that contain input data using the following instructions.

For instructions on setting the required CORS headers on the S3 bucket that hosts your images, see How do I add cross-domain resource sharing with CORS? Use the following CORS configuration code for the buckets that host your images.

The following is the code in JSON format:

[{
   "AllowedHeaders": [],
   "AllowedMethods": ["GET"],
   "AllowedOrigins": ["*"],
   "ExposeHeaders": []
}]

The following is the code in XML format:

<CORSConfiguration>
 <CORSRule>
   <AllowedOrigin>*</AllowedOrigin>
   <AllowedMethod>GET</AllowedMethod>
 </CORSRule>
</CORSConfiguration>

The following GIF demonstrates the instructions found in the Amazon S3 documentation to add a CORS header policy using the Amazon S3 console.

Conclusion

In this post, we explained how a recent decision made by the web standards group W3C will impact the ML community. AWS image annotation service providers will now require you to grant permission to view orientation values of your input images, which are stored in image EXIF data.

Make sure you enable CORS headers on the S3 buckets that contain your input images before creating Ground Truth labeling jobs, Amazon A2I human review jobs, and Mechanical Turk tasks on or after January 12th, 2021.

 


About the Authors

Talia Chopra is a Technical Writer in AWS specializing in machine learning and artificial intelligence. She works with multiple teams in AWS to create technical documentation and tutorials for customers using Amazon SageMaker, MxNet, and AutoGluon.

 

 

Phil Cunliffe is an engineer turned Software Development Manager for Amazon Human in the Loop services. He is a JavaScript fanboy with an obsession for creating great user experiences.

Read More