AWS Cost Efficiency

Best Practices for reducing AWS Graviton Instances costs

AWS Graviton Series Part 2

Optimizing AWS Graviton costs is important for businesses seeking to refine their cloud infrastructure expenses. Graviton2-based instances deliver up to 20% better price-performance compared to x86-based instances for various applications.[1]

In this post, we will explore some of the best practices that will help you reduce your AWS Graviton costs. By using Graviton instances efficiently, businesses can potentially save millions of dollars annually on their cloud computing expenses.

Given below are some best practices for performance optimization[2] and reducing your instance costs.[3]

1. Use optimized compiler flags

Using optimized compiler flags means telling the program how to write its instructions in a way that works best for the specific chip it's running on. For example, with AWS Graviton2, there are special settings called flags that can make things run even faster. One of these flags is called "outline-atomics", which helps with tasks like locking and unlocking stuff quickly.

By using these special settings, not only does the program run faster, but it also works well on older chips too, which is great for saving costs. For AWS Graviton2 chips, you can use flags like "-march=armv8.2-a+crypto" and "-mtune=neoverse-n1" to make things run even smoother.

2. Upgrade operating systems

Switching to the newest 64-bit ARM versions of operating systems, like Amazon Linux 2 or Ubuntu 20.04, can save costs by making things run smoother and using resources better. These versions have libraries made with special configurations that work really well with modern processors like AWS Graviton2. Because they use resources more efficiently, you don't need as much computing power for the same tasks, which means spending less on infrastructure. Plus, keeping up-to-date means you get the latest security fixes and new features, which helps save costs by avoiding problems and downtime from system updates.

3. Tune low level code

Tuning low-level code involves finding and fine-tuning any source code that contains CPU instructions specific to a particular type of processor. While it's not common in regular application programming, some code or libraries might use really efficient inline assembly code to make things work super fast on a certain type of CPU. But here's the catch: code that's made to work great on one type ,like x86, might not work as well on Graviton2, which uses a different kind called ARM.

So, in these situations, the program might end up using a slower, one-size-fits-all version instead of taking full advantage of what Graviton2 can do. That's why it's super important to find and tune this kind of source code to make sure it works its best on AWS Graviton2 instances. By making these performance-critical parts of the code work well with Graviton2, programs can really make the most out of the ARM-based architecture, making sure they run as fast and efficiently as possible and saving costs.

Here's a simplified example on identifying the architecture and printing a message accordingly:

#include <stdio.h>

int main() {
#if defined(__x86_64__) || defined(__i386__)
    printf("Using x86 architecture.\n");
#elif defined(__arm__)
    printf("Using ARM architecture.\n");
    printf("Using an architecture other than x86 or ARM.\n");

    return 0;

4. Test Performance on multiple instance sizes

Testing how well your applications perform on different instance sizes means comparing how they run on various types of machines within the same group. By testing both small and large sizes, you can find out if there are any problems with performance that show up only on certain sizes. For example, your app might work fine on smaller machines but struggle on bigger ones because they handle resources differently. Finding these problems by testing everything carefully lets you give useful advice to your team or customers about which size of machine to pick. This advice makes sure they choose sizes that match what they need, so they can save costs while still getting the performance they want.

5. Using Sanitizers

A sanitizer is a tool or feature used in software development to detect and address various types of bugs and issues in code.Using sanitizers like address and undefined behavior sanitizers helps save money by finding and fixing memory-related problems and undefined behavior in code before they cause big issues. Turning on these sanitizers when you're writing your code helps catch things like memory leaks and overflowing buffers early on.

By fixing these problems before your code goes live, you avoid expensive downtime and performance problems. Plus, fixing memory bugs early means your program uses resources better, so you don't have to spend as much on fancy new equipment. Overall, using sanitizers makes your program more reliable and efficient, which saves money on fixing and maintaining it, whether you're using Graviton or x86 systems.

The command to compile code with memory sanitizers using GCC and add them to standard compiler flags is as follows:

CFLAGS += -fsanitize=address -fsanitize=undefined
LDFLAGS += -fsanitize=address -fsanitize=undefined

The -fsanitize=address flag enables the AddressSanitizer, which helps detect memory errors like buffer overflows and use-after-free bugs. Similarly, the -fsanitize=undefined flag enables the UndefinedBehaviorSanitizer, which helps identify undefined behavior in the code.

By adding these flags to the CFLAGS and LDFLAGS, you ensure that the memory sanitizers are applied during both compilation and linking stages, helping you catch memory-related bugs and undefined behavior early in the development process.

6. Lock/synchronization intensive workload

Using Graviton2's Arm Large Scale Extensions (LSE) for tasks involving heavy locking and synchronization can significantly improve performance and reduce costs. By compiling your code with the -march=armv8.2-a flag, you can activate LSE-based locking and synchronization. This is particularly beneficial for tasks with numerous locks and multiple processor cores.However, this approach may not be compatible with older Arm v8.0 systems, such as AWS Graviton-based EC2 A1 instances. Instead, newer versions of GCC, like GCC 10, provide an alternative option called -moutline-atomics.

This option maintains compatibility with older systems while still allowing your program to dynamically determine the most efficient way to handle atomic operations during runtime.By adopting this strategy, your program can achieve optimal performance while remaining compatible with a wide range of Arm architectures. Ultimately, this results in improved efficiency and cost savings, as you leverage the full potential of Graviton2 instances without sacrificing compatibility with older systems.

7. Profiling the code

Profiling code means figuring out which parts of your program are using up the most computer time. Tools like perf help with this by showing where the CPU is spending its time, so developers can find and fix the slow parts. This makes the program use resources more efficiently, which stops you from using too much and spending too much money on cloud servers. By making sure your program runs its best, profiling helps cut down on the costs of running it and makes it work faster overall.

The commands below are used to profile the performance of a program (in this case, ffmpeg) using the perf tool.

# Record the performance profile
sudo perf record -g -F99 -o ./ffmpeg

# Generate a performance report
perf report

# Generate a flame graph for visual analysis
perf script -i | FlameGraph/ | FlameGraph/ > flamegraph.svg

For example, let's say we have a video processing program called ffmpeg. By comparing how long it takes to run on different types of servers (like C5 and M6g), we found that a specific function called ff_hscale_8_to_15_neon was using a lot of CPU time.

The analysis showed the following differences in execution times:

C5.4XL                                  M6g.4XL
19.89% dv_encode_video_segment 19.57% ff_hscale_8_to_15_neon
11.21% decode_significance_x86 18.02% get_cabac
8.68% get_cabac 15.08% dv_encode_video_segment
8.43% ff_h264_decode_mb_cabac 5.85% ff_jpeg_fdct_islow_8
8.05% ff_hscale8to15_X4_ssse3 5.01% ff_yuv2planeX_8_neon

We used profiling tools to look deeper into why this function was taking so long. After analyzing the results, we focused on optimizing that particular function. This made ffmpeg run faster on the M6g server, showing how profiling and fixing slow parts can make programs work better and save costs.


Reducing AWS Graviton costs is vital for refining cloud expenses. Using best practices like optimized compiler flags, OS upgrades, low-level code tuning, performance testing, sanitizers, and leveraging Graviton2's LSE can enhance efficiency and minimize costs. Profiling code with tools like `perf` aids in identifying and optimizing performance. Implementing these strategies ensures optimal resource utilization, maximizes Graviton benefits, and minimizes expenses across diverse architectures.


1. AWS Graviton - ARM Processor

2. Reviewing your cost structure - AWS Graviton2 for Independent Software Vendors

3. aws-graviton-getting-started/ at main

Subscribed !
Your information has been submitted
Oops! Something went wrong while submitting the form.

Similar Blog Posts

Maintain Control and Curb Wasted Spend!
Strategical use of SCPs saves more cloud cost than one can imagine. Astuto does that for you!
Let’s Talk
Let’s Talk