Amazon Aurora is becoming an increasingly popular managed database solution. It’s easy to see why: Aurora is fast, reliable, and built on a fully distributed and self-healing storage system. On top of that, it provides enterprise-level capabilities for an affordable price.
As many of our customers are deciding to switch to the Amazon’s database, we’ve compiled an ultimate list of all things you should know about Amazon Aurora.
The list captures all the essentials you ought to know before you start using the solution. What is Amazon Aurora? How does Aurora work? How can you migrate data from your existing database? You’ll find answers to all the basic questions.
Amazon Aurora is a relational database engine as a service.
It is MySQL compatible, which means that with Aurora you’ll be able to utilize most of the code, applications, drivers or tools from MySQL with little or no changes. Additionally, there are also push-button migration tools available that will let you convert the existing Amazon RDS for MySQL applications to Amazon Aurora.
Aurora is managed by Amazon RDS.
Amazon handles tasks such as provisioning, patching, backup, recovery, failure detection, and repair. You simply pay a single monthly usage fee without any upfront cost and long-term commitment.
According to Amazon, Aurora delivers over 500,000 SELECTs/sec and 100,000 updates/sec.
This is five times higher than MySQL running the same benchmarks on the same hardware. The area where Aurora outperforms MySQL is with highly concurrent workloads. By all means, to get the largest performance boost over MySQL, build applications that drive plenty of concurrent workloads.
The minimum storage for Aurora is 10GB.
It increases automatically up to 64TB with 10GB increments. There is no need to provision storage in advance.
The compute resources allocated to the database can scale up to 32 vCPUs and 244 GB memory.
To change memory and vCPUs amount, you have to modify the instance class. All changes will be implemented during your specified service window, or you can force them to implement immediately. Of course, keep in mind that all other changes in the queue will be applied as well, even if they require a reboot.
Aurora is a highly available database.
When you create an Aurora DB, in fact, you create an Aurora cluster, which may consist of one or more instances.
Two types of instances can make up an Aurora DB cluster:
- Primary instance – supports read-write workloads and performs data modification to the cluster volume. Each cluster must have one primary instance.
- Aurora replica – supports only read operations. A single cluster may have up to 15 replicas in separate Availability Zones to distribute read workloads and increase database availability.
Underneath, there is a cluster volume, which is a virtual database storage volume. It spans multiple Availability Zones, and each Availability Zone has a copy of cluster data. Data in the cluster volume is represented as a single volume to the Primary Instance and Replicas. As a result, all replicas return the same data within 100 ms after the primary instance has written it.
Cluster volume is automatically divided into 10GB segments spread across many disks. Each 10GB part is replicated six ways across three Availability Zones. This method makes Aurora extremely durable. The system can handle the loss of two copies of data without affecting the database write availability, and up to three copies without affecting the read availability. Moreover, Aurora is self-healing: it continuously scans data blocks for errors and applies repairs automatically.
The size of Aurora cluster volume determines the table size limit in Aurora.
As cluster volume may be 64TB, the table size limit is also 64TB.
When connecting to Aurora, you can use one of a few types of endpoints to connect.
These are cluster endpoint, reader endpoint, and instance endpoint:
- Cluster endpoint – always connects to the primary instance for the DB cluster. It can perform both, read and write operations.
- Reader endpoint – load balances the connections between replicas in a cluster. It is a read-only access. If there are no replicas available in a cluster, it points to the cluster endpoint.
- Instance endpoint – the primary endpoint and Aurora Replicas in a DB cluster all have an instance endpoint, which is a unique endpoint to connect to that very specific instance.
Aurora Replicas are independent endpoints in an Aurora DB cluster.
They provide a read-only access for the read workloads and help improve performance and availability. If the primary instance fails, Aurora Replica immediately takes over the primary role. You can also manually failover to the Replica if needed.
You can create an Aurora DB cluster as a Read Replica in a different region than the source DB cluster.
It will improve high availability and let you scale read operations to the region the closest to your users. To enable cross-region replication, you need to enable binary logging on your source cluster.
In the cross-region scenario you need to be aware of two facts:
- There is more lag time between the source DB cluster and replica.
- Data transferred between regions incur Amazon RDS data transfer charges.
Aurora DB can integrate with other AWS services.
Before integration, you must create and configure AWS Identity and Access Management (IAM) roles to authorize users in the DB to access AWS services. You also have to allow outbound connections from DB to the target AWS services.
After that you will be able to:
- Asynchronously invoke a Lambda Function using the mysql.lambda_async procedure.
- Load data from text or XML files stored in Amazon S3 buckets.
- Save data from DB cluster into text files stored in Amazon S3 buckets.
You can use database cloning.
Use this function if you need to experiment on your DB, perform a workload intensive operations or create a copy of a production DB cluster in a non-production environment. The mechanism uses a copy-on-write protocol. Therefore, the system copies the data as it changes, either on the source or cloned databases.
You can create multiple clones from the same DB cluster, but not across AWS regions.
When you use T2 instances with Aurora, you need to:
- Monitor your CPU Credit Balance to make sure it accumulates at the same rate as it’s consumed. Depleting your CPU Credit Balance will cause an immediate drop in the CPU and an increase of the read/write latency for the instance.
- Keep the number of inserts per transaction below 1 million for DB clusters that have binary logging enabled.
- Monitor the replica lag between the primary instance and the Aurora Replicas in the cluster. If a replica runs out of credits before the primary instance, the lag behind the primary instance will result in Aurora Replica frequently restarting.
There are several options to migrate data from your existing database to Amazon Aurora.
You can migrate data:
- From Amazon RDS to MySQL DB instance – create an Aurora Read Replica of the MySQL instance, then direct your application to the Aurora Read Replica, stop replication and finally make the Replica a standalone cluster. You can also migrate directly from a MySQL snapshot into an Aurora Cluster.
- From an external MySQL database supporting InnoDB or MyISAM tablespaces – you can create a dump of your data and import it to the existing Aurora DB cluster. Another way is to copy the backup files to the Amazon S3 bucket and restore Aurora DB from these files. As a third option, you can save data from the database to text files, copy it to the S3 bucket and then load to the existing Aurora DB cluster.
- From a database that is not MySQL compatible – use AWS Database Migration Service.
Automated backups are enabled by default.
When you remove a database, it will wipe out all automated backups as well. As a result, you will not be able to restore that DB to a specific point in time.
If you have a second please watch our last webinar about practical migration from an Oracle database to Amazon Aurora.