ElastiCache, OpenSearch, Neptune & Redshift. AWS Solutions Architect Associate Complete Course
Chapter 14: Other AWS Databases
In addition to Amazon RDS, the relational database we saw in the last chapter, AWS supports other databases like cache databases (Amazon ElastiCache), graph databases (Amazon Neptune), and Data Warehouses (Amazon Redshift). Let’s see these technologies in detail!
- ElastiCache
- OpenSearch
- Redshift
- Redshift Spectrum
- DocumentDB
- Amazon Neptune
- Other AWS Databases
- Typical Exam Questions
Remember that all the chapters from the course can be found in the following link.
ElastiCache
Fully-managed in-memory data store, compatible with Redis or Memcached engines, with sub-millisecond latency. As it’s a fully managed service, AWS is in charge of all its management and configuration; you only need to worry about the data. It’s used when you need extreme performance and extremely low latency. It is popular for real-time use cases, especially in Caching and Session Stores.
As we said, it supports two in-memory database engines, Redis and Memcached. Both of them support SSL, although they DON’T support IAM Authentication. Let’s see the differences in the following diagram:
- Redis → Fast in-memory data store that provides sub-millisecond latency to power internet-scale real-time applications. It has Multi-AZ with auto-failover, Read Replicas, and Data Durability (data persists between sessions). It supports authenticating users with the Redis AUTH command.
- Memcached → It’s an in-memory key-value store service that can be used as a cache or a data store. It is popular for web, Mobile Apps, Gaming, Ad-Tech, and E-Commerce. It has sharding (multiple nodes, the cache will be divided between nodes), data does NOT persist, and you don’t have backup options. On the other hand, it’s a high-performance cache.
Some use cases:
- Cache database → You can put a cache database between your application and your regular database. This way, you ask for data to the cache database so that everything is much faster, and if it is not here, it will look in the database, and you write it in the cache to improve future queries. If the cache gives you the data, it’s called “Cache Hit”, whereas if you have to access the database, this is called “Cache Miss”.
- User Session Store → Let’s imagine we have an application that uses several EC2 instances. If we redirect the user to another instance, he will lose the session data and have to log in again. To solve that, we can use this database to store the user sessions so that if the user accesses through another instance, he would not have to log in again. This is shown in the following diagram:
OpenSearch
OpenSearch, known as ElasticSearch in the past, is an open-source, fast, and scalable search and analytics engine sold as a service by AWS. It’s used to search, visualize, and analyze up to petabytes of text and unstructured data in milliseconds. You can search by any field of any type in any collection or even by using regex. For operational purposes, you would use a different database like MongoDB, and then OpenSearch can help you to provide advanced data indexing capabilities like a way faster search engine.
Redshift
Redshift is the columnar cloud data warehouse from Amazon. It’s used to analyze big tables of data to gain new insights. Using standard SQL, you can query and combine exabytes of structured and semi-structured data across your data warehouse, operational database, and data lake. Data Analytics, Business Intelligence, and Data Warehousing are the main use cases. The difference with the other databases is that Redshift is OLAP (Online Analytical Processing) instead of OLTP (Online Transaction Processing). This means that you will use it to do Data Analysis instead of doing data transactions to carry out Business Processes.
Redshift can also improve performance for repeat queries by caching the result and returning the cached result when queries are re-run. A query that runs for 10 minutes (for example) can return the result in milliseconds if we execute the same query again. Dashboard, visualization, and business intelligence (BI) tools that run repeated queries significantly improve with this functionality.
Redshift is similar to SnowFlake, although it’s more expensive. If you want more information about SnowFlake, I’m also developing a course about it!
Redshift Spectrum
Redshift Spectrum is a feature of Redshift that lets you perform SQL queries on data stored in Amazon S3 buckets. You may think it is the same functionality as Amazon Athena, and although they are similar, there are some differences. The main difference is that Athena is serverless, whereas Amazon Redshift Spectrum requires nodes running on EC2. You will use Amazon Redshift Spectrum for really frequent complex queries. If you are going to perform a sporadic query, use Athena.
DocumentDB (NEW)
Fully Managed AWS Implementation for MongoDB (NoSQL). You can store, query, index, and aggregate data in JSON format that is generated in your applications.
It replicates your data across 3 AZs, and it’s highly available.
Amazon Neptune
Fast, fully-managed graph database (similar to Neo4j). It powers graph use cases (social networking) such as identity graphs, knowledge graphs, and fraud detection. Apart from that, we don’t need to know anything else for the AWS Solutions Architect Associate Exam because this service is studied deeper in the AWS Certified Database Specialty DBS-C01 Exam.
Other AWS Databases (NEW)
- Amazon Quantum Ledger Database (QLDB): Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log. Immutable. Ledger databases are used for recording financial transactions.
- Amazon Timestream: Serverless time series database.
TYPICAL EXAM QUESTIONS
An application needs to retain information about each user session and has decided to implement a layer within the application architecture to store it. Which of the options below could be used? (Select TWO)
- Sticky sessions on an Elastic Load Balancer (ELB)
- A block storage service such as Elastic Block Store (EBS)
- Amazon Redshift to store data
- A relational data store such as Amazon RDS
- A key/value store such as ElastiCache Redis
Solution: 1, 5. Sticky Sessions allow the ALB to bind a user’s session to a specific target (EC2 instance or container) behind the load balancer. This ensures that subsequent requests from the same user are routed to the same target. This may be a solution, although a more sophisticated one would be to put a cache to write the information about the sessions. You can see these two techniques in the following image:
A database currently uses an in-memory cache. We must deliver a solution that supports high availability and replication for the caching layer. Which service should we use?
- Amazon ElastiCache Redis
- Amazon RDS Multi-AZ
- Amazon ElastiCache Memcached
- Amazon Redshift
Solution: 1. Amazon ElastiCache is a web service that makes it easy to deploy and operate an in-memory cache in the cloud. ElastiCache provides two caching engines: Redis and Memcached. However, only ElastiCache Redis provides high availability and replication.
A data lake solution in Amazon S3 must analyze massive datasets from time to time (infrequent SQL queries only). Which AWS service should be used to meet these requirements if we want to minimize infrastructure costs?
- Amazon Aurora
- Amazon Athena
- Amazon Redshift
- Amazon Redshift Spectrum
Solution: 2. Amazon Athena is an interactive query service that easily analyzes data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run. This would be especially cost-effective for infrequent SQL queries, as it allows for querying directly against the data in your S3 data lake without needing a dedicated data warehouse or database infrastructure.
This is the main difference with Amazon Redshift Spectrum, which we can also be used for this purpose, but it runs in EC2 instances (requiring an Amazon Redshift cluster to maintain), and this can be costly if you’re making infrequent queries.
We are working on a social media website application where users can be friends, like each other’s posts, and send messages between them. Which database do you recommend to perform some complicated queries?
- Amazon RDS
- Amazon Redshift
- Amazon Neptune
- Amazon OpenSearch
Solution: 3. Graph databases excel at managing interconnected data and providing high performance on queries that navigate the data graph. Amazon Neptune is the AWS graph database service specifically designed for handling these datasets.
Whenever they ask social media questions regarding databases in the AWS exam, it will always be Amazon Neptune.
More Questions?
- Do you want more than 500 AWS practice questions?
- Access to a real exam simulator to thoroughly prepare for the exam.
- You can download all of the AWS questions on PDF.
All of this and more at FullCertified!
Thanks for Reading!
If you like my work and want to support me…
- The BEST way is to follow me on Medium here.
- Feel free to clap if this post is helpful for you! :)