OLTP = Online Transaction Processing
OLAP = Online Analytical Processing
What does this really mean?
Every business application whether it be a point of sale (POS) system, a blog site, a logistics application, or an e-commerce web site is backed by a database. It keeps track of every transaction, order, or comment on a blog post. These databases prioritizes reliable storage and fast retrieval of a small subset of records. When a database is used in such a manner it as an OLTP database system.
When a database is primarily used for analytics, dashboards, and historical reporting they are referred to as OLAP systems. The primary difference between the two is not the underlying database software or even the data model. Instead, it’s how they are used.
Many organizations getting started with analytics will likely use their live production databases for transactions (OLTP) and reporting (OLAP). This is of course dangerous and not recommended. Queries for analytics tend to be process intensive and can crash an otherwise well functioning database. Your DBA (database administrator) should create a read replica (a copy of the live database taken at specified intervals) of your production OLTP database. In this case, is there any difference between the OLTP and OLAP database? No. Therefore, OLAP simply means a database that used primarily for analytics.
While a read replica is a good start, it will eventually fail to scale for common reporting and analytical needs. The problem is that the data model design for an OLTP systems is optimized for fast lookups of a subset of records and reliable storage. To enable fast lookups we design highly “normalized” data models. Don’t let fancy terms like normalized and third normal form (3NF) confuse you. It’s far simpler than the pros in our community make it out to be.
An OLTP data model organizes every entity in an application into its own table. For example, an e-commerce site has entities such as customer, shipping address, product, and order. All of these entities will be modeled as records in a single table. A single customer becomes a single record (row) within the customers table. Each row is typically assigned a unique ID also known as a primary key. When this customer makes an order a record is inserted into the order table with a reference to the customer’s unique ID (primary key from the customers table). The order record may also have primary key ID’s of the product, shipping address, credit card etc associated with the transaction.
This type of data organization keeps the database slim and fast. Imagine if instead we just had one large table. Every time someone buys a giant bag Costco lays chips all the details of said bag will have to be included in every order record. Things like current price, wholesale price, supplier, flavor, color, weight, brand etc etc. Tack on to that all the details about a customer, credit card, shipping and we have a wildly bloated slow database.
All of this is great for an e-commerce site but it sucks when you want to track historical sales of your top 10 products. Usually in a OLTP query you’re query one table with maybe one or two joins on average. But, for analytics an OLTP data model will require a half a dozen joins or more. Analysis tends to span many dimensions / entities. i.e. Top 10 Xbox “first person shooter” games sliced by customer zip in the last 6 months. These massive joins over large timeframes will no perform well. Imagine waiting an hour for a report. Thats nuts in todays insta world. Also, what if a customer moves and their zip code changes? Do you attribute the previous sale to the original zip or the new zip and how do you track that?
An OLAP data model (or a data warehouse data model) attempts to carefully denormalize a data model balancing performance and easy of querying. Denormalization is the process of pushing attributes from related tables directly onto the target table. Remember the zip code issue above where we needed to slice by zip and track it historically? In a denormalized model of the orders table we might add the zip code information into every record. We might even go further and add product name, product category and credit card brand. Unlike an OLTP data model, an OLAP optimized data model is more art than science. It depends largely on the analytics you want to run, the database software, and the performance expectations of your analysts. There is a lot more to data warehouse data models than described here. Stay tuned for future posts. But first, let’s wrap up by clarifying some confusion around database software.
Database software designed primarily for OLTP use cases can be used for OLAP use cases as well. However, the reverse is not true. Recall from above OLAP may imply denormalization but the label is mainly due to how the database is used. Now, on the other hand, an OLAP database design can range from full blow Relational Databases like Redshift to single table data stores in popular Business Intelligence tools (i.e. Tableau, MicroStrategy). OLAP database tend to sacrifice reliability for performance and optimize toward the selection of many rows but fewer columns.
When you see BI tools touting their OLAP capabilities, it simply means they can aggregate data quickly over a single table store.