Lately I wrote a post about 12.2 new features. In the post I didn’t list sharding (probably one of the biggest features in this version), as I wanted to dedicate an entire post for it. In this post I will cover this option , but please note, as 12.2 is only released for the cloud and no one can install it and play with it, all the information is based on presentations I’ve seen at OOW and a chat I had with one of the developers in the demo ground. There might be some inaccuracies or mistakes in the post.
What is Sharding?
For those of you who are not familiar with sharding, the idea is to split information between different servers in order to allow better scalability. It is important to understand that we are not talking about a cluster (such as RAC), where each node can access the same storage and the entire database. In sharding every node (called shard) holds only portion of the data so in order to get the data we are looking for, we need to access the correct shard (or shards, as several shards can hold the same data for availability and better scalability), and only these shards will be able to provide the relevant data.
The data is split between the shards according to a key using a range of values or a hash function. To explain this let’s think about a software product that holds data for different states in the US. We can split the data into different shards according to a hash function on the state, so if I have 2 shards, each will contain 25 states, if I have 10 shards, each will contain 5 and so on.
There are several benefits of the sharding architecture:
- It allows a very good scalability (as if I add shard for new data, the existing shards won’t be affected by the new users, since the users won’t find their data on the existing shard, only on the new one)
- I can perform maintenance on specific shards, allowing only a “local” downtime.
- There is practically no limit on the number of shards as there is no “real” communication between them (unlike RAC for example that needs to synchronize the data in case it changes)
- Because there is no massive data transfer between the shards, they can be located anywhere without limitations
There is however a big challenge as well, the sharding architecture and key should be chosen very carefully. The sharding key affects the entire physical structure, so it cannot be changed easily.
Oracle Sharding Option
Oracle 12.2 brings sharding as a database option. In this architecture they use stand alone databases, each will be a shard in the architecture.
To make this manageable, Oracle introduced a GSM server (if I remember correctly this stands for Generic or Global Service Manager). This GSM holds the catalog of the sharding data. It contains the schema structure, the tables (some can be sharded, some can be global and their data will be available in all shards), the shard key and information about the shards.
Creating Shards
Since the power of sharding is in the number of shards, the way to create the shards is using the GSM. A simple command will run dbca on the shards themselves, create the database, configure the schema, etc.
Managing Shards and Data
The management of the shards will also be in the GSM side. We can add and remove shards, rebalance the data between shards and more.
On the technical side, Oracle uses a set of tablespaces that are managed automatically to hold the data in each shard. This means that the data of each shard will not reside in a single tablespace, but in a set of tablespaces, allowing us to move some tablespaces from shard to shard. Moving tablespace from one shard to another will be done (behind the scenes) using data pump transportable tablespaces, and this is how the data will be rebalanced when we add or remove nodes.
Client Connections
When a client connects to the architecture, it can provide additional information to the listener (in the connect string) to contain the specific shard key it wants to access. The listener will redirect it to the correct node. The client will then cache this information for future connections.
When a client needs access to more than one shard, it will access the GSM itself and run the query. The GSM will use db links to access the relevant shards and run the query. It will then process the responses, and return the resultset to the client.
Availability
When creating the shards, we can also create standby shards, this will use the active data guard feature to keep a standby database for each shard in the architecture, so we will have the option to query this standby shard as well as maintain high availability in case the primary shard fails.
Summary
As I said above, this is only review of the feature based on presentations and the demo ground at OOW16. It is definitely look like an interesting feature, and I’ll be happy to look into it some more in the future.
In the meantime, if you find any error in this post, or have any question, don’t hesitate to contact me.
Mongo database has this feature 2 or more years before. As many people pointed out, cost is definitely a bottleneck.
using oracle sharding can I create a shard table where using date interval , each shard for each date?