In this post I will show you how to use table partitioning to keep queries on a large table fast and efficient. First lets start with a definition of what partitioning is:
The type of partitioning I will be describing here is called horizontal partitioning. Horizontal partitioning is when you separate rows into different tables based on some criteria, like date range or zip code.
As usual different databases use different methods to achieve partioning. I will be detailing how to accomplish this feat using the Postgresql ORDBMS. For further research refer to the PostgreSQL manual section 5.9
OK lets jump to it. The first question one might ask is : “WHY? Why would I want to split up my data and give myself all the grief of having to add extra logic to determine what table to search and when?” The answer to that is speed. Partitioning only makes sense when you are dealing with large tables or tables that have a lot of writes to them, but when reads occur it is usually for a specific criteria. Now that you have your reasons lets eliminate the negatives. PostgreSQL is an Object Relational Database Management System. This means tables can inherit the properties (columns) from other tables, and Pg then allows you to query both the parent and the child tables from one syntax.
The basic procedure for implementing partioning is as follows:
1) create a parent table with all the columns you need. CREATE TABLE person(name varchar, age integer);
2) create a couple of child tables CRETAE TABLE adults(check age >= 18) INHERITS (person);
CREATE TABLE children(check age 3) Now we have three completely separate tables, but they are related in a special way. If you insert a row into the adults or children tables directly you can retrieve those rows by selecting from person. Watch.
INSERT INTO adults(name, age) VALUES(‘Ketema’, 29);
INSERT INTO children(name, age) VALUES(‘Jasmine’, 7);
SELECT * FROM person;
name | age
———+—–
Ketema | 29
Jasmine | 7
Even though I never inserted into person I get back both the rows from the child tables. Very cool, and very handy.
You may have been wondering what those CHECK things were when we defined the child tables. Check constraints are the way in which we define what data goes into what child partition so that it makes sense to us when we want to perform informational queries. Our example is pretty straight forward. We know all persons with an age greater than or equal to 18 will go into the adults table and everyone younger than 18 goes into the children table. What goes into person then? Nothing, ever. Thats the cool thing. The person table can be considered logical. We know both adults and children are persons, so if we want them all we just select * from the persons table. But the partitioning gives us the luxury of being able to search specifically for adults or specifically for children if we so choose.
Just like we can select from the person parent table we can also insert to it and have our data redirected to the appropiate child table based on the constraints we have defined, in our case age. This is accomplished by using either a rule or a trigger. Using a rule is definitely the simplest way.
CREATE RULE person_adult_insert AS
ON INSERT TO person WHERE
( age >= 18 )
DO INSTEAD
INSERT INTO adults VALUES (NEW.name, NEW.age);
CREATE RULE person_child_insert AS
ON INSERT TO person WHERE
( age DO INSTEAD
INSERT INTO children VALUES (NEW.name, NEW.age);
Now we can insert into person all day and our data will be automatically partioned:
INSERT INTO person VALUES (‘John’, 17);
INSERT INTO person VALUES(‘Mary’, 32);
select * from person;
name | age
———+—–
Ketema | 29
Mary | 32
Jasmine | 7
John | 17
(4 rows)
select * from children;
name | age
———+—–
Jasmine | 7
John | 17
(2 rows)
select * from adults;
name | age
——–+—–
Ketema | 29
Mary | 32
(2 rows)
As you can see our data is split up. The last point I want to illustrate is query optimization. If we take a look at the query plan for select * from person:
explain analyze select * from person;
QUERY PLAN
———————————————————————————————————
Result (cost=0.00..64.80 rows=3480 width=36) (actual time=0.005..0.009 rows=4 loops=1)
-> Append (cost=0.00..64.80 rows=3480 width=36) (actual time=0.004..0.007 rows=4 loops=1)
-> Seq Scan on person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on adults person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.003..0.004 rows=2 loops=1)
-> Seq Scan on children person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.001..0.002 rows=2 loops=1)
Total runtime: 0.100 ms
As you can see three sequential scans are performed on all three of our related tables. Thats fine if you wanted all the rows. What happens if you do this:
explain analyze select * from person where age = 32;
QUERY PLAN
———————————————————————————————————
Result (cost=0.00..49.00 rows=12 width=36) (actual time=0.008..0.009 rows=1 loops=1)
-> Append (cost=0.00..49.00 rows=12 width=36) (actual time=0.007..0.008 rows=1 loops=1)
-> Seq Scan on person (cost=0.00..24.50 rows=6 width=36) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (age = 32)
-> Seq Scan on adults person (cost=0.00..24.50 rows=6 width=36) (actual time=0.005..0.006 rows=1 loops=1)
Filter: (age = 32)
Total runtime: 0.084 ms
We are looking for one row only and one that we KNOW is an adult. Notice the children table was not even considered in the plan. This is due to those constraints we defined earlier and a feature called CONSTRAINT EXCLUSION. This is a very simplistic example, but you can easily see how with a large data set partitioning can give great benefits for query speed. If indexes are created on commonly searched columns even greater enhancements can be made.
I hope you enjoyed Partitioning with PostgreSQL!