Over the years, I have learned that one thing that I really love to do is performance tuning… especially on a database. I started learning database design and development before I even stepped foot into indexes and I really wish that I had really learned how a database worked before I just jumped into SQL. I have done a lot of research and testing across several different databases and have learned a lot of ways to speed up queries. Database design is, of course, essential. However, if I had to choose the number 1 way to effectively speed up your database and properly handle your data, I would definitely choose to employ great index design (and maintenance).

Indexes are often overlooked, but are crucial when it comes to database performance. Anyone who has ran a query that takes more than a second should look into indexes. I have had pages go from being over 3,500ms to less that 50ms all due to proper index implementation.

Consider, you were looking for a check that you wrote recently. You need to know the check number, the date that it was written, the recipient, memo line and the total amount. We can assume that the unique identifier for this situation would be your check number. Without proper indexes, you are telling your database to start from the first check that you have ever written your entire life. No one would look for a check in this manner. Also, you would typically start looking at more recent checks and not checks from 2-3 years ago. The order in which you retrieve data is also important when you are creating your indexes. So, since we know we will pull check information based on the most recent check numbers first, it only makes sense to have one index on the checkNumber column, remembering to select descending order (by default, most databases will create ascending which is rarely how most data-driven systems need to access data).

Now, I want to consider another situation where we may want a new index on our check book information. Let’s assume we received our bank statement for last month and it is not balancing correctly. We want to look through our checks for that month to see what the problem is. With our last index (descending CheckNumbers), we we would actually start looking at our last check first. So, if we had written 100 checks since the last bank statement, we would be looking through 100 records that we knew we didn’t need anyway. This is why it would be good to put an index on our date field. Now, assuming that we would be more interested in more current data than data that could be decades old, I would create the date index descending as well. There is actually nothing wrong with having multiple indexes on the same column (one ascending and one descending). It all depends on how you plan on using your data. I know that most databases will typically choose the appropriate index (please comment if you know otherwise), but you can also force your sql statement to use the index you know will be accurate. Keep in mind that with growing data, you also have growing indexes… meaning an index that is great today may not be so great a year from now.

You should always consider what data you will filter on and that is what you will most likely need indexes on. I understand that you should always be aware of your environment and application; there is truly no need to have a dozen indexes on a table that is extremely small or not used frequently. I think the best rule of thumb is to simply be on the lookout for issues and understand when implementing the index could resolve your issue.

Another good way to look at indexes is to think of a book. The database is a book full of pages. The indexes separate a column into pages. For example, say you have the letters of the alphabet to look through. That would be 26 records. Say you create an index descending and an index ascending on the alphabet. For this example, let’s assume that we will store 10 records on each index page. So, it would have 3 pages (26/10 = 2 full pages + 1 page with 6 records). It would look something like this:


Index Page 1 Index Page 2 Index Page 3
A
B
C
D
E
F
G
H
I
J

K
L
M
N
O
P
Q
R
S
T

U
V
W
X
Y
Z

So, in our example, let’s say that we want to look for the letter Q. Without an index, our database would look through every record to find what it needed, even after it located Q. With the proper data type and the index created above, the database would read the pages similar to a person flipping through a dictionary. If you knew you were looking for “quick” in a dictionary, you wouldn’t start on the page that has A – J (like Index Page 1), you would skip to the page that had K – T (like Index Page 2) and look for the Q’s. You would skip any other pages after you find the word “quick”, because you know that “quick” will never be found ascending (or alphabetically in this example) again.

Another thing to keep in mind is that records are stored from first to last. So, consider you are filtering queries based on the last name of a customer. Those last names could be in any kind of order in your table. Let’s assume we have a Customers table similar to this:

Last Name First Name
Jones Jim
Cook Debra
Smith James
Jones Timothy
West Joe

Notice, they are stored from top to bottom as Jones, Cook, Smith, Jones, West. If your database has to look through those records from beginning to end it will have to go through every single record in the database. So, if you were looking for Mr. Cook, the database would look at all 5 records above. Starting at Jones, then Cook, Smith, Jones and finally West, even though it already found Cook in the second record. Would you look through an entire address book looking for a contact that you already found on the first page of your address book?

Now, consider we have all last names in an index organized ascending. Now, they will look more like this:

Last Name First Name
Cook Debra
Jones Jim
Jones Timothy
Smith James
West Joe

Now, when the database scans those pages you can see that it will only have to look at the first record. As soon as it found Cook and realized that Jones was the next in the list it would know that it had found the only Cook even though it hadn’t even looked through the rest of the records.

There are other factors in some database index design, such as padding (which would basically just leave room on the page), that I did not cover in this post. Hopefully, I was able to give a few examples that may be helpful for you. The simplest guidance I could probably offer is to think of index pages the same way you would think of flipping through pages of a book. When you know how a book is organized you know exactly where to find what you are looking for. A database is the same way. No matter how much data it holds, as long as it knows where to look it can be unbelievably “quick”. :-)

Always consider using databases for YOUR CLIENTS/CUSTOMERS and not for YOURSELF. It doesn’t matter what you think about them, if it can help your customers application or your own application’s users, then I would suggest using them. I once programmed an application that had years worth of data. I had implemented indexes on the development database server and on the testing database server, however, when the application launched it was SUPER SLOW! The first thing I considered was it had to be an index issue. I wondered what I did wrong and when I went to look at the production/live database, I realized that the indexes didn’t exist. I questioned the “DBA” about the indexes and they said “Well, they are searching for two years worth of data… they should know it is going to take long!!” Of course, they would expect it to take long, but my philosophy is that your job is not to meet their expectations… it is to exceed them!

There are other factors in some database index design, such as padding (which would basically just leave room on the page), that I did not cover in this post. In the future, I will try to go into index maintenance. Please feel free to write or comment with any questions, corrections, better methods, etc.

Goodnight!

I while back I wrote a messy bit of JavaScript for handling credit cards and routing numbers on a web portal that allowed customers to pay their recurring bill online. It was efficient, but I cannot for the life of me find it. I wrote it at work and I suppose I didn’t think to save it.

Anywho… I decided to write a cfc. It is not finished by any means, but wanted to put it out here to see if anyone had any feedback. If you are new to CFC’s and don’t quite understand, I’ll be posting some more details in the future, but for now this would be for users comfortable with cfc’s.

Feedback welcomed and encouraged.

<cfcomponent>
<cffunction name="checkCreditCard" access="remote" returntype="string">
  <cfargument name="cardNumber" type="numeric" required="yes">

  <cfif left(arguments.cardNumber,1) eq 4>
    <cfset cardType = 'VISA'>
    <cfset totalDigits = 16>
  <cfelseif listFind('51,52,53,54,55',left(arguments.cardNumber,2))>
    <cfset cardType = 'MASTERCARD'>
    <cfset totalDigits = 16>
  <cfelseif listFind('34,37',left(arguments.cardNumber,2))>
    <cfset cardType = 'AMERICANEXPRESS'>
    <cfset totalDigits = 15>
  <cfelseif left(arguments.cardNumber,4) eq 6011>
    <cfset cardType = 'DISCOVER'>
    <cfset totalDigits = 16>
  </cfif>
<!---
MASTERCARD:51-55,  16
VISA:  4|13,  16
AMEX:  34|37,  15
Discover:  6011, 16
--->
  <cfset validCardTypes = "Visa, MasterCard, AmericanExpress, Discover">

  <cfif listFind('3,4,5,6',left(arguments.cardNumber,1)) eq 0>
    <cfset returnError = "This card type is not supported.  Please use one of the following:  #validCardTypes#">
  <cfelse>

    <cfif len(arguments.cardNumber) neq totalDigits>
      <cfset  errorMsg = 'The #cardType# that you entered in not valid.  Visa card numbers are #totalDigits# long and you entered a #len(arguments.cardNumber)# digit number.'>
    <cfelse>
      <cfset myCount = 0>
      <cfloop from="1" to="#totalDigits#" index="i">
        <cfif i mod 2 neq 0>
          <cfset  myCount = myCount + i>
        <cfelseif i mod 2 eq 0 and i neq 16>
          <cfset  tempNum = 2 * i>
          <cfif len(tempNum) eq 2>
            <cfset  tempNum = left(tempNum,1) + left(tempNum,2)>
          </cfif>
          <cfset  myCount = myCount + tempNum>
        </cfif>
      </cfloop>

    <cfif myCount mod 10 neq 0>
      <cfset  errorMsg = 'The #cardType# that you have entered is incorrect.  Please check the number and try again.'>
    </cfif>

  </cfif>  <!--- /if len(arguments.cardNumber) neq totalDigits --->
  </cfif>  <!--- /if listFind('3,4,5,6',left(arguments.cardType,1)) eq 0 --->

<cfreturn errorMsg>
</cffunction>
</cfcomponent>

“Criss Crossing Variables” was a poor way to ask someone the best way to do this, but just talking about it helped me solve my problem. I like talking to people (and by people, lately I mean Ian) about the problem that I am having because it makes me explain by problem. So, anywho… the problem is that I wanted to set some ColdFusion local variables that I could use later in the page when I was passing arguments into a JavaScript function.

I know that I could use DHTML to pass hidden values into my html and I don’t even want to get into why I needed this to be done this way… but you will see the basic use. So, let’s just say that you need to set the current page title. You can get this using the JavaScript “document” object. You can use a lot of CGI variables to get server specific information, like cgi.http_host (the domain), query_string (list of your url variables), but you cannot get or use information from the DOM.

<cfset var myPageTitle = “<script language=’javascript’ type=’text/javascript’>document.write(document.title);</script>”>

Isn’t that cool? Maybe it was obvious to some, but I found it pretty dang useful today and thought some of you may as well.

It’s vs. Its

Emails vs. Email’s

Jones’ vs. Joneses vs. Joneses’

I have seen them all. Sure, I have made grammatical errors before, but it is ridiculous how rotten the grammar is of the majority of internet users. Come on, people… learn the simple rules in apostrophe usage. I’ll not even get involved in spelling issues in this post, but it’s coming… not ITS not ITS’, but IT’S coming!!

Easy rule #1: An Apostrophe To Join Words

Can’t = Can not
Don’t = Do not
Isn’t = Is not
It’s = It is

Easy rule #2: To Denote possession
Consider one person, place or thing owning something.

Christie’s book
The dog’s toy
Boston’s visitors
The company’s employees
An employee’s check is missing (meaning one employee has a missing check)

Easy rule #3: Plural possession
Same rules as #2, except for usage with more than one owner. For example, two or more people, places or things.

The dogs’ toys were donated (For a group of dogs)
The employees’ checks (A group of employees)
The employees’ checks are missing (A group of employees are missing their checks)

Finally, NEVER use an apostrophe to denote plurals, but you can use to denote plural possessions as mentioned above.

Listen, I mess up commas and apostrophes often… this I know. I misspell things on occasion as well. I am a little embarrassed when I notice that I posted a blog, comment, message, etc. with improper grammar and punctuation. Oddly enough, I even get embarrassed for other people as well. I didn’t post this to be rude or critical of users. I just thought it may be a good resource for users trying to find out the proper grammatical use of apostrophes.

Please feel free to comment to let me know what I screwed up!! :-)

♥ Christie.

I was trying to think of some way to pull data from a database to determine certain style attribute/value pairs that I would use on any given page. Typically, I have a default.css and maybe more xxx.css that I will deploy, but I thought of actually having my stylesheet in a cfm file and then letting cf change the document type in the stylesheet.cfm page. This actually works great (using a cfm stylesheet), so I thought I would share. I will discuss two methods, but each method only varies by one line of code… nothing tricky at all.

I know that I could “get around” this by using coldfusion local variables in hidden html elements and then onLoad a javascript function that would read this variable and use dhtml to update styles… but it’s a work around and well, quite frankly, this is cooler and takes less time. Don’t get me wrong… I love javascript, but I’ve been resorting to tricking browser into doing what I want by mucking around in the DOM. I was just looking for another way to do it on the server-side and not the client-side. Now that I’m working on more external web apps, I need to consider that I cannot always control the client-side.

When you are working on an isolated application, especially in a corporate controlled environments it is easy to overlook all of the variables of web browsing. Different browsers, different Operating Systems, version control and compatibility issues smack you in the face when you look at a database and know that there are people in Italy, Canada, all over who are using the app and not just the people on the second floor or across the building. It’s not that quality isn’t important… you just have different priorities, because you know that your client side is controlled. I also like debugging though, so I don’t mind issues for me to solve… but for users, visiting a web site for the first time is like test driving a car; if things go run, they may choose to walk away and never (ever) give it another chance (and bitch about it on other websites or in their blog!! Ha!)

Ok, onto the example. There will be two files that you will need to adjust to have coldfusion server side stylesheet control: the file that you reference your .css document (just change the .css file to the .cfm file) and of course create or convert your stylesheet.cfm file (or whatever you choose to call it).

So, first, of course, you will need to make sure that you point to your new stylesheet.cfm file instead of your whatever.css file in your the <link> tag (because I KNOW that none of you still put your CSS code in the same document; you all follow xhtml standards and current basic web application engineering standards, RIGHT?!) So, like this:

<link rel=”stylesheet” type=”text/css” href=”stylesheet.css” />

Next, onto your actual stylesheet.cfm… you’ll need to add either a cfcontent or a cfheader tag to the top of that file to make sure that cf server knows to push the correct document type to the browser. To use cfcontent:

<cfcontent type=”text/css” />

I know, I know… that was too difficult wasn’t it? Kidding… onto the next method, cfheader, where you will need to add:

<<cfheader name=”content-type” value=”text/css” />>

Also, of course, below that you can use any coldfusion conditional logic to set and use any combination of attribute value pairs or named classes that you wish.

Until next time… remember, just because they said it’s kosher doesn’t mean you should have it for dinner… is there such thing as kosher meth… kosher crack? Does the little Orthodox Union brand their logo onto little, tiny crack rocks and those little mini zip lock baggies they use for dope?

“Criss Crossing Variables” was a poor way to ask someone the best way to do this, but just talking about it helped me solve my problem. I like talking to people (and by people, lately I mean Ian) about the problem that I am having because it makes me explain by problem. So, anywho… the problem is that I wanted to set some ColdFusion local variables that I could use later in the page when I was passing arguments into a JavaScript function.

I know that I could use DHTML to pass hidden values into my html and I don’t even want to get into why I needed this to be done this way… but you will see the basic use. So, let’s just say that you need to set the current page title. You can get this using the JavaScript “document” object. You can use a lot of CGI variables to get server specific information, like cgi.http_host (the domain), query_string (list of your url variables), but you cannot get or use information from the DOM.

<cfset var myPageTitle = “<script language=’javascript’ type=’text/javascript’>document.write(document.title);</script>”>

Isn’t that cool? Maybe it was obvious to some, but I found it pretty dang useful yesterday and thought some of you may as well.

This year has made way for several changes in my life at home. Some good, some eh, but for the most part things are rock n’ kick arse. We have had a few new additions to the family; by family I mean our home network and home offices. You may as well consider them members of your family when on some days your hands are touching them more than your spouse, right? (Or maybe that means we call them Mistresses?)

A few months ago we had to reorganize our office and music room into two offices (one just happens to have 3 guitars hanging on the walls). During some of these changes, I became so frustrated looking for software discs (some burns, I admit… backups of my personal software *of course*). Ahem… anywho, in the midst of my exhausting efforts and my temper tantrums (like slamming my new (two H keys and no J key) keyboard around), Ian has to lean back and say snidely “See, now that is what you need. I needed a new version of my operating system and I just clicked a button. I haven’t had any problems with my software yet.” He uses Ubuntu and I knew that he was right (the new, cute, tattooed hot chica spread across to his newly installed second monitor gave it away earlier. He had setup his new desk in the adjoining office *and* had time to find a new wallpaper.)

So, I decided about 5 minutes later (I had to clean up the water that I spilled when slammed my keyboard down) that I was ready to make the move to downloaded, freeware, open source, self-updating software.

I know that some new users are so brainwashed by Microsoft that they feel like they HAVE to use Microsoft products and that means paying for them. Many of these same users think if it’s free, it must be illegal and that is simply not the case. I remember trying to convince an unnamed family member to switch from IE to Firefox a few years ago after she had a horrible two day battle with viruses, spyware, adware… you name it. “Well, I don’t feel comfortable using that stuff. I don’t want to get in any trouble.”

I felt like I just ask her to grow some marijuana in her back yard. So, for all of your out there who feel the need to kit the habit and end your addiction: here are some completely legal and actually pretty good alternatives to some of the software you may use (and I promise the PoPo can’t break down your door with a SWAT team on hand to take you in just because you were checking your email on Thunderbird).

I will break this up into sections based on recommendations for tools based on need.

DOCUMENTS, SPREADSHEETS, PRESENTATION
(Alternatives for Microsoft Office Word, Excel, PowerPoint, Access, etc)

Most computers will come with 60-90 day free trials and then you are guilted into purchasing. Instead, check out the online option Google Documents or the plethora of “weblications” offered by Zoho. You can create your documents, spreadsheets and presentations through these online services… this also means that you can access your files from anywhere with internet access. Both of these options are great alternatives for some small businesses and schools as well.

Google Apps is also super easy to setup and use to give access to all of your users for unlimited email, calendar usage, static webpages, private/shared documents. Another option is OpenOffice. They offer tools for document/word processing, spreadsheets, presentations, databases and graphics which are all easy to install and easy to use.

I know that there will be some learning curve present, but Google can become your best friend with Open source software. If you have a problem or don’t know how to do something, just Google it… or you can look in each options Help, FAQ or Forums for solutions.

CODE EDITTING / DEVELOPING

Everyone who needs these tools are probably already aware of these tools, but there is absolutely no need to sink money into programming IDE’s when Eclipse is out there for free. You can even design and use plugins to make it even more compatible for whatever project you are working on. An example is CFEclipse (cfeclipse.org), which allows Eclipse to integrate auto-completion, help menus, etc. for Coldfusion programmers. As much as I love Dreamweaver, the new CS3 is $399 new! That is ridiculous when Eclipse is free, open source and considered to be one of the better programming IDE’s out there. Aside from it’s cost (or lack there of), it is fast, reliable, and easy to use.

GRAPHICS / PHOTO EDITING AND MANIPULATION

Okay, I’m all too aware that everyone raves about Photoshop, but many basic users do not need most of the features that Photoshop has to offer. A typical user and many advanced users can try the free and open source, feature friendly Paint.NET. Unless you are a graphic designer or regularly doing massive amounts of graphics work or photo manipulation, you will find that you don’t need the uber-expensive graphics tools. Just because they are “the best” doesn’t mean they are “the best for you”. Try free or inexpensive options before sinking hundreds into Photoshop.

Of course Picasa by Google is another great tool that is free to download, easy to use and great for integrating with your webpage or creating a free online web album. For Vista users, the Picture Viewer actually has some great additions for photo editing. You can now crop photos, adjust brightness, alter color levels and remove red eye right from your Windows Picture Viewer.

There are so many options available for many users… all you need to do is look for them. Don’t buy the first thing that you hear of people say they use it or its the best (remember, just because you overheard someone tell you that Rachel’s Steakhouse & Entertainment has the “best” desert in town, doesn’t mean you should go there with your mother-in-law without checking it out for yourself first.)

I hope that you consider trying some freeware and open source solutions… when you see that they are worth it, you can alway honor the developers and contributors with an online donation (which is typical with most open source projects)… even a $50 donation will be better spent and more appreciated that sending $300-500 to Microsoft or Adobe.

Enjoy and please post comments about other freeware and open source software projects that you have found that are equally as good (or better) use than their competitors.

Remember the first step is admitting that you have a problem.

Until next time…

Christie.

Let’s say you’ve decided to finally update your website that has been sitting untouched gathering “spider” “webs”… get it… spider? webs? Oh, forget it… you people have no sense of humor anymore… you wouldn’t laugh even if we tcl/tk’ed you!? Get it, get it… TCL!! OK, OK… I’m finished for real this time. :-)

So let’s say that you really do want to finally work on your website, but you decide to be the sloppy coder that we all can be at times and you just want to code straight from the source code in your live/production environment. And, of course, you are perfect and will make absolutely no mistakes between saving your changes line by line, so your visitors will be completely unaware that any changes are even being made. Ok… now THAT was funny. We all make mistakes and the last thing you want to do (or at least the last thing “I” want you to do) is gift more damn errors to the “InterWeb”, so for the sake of a good example, let’s create a “Under Construction” page the we will have your visitors redirected to while you are working on your masterpiece.

Create the file hangOnASec.html with something simple… just a little image or message letting your visitors know that you are currently working on your website and to check back soon. You could even ask them to leave their email address, so that once you’re finished, you can send out a mass email letting everything know that you are indeed finished. See File 1.0 for your first addition for this tutorial.

File 1.0 – hangOnASec.html

“Thank you for visiting EzMoFo.info. Being the kick ass person that I am have decided to update my website for you. Anywhoozle, I’ll be finished soon, but while I’m working on the updates I just thought I would forward you to this message so you wouldn’t be plagued with nasty error messages that we all HATE! This message will go away once I’m finished with the updates. Check back later to see that new site!”

Next, you’ll just need to go to your typical default page to add a simple line of code that will redirect your visitors to the message that you have left for them at “hangOnASec.html”. You don’t need to comment out any code or delete anything… just place these few lines at the main top of the page and you can leave everything else alone if you want.

First let’s look at how to do this in PHP… it is SUPER easy and you’ll find many different scenarios when this will come in handy for you. I’m glad I’m able to share and hopefully can help something who’s trying to learn them some PHP!  It’s as easy as Paris Hilton was in “A Night In Paris”, all you gotta’ do is add the following little blurb of code:

<?php
header(“Location: http://www.ezMofo.info/hangOnASec.html”);
?>

So, so easy, right? It simply relocates the browser to the new page, which in this example is “http://www.ezMofo.info/hangOnASec.thml”.

Want to do the exact same thing in ColdFusion? Again… easier than Paris! See below:

<cflocation url=”hangOnASec.html” addtoken=”no”>

Let us know if you have any problems or comments… or if you have any other ideas or suggestions for more efficient ways to do the same thing. Have other languages that you want to share tricks with us? We would, of course, love to hear from you and see your code examples. Leave us a comment!

Here is a trick that I always forget despite how handy it is. Often I want to have a count of certain records I am dealing with as columns in a query that returns more than each individual count. Lets put that in English with an example:

Say I have a table people that contains (name, age). I want to get all of the rows out of this table, along with a count of people who fall into certain age groups, lets say >= 18 and
select people.*,
(select count * from people where age >= 18) as adults,
(select count * from people where age from people;

If you look at the query plan for this type of query you’ll see that it goes through the table three separate times! Not good.

You can get the same results using a case statement like so:

select people.*,
count(case when age >= 18 then age end) as adults,
count(case when age from people
group by “name”, age;

This query only goes through the table one time and give you row specific information, meaning it tells you what category, name and age once! (or one record at a time).

—–Post from outside user—–

RECOMMENDATION FROM CHRISTIE:

I recommend never using select * or count(*)… I know that this was just an example, but you should always consider your indexes, records, and tables before doing a select *. Even doing counts, you can always count by the primary key which will always have an index (or at least ALWAYS SHOULD!!! Ha!)

Many people still use “SELECT *”, but I would recommend only selecting the column names that you need “returned”, not that you are working with. For example, many people may include columns in their SELECT statement that they would include in their WHERE clause… this is simply not the case. Your SELECT columns are only necessary if you intend to use them in output or in further functions locally.

Again, I know this was just an example and I even use SELECT * in examples from time to time, but I just wanted to throw in the tip for a real world example, as I have seen time and time again experienced programmers still using SELECT *.

In this post I will show you how to use table partitioning to keep queries on a large table fast and efficient. First lets start with a definition of what partitioning is:

Database Partitioning

The type of partitioning I will be describing here is called horizontal partitioning. Horizontal partitioning is when you separate rows into different tables based on some criteria, like date range or zip code.

As usual different databases use different methods to achieve partioning. I will be detailing how to accomplish this feat using the Postgresql ORDBMS. For further research refer to the PostgreSQL manual section 5.9

OK lets jump to it. The first question one might ask is : “WHY? Why would I want to split up my data and give myself all the grief of having to add extra logic to determine what table to search and when?” The answer to that is speed. Partitioning only makes sense when you are dealing with large tables or tables that have a lot of writes to them, but when reads occur it is usually for a specific criteria. Now that you have your reasons lets eliminate the negatives. PostgreSQL is an Object Relational Database Management System. This means tables can inherit the properties (columns) from other tables, and Pg then allows you to query both the parent and the child tables from one syntax.

The basic procedure for implementing partioning is as follows:
1) create a parent table with all the columns you need. CREATE TABLE person(name varchar, age integer);
2) create a couple of child tables CRETAE TABLE adults(check age >= 18) INHERITS (person);
CREATE TABLE children(check age 3) Now we have three completely separate tables, but they are related in a special way. If you insert a row into the adults or children tables directly you can retrieve those rows by selecting from person. Watch.

INSERT INTO adults(name, age) VALUES(‘Ketema’, 29);
INSERT INTO children(name, age) VALUES(‘Jasmine’, 7);
SELECT * FROM person;

name | age
———+—–
Ketema | 29
Jasmine | 7

Even though I never inserted into person I get back both the rows from the child tables. Very cool, and very handy.

You may have been wondering what those CHECK things were when we defined the child tables. Check constraints are the way in which we define what data goes into what child partition so that it makes sense to us when we want to perform informational queries. Our example is pretty straight forward. We know all persons with an age greater than or equal to 18 will go into the adults table and everyone younger than 18 goes into the children table. What goes into person then? Nothing, ever. Thats the cool thing. The person table can be considered logical. We know both adults and children are persons, so if we want them all we just select * from the persons table. But the partitioning gives us the luxury of being able to search specifically for adults or specifically for children if we so choose.

Just like we can select from the person parent table we can also insert to it and have our data redirected to the appropiate child table based on the constraints we have defined, in our case age. This is accomplished by using either a rule or a trigger. Using a rule is definitely the simplest way.

CREATE RULE person_adult_insert AS
ON INSERT TO person WHERE
( age >= 18 )
DO INSTEAD
INSERT INTO adults VALUES (NEW.name, NEW.age);

CREATE RULE person_child_insert AS
ON INSERT TO person WHERE
( age DO INSTEAD
INSERT INTO children VALUES (NEW.name, NEW.age);

Now we can insert into person all day and our data will be automatically partioned:

INSERT INTO person VALUES (‘John’, 17);
INSERT INTO person VALUES(‘Mary’, 32);

select * from person;
name | age
———+—–
Ketema | 29
Mary | 32
Jasmine | 7
John | 17
(4 rows)

select * from children;
name | age
———+—–
Jasmine | 7
John | 17
(2 rows)

select * from adults;
name | age
——–+—–
Ketema | 29
Mary | 32
(2 rows)

As you can see our data is split up. The last point I want to illustrate is query optimization. If we take a look at the query plan for select * from person:

explain analyze select * from person;
QUERY PLAN
———————————————————————————————————
Result (cost=0.00..64.80 rows=3480 width=36) (actual time=0.005..0.009 rows=4 loops=1)
-> Append (cost=0.00..64.80 rows=3480 width=36) (actual time=0.004..0.007 rows=4 loops=1)
-> Seq Scan on person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.001..0.001 rows=0 loops=1)
-> Seq Scan on adults person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.003..0.004 rows=2 loops=1)
-> Seq Scan on children person (cost=0.00..21.60 rows=1160 width=36) (actual time=0.001..0.002 rows=2 loops=1)
Total runtime: 0.100 ms

As you can see three sequential scans are performed on all three of our related tables. Thats fine if you wanted all the rows. What happens if you do this:

explain analyze select * from person where age = 32;
QUERY PLAN
———————————————————————————————————
Result (cost=0.00..49.00 rows=12 width=36) (actual time=0.008..0.009 rows=1 loops=1)
-> Append (cost=0.00..49.00 rows=12 width=36) (actual time=0.007..0.008 rows=1 loops=1)
-> Seq Scan on person (cost=0.00..24.50 rows=6 width=36) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (age = 32)
-> Seq Scan on adults person (cost=0.00..24.50 rows=6 width=36) (actual time=0.005..0.006 rows=1 loops=1)
Filter: (age = 32)
Total runtime: 0.084 ms

We are looking for one row only and one that we KNOW is an adult. Notice the children table was not even considered in the plan. This is due to those constraints we defined earlier and a feature called CONSTRAINT EXCLUSION. This is a very simplistic example, but you can easily see how with a large data set partitioning can give great benefits for query speed. If indexes are created on commonly searched columns even greater enhancements can be made.

I hope you enjoyed Partitioning with PostgreSQL!

Next Page »