<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title><![CDATA[Category: Postgres | Craig Kerstiens]]></title>
  <link href="http://www.craigkerstiens.com/categories/postgres/atom.xml" rel="self"/>
  <link href="http://www.craigkerstiens.com/"/>
  <updated>2013-05-07T10:11:01-07:00</updated>
  <id>http://www.craigkerstiens.com/</id>
  <author>
    <name><![CDATA[Craig Kerstiens]]></name>
    
  </author>
  <generator uri="http://octopress.org/">Octopress</generator>

  
  <entry>
    <title type="html"><![CDATA[My SQL Bad Habits]]></title>
    <link href="http://www.craigkerstiens.com/2013/05/06/my-sql-bad-habits/"/>
    <updated>2013-05-06T00:00:00-07:00</updated>
    <id>http://www.craigkerstiens.com/2013/05/06/my-sql-bad-habits</id>
    <content type="html"><![CDATA[<p>I'm reasonably proficient at SQL – <em>a coworker when pseudocoding some logic for him pointed out that my pseudocode is what he thought was executable SQL</em>. I'm fully capable of writing clear and readable SQL – which most SQL is not. Despite that I still have several bad habits when it comes to SQL. Without further adieu heres some of my dirty laundry so hopefully others can not make the same mistakes.</p>

<!--more-->


<h3>Order/Group by Column Numbers</h3>

<p>When quickly iterating on a query its a lot less typing to put the column number as the thing you want to order by. Here's a quick lightweight example:</p>

<pre><code>SELECT
  email,
  created_at
FROM 
  users
ORDER BY 2 DESC
LIMIT 5;
</code></pre>

<p>This gives me my last 5 users that have signed up for my site. Of course as soon as I have this I may want to add some data to it, like their first name so I can send them a welcome email. I quickly alter the query to:</p>

<pre><code>SELECT
  email,
  first_name,
  created_at
FROM 
  users
ORDER BY 2 DESC
LIMIT 5;
</code></pre>

<p>And now I have 5 users that have signed up ordered by their first name. Sure its obvious when you have 1 column you're ordering by, but when you have <code>GROUP BY 1, 2, 3, 4, 5, 6</code> which is actually open in one of my tabs currently its a bit more confusing....</p>

<p><em>Though if you really want to have some fun, share a query with someone that looks something like this:</em></p>

<pre><code>SELECT
  email as "3",
  first_name "2",
  created_at "1"
FROM 
  users
ORDER BY "1", "3" DESC
LIMIT 5;
</code></pre>

<h3>Implicit Joins</h3>

<p>I seldom use the syntax <code>INNER JOIN</code>. Instead I simply put the two tables in my where clause and ensure I have a where condition. The problem with ensuring I have a where condition is sometimes I don't, especially when you're dealing with 3 tables.</p>

<pre><code>SELECT 
  email,
  product.name,
  product.price
FROM 
  users,
  orders,
  items
WHERE users.id = orders.user_id
  AND orders.id = items.order_id
</code></pre>

<p>Is less clear (especially when dealing with 5-6 tables) than the alternative:</p>

<pre><code>SELECT 
  email,
  product.name,
  product.price
FROM users
INNER JOIN orders on users.id = orders.user_id
INNER JOIN items on orders.id = items.order_id
</code></pre>

<h3>Lack of comments</h3>

<p>I comment my SQL far less than I comment my code, yet it can be done just as easily. For example I have this in one of my queries:</p>

<pre><code>SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
   SELECT 
     CASE 
       WHEN length(r.m[1]) = 1 
     THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex') 
     ELSE substring(r.m[1] from 2 for 2) 
     END
  FROM regexp_matches(url_here, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');
</code></pre>

<p>While this has its own issues theres no documentation around what this actually does, in contrast:</p>

<pre><code>--- DECODES url ---
SELECT convert_from(CAST(E'\\x' || array_to_string(ARRAY(
   SELECT 
     CASE 
       WHEN length(r.m[1]) = 1 
     THEN encode(convert_to(r.m[1], 'SQL_ASCII'), 'hex') 
     ELSE substring(r.m[1] from 2 for 2) 
     END
  FROM regexp_matches(url_here, '%[0-9a-f][0-9a-f]|.', 'gi') AS r(m)
), '') AS bytea), 'UTF8');
</code></pre>

<p>Comments also work well inline at the end of a line.</p>

<h3>Large Manually Generated Lists</h3>

<p>A lot of times in working with some specific data set I'll manually or automatically generate a list that I want to filter. A common example is filtering out staging/dev environments. I'll often manually search and prune the list, then save that result for the queries I'm going to build going forward. This is a bit of effort but still feels reasonable the downside is it results in something like:</p>

<pre><code>SELECT 
  foo
FROM 
  bar
WHERE 
  bar.id NOT IN (34723, 42735, 32321, 47205, 20375, 30261, 26194, 109371, 9313, 6351, 20184, 50273, 34735, 39854, 23954, 25323, 23405, 30528, 50182, 29340, 47659, ... and the list goes on)
</code></pre>

<p>SQL is meant to be reasonable for containing some level of logic. Data changes, hard coding keys is going to bite you at some point, spend the extra effort and re-use something thats clear.</p>

<h3>What else</h3>

<p>I'm sure theres plenty more; I suspect within a few minutes of sitting down with someone they could point out some other bad habits. While I know mine at least some of mine I still often know the trade-off. What are yours? I'd love to hear to document them for others so hopefully they can prevent developing the same bad habits. Let me know; <a href="mailto:craig.kerstiens@gmail.com">craig.kerstiens@gmail.com</a></p>

<!-- ### In, Subqueries and Lots of Data

Its really easy to build up a huge list of users then filter something else based on that list of users for if they're not in it. Its also really shitty on performance in most cases. A good example might be if I have 100,000 users on my site but want to find which ones have never made a purchase. Part of this results in knowing your data, but if only 10k have never made a purchase this can give you pretty bad results by doing: The quick and dirty way to do this might be:

    SELECT 
      count(*)
    FROM 
      users
    WHERE 
      user_id NOT IN 
      (
        SELECT user_id
        FROM orders
      )

-->

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Using array_agg in Postgres – powerful and flexible]]></title>
    <link href="http://www.craigkerstiens.com/2013/04/17/array-agg/"/>
    <updated>2013-04-17T00:00:00-07:00</updated>
    <id>http://www.craigkerstiens.com/2013/04/17/array-agg</id>
    <content type="html"><![CDATA[<p>In almost any application it's common to want to aggregate some set of values together, commonly in a comma separated form. Most developers do this by running a query to get much of the raw data, looping over the data and pushing it into a set, appending each new value to the appropriate key. Hopefully, it's not a surprise that there's a much better way to do this with PostgreSQL.</p>

<p>Postgres has a flexible and robust <a href="/2012/08/20/arrays-in-postgres/">array datatype</a> that comes with a variety of functions. Even without taking advantage of the array datatype in <a href="/2012/11/06/django-and-arrays/">your application</a>, you can still take advantage of some of the functions to get the functionality you need. Lets take a look at an example schema and use case.</p>

<!--more-->


<h3>An example</h3>

<p>Given a project management application, you may have <code>users</code> who have <code>projects</code> that have <code>tasks</code>. An example piece of functionality might be to send an email with a list of all projects that have tasks that are past their due dates of completion. Your schema might look something like this:</p>

<pre><code> # \d users
             Table "public.users"
    Column   |            Type             | Modifiers
 ------------+-----------------------------+-----------
  id         | integer                     | not null
  email      | character varying(255)      |
  ...

# \d projects
             Table "public.projects"
    Column   |            Type             | Modifiers
 ------------+-----------------------------+-----------
  id         | integer                     | not null
  user_id    | integer                     | not null
  name       | character varying(255)      | not null
  ...

# \d tasks
             Table "public.tasks"
    Column     |            Type             | Modifiers
 --------------+-----------------------------+-----------
  id           | integer                     | not null
  project_id   | integer                     | not null
  completed_at | timestamp without time zone | 
  due_at       | timestamp without time zone | 
  ...
</code></pre>

<p>To get a list of all projects that have tasks that haven't been completed, you would start with something like:</p>

<pre><code>SELECT 
  projects.name
FROM
  projects,
  tasks
WHERE projects.id = tasks.project_id
  AND tasks.due_at &gt; tasks.completed_at
  AND tasks.due_at &gt; now()
</code></pre>

<p>This would give you a list of projects which you could then easily join this with users:</p>

<pre><code>SELECT 
  users.email
  projects.name
FROM
  projects,
  tasks,
  users
WHERE projects.id = tasks.project_id
  AND tasks.due_at &gt; tasks.completed_at
  AND tasks.due_at &gt; now()
  AND users.id = projects.user_id
</code></pre>

<p>At this point you've got everything you need to pull this up into Ruby, Python, or other language of your choice and then build the full set. However if this is thousands or even hundreds of results you'll be spending more time than necessary, grouping this data for a sensible email. With 3 other small changes you can have this already formatted for you to immediately send of in an email. The first is using a handy function called <code>array_agg</code> which will aggregate items and then you can format them how you wish. The second is just ensuring you're grouping correctly. Finally you'll want to unnest the array so it formats the data in a clean way for you.</p>

<p>Looking at it all put together:</p>

<pre><code>SELECT 
  users.email,
  array_to_string(array_agg(projects.name), ',')) as projects
FROM
  projects,
  tasks,
  users
WHERE projects.id = tasks.project_id
  AND tasks.due_at &gt; tasks.completed_at
  AND tasks.due_at &gt; now()
  AND users.id = projects.user_id
GROUP BY 
  users.email
</code></pre>

<p>This would give you a nice clean result of projects that have overdue tasks that you could then send to the user in an email:</p>

<pre><code>           email            |     projects       
 ---------------------------+-------------------
 craig.kerstiens@gmail.com  | blog, timetracker      
 craig@heroku.com           | foo, bar, baz    
</code></pre>

<!-- Perfect Audience - why postgres - DO NOT MODIFY -->


<p><img src="http://ads.perfectaudience.com/seg?add=691030&t=2" width="1" height="1" border="0" /></p>

<!-- End of Audience Pixel -->

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[Getting more out of psql (The PostgreSQL CLI)]]></title>
    <link href="http://www.craigkerstiens.com/2013/02/21/more-out-of-psql/"/>
    <updated>2013-02-21T00:00:00-08:00</updated>
    <id>http://www.craigkerstiens.com/2013/02/21/more-out-of-psql</id>
    <content type="html"><![CDATA[<p><em>After my last post I had a variety of readers reach out about many different tweaks they'd made to their workflows using with psql. One people <a href="https://github.com/chanmix51/">Grégoire Hubert</a> had a wondeful extensive list of items. Grégoire has been a freelance in web development and he has worked with Postgresql for some time now in addition to being the author of Pomm. Without further ado heres what he has to say on how he uses psql:</em></p>

<h2>Get the most of psql</h2>

<p>Psql, the CLI postgreSQL client, is a powerful tool. Sadly, lot of developers are not aware of the features and instead look for a GUI to provide what they need. Let's fly over what can psql do for you.</p>

<!--more-->


<h2>Feel yourself at home</h2>

<p>One of the most common misconception people have about CLI is «They are a poor user interface». C'mon, the CLI is <strong>the most efficient user interface ever</strong>. There is nothing to disturb you from what you are doing and you are by far fastest without switching to your mouse all the time. Let's see how we can configure psql at our convenience.</p>

<p>First, you'll have managed to choose a nice and fancy <a href="http://hivelogic.com/articles/top-10-programming-fonts">terminal font</a> like monofur or inconsolata. Do not underestimate the power of the font</p>

<p><img src="http://public.coolkeums.org/github/power_font.png" alt="monofur font in action" /></p>

<p>The nice line style shown above can be set with <code>\pset linestyle unicode</code> and <code>\pset border 2</code>. This is just an example of the many environment variables you can play with to get your preferred style of working out of psql.</p>

<p>For example, I found the character ¤ the most accurate to express nullity (instead of default <code>NULL</code>). Let's just <code>\pset null ¤</code> and here it is:</p>

<pre><code>SELECT * FROM very_interesting_stat;
┌──────┬──────┬──────┬──────┬──────┐
│  a   │  b   │  c   │  d   │  e   │
├──────┼──────┼──────┼──────┼──────┤
│ 9.06 │    ¤ │    ¤ │    ¤ │    ¤ │
│ 7.30 │ 3.55 │ 7.57 │ 3.31 │    ¤ │
│ 7.20 │ 5.08 │    ¤ │ 6.58 │ 5.90 │
...
</code></pre>

<p>Another hugely value to get environment variables is colors in the prompt. Colors in the prompt are important because it makes easier to spot where output starts and ends between two interactions at the console. The <a href="http://www.postgresql.org/docs/9.2/static/app-psql.html#APP-PSQL-PROMPTING">PROMPT1</a> environment variable will even let you set an indicator to notify you are inside a transaction or not, give this a try for a sweet surprise...</p>

<pre><code>\set PROMPT1 '%[%033[33;1m%]%x%[%033[0m%]%[%033[1m%]%/%[%033[0m%]%R%# '
</code></pre>

<p>I also like to disable the pager by default <code>\pset pager off</code> and display the time every issued query takes <code>\timing</code>. If you are used to psql, you may notice in the picture above, some content is wrapped. This is <code>\pset format wrapped</code> option.</p>

<p>Of course, writing all that on every connection would be a pain, so just write them in a <code>~/.psqlrc</code> file, it will be sourced every time psql is launched.</p>

<p>If you are familiar with <code>bash</code> or other recent unix shells, you might also declare aliases in your configuration file. You can do the same with psql. For example if you want to have a query for slow queries such as from this <a href="http://craigkerstiens.com/2013/01/10/more-on-postgres-performance/">earlier post</a> but not have to remember the query every time you can set it up as:</p>

<pre><code>\set show_slow_queries 
'SELECT 
  (total_time / 1000 / 60) as total_minutes, 
  (total_time/calls) as average_time, query 
FROM pg_stat_statements 
ORDER BY 1 DESC 
LIMIT 100;'
</code></pre>

<p>Now, just entering <code>:show_slow_queries</code> in your psql client will launch this query and give you the results:</p>

<pre><code>    total_time    |     avg_time     |                                                                                                                                                              query
------------------+------------------+------------------------------------------------------------
 295.761165833319 | 10.1374053278061 | SELECT id FROM users WHERE email LIKE ?
 219.138564283326 | 80.24530822355305 | SELECT * FROM address WHERE user_id = ? AND current = True
</code></pre>

<h2>Psql at your fingertips</h2>

<p>Now you have got a fancy prompt, here is the real question you ask, what can psql do for me ? and <code>\?</code> has all of the answers. It has built-in queries to describe almost all database objects from tables to operators, indexes, triggers etc... with clever auto-completion. Not only completion on tables and columns -- but also on aliases (sweet), <strong>SQL commands</strong> (w00t) and database objects.</p>

<p>Now we can enter some SQL commands. As usual, you need to check in the documentation how the heck to write this damn <code>ALTER TABLE</code>. Relax, psql proposes inline documentation. Just enter <code>\h alter table</code> (auto complete w00t) and you ll be ok.</p>

<h3>Interacting with your editor</h3>

<p>psql provides two very handy commands: \e and \i. This last command sources a sql file in the client's current session. \e edits the last command using the editor defined in the <code>EDITOR</code> shell environment variables (aka vim). This grant you with real editor feature when it comes to writing long queries. What psql does, it saves the buffer in a temporary file and fires up the editor with that file. Once the editor is terminated, psql sources the file. Of course, you can use your editor to save queries in other places where they would be under version control, but the \e has a serious limitation: it spawns only the last query. Even if you sent several queries on the same line. (Note that \r clears psql's last query buffer).</p>

<p>Note: <code>\ef my_function</code> opens stored function source code (With auto completion, I know, it's awesome).</p>

<p>Vim users can here benefit from Vim's server mode. If you launch a vim specifying a server name (let' say "PSQL") somewhere, and set the EDITOR variable as is <code>export EDITOR="vim --servername PSQL --remote-tab-wait</code> then psql will open a new tab on the running vim with the last query and run it as soon as you close this tab. Tmux or gnu/screen users will split their screen to have Vim and psql running on the same terminal window.</p>

<p><img src="http://public.coolkeums.org/github/vim_tmux.png" alt="Vim, psql and tmux" /></p>

<h3>Call a friend</h3>

<p>Vim power users know it is possible to pipe a buffer (or selection) directly in a program that can be ... psql (Using the <code>:w !psql</code> syntax). Even from the shell, you might want to take advantage of the fantastic <code>\copy</code> feature that loads formated file in the database (I use it to load apache logs). But always having to specify connection parameters are a hassle. Let's use shell environment instead. Psql is sensitive to the following variables:</p>

<ul>
<li>PGDATABASE</li>
<li>PGHOST</li>
<li>PGPORT</li>
<li>PGUSER</li>
<li>PGCLUSTER (debian wrapper).</li>
</ul>


<p>Set them once for all in you shell environment and call <code>psql</code> to connect to the database. In case you want to skip password prompt, you can store your pass in a 600 mode access file named <code>.pgpass</code> in your home (do not do that on shared or exposed computers). Although this is nice for development database servers, I do NOT recommend this for production servers since it should not be easy to mess with them.</p>

<p>Resource for additional information is ... the man page and <a href="http://www.postgresql.org/docs/9.2/static/index.html">Postgres Docs</a>. All <a href="http://www.postgresql.org/docs/9.2/static/index.html">PostgreSQL documentation</a> is an example of what software reference documentation should be. Enjoy!</p>

<!-- Perfect Audience - why postgres - DO NOT MODIFY -->


<p><img src="http://ads.perfectaudience.com/seg?add=691030&t=2" width="1" height="1" border="0" /></p>

<!-- End of Audience Pixel -->

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[How I work with Postgres – psql, My PostgreSQL Admin]]></title>
    <link href="http://www.craigkerstiens.com/2013/02/13/How-I-Work-With-Postgres/"/>
    <updated>2013-02-13T00:00:00-08:00</updated>
    <id>http://www.craigkerstiens.com/2013/02/13/How-I-Work-With-Postgres</id>
    <content type="html"><![CDATA[<p>On at least a weekly basis and not uncommonly multiple times in a single week I get this question:</p>

<p><blockquote><p>I've been hunting for a nice PG interface that works within other things. PGAdmin kinda works, except the SQL editor is a piece of shit</p><footer><strong>@neilmiddleton</strong><cite><a href='https://twitter.com/neilmiddleton'></a></cite></footer></blockquote></p>

<p>Sometimes it leans more to, what is the Sequel Pro equivilant for Postgres. My default answer is I just use psql, though I do have to then go on to explain how I use it. For those just interested you can read more below or just get the highlights here:</p>

<ul>
<li>Set your default <code>EDITOR</code> then use \e</li>
<li>On postgres 9.2 and up <code>\x auto</code> is your friend</li>
<li>Set history to unlimited</li>
<li><code>\d</code> all the things</li>
</ul>


<p>Before going into detail on why psql works perfectly fine as an interface I want to rant for a minute about what the problems with current editors are and where I expect them to go in the future. First this is not a knock on the work thats been done on previous ones, for their time PgAdmin, phpPgAdmin, and others were valuable tools, but we're coming to a point where theres a broader set of users of databases than ever before and empowering them is becoming ever more important.</p>

<p>Empowering developers, DBA's, product people, marketers and others to be comfortable with their database will lead to more people taking advantage of whats in their data. <a href="http://craigkerstiens.com/2013/01/10/more-on-postgres-performance/">pg_stat_statements</a> was a great start to this laying a great foundation for valuable information being captured. Even with all of the powerful stats being captured in the statistics of PostgreSQL so many are still terrified when they see something like:</p>

<pre><code>                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=4.25..8.62 rows=100 width=107) (actual time=0.126..0.230 rows=100 loops=1)
   Hash Cond: (purchases.user_id = users.id)
   -&gt;  Seq Scan on purchases  (cost=0.00..3.00 rows=100 width=84) (actual time=0.012..0.035 rows=100 loops=1)
   -&gt;  Hash  (cost=3.00..3.00 rows=100 width=27) (actual time=0.097..0.097 rows=100 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 6kB
         -&gt;  Seq Scan on users  (cost=0.00..3.00 rows=100 width=27) (actual time=0.007..0.042 rows=100 loops=1)
 Total runtime: 0.799 ms
(7 rows)
</code></pre>

<p>Empowering more developers by surfacing this information in a digestable form, such as building on top of <code>pg_stat_statements</code> tools such as <a href="http://datascope.heroku.com">datascope</a> by <a href="http://www.twitter.com/leinweber">@leinweber</a> and getting this to be part of the default admin we will truly begin empowering a new set of user.</p>

<p>But enough of a detour, those tools aren't available today. If you're interested in helping build those to make the community better please reach out. For now I live in a work where I'm quite content with simple ole <code>psql</code> here's how:</p>

<!--more-->


<h3>Editor</h3>

<p>Ensuring you've exported your preferred editor to the environment variable <code>EDITOR</code> when you run \e it will allow you to view and edit your last run query in your editor of choice. This works for vim, emacs, or even sublime text.</p>

<pre><code>export EDITOR=subl
psql
\e 
</code></pre>

<p>Gives me:</p>

<p><img src="http://f.cl.ly/items/2I0f3M0B1T3k0d290v3k/Screenshot_2_12_13_9_58_AM.png" alt="sublime text" /></p>

<p><em>Note you need to make sure you connect with psql and have your editor set, once you do that saving and exiting the file will then execute the query</em></p>

<h3>\x auto</h3>

<p>psql has long had a method of formatting output. You can toggle this on and off easily by just running the <code>\x</code> command. Running a basic query you get the output:</p>

<pre><code>SELECT * 
FROM users 
LIMIt 1;
 id | first_name | last_name |           email            |    data    |     created_at      |     updated_at      |     last_login
 ----+------------+-----------+----------------------------+------------+---------------------+---------------------+---------------------
   1 | Rosemary   | Wassink   | Rosemary.Wassink@yahoo.com | "sex"=&gt;"F" | 2010-07-01 18:16:00 | 2011-05-14 11:47:00 | 2011-06-07 23:04:00
</code></pre>

<p>With toggling the output and re-running the same query we can see how its now formatted:</p>

<pre><code>\x
Expanded display is on.
craig=# SELECT * from users limit 1;
-[ RECORD 1 ]--------------------------
id         | 1
first_name | Rosemary
last_name  | Wassink
email      | Rosemary.Wassink@yahoo.com
data       | "sex"=&gt;"F"
created_at | 2010-07-01 18:16:00
updated_at | 2011-05-14 11:47:00
last_login | 2011-06-07 23:04:00
</code></pre>

<p>Using <code>\x auto</code> will automatically put this in what Postgres believes is the most intelligible format to read it in.</p>

<h3>psql history</h3>

<p>Hopefully this needs no justification... having an unlimited history of all your queries is incredibly handy. Ensuring you set the following environment variables will ensure you never lose that query you ran several months ago again:</p>

<pre><code>export HISTFILESIZE=
export HISTSIZE=
</code></pre>

<h3>\d</h3>

<p>And while the last on the list one of the first things I do when connecting to any database is check out whats in it. I don't do this by running a bunch of queries but rather checking out the schema and then poking at definitions of specific tables. <code>\d</code> and variations on it are incredibly handy for this. Here's a few highlights below:</p>

<p>Listing all relations with simply <code>\d</code>:</p>

<pre><code>\d
                 List of relations
 Schema |       Name       |     Type      | Owner
--------+------------------+---------------+-------
 public | products         | table         | craig
 public | products_id_seq  | sequence      | craig
 public | purchases        | table         | craig
 public | purchases_id_seq | sequence      | craig
 public | redis_db0        | foreign table | craig
 public | users            | table         | craig
 public | users_id_seq     | sequence      | craig
(7 rows)
</code></pre>

<p>List only all tables with <code>dt</code>:</p>

<pre><code>\dt
         List of relations
 Schema |   Name    | Type  | Owner
--------+-----------+-------+-------
 public | products  | table | craig
 public | purchases | table | craig
 public | users     | table | craig
(3 rows)
</code></pre>

<p>Describe a specific relation with <code>\d RELATIONNAMEHERE</code>:</p>

<pre><code>\d users
                                     Table "public.users"
   Column   |            Type             |                     Modifiers
------------+-----------------------------+----------------------------------------------------
 id         | integer                     | not null default nextval('users_id_seq'::regclass)
 first_name | character varying(50)       |
 last_name  | character varying(50)       |
 email      | character varying(255)      |
 data       | hstore                      |
 created_at | timestamp without time zone |
 updated_at | timestamp without time zone |
 last_login | timestamp without time zone |
</code></pre>

<p>One more pro-tip if you're running a transaction with many tables and forget which are involved in it you can run '\d *transaction*' and it'll display tables curently affected.</p>

<p><em>Have a tool you prefer, have something you use daily in psql that I missed, or interested in helping create a new admin experience please reach out and lets talk craig.kerstiens at gmail.com</em></p>

<!-- Perfect Audience - why postgres - DO NOT MODIFY -->


<p><img src="http://ads.perfectaudience.com/seg?add=691030&t=2" width="1" height="1" border="0" /></p>

<!-- End of Audience Pixel -->

]]></content>
  </entry>
  
  <entry>
    <title type="html"><![CDATA[More on Postgres Performance]]></title>
    <link href="http://www.craigkerstiens.com/2013/01/10/more-on-postgres-performance/"/>
    <updated>2013-01-10T00:00:00-08:00</updated>
    <id>http://www.craigkerstiens.com/2013/01/10/more-on-postgres-performance</id>
    <content type="html"><![CDATA[<p>If you missed my previous post on <a href="/2012/10/01/understanding-postgres-performance/">Understanding Postgres Performance</a> its a great starting point. On this particular post I'm going to dig in to some real life examples of optimizing queries and indexes.</p>

<h3>It all starts with stats</h3>

<p>I wrote about some of the <a href="https://postgres.heroku.com/blog/past/2012/12/6/postgres_92_now_available/">great new features in Postgres 9.2</a> in the recent announcement on support of Postgres 9.2 on <a href="https://www.heroku.com">Heroku</a>. One of those awesome features, is <a href="http://www.postgresql.org/docs/9.2/static/pgstatstatements.html">pg_stat_statements</a>. Its not commonly known how much information Postgres keeps about your database (beyond the data of course), but in reality it keeps a great deal. Ranging from basic stuff like table size to cardinality of joins to distribution of indexes, and with pg_stat_statments it keeps a normalized record of when queries are run.</p>

<!-- more -->


<p>First you'll want to turn on pg_stat_statments:</p>

<pre><code>CREATE extension pg_stat_statements;
</code></pre>

<p>What this means it would record both:</p>

<pre><code>SELECT id 
FROM users
WHERE email LIKE 'craig@heroku.com';
</code></pre>

<p>and</p>

<pre><code>SELECT id 
FROM users
WHERE email LIKE 'craig.kerstiens@gmail.com';
</code></pre>

<p>To a normalized form which looks like this:</p>

<pre><code>SELECT id 
FROM users
WHERE email LIKE ?;
</code></pre>

<h3>Understanding them from afar</h3>

<p>While Postgres collects a great deal of this information dissecting it to something useful is sometimes more mystery than it should be. This simple query will show a few very key pieces of information that allow you to begin optimizing:</p>

<pre><code>SELECT 
  (total_time / 1000 / 60) as total_minutes, 
  (total_time/calls) as average_time, 
  query 
FROM pg_stat_statements 
ORDER BY 1 DESC 
LIMIT 100;
</code></pre>

<p>The above query shows three key things:</p>

<ol>
<li>The total time a query has occupied against your system in minutes</li>
<li>The average time it takes to run in milliseconds</li>
<li>The query itself</li>
</ol>


<p>Giving an output something like:</p>

<pre><code>    total_time    |     avg_time     |                                                                                                                                                              query
------------------+------------------+------------------------------------------------------------
 295.761165833319 | 10.1374053278061 | SELECT id FROM users WHERE email LIKE ?
 219.138564283326 | 80.24530822355305 | SELECT * FROM address WHERE user_id = ? AND current = True
(2 rows)
</code></pre>

<h3>What to optimize</h3>

<p>A general rule of thumb is that most of your very common queries that return 1 or a small set of records should return in ~ 1 ms. In some cases there may be queries that regularly run in 4-5 ms, but in most cases ~ 1 ms or less is an ideal.</p>

<p>To pick where to begin I usually attempt to strike some balance between total time and long average time. In this case I'd start with the second probably, as on the first one I could likely shave an order of magnitude off, on the second I'm hopeful to shave two order of magnitudes off thus reducing the time spent on that query from a cumulative 220 minutes down to 2 minutes.</p>

<h3>Optimizing</h3>

<p>From here you probably want to first example my other detail on understanding the explain plan. I want to highlight some of this with a more specific case based on the second query above. The above second query on an example data set does contain an index on user_id and yet there's still high query times. To start to get an idea of why I would run:</p>

<pre><code>EXPLAIN ANALYZE
SELECT * 
FROM address 
WHERE user_id = 245 
  AND current = True
</code></pre>

<p>This would yield results:</p>

<pre><code>                                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4690.88..4690.88 rows=1 width=0) (actual time=519.288..519.289 rows=1 loops=1)
   -&gt;  Nested Loop  (cost=0.00..4690.66 rows=433 width=0) (actual time=15.302..519.076 rows=213 loops=1)
         -&gt;  Index Scan using idx_address_userid on address  (cost=0.00..232.52 rows=23 width=4) (actual time=10.143..62.822 rows=1 loops=8)
               Index Cond: (user_id = 245)
               Filter: current
               Rows Removed by Filter: 14
 Total runtime: 219.428 ms
(1 rows)
</code></pre>

<p>Hopefully not being too overwhelmed by this due to having read the other detail on <a href="/2012/10/01/understanding-postgres-performance/">query plans</a> we can see that it is using an index as expected. The difference is its having to fetch 15 different rows from the index then discard the bulk of them. The number of rows discarded is showcased by the line:</p>

<pre><code>Rows Removed by Filter: 14
</code></pre>

<p><em>This is just one more of the many improvements in Postgres 9.2 alongside pg_stat_statements.</em></p>

<p>To further optimize this we would great a conditional OR composite index. A conditional would be where only current = true, where as the composite would index both values. A conditional is commonly more valuable when you have a smaller set of what the values may be, meanwhile the composite is when you have a high variability of values. Creating the conditional index:</p>

<pre><code>CREATE INDEX CONCURRENTLY idx_address_userid_current ON address(user_id) WHERE current = True;
</code></pre>

<p>We can then see the query plan is now even further improved as we'd hope:</p>

<pre><code>EXPLAIN ANALYZE
SELECT * 
FROM address 
WHERE user_id = 245 
  AND current = True

                                                                                   QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4690.88..4690.88 rows=1 width=0) (actual time=519.288..519.289 rows=1 loops=1)
     -&gt;  Index Scan using idx_address_userid_current on address  (cost=0.00..232.52 rows=23 width=4) (actual time=10.143..62.822 rows=1 loops=8)
           Index Cond: ((user_id = 245) AND (current = True))
 Total runtime: .728 ms
(1 rows)
</code></pre>

<p><em>For further reading, give Greg Smith's <a href="http://www.amazon.com/gp/product/184951030X/ref=as_li_qf_sp_asin_tl?ie=UTF8&amp;tag=mypred-20&amp;linkCode=as2&amp;camp=1789&amp;creative=9325&amp;creativeASIN=184951030X">Postgres High Performance</a> a read</em></p>

<!-- Perfect Audience - why postgres - DO NOT MODIFY -->


<p><img src="http://ads.perfectaudience.com/seg?add=691030&t=2" width="1" height="1" border="0" /></p>

<!-- End of Audience Pixel -->

]]></content>
  </entry>
  
</feed>
