This year Postgres open and PGConf SV have combined to great a bigger and better conference right in downtown San Francisco. I’m obviously biased as I’m one of the co-chairs, and I know every conference organizer says picking the talks was hard, but I’m especially excited for the line-up this year. The hard part for me is going to be which talks do I miss out on because I’m sitting in the other session that’s ongoing. You can see the full list of talk and tutorial sessions, but I thought it’d be fun to do a rundown of some of my favorites. Read on
A massive amount of reporting queries, whether really intensive data analysis, or just basic insights into your business involving looking at data over a certain time period. Postgres has really rich support for dealing with time out of the box, something that’s often very underweighted when dealing with a database. Sure, if you have a time-series database it’s implied, but even then how flexible and friendly is it from a query perspective? With Postgres there’s a lot of key items available to you, let’s dig in at the things that make your life easier when querying. Read on
Five years ago I wrote a post that got some good attention on why you should use Postgres. Almost a year later I added a bunch of things I missed. Many of those items bear repeating, and I’ll recap a few of those in the latter half of this post. But in the last 4-5 years there’s been a lot of improvements and more reasons added to the list of why you should use Postgres. Here’s the rundown of the things that make Postgres a great database you should consider using. Read on
JSONB is an awesome datatype in Postgres. I find myself using it on a weekly basis these days. Often in using some API (such as clearbit) I’ll get a JSON response back, instead of parsing that out into a table structure it’s really easy to throw it into a JSONB then query for various parts of it.
If you’re not familiar with JSONB, it’s a binary representation of JSON in your database. You can read a bit more about it vs. JSON here.
In working with JSONB here’s a few quick tips to get up and running with it even faster: Read on
It seems each week when I’m reviewing data with someone a feature comes up that they had no idea existed within Postgres. In an effort to continue documenting many of the features and functionality that are useful, here’s a list of just a few that you may find handy the next time your working with your data.
Psql, and \e
This one I’ve covered before, but it’s worth restating. Psql is a great editor that already comes with Postgres. If you’re comfortable on the CLI you should consider giving it a try. You can even setup you’re own
.psqlrc for it so that it’s well customized to your liking. In particular turning
\timing on is especially useful. But even with all sorts of customization if you’re not aware that you can use your preferred editor by using
\e then you’re missing out. This will allow you to open up the last run query, edit it, save–and then it’ll run for you. Vim, Emacs, even Sublime text works just take your pick by setting your
For the second time now I’ve had to implement a system that syncs from my system of record into Salesforce.com, the first at Heroku and now at Citus Data. The case here is pretty simple, I have a software-as-a-service, B2B product. It’s a homegrown application in these cases in Ruby, but could be Python, .Net, any language of your choosing. The problem is I don’t want to have to be rebuilding my own CRM, reporting, etc. on top of all of my internal database. And as soon as you’re at some reasonable size (sales guy of 1 or more) you need to be able to provide insights on what’s in that system of record database to others.
While my tooling isn’t a full fledged product by any means, here’s a bit of how I’ve developed this process a few times over and some of the annoying bits of code to help get you started. In this post I’ll walk through some of the basic datatypes, then we’ll follow-up with the overall architecture and tweaks you need to make to Salesforce, and finally we’ll provide some example code to help you create this setup yourself. Read on
Open DNS is a DNS resolver that caches records beyond their TTL if the upstream DNS server cannot be found. In cases like today’s major outage it can be handy to swap your DNS settings out for this, or it may be worth using as a standard default. Resolution may be a bit slow as it will try to see if the upstream server cannot be found, but it at least can get you back to a working state.
If you know what you’re doing then all you need to do is configure your DNS settings to:
If you need a little more guidance you can go into your System Preferences on Mac, select Network, then Advanced and finally the DNS tab. You should set it up to look as follows:
SQL can be a powerful language for reporting. Whether you’re just exploring some data, or generating reports that show month over month revenue growth it’s the lingua franca for data analysis. But, your data isn’t always in a SQL database, even then if you’re using Postgres you can still likely use SQL to analyze, query, even joing with that data. Foreign data wrappers have been around for years in Postgres, but are continuing to mature and be a great option for joining disparate systems.
Overview of foreign data wrappers
If you’re unfamiliar, foreign data wrappers, or FDW, allow you to connect from within Postgres to a remote system. Then you can query them from directly within Postgres. While there is an official Postgres FDW that ships with Postgres itself, that allows you to connect from one Postgres DB to another, there’s also a broad community of others.
At the core of it Postgres provides certain APIs under the covers which each FDW extension can implement. This can include the ability to map SQL to whatever makes sense for a given system, push down various operators like where clauses, and as of Postgres 9.3 can even write data. Read on
When you start out as a developer there’s an overwhelming amount of things to grasp. First there’s the language itself, then all the quirks of the specific framework you’re using,and after that (or maybe before) we’ll throw front-end development into the mix, and somewhere along the line you have to decide to store your data somewhere.
Early on, with so many things to quickly master, the database tends to be an after-though in application design (perhaps because it doesn’t make an impact to end user experience). As a result there’s a number of bad practices that tend to get picked up when working with databases, here’s a rundown of just a few.Read on
Notice: Much of this post still applies, but now applies more directly to Citus. Since this post originally published, pg_shard is now deprecated. Citus now has an open source version which offer a superset of the features of pg_shard, as well as a cloud offering. Finally you can find some further guidance for sharding on the Citus blog and docs
Back in 2012 I wrote an overview of database sharding. Since then I’ve had a few questions about it, which have really increased in frequency over the last two months. As a result I thought I’d do a deeper dive with some actual hands on for sharding. Though for this hands on, because I do value my time I’m going to take advantage of
pg_shard rather than creating mechanisms from scratch.
For those unfamiliar pg_shard is an open source extension from Citus data who has a commerical product that you can think of is pg_shard++ (and probably much more). Pg_shard adds a little extra to let data automatically distribute to other Postgres tables (logical shards) and Postgres databases/instances (physical shards) thus letting you outgrow a single Postgres node pretty simply.
Alright, enough talk about it, let’s get things up and running.Read on