Database-style Queries in Google Sheets using QUERY

If you’ve ever done any work with relational databases, especially if you used a variant of the SQL language, then you’ll be used to selecting and manipulating values programmatically:

SELECT name, email FROM employees WHERE employee_id ='123'

We can do something similar within a Google Sheet by using the QUERY function. Here’s the general syntax:

QUERY(data, query, [headers])

Within this syntax, data is the range of cells on which we want to perform the query, and query is the query we want to perform. The optional headers parameter is the number of header rows to be ignored at the top of data. If this term is omitted (or set to -1) then the value is guessed by Sheets based on the contents of the parameter data.

Let’s look at a sheet we used before, in this post:

Click image to enlarge

Click image to enlarge

Let’s suppose that somewhere in the sheet – say, column D – I want to select and list only the students who achieved a B or a C grade.

Here, then, parameter data corresponds to the range A2:C (we’ll write it that way so it can cope with any number of rows of data). Our query parameter is:

"select A where C='B' or C='C'"

In other words, select column A (the student’s name) where column C (the student’s grade) contains a value of B or a value of C.

Here we can ignore the third, optional parameter, so our cell D1 holds the expression:

=query(A1:C, "select A where C='B' or C='C'")

Here’s the sheet with the selection made:

Click image to enlarge

Click image to enlarge

Note that, had my data been in a different sheet, I could have expressed the data parameter using the format:

Sheetname!A2:C

Note that not all of the SQL language syntax is supported in this language (which is officially called the Google Visualization API Query Language). It is a subset of SQL with a few added feature of its own. Here’s a list of the expressions you can use:

select – Select which columns to return, and in what order, if omitted, all the table’s columns are returned, in their default order
where – Return only rows that match a condition. if omitted, all rows are returned.
group by – Aggregates values across rows
pivot – Transforms distinct values in columns into new column
order by – sorts rows by values in columns
limit – Limits the number of returned rows
offset – skips a given number of first rows
label – sets column labels
format – formats the values in certain columns using given formatting patterns
options – sets additional options

In fact, you can also perform such queries on remote spreadsheets, accessed and displayed as web pages, by adding extra parameters to the URL, which I’ll look at in a future post.