There was a question on the Udemy.com PostgreSQL Bootcamp : Go From Beginner to Advanced, 60+hours course lesson on \crosstabview
which asked:
What is the exact purpose of rank here? Everything seems randomised for me.
My answer, copied here because I’ve been neglecting adding content, is as follows:
\crosstabview sorting
It’s hard to grasp, but here’s my understanding after reading the PostgreSQL Wiki (starting at https://wiki.postgresql.org/wiki/Crosstabview#Independant_sort_of_the_horizontal_header.2C_with_additional_column):
When the query’s ORDER BY is already dedicated to sort on the leftmost column (vertical header), an additional sort column can be added for the horizontal header
\crosstabview
…
Here the vertical header is sorted by way of the ORDER BY clause, and the horizontal header is sorted independantly, so we get the desired result of having both headers sorted.
We can see the different sortation here, where the rows are sorted by location
via ORDER BY
:
SELECT
location,
year,
SUM(raindays) AS sum_days
FROM crosstabs.rainfalls
GROUP BY
location,
year
HAVING
SUM(raindays) > 100
ORDER BY
location
\crosstabview
location | 2012 | 2017 | 2014 | 2013 | 2016 | 2015
-----------+------+------+------+------+------+------
Dubai | 112 | 102 | 111 | 116 | 110 |
France | 114 | 110 | 113 | 111 | 120 |
Germany | 153 | 113 | 136 | 120 | 138 | 128
London | 116 | 110 | 137 | 138 | 117 | 101
Qatar | | | 111 | 116 | 102 |
Singapore | 124 | 110 | 142 | 132 | 106 | 111
Sydney | | 115 | 110 | 120 | 122 |
(7 rows)
Note the column headers are unsorted (2012
, 2017
, 2014
,…)
To sort those as well, we give a sort column parameter to \crosstabview
:
SELECT
location,
year,
SUM(raindays) AS sum_days
FROM crosstabs.rainfalls
GROUP BY
location,
year
HAVING
SUM(raindays) > 100
ORDER BY
location
\crosstabview location year sum_days year
location | 2012 | 2013 | 2014 | 2015 | 2016 | 2017
-----------+------+------+------+------+------+------
Dubai | 112 | 116 | 111 | | 110 | 102
France | 114 | 111 | 113 | | 120 | 110
Germany | 153 | 120 | 136 | 128 | 138 | 113
London | 116 | 138 | 137 | 101 | 117 | 110
Qatar | | 116 | 111 | | 102 |
Singapore | 124 | 132 | 142 | 111 | 106 | 110
Sydney | | 120 | 110 | | 122 | 115
(7 rows)
Nice – now sorted both vertically and horizontally.
But… what if we want it sorted by amount of rainy days?
We RANK()
the SUM()
:
SELECT
location,
year,
SUM(raindays) AS sum_days,
RANK() OVER (ORDER BY SUM(raindays)) AS ranking
FROM
crosstabs.rainfalls
GROUP BY
location,
year
HAVING
SUM(raindays) > 100
ORDER BY
location
\crosstabview location year sum_days ranking
location | 2017 | 2016 | 2014 | 2012 | 2013 | 2015
-----------+------+------+------+------+------+------
Dubai | 102 | 110 | 111 | 112 | 116 |
France | 110 | 120 | 113 | 114 | 111 |
Germany | 113 | 138 | 136 | 153 | 120 | 128
London | 110 | 117 | 137 | 116 | 138 | 101
Qatar | | 102 | 111 | | 116 |
Singapore | 110 | 106 | 142 | 124 | 132 | 111
Sydney | 115 | 122 | 110 | | 120 |
(7 rows)
Our ORDER BY
still sorts the rows by location
(first column), but the remaining columns are sorted by the data values for the first row.
i.e. Dubai
: 102
, 110
, 111
, 112
, 116
Hopefully this helps someone, it helped me to understand better.
It must be said that the course title “From Beginner to Advanced 60+ Hours” is fairly accurate – it’s quite advanced.
But if you type up the code in the ~629 videos, or even any significant number of them, it’s gonna take a lot more than 60 hours.
And, if you make scripts with the SQL, run those scripts, refine the output, read the documentation to feel the topic is sufficiently understood?
Oof, HUNDREDS of hours.
But yeah, it’s getting a good degree of expertise by doing so.