#6 in Data warehousing books
Use arrows to jump to the previous/next product

Reddit mentions of Kimball's Data Warehouse Toolkit Classics, 3 Volume Set

Sentiment score: 2
Reddit mentions: 3

We found 3 Reddit mentions of Kimball's Data Warehouse Toolkit Classics, 3 Volume Set. Here are the top ones.

Kimball's Data Warehouse Toolkit Classics, 3 Volume Set
Buying options
View on Amazon.com
or
    Features:
  • Wiley
Specs:
Height9.200769 Inches
Length7.40156 Inches
Number of items1
Weight6.31844842892 Pounds
Width3.999992 Inches

idea-bulb Interested in what Redditors like? Check out our Shuffle feature

Shuffle: random products popular on Reddit

Found 3 comments on Kimball's Data Warehouse Toolkit Classics, 3 Volume Set:

u/PrettyMuchAVegetable · 5 pointsr/datascience

Alright, Example time. I hope this isn't TMI but I'm open to answering more questions.

I decided to use Oracle 11g R2 since I am familiar with it but a lot of my code would be portable to another vendor's implementation of SQL.

You can use www.sqlfiddle.com and choose Oracle 11g R2 from the list of Database Vendors.

First we have to put your data into the database, use the left "build schema" box with the following code to create 2 database tables and 1 primary key object:

CREATE TABLE STG_TABLENAME (
ID1 VARCHAR2(2) ,
ID2 VARCHAR2(2) ,
ID3 VARCHAR2(2) ,
"2018" INT,
"2019" INT,
"2020" INT
)
;
--Create a core table, for retaining historical data
CREATE TABLE CORE_TABLENAME (
ID1 VARCHAR2(2) ,
ID2 VARCHAR2(2) ,
ID3 VARCHAR2(2) ,
MEASURE_YEAR INT,
MEASURE INT,
CONSTRAINT CDT_PK PRIMARY KEY (ID1,ID2,ID3,MEASURE_YEAR)
) ;



--Give our stg table your sample data
INSERT INTO STG_TABLENAME VALUES ('X', 'AA', 'T1', 20, 30, 48);
INSERT INTO STG_TABLENAME VALUES ('X', 'AA', 'T2', 33, 6666, 66);
INSERT INTO STG_TABLENAME VALUES ('X', 'B', 'T1', 2, 555, 5);
INSERT INTO STG_TABLENAME VALUES ('Y', 'C', 'T2', 5, 6, 6);

--PIVOT THE STG TABLE AND INSERT THE RESULTS INTO THE CORE TABLE
INSERT INTO CORE_TABLENAME (ID1,ID2,ID3,MEASURE_YEAR,MEASURE)
SELECT ID1,ID2,ID3,"2018" as "MEASURE_YEAR","MEASURE" FROM STG_TABLENAME
UNPIVOT ( "MEASURE" FOR "2018" IN (
"2018" as '2018',
"2019" as '2019',
"2020" as '2020'
)
)
;

The STG_TABLENAME is built to accept your raw data without any questions or constraints as to duplicate keys. The idea is that you will keep this table empty most of the time, except when there is new data to be pushed to the CORE_TABLENAME table. You can use your application/load script/ETL to check for constraints later, but that is beyond the scope of what I am explaining right now. This table design is not very good in the long term, it will get wider (cols) instead of long (rows) and is confusing
The INSERT commands load the sample data one row at a time into the table. This is not how we would load a large set of data into a DB but it suits for small datasets (or programmatic entries of small sets etc).
I used a Primary KEY on the CORE_DATATABLE for multiple reasons, first it explicitly defines the table key for us, second it automatically creates an index on the key fields, and third it ensures these fields are NOT NULL and never duplicated, you don't want duplicate keys in your table, and finally this table is designed with keeping historical records in mind so it may get quite large (long due to rows).

The final Insert pivots your data into a more useful long term format like u/seanv507 suggested. The code might look a bit complicated but pivoting/unpivoting is normal when data is coming in from the real world where long term db storage is not always top of mind so you would be wise to learn it if the data ingestion is part of your role. Really this pivoting could be done in any language, SQL, Python, or even powershell. An ETL Tool like Informatica or open source Talend or Pentaho can handle this as well. (Sorry I might be way outside the scope of your question now), anyway lets look at querying both of these tables we've made. In the Right hand side SQL Query box you can put all or any of these individual queries:

--See everything in both tables
SELECT FROM STG_TABLENAME;
SELECT
FROM CORE_TABLENAME;

--Make use of the select list to see only columns you want
SELECT ID1,ID2,ID3,2018 FROM STG_TABLENAME;
SELECT ID1,ID2,ID3,MEASURE_YEAR,MEASURE FROM CORE_TABLENAME;

--Make use of the WHERE Clause to select only records you want to see
SELECT ID1,ID2,ID3,"2018" FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1';

SELECT ID1,ID2,ID3,MEASURE_YEAR,MEASURE FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR=2018;

--Summing up the year 2018 under both table structures
SELECT ID1,ID2,ID3,SUM("2018") as SUM_2018 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,MEASURE_YEAR,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR=2018
GROUP BY ID1,ID2,ID3,MEASURE_YEAR;

--Summing and grouping multiple years under both table designs
SELECT ID1,ID2,ID3,SUM("2018") as SUM_2018, SUM("2019") as SUM_2019 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,MEASURE_YEAR,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR IN (2018,2019) -- Note the use of the IN operator
GROUP BY ID1,ID2,ID3,MEASURE_YEAR;

--SUMMING Multiple years but grouping only by ID1,ID2 and ID3
SELECT ID1,ID2,ID3,SUM("2018" + "2019") as SUM_18_19 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR IN (2018,2019) -- Note the use of the IN operator
GROUP BY ID1,ID2,ID3;

I chose a variety of different queries to highlight different aspects of SQL querying for you. The -- comments out the line that follows it, so I made short notes as to what the queries are doing.

To learn more about SQL I always recommend people take the Stanford DB5 open course and the DB 8 course on relational modelling.

https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about

To learn more about data warehousing I reccomend reading some Kimball:
https://www.amazon.ca/Kimballs-Data-Warehouse-Toolkit-Classics/dp/1118875184/ref=sr_1_3?s=books&ie=UTF8&qid=1537275891&sr=1-3

Yes there are other design paradigms, this is the one I find works best in real world situations that I tend to run into in my practice.

Feel free to ask me any questions that pop up, I'm happy to talk SQL or DW. Also don't get discouraged if you find any of this overwhelming or complex. It's actually very approachable and with a bit of effort, a course or some reading combined with practice you'll be humming along in SQL in no time. Keep in mind a couple of these statements/queries (I'm looking at you UNPIVOT)took me 2-3 times to get the syntax right, and I've been doing SQL for 9 years and have an Oracle cert in SQL.

u/evolving6000 · 1 pointr/PowerBI

amazon has some great books on this. You can also search YouTube for dimensional modeling. Follow William McKnight for great info on this too.