#7,630 in Computers & technology books

Reddit mentions of The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling

Sentiment score: 0
Reddit mentions: 4

We found 4 Reddit mentions of The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling. Here are the top ones.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
Buying options
View on Amazon.com
or
Specs:
Release dateJuly 2013

idea-bulb Interested in what Redditors like? Check out our Shuffle feature

Shuffle: random products popular on Reddit

Found 4 comments on The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling:

u/SQLSavant · 3 pointsr/datascience

If you're working in an enterprise environment, then most likely your data will live - at the source, in a transaction-based database (OLTP). For this, I'd recommend Database Design for Mere Mortals - it's a well written book that is more heavily based on the practical application of how your data is architected, designed and stored and less on the theoretical side of things - but it's written in a way I feel most any learned person can understand. For theoretical review, there's always the seminal work of E.F. Codd's A Relational Model For Large Shared Data Banks and also some of his follow up work The Relational Model for Database Management

From the analytical database side of things (Data Warehouses/BI Solutions) and, where hopefully you'll actually be pulling and manipulating your data from there is The Definitive Guide to Dimensional Modeling - this is a more verbose read - and not practical, but more thought experiment provoking and includes the business reasons why dimensional modeling should be used so that Data Science/Data Analytics professionals can get at their data - nevertheless - for most large companies this is the "foundation" by which your data sits on if you're a Data Scientist. I, unfortunately, do not have a good recommendation for the practical application of OLAP databases as I've never found one that generally tickled my fancy.

Just skimming through these and periodically reading through them should at least give you an idea about how your data is stored, which more importantly gives you an idea around how it can be pulled and manipulated by the systems within your company.

As an example, I had a hard time explaining once to a research assistant why I couldn't 100% match two free-text string fields with names in them to one another in a large data set. I tried explaining to him that while there is fuzzy string matching algorithms I can apply to a given data set (Like Jaro-Winkler or Levenshtein), that it wasn't always 100% and was an approximation - I guess he wanted me to further the field of Computer Science by making fuzzy string matching 100% and therefore doing what many CS and Stats gurus haven't been able to do -shrugs-.

u/arbiter_of_tastes · 2 pointsr/datascience

Definitely. Don't forget atomic modeling, either. As I said, I'm not an architect, but this book I see commonly referenced around data warehouse modeling:
https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional-ebook/dp/B00DRZX6XS/ref=sr_1_1?ie=UTF8&qid=1539091831&sr=8-1&keywords=kimball+modeling

It's probably also worth pointing out that data warehouse design can be simplified by obtaining a pre-built data model (either purchased from someone like IBM for specific sectors, or obtained open-source, such as OMOP for healthcare), or data warehouse design and modeling can be outsourced to consultants that do it for you. Those may not be great options for you, but if you have a mission critical need for your business, it might be nice to have access to an experienced architect.