Reddit mentions: The best data warehousing books

We found 70 Reddit comments discussing the best data warehousing books. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 24 products and ranked them based on the amount of positive reactions they received. Here are the top 20.

11. DAX Patterns 2015

DAX Patterns 2015
Sentiment score: 1
Number of mentions: 1
▼ Read Reddit mentions

16. SQL: Learn SQL the easy way

SQL: Learn SQL the easy way
Sentiment score: 1
Number of mentions: 1
▼ Read Reddit mentions

20. Google BigQuery Analytics

Google BigQuery Analytics
Sentiment score: 1
Number of mentions: 1
▼ Read Reddit mentions

idea-bulb Interested in what Redditors like? Check out our Shuffle feature

Shuffle: random products popular on Reddit

Top Reddit comments about Data Warehousing:

u/PrettyMuchAVegetable · 5 pointsr/datascience

Alright, Example time. I hope this isn't TMI but I'm open to answering more questions.

I decided to use Oracle 11g R2 since I am familiar with it but a lot of my code would be portable to another vendor's implementation of SQL.

You can use www.sqlfiddle.com and choose Oracle 11g R2 from the list of Database Vendors.

First we have to put your data into the database, use the left "build schema" box with the following code to create 2 database tables and 1 primary key object:

CREATE TABLE STG_TABLENAME (
ID1 VARCHAR2(2) ,
ID2 VARCHAR2(2) ,
ID3 VARCHAR2(2) ,
"2018" INT,
"2019" INT,
"2020" INT
)
;
--Create a core table, for retaining historical data
CREATE TABLE CORE_TABLENAME (
ID1 VARCHAR2(2) ,
ID2 VARCHAR2(2) ,
ID3 VARCHAR2(2) ,
MEASURE_YEAR INT,
MEASURE INT,
CONSTRAINT CDT_PK PRIMARY KEY (ID1,ID2,ID3,MEASURE_YEAR)
) ;



--Give our stg table your sample data
INSERT INTO STG_TABLENAME VALUES ('X', 'AA', 'T1', 20, 30, 48);
INSERT INTO STG_TABLENAME VALUES ('X', 'AA', 'T2', 33, 6666, 66);
INSERT INTO STG_TABLENAME VALUES ('X', 'B', 'T1', 2, 555, 5);
INSERT INTO STG_TABLENAME VALUES ('Y', 'C', 'T2', 5, 6, 6);

--PIVOT THE STG TABLE AND INSERT THE RESULTS INTO THE CORE TABLE
INSERT INTO CORE_TABLENAME (ID1,ID2,ID3,MEASURE_YEAR,MEASURE)
SELECT ID1,ID2,ID3,"2018" as "MEASURE_YEAR","MEASURE" FROM STG_TABLENAME
UNPIVOT ( "MEASURE" FOR "2018" IN (
"2018" as '2018',
"2019" as '2019',
"2020" as '2020'
)
)
;

The STG_TABLENAME is built to accept your raw data without any questions or constraints as to duplicate keys. The idea is that you will keep this table empty most of the time, except when there is new data to be pushed to the CORE_TABLENAME table. You can use your application/load script/ETL to check for constraints later, but that is beyond the scope of what I am explaining right now. This table design is not very good in the long term, it will get wider (cols) instead of long (rows) and is confusing
The INSERT commands load the sample data one row at a time into the table. This is not how we would load a large set of data into a DB but it suits for small datasets (or programmatic entries of small sets etc).
I used a Primary KEY on the CORE_DATATABLE for multiple reasons, first it explicitly defines the table key for us, second it automatically creates an index on the key fields, and third it ensures these fields are NOT NULL and never duplicated, you don't want duplicate keys in your table, and finally this table is designed with keeping historical records in mind so it may get quite large (long due to rows).

The final Insert pivots your data into a more useful long term format like u/seanv507 suggested. The code might look a bit complicated but pivoting/unpivoting is normal when data is coming in from the real world where long term db storage is not always top of mind so you would be wise to learn it if the data ingestion is part of your role. Really this pivoting could be done in any language, SQL, Python, or even powershell. An ETL Tool like Informatica or open source Talend or Pentaho can handle this as well. (Sorry I might be way outside the scope of your question now), anyway lets look at querying both of these tables we've made. In the Right hand side SQL Query box you can put all or any of these individual queries:

--See everything in both tables
SELECT FROM STG_TABLENAME;
SELECT
FROM CORE_TABLENAME;

--Make use of the select list to see only columns you want
SELECT ID1,ID2,ID3,2018 FROM STG_TABLENAME;
SELECT ID1,ID2,ID3,MEASURE_YEAR,MEASURE FROM CORE_TABLENAME;

--Make use of the WHERE Clause to select only records you want to see
SELECT ID1,ID2,ID3,"2018" FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1';

SELECT ID1,ID2,ID3,MEASURE_YEAR,MEASURE FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR=2018;

--Summing up the year 2018 under both table structures
SELECT ID1,ID2,ID3,SUM("2018") as SUM_2018 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,MEASURE_YEAR,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR=2018
GROUP BY ID1,ID2,ID3,MEASURE_YEAR;

--Summing and grouping multiple years under both table designs
SELECT ID1,ID2,ID3,SUM("2018") as SUM_2018, SUM("2019") as SUM_2019 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,MEASURE_YEAR,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR IN (2018,2019) -- Note the use of the IN operator
GROUP BY ID1,ID2,ID3,MEASURE_YEAR;

--SUMMING Multiple years but grouping only by ID1,ID2 and ID3
SELECT ID1,ID2,ID3,SUM("2018" + "2019") as SUM_18_19 FROM STG_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
GROUP BY ID1,ID2,ID3;

SELECT ID1,ID2,ID3,SUM(MEASURE) FROM CORE_TABLENAME
WHERE ID1='X'
AND ID2='AA'
AND ID3='T1'
AND MEASURE_YEAR IN (2018,2019) -- Note the use of the IN operator
GROUP BY ID1,ID2,ID3;

I chose a variety of different queries to highlight different aspects of SQL querying for you. The -- comments out the line that follows it, so I made short notes as to what the queries are doing.

To learn more about SQL I always recommend people take the Stanford DB5 open course and the DB 8 course on relational modelling.

https://lagunita.stanford.edu/courses/DB/2014/SelfPaced/about

To learn more about data warehousing I reccomend reading some Kimball:
https://www.amazon.ca/Kimballs-Data-Warehouse-Toolkit-Classics/dp/1118875184/ref=sr_1_3?s=books&ie=UTF8&qid=1537275891&sr=1-3

Yes there are other design paradigms, this is the one I find works best in real world situations that I tend to run into in my practice.

Feel free to ask me any questions that pop up, I'm happy to talk SQL or DW. Also don't get discouraged if you find any of this overwhelming or complex. It's actually very approachable and with a bit of effort, a course or some reading combined with practice you'll be humming along in SQL in no time. Keep in mind a couple of these statements/queries (I'm looking at you UNPIVOT)took me 2-3 times to get the syntax right, and I've been doing SQL for 9 years and have an Oracle cert in SQL.

u/mitcharoni · 3 pointsr/SQLServer
  • MDX Solutions by George Spofford and others - I consider this one a bible of sorts for MDX. Lots of explanations and in the MDX reference section LOTS of practical examples. A little dated as it's for 2005 but still very very useful. For $10 used on Amazon, don't pass it up.
  • Practical MDX Queries by Art Tennick - A much smaller but very dense "learning by example" reference for MDX. I love this book. Used for $17 on Amazon.
  • MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook by Tomislav Piasevoli - A really, really good book on MDX. Really great examples.
  • Expert Cube Development with SSAS Multidimensional Models by Chris Webb and Marco Russo - I don't have this book but considering the authors I have to think it's an awesome book from a great publisher (Packt).
  • There's also a great reference site called MDXpert which covers most if not all MDX functions and syntax although not in terribly great detail.

    The MDX language hasn't really changed all that much, so don't worry too much about getting an older reference for 2005/2008.

    You're not going to find an "all-in-one" reference for OLAP and MDX. You'll need more than one reference just because there's so much to cover. And my advice is don't assume that any SQL knowledge you have will translate to OLAP/MDX. It's a totally different animal. It took me years to master and I had 10+ years of experience with SQL Server and Oracle when I started learning OLAP/MDX....which didn't mean squat.
u/randumnumber · 4 pointsr/oracle

ohh "set things up" is a very very wide term. OBIEE can do a ton of stuff. First do you have a data warehouse? What is the source of your data? I can give you the basics. OBIEE uses a metadata repository its called and RPD this is the source of all queries. You pull metadata from your source and then build out the RPD through a physical -> Business -> Presentation layer. The Business layer can do quite a bit of work for you in terms of combining dimensions and joins but you want as much of a star schema as possible from the source. Read Kimballs book listed below to understand star schema and warehousing concepts.

Inside of the OBI admin tool there is also some user management, user management isa whole nother aspect. Are you using some ldap authentiacaiton or will you be managing users though obiee? There are USERS, GROUPS, & ROLES. This is another aspect to deal with.

There is also the EM web portal, Enterprise Manager from here you do other management of users and roles and the actual services. This is another thing, where is this hosted? Do you already have OBIEE 11g set up on a server? If so you will need access to that box to do services management. Also may need to modify config files here.

Then there is the actual reporting service, OBIEE uses dimensions and a fact to create charts, pivot tables etc. Here you will log into the web front end this would be accessed by going to http://servername:port/analytics From here you log in as your development user by default its weblogic i beileve. And here is where you would create dashboards etc.

This is just one aspect of the tool set, there is also BIP (bi publisher) used to develop reports from various sources by creating a template and filling the template out by using XML.

Oracle offers classes, which if your managment is throwing you into OBIEE they should be giving you at least 1 class. The report building stuff is easy enough to pick up, but if you are responsible for the management of the server, you need a class.. there is just so much to know about it.

I have worked in the RPD and reports/dashboard building side of things for 2 years. and im still learning stuff (usually the limitations of OBIEE). We have a whole nother TEAM(TEAM) of people who manage the databases and server side.

Resources:

Get a subscription to METALINK from oracle to issue service requests and look up bug fixes etc.

https://login.oracle.com/mysso/signon.jsp

Blogs:

http://www.rittmanmead.com/
http://gerardnico.com/

There are also youtube videos to explain simple stuff for setting up and RPD etc. You can also download an entire sample setup of OBIEE 11g from oracle.. its a huge download 50gb or something like that, but it has database, RPD, sample reports. all in a virtual machine. You can spend a week setting it up just to have examples to work from.

There is plenty of resources, but to give 1 generalized resource is difficult, you need to search for specific things you need to do. "Installing obiee11g on linux" "importing meta data into RPD"

If you need books on Data Warehousing and explanations of STAR schema and data denormalization I suggest reading up on kimball method:

http://www.amazon.com/The-Data-Warehouse-Toolkit-Dimensional/dp/1118530802/ref=sr_1_1?ie=UTF8&qid=1377213718&sr=8-1&keywords=kimball

and

Inmon

http://www.amazon.com/Building-Data-Warehouse-W-Inmon/dp/0764599445/ref=sr_1_2?ie=UTF8&qid=1377213827&sr=8-2&keywords=inmon

They have different philosophies for data warehousing i personally subscribe to the Kimball method because it supports rapid development better.


I'd like you to know but not discourage you, this is a large undertaking for 1 person. We manage 2 RPD's and 2 sets of dashboards for a custom reporting application we also do the ETL and warehousing. The whole warehouse was set up by a team, then we moved in ETL is handled by another team of people and we have a team doing reporting, then there is management and functional. So building out an OBIEE implementation from the ground up doing warehousing is a huge undertaking. There is another team of people doing server management and upgrades, and migrations.

This is at least a 3 man job, with each person being specialized. Push for RPD traning, Server managment Traning, and dashboard design Training. Warehousing methods and ETL work is another story.

u/heroicjunk · 1 pointr/mentors

Believe it or not, attitude is half (if not more) of the battle. Keep doing what you are doing to continuously expand and build upon your skills.

With that said, I cannot stress highly enough taking a look at the Kimball Group Reader. It is chock full of real-world scenarios and is structured in a way that lends itself great to newcomers and seasoned pros alike. I think this will help to supplement what you are learning with the AdventureWorks system at the moment.

Once you familiarize yourself with SQL Server, I would recommend researching Oracle (Database, OLAP, OBIEE) and other technologies to better understand how they are alike, but also how they differ.

A cool way to learn more about data analysis is to try and apply it to something that interests you and expand from there. For example, if you like sports, you can mine football, baseball, etc... statistics for analysis. Find something that interests you and it may help reinforce your learning, especially on the parts that you may not find as satisfying to learn.

My background: I have been in Data Warehousing for about 8 years now and in my current role I flex between an Oracle DBA and DW Architect.

Best of luck to you and continue to strive learning new things!

u/thibaut_barrere · 6 pointsr/rails

It's hard to provide a full answer just based on available information, but roughly you have many different ways to achieve what you have in mind.

Some families of ways to handle this:

  • ETL (Extract Transform Load) -> extract from CSV, during the process transform / clean / remap any field, if you don't do incremental reload you could also dedupe in the transform step, then load a "clean dataset" into either Postgres, ElasticSearch, etc
  • ELT (Extract Load Transform) -> extract from CSV, dump right into ES or PG (mostly unchanged), then modify there (or query a "temporary dataset" to do a sort of ETL, clean / filter etc, and pour the data into a more "final" destination in the same datastore

    What's the most adequate way to do this depends on various factors:

  • Do you want to deduplicate inside a single CSV (which can be achieved in memory before loading), or cross-CSVs (in which case you need a business key, with unique constraint, and do "upserts", or at least verify if you must drop the rows by checking id presence before)
  • Do you have many different CSV formats or are them quite different? (if they are quite different, it's often more easy to go ETL, to have a very flexible & well tested way to verify the mappings & conversions etc)
  • Are the outputs mostly largely similar with a bit of different fields, or mostly completely different?

    Finally, here are some tools which can help:

  • My own gem https://www.kiba-etl.org (which I use both for ETL & ELT scenarios)
  • Ruby Sequel https://github.com/jeremyevans/sequel (which is actually used by my upcoming Kiba Pro offer for database related tasks)
  • pgloader https://github.com/dimitri/pgloader
  • embulk http://www.embulk.org/docs/

    If you are into this for the long term, it can be worth reading a book that I often mention, which is the ETL book by Ralph Kimball. While quite old, it provides interesting patterns.

    Happy to detail this more if you provide more input!
u/hagemajr · 1 pointr/AskReddit

Awesome! I kind of fell into the job. I was initially hired as a web developer, and didn't even know what BI was, and then got recruited by one of the BI managers and fell in love. To me, it is one of the few places in IT where what you create will directly impact the choices a business will make.

Most of what I do is ETL work (taking data from multiple systems, and loading them into a single warehouse), with a few cubes (multidimensional data analaysis) and SSRS report models (logical data model built on top of a relational data store used for ad hoc report creation). I also do a bit of report design, and lots of InfoPath 2010 + SharePoint 2010 custom development.

We use the entire Microsoft BI stack here, so SQL Server Integration (SSIS), Analysis (SSAS), and Reporting Services (SSRS). Microsoft is definitely up and coming in the BI world, but you might want to try to familiarize yourself with Oracle BI, Business Objects, or Cognos. Unfortunately, most of these tools are very expensive and not easy to get up and running. I would suggest you familiarize yourself with the concepts, and then you will be able to use any tool to apply them.

For data warehousing, check out the Kimball books:

Here and here and here

For reporting, get good with data visualizations, anything by Few or Tufte, like:

Here and here

For integration, check these out:

Here and here

Also, if you're interested in Microsoft BI (SSIS, SSAS, SSRS) check out this site. It has some awesome videos around SSAS that are easy to follow along with.

Also, check out the MSDN BI Blog: http://blogs.msdn.com/b/bi/

Currently at work, but if you have more questions, feel free to shoot me a message!

u/NAMOS · 10 pointsr/onions

Basically any SRE advice for a normal service but replace/compliment HAproxy / nginx / ingress controller / ELB with the Tor daemon / OnionBalance.

I run Ablative Hosting and we have a few people who value uptime over anonymity etc and so we follow the usual processes for keeping stuff online.

Have multiples of everything (especially stuff that doesn't keep state), ensure you have monitoring of everything from connections, memory pressure, open files, free RAM etc etc.

Just think of the Tor daemon onion service as just a TCP reverse proxy, with load-balancing capability and then follow any other advice when it comes to building reliable infrastructure;

u/CarbonChauvinist · 3 pointsr/PowerBI

Agree completely the Purple Book is the Bible! And yes, wait for the 2nd edition at this point as it's very close. They also just released version 2 of their Mastering DAX video class on SQLBI as well which I'm looking forward to going back over.

DAX patterns is amazing as well, but maybe easier to access via web and I'm sure that will be seeing a new edition soon.

I cant' stress enough how useful the Analyzing Data/Orange Book was for me personally. It was more of a high-level overview of best practices for constructing data models which was crucial for me coming from unrelated backgrounds.

u/flipstables · 1 pointr/datascience

I'm very much a data science newbie. But here's my attempt to answer your question.

Excel is both very powerful and very limited in data analysis.

Pros: a lot of quick and dirty analysis can be done using Excel. Data manipulation is easy and fast. Particularly with Excel 2013, it's very easy to mash data together from different data sources. PowerPivot is a nice way to "import" data from SQL sources. Popular data mining addin = XLMiner.

Cons: Limited in size of data sets, and it can get extremely slow and unstable with large amounts of data. Analysis is very biased towards financial applications. Programming is Excel is very clunky (I'd rather let python do the heavy lifting if necessary).

Excel really needs a database engine and/or business intelligence backend to perform a lot of the heavy analytics. Nevertheless, it's a good tool.

For resources, I was recently introduced to this book, and it seems promising:

http://www.amazon.com/Data-Analysis-Using-SQL-Excel/dp/0470099518

u/r00t_4orce · 3 pointsr/commandline

Upvotes to the mentions of Powershell - it is really powerful and excellent to use.

However - good'ol DOS and batch files fuel some of my top functions at work. Here are some links that might help:

u/[deleted] · 2 pointsr/BusinessIntelligence

I know this post is kinda stale, but do yourself a favor and pick up the Kimball Group Reader book. Fantastic for understanding the concepts behind designing and building dimensions, facts, and how they represent various business processes.

From what you say, you sound like a "senior" data analyst: you write queries that give correct results in a performant manner. You could probably look for a "junior" DWH engineer role which should pay about the same or maybe 10% less per year. However, as you grow in experience, the top end for relational DWH engineers is around US$110,000 per year while I would expect a data analyst to top out around US$70,000 per year.

Aside from learning DWH concepts, I'd say learn Tableau or Power BI as a data analyst should know not just how to get to the data, but how to present it in a meaningful manner. Learn about the "concepts" of ETL and the patterns they implement. Again, the book I referenced above will give some good examples of what should be done within each letter of E-T-L. These are the most logical and attainable next steps in your career. From there, once you understand more about ETL, the next logical step is to start thinking about how all the various systems integrate with one-another.

u/babypng · 1 pointr/mentors

Reading this post, two books immediately come to mind:

Microsoft BI Exam Training Kit - This is almost exactly what you described with "I would like someone to give me next steps as suggesting some real world scenarios with (for example) the adventureworks database". It shows you how to build ETL's in SSIS using the adventureworks sample databases, build cubes in SSAS using the stuff you made in SSIS, and how to build reports in SSRS. I think this particular test has been deprecated so and I don't know if you could use this knowledge to turn around an get a MS certificate anymore, but the book itself has some great examples.

The Microsoft Data Warehouse Toolkit - This book is essentially the blue print of how to implement a Kimball-style DW/BI solution using the SQL Server tools you are already learning. This book is the closest I've read to what life is like for B.I. folks (granted it assumes starting a new project from the ground up, not maintaining an existing solution which comes with its own set of headaches learning opportunities). Downside here is it obviously doesn't expose you to the other primary BI ethos of Bill Inmon. However in my opinion there is plenty of info for free on the web about the ever present Kimball vs Inmon methods.

u/LittleOlaf · 32 pointsr/humblebundles

Maybe this table can help some of you to gauge how worth the bundle is.

| | | Amazon | | | Goodreads | |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------|---------|--------------|-----------|--------------|
| Tier | Title | Kindle Price ($) | Average | # of Ratings | Average | # of Ratings |
| 1 | Painting with Numbers: Presenting Financials and Other Numbers So People Will Understand You | 25.99 | 3.9 | 20 | 4.05 | 40 |
| 1 | Presenting Data: How to Communicate Your Message Effectively | 26.99 | 2.9 | 4 | 4.25 | 8 |
| 1 | Stories that Move Mountains: Storytelling and Visual Design for Persuasive Presentations | - | 4.0 | 13 | 3.84 | 56 |
| 1 | Storytelling with Data: A Data Visualization Guide for Business Professionals (Excerpt) | 25.99 | 4.6 | 281 | 4.37 | 1175 |
| 2 | 101 Design Methods: A Structured Approach for Driving Innovation in Your Organization | 22.99 | 4.2 | 70 | 3.98 | 390 |
| 2 | Cool Infographics: Effective Communication with Data Visualization and Design | 25.99 | 4.3 | 39 | 3.90 | 173 |
| 2 | The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions | 31.71 | 3.8 | 43 | 3.03 | 35 |
| 2 | Visualize This: The FlowingData Guide to Design, Visualization, and Statistics | 25.99 | 3.9 | 83 | 3.88 | 988 |
| 3 | Data Points: Visualization That Means Something | 25.99 | 3.9 | 34 | 3.87 | 362 |
| 3 | Infographics: The Power of Visual Storytelling | 19.99 | 4.0 | 38 | 3.79 | 221 |
| 3 | Graph Analysis and Visualization: Discovering Business Opportunity in Linked Data | 40.99 | 4.2 | 3 | 3.59 | 14 |
| 3 | Tableau Your Data!: Fast and Easy Visual Analysis with Tableau Software, 2nd Edition | 39.99 | 4.0 | 66 | 4.14 | 111 |
| 3 | Visualizing Financial Data | 36.99 | 4.7 | 4 | 3.83 | 6 |

u/onetwosex · 1 pointr/datascience

OP, so that you know, you mention uncle Bob's "Clean Code", but your link redirects to the book "Clean Coder". They're both great, but different.

I've ordered the book Practical Statistics for Data Scientists: 50 Essential Concepts. Looks great to brush up the basics of statistics and machine learning. Since I haven't actually read it yet, take my input with a grain of salt.

u/VaporDotWAV · 2 pointsr/SQLServer

This is the real answer.

While everyone else is talking about which reporting tool to implement, you still need to learn about the architectural structure of the environment (extraction, staging, cleaning, conforming, loading) and then how to structure the final fact and dimension tables.

The great thing is that once you get your "back room" in order, the SQL structures should be able to plug into any sort of presentation layer you want to have, whether it's SSAS, SSRS, Tableau, or PowerBI.

I'd recommend The Kimball Group Reader, Relentlessly Practical... to give you a pretty good starting base on just about anything related to data warehousing.

I would also recommend asking for a data analyst. As a developer/engineer, you should be building structures and processes that will allow for a specialist to get the data the business needs. Instead of business clients bugging you incessantly forcing you to context switch, they will instead call upon the analyst to tweak the reports and conduct the deep-dives on questions they have.

EDIT: the cookbook parent might be talking about is Data Warehouse Design Solutions. They talk about common designs based on several common industries and different reporting levels as your CxO suite will view things at a higher level than your individual specialized department heads.

u/Yehosua · 22 pointsr/programming

Whenever I get an email about a new programming Humble Bundle, I hop over to Reddit to see if anyone else thinks it's worth buying. In this case, Reddit has failed me, because no one has shared their opinions. All is not lost, however, for I can share mine!

These are probably the most commonly recommended DevOps books:

  • The Phoenix Project - written in the form of a novel to teach DevOps principles. (I've read it. It's awesome.)
  • The DevOps Handbook - a non-fiction book from the authors of The Phoenix Project. (I've started it.)
  • Site Reliability Engineering - "SRE" is more or less Google's term for DevOps. This book is more or less how Google does DevOps. (I've not read it. It's available online for free.)
  • Accelerate: The Science of Lean Software and DevOps - Martin Fowler calls this the best programming book of this year. (I've not read it.)
  • The Site Reliability Workbook - a sequel of sorts to Site Reliability Engineering. Probably less popular than the others I just listed. (I've not read it.)

    The Site Reliability Workbook is the only one of these that's included in this bundle. So the first question I ask myself regarding this bundle is, "Do I want to spend the time and money on this bundle's books, or should I spend that on one of the highly recommended books instead?" (Personally, I'm going with the latter.)

    Otherwise, most of the books here are technology-specific, so the second question is, "Do I want to learn any of these specific technologies now, and are e-books a good way of doing it?" (Personally, my answer is no.)

    Depending on how you answer the first two questions, the last question is, "Are the non-technology-specific books worth getting?" To answer that, here are Amazon links to the non-technology-specific books, for reviews and sales rankings:

  • The Site Reliability Workbook
  • Designing Distributed Systems
  • Database Reliability Engineering
  • Seeking SRE
  • Cloud Native Infrastructure
  • Practical Monitoring
  • Effective DevOps
u/elus · 4 pointsr/SQLServer

Grab a copy of dimensional modeling book from the Kimball group. This one deals with MS technology specifically.

To get a better understanding of data visualization theory and techniques I suggest reading all the material from Stephen Few and Edward Tufte. Few's book on dashboard is fantastic.

Network and meet people in the industry through events thrown by The Data Warehouse Institute. Download free datasets online and use those to gain insight into subjects you're interested in as well as to practice the skills you learn as you play around with SQL Server.

I can add more later but I have to go for the day. Feel free to post more specific questions in this thread and I'll try to answer.

u/pooerh · 5 pointsr/SQLServer

SQL Server Developer, BI Backend Developer, Data Architect, stuff like that.

The Kimball Group Reader (Amazon link, without a referral or whatever) is a must read in the business of DWH and BI. Coincidentally, I haven't read it (impossible to get where I live), so don't ask me for an opinion if it's any good!

> I try to use the same principles of code cleanliness, separation of concerns, etc from .Net dev.

The first thing I imagine when I hear something like that is the person happily converting code into user defined scalar functions (I hope you don't use them). Not all rules from procedural development apply to SQL, remember that!

u/nakun · 2 pointsr/datascience

Hi,

I am starting out studying Data Science myself (not employed in the field). Here is what has been useful to me:

Data Quest has some free lessons and they are good. (They also have a weekly newsletter that has learning resources/articles).

Practical Statistics for Data Scientists has been very helpful in getting me up to speed on statistics (note that the code here uses R, not Python).

u/seadave77 · 1 pointr/BusinessIntelligence

http://www.amazon.com/The-Kimball-Group-Reader-Relentlessly/dp/0470563109

This a great book. Bite size chunks explaining why to do a warehouse and how. Pretty much any Kimball book it seems good.

u/bbowler86 · 2 pointsr/datascience

Yeah to be honest the only thing that you get with the Enterprise version is some visualization stuff which is meh at best, an Enterprise Scheduler which doesn't even have job dependencies, and support. We had a Consultant come in from the Normandy Group before we started using it and do an evaluation between PDI and Informatica based on our needs and his conclusion was that 95% of everything we needed to do we could do with PDI and we didn't have to pay for it. It hasn't let me down except for some export to Excel stuff but you really shouldn't be doing reporting with an ETL tool anyway. There is of course performance tradeoffs between using any ETL tool and straight SQL/scripting but the amount of time you save and being able to reproduce with a tool like Pentaho make it worth it.

If you are serious about it I would suggest this book. And I mean read it. Bad code makes bad code regardless if you script this with Python or Pentaho. It is a bit of a learning curve but worth it in my opinion.

u/yahelc · 4 pointsr/dataengineering

The most important reading from a database design perspective, IMO, is one of Kimball’s books:

https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802

It’s less technically focused, and more focused on how to build good datasets. It’s an older text so it’s references to specific technologies are a bit out of date, but when it comes to describing how to design particular schemas (or at least speak the language of people who design schemas), it’s pretty much canon.

u/evolving6000 · 1 pointr/PowerBI

amazon has some great books on this. You can also search YouTube for dimensional modeling. Follow William McKnight for great info on this too.

u/FabianKiel · 1 pointr/SQLServer

I like my own book. ;) You find a free excerpt here: http://bi-solutions.gaussling.com/sql_basics_excerpt/

If you want to go directly to amazon: https://www.amazon.com/dp/B0733D7MCR/ref=cm_sw_r_cp_dp_T2_fXmtzb2XB2WRY

u/johngabbradley · 4 pointsr/BusinessIntelligence

Dimensional modeling is under valued in today's climate. Any complex models on a large scale will be more effective when modeled optimally.

https://www.amazon.com/Data-WarehouseÂ-ETL-Toolkit-Techniques-Extracting/dp/0764567578

u/gfody · 84 pointsr/programming

First don't think of this as "DBA" stuff - you're a developer, you need to know database technology, period. Read this rant by Dennis Forbes in response to Digg's CTO's complaints about databases it's very reminiscent of TFA.

Read Data and Reality by the late William Kent (here's a free copy) and get a fundamental understanding of "information" vs. "data". Then read Information Modeling and Relational Databases to pickup a couple practical approaches to modeling (ER & OR). Now read The Datawarehouse Toolkit to learn dimensional modeling and when to use it. Practice designing effective models, build some production databases from scratch, inherit some, revisit your old designs, learn from your mistakes, write lots and lots and lots of SQL (if you want to get tricky with SQL I suggest to pickup Celko's SQL for smarties - it's like the Hacker's Delight for SQL).

Many strange models you may encounter in the wild are actually optimizations. Some are premature, some outright stupid, and some brilliant, if you want to be able to tell one from the other then you're going to dive deep into internals. Do this generically with Modern Information Retrieval and Managing Gigabytes then for specific RDBMSs Pro SQL Server Internals, PostgreSQL Internals, Oracle CORE, etc.

Reflect on how awesome practically every modern RDBMS is as a great technological achievement for mankind and how wonderful it is to be standing on the shoulders of giants. Roll your eyes a little bit whenever you overhear another twenty-something millenial fresh CS graduate who skipped every RDBMS elective bleat about NoSQL, Mongo, whatever. Try not to fly into murderous rage when another loud-mouthed know-nothing writes at length about how bad RDBMS technology is when they couldn't be bothered to learn the most basic requisite skills and knowledge to use one competently.

u/vededju · 1 pointr/Accounting

I taught myself how to use Essbase in 1 day using this book.

Learning essbase isn't difficult, you can pick it up pretty easily.

u/elliotbot · 3 pointsr/cscareerquestions

I second Kimball's The Data Warehouse Toolkit. Definitely be familiar with DS&A as well as SQL and big data concepts including window functions, pivots, aggregations, map-reduce, spark, etc.

I list some other resources and my study guide in my post here.

u/fhoffa · 1 pointr/bigquery

The best book around (a little outdated, but it comes straight from the engineers who made BigQuery):

u/welshfargo · 2 pointsr/Database

Informatica is widely used for ETL tools, but more important is understanding the challenges. How to design staging tables, what update strategy to use, how to design restartable jobs, and so on. This book refers to data warehousing, but the techniques are applicable to most scenarios.

u/phl12 · 1 pointr/datascience

Practical Statistics for Data Science is on sale right now on Amazon. Only $13!

u/story-of-your-life · 1 pointr/statistics

The book Practical Statistics for Data Scientists is worth a look. https://www.amazon.com/Practical-Statistics-Data-Scientists-Essential/dp/1491952962

u/camelrow · 19 pointsr/BusinessIntelligence

The Data Warehouse Toolkit by Kimball was recommended to me as "The Source" for DW. I just started reading it, so no experience yet.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition https://www.amazon.com/dp/1118530802/ref=cm_sw_r_cp_apa_i_LZ-7CbHQTXGRM

u/sathley90 · 2 pointsr/databases

Also The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data https://www.amazon.com/dp/0764567578/ref=cm_sw_r_cp_apa_lnUSAbDC5NK4X

u/thephilski · 2 pointsr/SQL

>data warehouse toolkit

Can you confirm that this is the book you are referring to? Amazon Link

u/Autoexec_bat · 4 pointsr/BusinessIntelligence

Assuming you've already read the Data Warehouse Toolkit? If not, do. http://www.amazon.com/dp/1118530802

u/muraii · 3 pointsr/datascience

Look up the DMBOK and Ralph Kimball’s The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling https://www.amazon.com/dp/1118530802/ref=cm_sw_r_cp_api_r4XMBbY0729K9 .