Best products from r/dataengineering
We found 10 comments on r/dataengineering discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 9 products and ranked them based on the amount of positive reactions they received. Here are the top 20.
1. Agile Data Science 2.0: Building Full-Stack Data Analytics Applications with Spark
- The Dart Rover retrieves a maximum of 100 darts plow guides are built to capture the most darts
- Includes a mesh bag to hold the darts
- Includes an easy-to-use and adjustable handle
- Non-slip wheels roll smoothly on flat surfaces; for indoor use only
- Frustration-free packaging: This item ships in simple, recyclable packaging that is easy to open.
Features:
2. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
- Princeton Architectural Press
Features:
3. The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition
- John Wiley Sons
Features:
4. Algorithms in a Nutshell (In a Nutshell (O'Reilly))
- Folding treadmill with 2.5-horsepower continuous-duty motor
- Adjustable MaxComfort cushioning system helps you recover quickly
- Speed range of 0.5 to 12 mph; incline range of 0 to 12 percent
- Includes Livetrack Interactive technology; 9 total workout programs
- 325-pound capacity; measures 36 x 60 x 70 inches (W x H x D) unfolded
Features:
5. Agile Data Warehouse Design: Collaborative Dimensional Modeling, from Whiteboard to Star Schema
- 155 mm in length.
Features:
6. SQL Queries for Mere Mortals: A Hands-On Guide to Data Manipulation in SQL (4th Edition)
- [Excellent Experience] Five 3.5mm female jacks are for attaching headphones or iPod devices with 3.5mm port. Whether you're listening to music from your smartphone or iPod or watching a movie on your tablet or laptop, the splitter's multiple auxiliary jacks allow you to connect up to five sets of headphones so you and your friends can share in the fun. Parents will love it for keeping the kids entertained on family road trips or during airport travel.
- [Classroom Aide] Mix music by connecting multiple music players simultaneously. Mixing and fade-ins controlled by the standard controls of each MP3 player. Teachers can use LP multi headphone splitter as a learning tool and make the most of educational resources: Up to five students at a time can listen to the same audiobook or podcast.
- [Universal Compatibility] One hard-wired connection is for your MP3 player. Fit for all 3.5mm audio devices: Compatible with iPhone, iPod, Samsung, Blackberry, Tablets, Mp3, laptops.
- [Handy and Excellent] The lightweight, small, convenient and compact Multi-Headphone Splitter from LP offers you a multi-user audio experience wherever you go. The fashionable appearance and portable size ensure you carry it everywhere you go hanging in handbags or pocket.
- What you get: 1 x 3.5mm Male to 5 x 3.5mm Female audio splitter, 1x600mm stereo audio cable, 1-Year Warranty & lifetime service.
Features:
https://www.amazon.com/Data-Warehouse-Toolkit-Complete-Dimensional/dp/0471200247
The is a highly recommended book for the Data Warehouse industry. Hope you enjoy it and good luck.
I'm reading this book now:
https://www.amazon.com/Agile-Data-Science-2-0-Applications/dp/1491960116
And ok it's already 2 years old but it is amazing, it depicts the complete agile data science process while using, kafka, spark (core, streaming, sql, mlib), airflow, elasticsearch, mongodb, scikitlearn, d3js and how to improve and deploy your pipeline.
The most important reading from a database design perspective, IMO, is one of Kimball’s books:
https://www.amazon.com/Data-Warehouse-Toolkit-Definitive-Dimensional/dp/1118530802
It’s less technically focused, and more focused on how to build good datasets. It’s an older text so it’s references to specific technologies are a bit out of date, but when it comes to describing how to design particular schemas (or at least speak the language of people who design schemas), it’s pretty much canon.
I attended UCI for CS, and am going through the process of masters right now. I'm a data engineer / data platform engineer at a startup, and have been doing it for ~2 years or so. I find that the traditional CS knowledge is a tool belt that you don't necessarily *need* to get through industry.
​
There are a lot of really good algorithm books out there, O'Reilly has Algorithms In A Nutshell which does talk about O notation, and then a walk through of some basic data structures and algorithms (Linked list, trees, sorting). DS and Algos are really like the *core* CS things that one would need. Some community colleges offer these courses, which might be better depending on your circumstance.
​
The upper division classes are useful I think. I took a few classes on distributed systems and computer architecture which have been insurmountable. I took a class on databases (useful I suppose but meh), some classes on machine learning and artificial intelligence and operating systems. Those have become more useful now that I'm doing data platform work.
​
All that being said, I think the only disadvantages you have are the terminology ("This will give o(nlogn) lookup while retaining referential integrity") and the boxes to tick. Terminology though you can learn. The boxes to tick though might be tougher. I think some companies will be really stingy about that stuff. You did say that you have an undergraduate education though so I don't think that will matter.
Definitely. We actually used that book for my Business Intelligence masters course in my MIS program. I met a BI manager hiring for a data engineering role and she recommended the following text as well. The content was pretty similar as they focus on the Kimball method but goes over BEAM*, which is a requirements gathering framework for designing data warehouses.
https://www.amazon.com/Agile-Data-Warehouse-Design-Collaborative/dp/0956817203/ref=sr_1_1?s=books&ie=UTF8&qid=1511661160&sr=1-1&keywords=agile+data+warehouse+design
Re-iterating what the previous posters said: the fundamentals are the same regardless of system. Learning how to get data out of a SQL system is all about learning how to write SQL.
To effectively learn how to write SQL for data engineering, I highly recommend grabbing a book like one of these*:
and grabbing a sample database for the system of your choice:
and then practice some of your chosen book on the sample db.
Notes and words of warning:
^((*I'm not affiliated w/ any of those books))
No, this one: Learning Spark: Lightning-Fast Big Data Analysis https://www.amazon.com/dp/1449358624/ref=cm_sw_r_cp_apa_i_dav3DbS0DXT51