(Part 2) Best products from r/datascience
We found 42 comments on r/datascience discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 205 products and ranked them based on the amount of positive reactions they received. Here are the products ranked 21-40. You can also go back to the previous section.
21. Database Internals: A Deep Dive into How Distributed Data Systems Work
- HIGH RESOLUTION 4K DISPLAY - easily connect a HDMI or DisplayPort monitor for crystal-clear, high-resolution display from your laptop screen. Supports 1 display at a time
- USB-C POWER DELIVERY - enhance charging capabilities with USB-C PD standard, capable of multi-directional power flow (up to 60W), efficient power usage and best of all, less power adapters required
- EXPANSION PORTS - adds Gigabit Ethernet, 3 USB 3.0 ports and micro/SD card reader slots for fast and convenient access to all your devices
- SLEEK, MODERN DESIGN - features a functional yet sleek design to suit your busy lifestyle and complement any work space
- DESIGNED FOR TYPE-C DEVICES - 2020/2019/2018/2017/2016 MacBook Pro, 2020/2018 MacBook Air, 2020/2018 iPad Pro, 2019/2017 iMac, iMac Pro, 2015/2016/2017 MacBook, Microsoft Surface Laptop 3/Surface Pro 7/Go, Google PixelBook Go, ChromeBook (supports DP over USB-C Alt mode), Samsung Galaxy Tab Pro S, HP Spectre Convertible, Razer Blade, Huawei Matebook and more USB-C devices with Power Delivery protocol
Features:
24. Automate the Boring Stuff with Python: Practical Programming for Total Beginners
- No Starch Press
Features:
25. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
- O Reilly Media
Features:
26. Deep Learning with Python
- Care instruction: Keep away from fire
- It can be used as a gift
- It is made up of premium quality material.
Features:
27. Bayesian Methods for Hackers: Probabilistic Programming and Bayesian Inference (Addison-Wesley Data & Analytics) (Addison-Wesley Data & Analytics)
Addison-Wesley Professional
28. Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
- Dey Street Books
Features:
29. Cracking the Coding Interview: 189 Programming Questions and Solutions
- Careercup, Easy To Read
- Condition : Good
- Compact for travelling
Features:
30. Data Mining: Practical Machine Learning Tools and Techniques (The Morgan Kaufmann Series in Data Management Systems)
- Green candy perfect for theme parties like high school pep rallys, parades, birthday parties, Weddings, candy buffets, baby showers, and more!
- Great value for a Giveaway to trick or treaters at Halloween
- Frooties are only 11 calories per piece, and are kosher, Gluten Free, and peanut free.
- Contains 1 bag of 360 pieces of Green apple frooties
- This bag contains only Green apple flavored frooties but make sure to try all 10 frooties flavors including: fruit Punch, grape, strawberry, Green apple, Blue Raspberry, Watermelon, Strawberry Lemonade, lemon lime, cherry Limeade, and mango!
Features:
32. Search and Optimization by Metaheuristics: Techniques and Algorithms Inspired by Nature
33. Pandas Cookbook: Recipes for Scientific Computing, Time Series Analysis and Data Visualization using Python
- 【KEEP SWEAT OFF EYES & FACE】- Don’t be distracted from dripping sweat! The double layer sweat absorbing headband literally keeps the sweat out of your eyes so you won’t have to wipe any sweat during the training.
- 【LIGHT AS FEATHER, SOFT AS COTTON]】-Made of high quality 95% Polyester & 5% Spandex, the headband weighs ONLY 10g, and it gives you a nice, soft , comfortable sense of touch. Wear this headband for exercise, workout or training and you’ll feel like no headband on.
- 【COMFORTABLE & NO DEFORMATION】- The elastic pullover headband is very easy to get on and off while providing maximum comfort. The durable material of the headband won’t lose shape or stretch out easily.
- 【STYLISH AND FUNCTIONAL】- The adjustable size fits most people and can fit underneath a helmet if you’re going to use it for cycling. Widely used in all sports- running, work out, gym, tennis, soccer, basketball and so on, or even as an accessory to your outfit
- 【WHAT YOU GET】- 12-month Guarantee & COOLOO Sports Headbands x 2. If you find any problem of our products, please contact us via E-mail without hesitation. We will solve the problem of our product in the shortest time possible.
Features:
34. Mind Over Mood: Change How You Feel by Changing the Way You Think
- Evidenced Based Material
Features:
35. The Wall Street Journal Guide to Information Graphics: The Dos and Don'ts of Presenting Data, Facts, and Figures
- W W Norton Company
Features:
36. Storytelling with Data: A Data Visualization Guide for Business Professionals
- Wiley
- Language: english
- Book - storytelling with data: a data visualization guide for business professionals
Features:
37. Lenovo ThinkPad P51 Mobile Workstation Laptop - Windows 10 Pro - Intel Xeon E3-1505M, 64GB RAM, 1TB SSD, 15.6" FHD IPS 1920x1080 Display, NVIDIA Quadro M2200M 4GB GPU
- Intel Xeon E3-1505M v6 (8M Cache, up to 4.0 GHz) - 1TB Solid State Drive - 64GB PC4-19200 DDR4 SDRAM, 2400MHz SODIMM
- 15.6" Full HD IPS (1920x1080) Anti-Glare, Non-Touch Display - NVIDIA Quadro M2200M Discrete Graphics with 4GB VRAM - Stereo Speakers
- Built in HD 720p Webcam with Dual noise-cancelling Microphones - Intel Dual Band Wireless-AC (2x2) 8265, Bluetooth Version 4.1 - Fingerprint Reader
- 4x USB 3.1 ports (one Always On), HDMI 1.4b, mini-DisplayPort 1.2a, 10/100/1000 Gigabit Ethernet (RJ-45), USB Type-C / Thunderbolt 3, Dock Connector, 4-in-1 card Reader (SD/SDHC/SDXC), Dock Connector, Headphone/Mic Combo
- Windows 10 Pro 64-Bit - ThinkPad Precision Spill-Resistant, Backlit Keyboard with Full NumberPad / 6-cell Li-Polymer (90Whr) External Battery / 170-Watt AC-Adapter
Features:
38. New 2018 Lenovo ThinkPad P52 Workstation Laptop - Windows 10 Pro - Intel Hexa-Core i7-8850H, 64GB RAM, 4TB SSD, 15.6" FHD IPS 1920x1080 Display, NVIDIA Quadro P1000 4GB
- Intel Core i7-8850H (9M Cache, up to 4.3 GHz) - 4TB Solid State Drive - 64GB PC4 DDR4 SDRAM, 2400MHz SODIMM
- 15.6" Full HD IPS (1920x1080) Anti-Glare, Non-Touch Display - NVIDIA Quadro P1000 Discrete Graphics with 4GB VRAM - Stereo Speakers
- Built in HD 720p Webcam with Dual noise-cancelling Microphones - Intel Dual Band Wireless-AC (2x2) 9560, Bluetooth Version 5.0 - Fingerprint Reader
- 3x USB 3.1 Gen 1 (one Always On), 2x USB Type-C / Thunderbolt 3, Mini DisplayPort 1.4, HDMI 2.0, Ethernet (RJ-45), headphone / microphone combo jack, 4-in-1 reader (MMC, SD, SDHC, SDXC), side docking connector, security keyhole
- Windows 10 Pro 64-Bit - ThinkPad Precision Spill-Resistant, Backlit Keyboard with Full NumberPad / 6-cell Li-Polymer (90Whr) External Battery / 170-Watt AC-Adapter
Features:
39. The Visual Display of Quantitative Information
- Data Rate: 10G
- Wavelength: 850-nm; Reach: up to 300m
- Fiber Type: Dual LC OM3/OM4 multi-mode fiber
- Compatible with Cisco SFP-10G-SR, Meraki MA-SFP-10GB-SR, Ubiquiti UF-MM-10G, Mikrotik, Fortinet, D-Link, Supermicro and More
- We are a professional manufacturer and accept customized orders. If necessary, please contact us for customized SKU to meet your needs. Products will be shipped from China.
Features:
Hello, I am an undergrad student. I am taking a Data Science course this semester. It's the first time the course has ever been run so it's a bit disorganized but I am very excited about this field and I have learned a lot on my own.I have read 3 Data Science books that are all fantastic and are suited to very different types of classes. I'd like to share my experience and book recommendations with you.
Target - 200 level Business/Marketing or Science departments without a programming/math focus.
Textbook - Data Science for Business https://www.amazon.com/gp/product/1449361323/ref=ya_st_dp_summary
My Comments - This book provides a good overview of Data Science concepts with a focus on business related analysis. There is very little math or programming instruction which makes this ideal for students who would benefit from an understanding of Data Science but do not have math/cs experience.
Pre-Reqs - None.
Target - 200 level Math/Cs or Physics/Engineering departments.
Textbook -Data Mining: Practical Machine Learning Tools and Techniques https://www.amazon.com/gp/aw/d/0123748569/ref=pd_aw_sim_14_3?ie=UTF8&dpID=6122EOEQhOL&dpSrc=sims&preST=_AC_UL100_SR100%2C100_&refRID=YPZ70F6SKHCE7BBFTN3H
My comments: This book is more in depth than my first recommendation. It focuses on math and computer science approaches with machine learning applications. There are many opportunities for projects from this book. The biggest strength is the instruction on the open source workbench Weka. As an instructor you can easily demonstrate data cleaning, analysis, visualization, machine learning, decision trees, and linear regression. The GUI makes it easy for students to jump right into playing with data in a meaningful way. They won't struggle with knowledge gaps in coding and statistics. Weka isn't used in the industry as far as I can tell, it also fails on large data sets. However, for an Intro to Data Science without many pre-reqs this would be my choice.
Pre-Req - Basic Statistics, Computer Science 1 or Computer Applications.
Target - 300/400 level Math/Cs majors
Textbook - Data Science from Scratch: First Principles with Python
http://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X
My comments: I am infatuated with this book. It delights me. I love math, and am quickly becoming enamored by computer science as well. This is the book I wish we used for my class. It quickly moves through some math and Python review into a thorough but captivating treatment of all things data science. If your goal is to prepare students for careers in Data Science this book is my top pick.
Pre-Reqs - Computer Science 1 and 2 (hopefully using Python as the language), Linear Algebra, Statistics (basic will do, advanced preferred), and Calculus.
Additional suggestions:
Look into using Tableau for visualization. It's free for students, easy to get started with, and a popular tool. I like to use it for casual analysis and pictures for my presentations.
Kaggle is a wonderful resource and you may even be able to have your class participate in projects on this website.
Quantified Self is another great resource. http://quantifiedself.com
One of my assignments that's a semester long project was to collect data I've created and analyze it. I'm using Sleep as Android to track my sleep patterns all semester and will be giving a presentation on the analysis. The Quantified Self website has active forums and a plethora of good ideas on personal data analytics. It's been a really fun and fantastic learning experience so far.
As far as flow? Introduce visualization from the start before wrangling and analysis. Show or share videos of exciting Data Science presentations. Once your students have their curiosity sparked and have played around in Tableau or Weka then start in on the practicalities of really working with the data. To be honest, your example data sets are going to be pretty clean, small, and easy to work with. Wrangling won't really be necessary unless you are teaching advanced Data Science/Big Data techniques. You should focus more on Data Mining. The books I recommended are very easy to cover in a semester, I would suggest that you model your course outline according to the book. Good luck!
I used R for about 4 years before I moved to Python to use it for deep learning. I have been using Python for about 2 years now.
>Are R and Python considered redundant, or are there some situations where one will be preferred over the other? If I become proficient at using Python for data wrangling, analysis, and visualization, will I have any reason to continue using R?
It depends. I haven't really found anything that I can do in Python that I could not already do in R. I still use R because I like it better as a functional programming language and because it has a wide variety of more specific statistical packages (many for biology) that are just not available for Python yet. There are some specific cases where I just find it more intuitive and simpler to implement a solution in R. And generally, I just prefer ggplot2 over any of the various Python plotting packages. Also, R has high level API for things like TensorFlow so it's not like you can't do deep learning in R.
The biggest advantage for Python is its speed and ability to work within a larger programming framework. A lot of companies tend to use Python because the models they build are integrated into a larger system that needs the capabilities of a fully-fledged programming language. Python is generally faster and has better management of big data sets in memory. R is actually moving more in the direction to fix these issues but there are still limitations.
>Where should I start? I'm looking for a resource that isn't aimed at complete beginners, since I've been using R for a few years, and took a C class before that. At the same time I wouldn't claim to be an experienced programmer. I'm interested in learning Python both for data analysis and for general programming.
I learned Python syntax using Learn Python 3 the Hard Way. I learned about Pandas and data wrangling etc using Pandas for Everyone and Pandas Cookbook. If I was to suggest just one book, it would be Pandas for Everyone. You can learn Python syntax from YouTube, MOOCs, or online tutorials. The Pandas Cookbook is just extra practice. To be honest though, the general conventions used by Pandas for data analysis and manipulation are very similar to R in many ways. Especially if you've used anything in Hadley Wickham's Tidyverse. Finally, I made a Pandas cheatsheet while I was learning and including equivalent R functions in some places. I would be happy to share this Google Sheets file with you if you are interested.
>What IDE(s) should I use, and what are some must learn packages? I'm hoping to find something similar to RStudio.
I started off using PyCharm. I've heard good things about Spyder. But now, I actually still use RStudio! It is fully integrated with Python thanks to the Reticulate package. You can pass data structures between the languages and use both in RMarkdown. You can also use virtual environments which are popular with Python. Once you install the package:
library(reticulate)
use_virtualenv("path_to_my_virtual_env") # Start virtual environment
You can now run Python scripts directly in the RStudio console
It's really easy to use and even comes with auto-complete and everything else.
Hope that helped.
Hey, DE here with lots of experience, and I was self taught. I can be pretty specific about the subfield and what is necessary to know and not know. In an inversion of the normal path I did a mid career M.Sc in CS so it was kind of amusing to see what was and was not relevant in traditional CS. Prestigious C.S. programs prepare you for an academic career in C.S. theory but the down and dirty of moving and processing data use only a specific subset. You can also get a lot done without the theory for a while.
If I had to transition now, I'd look into a bootcamp program like Insight Data Engineering. At least look at their syllabus. In terms of CS fundamentals... https://teachyourselfcs.com/ offers a list of resources you can use over the years to fill in the blanks. They put you in front of employers, force you to finish a demo project.
Data Engineering is more fundamentally operational in nature that most software engineering You care a lot about things happening reliably across multiple systems, and when using many systems the fragility increases a lot. A typical pipeline can cross a hundred actual computers and 3 or 4 different frameworks.doesn't need a lot of it. (Also I'm doing the inverse transition as you... trying to understand multivariate time series right now)
I have trained jr coders to be come data engineers and I focus a lot on Operating System fundamentals: network, memory, processes. Debugging systems is a different skill set than debugging code, it's often much more I/O centric. It's very useful to be quick on the command line too as you are often shelling in to diagnose what's happening on this computer or that. Checking 'top', 'netstat', grepping through logs. Distributed systems are a pain. Data Eng in production is like 1/4 linux sysadmin.
It's good to be a language polyglot. (python, bash commands, SQL, Java)
Those massive java stack traces are less intimidating when you know that Java's design encourages lots of deep class hierarchies, and every library you import introduces a few layers to the stack trace. But usually the meat and potatoes method you need to look at is at the top of a given thread. Scala is only useful because of Spark, and the level of Scala you need to know for Spark is small compared to the full extent of the language. Mostly you are programatically configuring a computation graph.
Kleppman's book is a great way to skip to relevant things in large system design.
https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321
It's very worth understanding how relational databases work because all the big distributed systems are basically subsets of relational database functionality, compromised for the sake of the distributed-ness. The fundamental concepts of how the data is partitioned, written to disk, caching, indexing, query optimization and transaction handling all apply. Whether the input is SQL or Spark, you are usually generate the same few fundamental operations (google Relational Algebra) and asking the system to execute it the best way it knows how. We face the same data issues now we did in the 70s but at a larger scale.
Keeping up with the framework or storage product fashion show is a lot easier when you have these fundamentals. I used Ramakrishnan, Database Management Systems. But anything that puts you in the position of asking how database systems work from the inside is extremely relevant even for "big data" distributed systems.
https://www.amazon.com/Database-Management-Systems-Raghu-Ramakrishnan/dp/0072465638
I also saw this recently and by the ToC it covers lots of stuff.
https://www.amazon.com/Database-Internals-Deep-Distributed-Systems-ebook/dp/B07XW76VHZ/ref=sr_1_1?keywords=database+internals&qid=1568739274&s=gateway&sr=8-1
But to keep in mind... the designers of these big data systems all had a thorough grounding in the issues of single node relational databases systems. It's very clarifying to see things through that lens.
edit Supposedly this guy is OG in data science. http://www.datasciencecentral.com/profiles/blogs/hitchhiker-s-guide-to-data-science-machine-learning-r-python
My friend has a bio background and doing well as a data scientist consultant. I wouldn't shy away with a lack of math.
I'm still an amateur, so take this with a grain of salt.
I'd also like to share my strategy for learning data science so far.
I have a math background, which is useful but not required. Knowing linear algebra and differential equations, some analysis stuff is useful for developing a deeper intuition into how the machine is learning, but not necessary. IMO data science is a life long journey as it can be applied to many fields. It may be useful to learn more math later on as it get's deeper, but surface level knowledge should suffice.
For linear algebra, I've found the first lecture to be the most useful. http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/video-lectures/ It basically describes how we can translate lines into vectors and find solutions. It may be useful for continue learning, but in the beginning I believe surface understanding should suffice. If you're looking to build new data analytic tools, understanding the maths at depth is a must. But if your goal is to apply the tools already in existence, you can get by with a brief understanding.
For example, I have a weak statistics background; for the things I don't know I look them up on wikipedia, various sites, etc. The goal is not necessarily to learn the material as you would for an exam, but to develop a broader understanding of what the material is and how it is relates to machine learning. When I read this material I probably retain only 5-15% of the information, but I read enough to let me move on. Never get stuck on one piece of information for too long. I've found if I get stuck, I can move on and the brain just kind of figures out how it fits into the puzzle.
With your background Andrew Ng's course on coursera https://www.coursera.org/learn/machine-learning should be suitable.
I watch these videos only once on 2x speed. My goal is not to retain the information but to index it. Much of what is useful will be learned by practice, by watching the videos on 2x it's like skimming a text. It allows you to index, that way you know where to look if you need greater depth in the future. For example, you don't have to memorize the cost function, but it's important to know why the cost function is constructed the way it is, and what it's use is.
I then supplement by reading this: http://neuralnetworksanddeeplearning.com/
and doing these problems http://www.cs.cmu.edu/~tom/10601_fall2012/hws.shtml
This is the most useful resource I've found tbh:
http://www.kdnuggets.com/
I have a weak programming background, so for learning python I've found this text useful for practice and learning the language: https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994?ie=UTF8&Version=1&entries*=0
This text is very basic, useful in general if you don't have a compsci/compeng background, but doesn't have direct applications for data science. For a more data focus wrt python: https://www.coursera.org/specializations/python . You do not have to pay for any of these courses. Just search for the specific course and enroll, for example, https://www.coursera.org/learn/python-data
That's pretty much where I'm at.
I believe the most important thing is to train our brains to think as the machine would. It's important to utilize our intuition and natural parallel abilities of the brain, as ultimately these are the techniques we are attempting to replicate.
Whoa, there. Healthcare data scientist here, mainly working in areas like clinical epidemiology and with a background in health services research and pharmacoepidemiology.
First, kudos for having questions and reaching out for help. This is my opinion, but health care is different from other sectors. The work you do has the potential to affect people in visceral, fundamentally life-changing ways...such as recommending a patient should or should not get treatment. Or a patient should or should not be placed on end-of-life-care...that a life-threatening complication is or is not related to a pharmaceutical on the market. Point just being - I think this sector carries responsibility that many other sectors don't.
Second, are you at a pharmaceutical/related organization? If so, there should be qualified biostatisticians/epidemiologists/psychometricians/health economist/something similar to sit down with you and help you figure you this out.
Third, you said you study 'data science and knowledge engineering', but I'm not sure what your curriculum consists of - do you study causal inference? If you don't, it's the most important topic you need to be familiar with (not competent, mind you). Here are several references that could get you familiar with identifying and dealing with bias and confounding, and designing experiments to assess causal relationships instead of just association. In healthcare you have to know when a question warrants a causal analysis vs a predictive or associative one. If a causal analysis is needed, an epidemiologist or biostatistician might likely do that work, but it certainly helps to know what a DAG is and how to read one.
https://www.amazon.com/Epidemiology-Introduction-Kenneth-J-Rothman/dp/0199754551
https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/
Fourth, I'm hesitant to suggest anything about your dataset, because I still only have a rough idea of the details. Also, it sounds like you've got a psychometric dataset, and I've never studied psychometrics. I will say, though, that the question (hypothesis) being asked should really drive the analytic approach. Is the goal to look at a homogenous population and find that there's something about that causing them to require or be adherent to treatment? Do those results then need to get applied to a diverse, heterogenous population? That's a very high bar to achieve for experimental purposes. Is it enough to look at some data and say that certain characteristics are associated or predictive of certain outcomes? That's a much lower bar from an experimental standpoint and probably an analytic standpoint, too. If there is a selection bias, I think that's only relevant if there's a desire to extrapolate the study results to a different population. As you point out, if the desire is to generalize results to a larger population it's likely a significant problem that would require a intentional experimental design to address. If the company you're working with doesn't recognize this or can't have a qualified person explain why it's not a study design problem, you're working with bad people that likely don't know what they're doing. I've colloborated with several software/'health analytic' companies and startups that are like this, and it's why I'm dis-trustful of all health analytic software until proven.
Hope this helps!
​
Fellow NLP'er here! Some of my favorites so far:
I did now. Any way of getting a sticky/wiki/FAQ of useful materials /common questions for noobs like me? People can vote/review books and MOOC's / Kaggle competitions, and what was the best for them. Give us newbies something to get started on so we don't have to flood the sticky. Then gives more of a community support rather than one person's suggestion.
For instance
Applied Predictive Modeling
or the less theory version
Intro to Statistical Learning were two books that helped me with understanding statistical models and had applications and exercises in R
R for Data Science was decent enough and had updated packages for making tidy data.
I found the Data Science Coursera Specialization decently useful, but didn't go deep enough. It did give me enough of a taste to know this is the direction I want my career to go in. So I'm hesitant to do more MOOCs.
I also don't have experience in Data Science hiring, but have it for consulting/actuarial. I'd be happy to help critique resumes during my free time for all the graduating students.
The pymc3 documentation is a good place to start if you enjoy reading through mini-tutorials: pymc3 docs
Also these books are pretty good, the first is a nice soft introduction to programming with pymc & bayesian methods, and the second is quite nice too, albeit targeted at R/STAN.
Jose Portilla on Udemy has some good python based courses (and also frequents this subreddit). There's regularly sales or some sort of coupon code available to get any of the courses for $10-$15, so it's very reasonable.
For books:
https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1491957662/ref=asap_bc?ie=UTF8 ... it's not out yet, but due any day. You can also get preview access on sites like Safari Online (which would also have all the books below).
https://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X/ref=sr_1_1
For general python:
https://www.amazon.com/Python-Crash-Course-Hands-Project-Based/dp/1593276036/ref=sr_1_1
https://www.amazon.com/Automate-Boring-Stuff-Python-Programming/dp/1593275994/ref=sr_1_1
No Starch Press, OReilly, APress and Manning generally have pretty good quality publications. I'd usually skip anything from Packt, unless it's specifically received good reviews.
Hi /r/datascience. I'm an aspiring data scientist and I'm trying to put together a data science course that's self-taught and can be done on one's time. Any pointers would be appreciated.
Section A: Foundations in Mathematics
Section B: Foundations in Computer Science
Section C: Basic Data Science
Section D: Advanced Data Science
These are the courses/subjects I've gathered would be most important or useful for someone trying to learn data science. Below are the resources that can be used to learn these subjects.
Section A Resources
Khanacademy - General Calculus, Linear Algebra
PatrickJMT - General Calculus, Linear Algebra
Professor Leonard - Calculus I, Calculus II, Calculus III, Statistics
MIT OpenCourseWare - Single Variable Calculus (I/II), Multivariable Calculus (III), Linear Algebra, Statistics, Probability Theory/Bayesian Statistics
Harvard - Probability Theory/Bayesian Statistics
Section B Resources
Datacamp
Dataquest
Codeacademy
Code School
LearnPython.Org
Kaggle
Udemy
Udacity
Rmotr
Section C Resources
University of Michigan - Introduction to Data Science in Python
Harvard CS109 - Introduction to Data Science
R for Data Science
Section D Resources
Andrew Ng's Machine Learning
Jose Portilla's Python for Data Science and Machine Learning
Andrew Ng's Deep Learning Series
Am I missing any important courses, free or otherwise? Any important books? Any concepts I'm completely forgetting about?
I've been told this is missing real education in science itself. How can I incorporate that?
Exactly what it sounds like. They're going to be testing your ability to design a clean and efficient solution to a problem.
You don't need to come up with the "correct" solution. They're going to be more interested in how you think through the problem, your communication skills, etc.
I highly recommend [this] (https://www.amazon.com/Cracking-Coding-Interview-6th-Programming/dp/0984782850/ref=sr_1_1?ie=UTF8&qid=1465591474&sr=8-1&keywords=Cracking+the+coding+interview) book.
One of my favorite not-super-technical books that can give some insights into the thought process and actionability of analytics and machine learning is "Everybody Lies". https://www.amazon.com/Everybody-Lies-Internet-About-Really/dp/0062390856
It touches on a concept I really like to rely on data for which is revealed vs. stated intent. People tell you they want what they wish they wanted. Data tells you what they actually want and how they actually behave. There are some good intuitive regression models in there as well.
I'd focus on:
Big Data stuff with pyspark or dask would probably be a boon too. MongoDB is also pretty easy to pick-up.
Awesome list! I'm a software engineer looking to make the jump over to data science, so I'm just getting my feet wet in this world. Many of these books were already on my radar, and I love your summaries to these!
One question: how much is R favored over Python in practical settings? This is just based off of my own observation, but it seems to me that R is the preferred language for "pure" data scientists, while Python is a more sought-after language from hiring managers due to its general adaptability to a variety of software and data engineering tasks. I noticed that Francois Chollett also as a book called Deep Learning with Python , which looks to have a near identical description as the Deep Learning with R book, and they were released around the same time. I think its the same material just translated for Python, and was more interested in going this route. Thoughts?
​
If you want to be valuable to companies post graduation you should learn more about programming (design templates, how to write tests, how to go from a paper to code). I recommend this book as a good starting place. Once you're comfortable with how the different methods work, pick up this book.
The "bible" is "The Grammar of Graphics" by Leland Wilkinson. (link to amazon). The "gg" of ggplot2 stands for grammar of graphics.
Then we go into other books, resources that help with actually showing visualizations:
Then we can look at the "Table of Elements of Data Visualization":
Then, we can look at some blogs to help you see what works and doesn't work:
Finally, some blog posts about other people in data visualization that you can learn from:
From my perspective (optimization and operations research) I say two topics you mention are not like the others: simulated annealing and genetic algorithms. I've implemented these to solve computationally intractable combinatorial optimization problems and used this textbook for a heuristic optimization course:
https://www.amazon.com/Search-Optimization-Metaheuristics-Techniques-Algorithms-ebook/dp/B01JEJNT3M
It's a fine text that goes into a lot of detail for tons of metaheuristics you'll encounter in the wild including simulated annealing and genetic algorithms.
​
I found this text on amazon that covers SVM, NN, reinforcement learning, and decision trees along with a chapter on using genetic algorithms for machine learning. I was also able to find it on libgen and it looks like a nice introduction to the subjects.
https://www.amazon.com/Introduction-Machine-Learning-Miroslav-Kubat/dp/3319639129/ref=sr_1_5?ie=UTF8&qid=1537941157&sr=8-5&keywords=genetic+algorithms+neural+network+machine+learning
Otherwise there are plenty of papers on the subjects that introduce these topics.
I can’t recommend highly enough 3 books on good visualizations in business (and everywhere else)
Report format for abstract/methods/etc vs PowerPoint for salespeople varies dramatically from company to company, so I don’t have any good recommendations there. But in the “a picture is worth a thousand words” world, visualizations really matter.
I don't know your situation at all but to me it sounds to me like a classic case of excessive anxiety. It also seems like you're already on your way by acknowledging some of your thoughts are the product of a high sensitivity to anxiety and are not based in reality. I've done CBT for years and its helped me to realize the key isn't to try and reason through your feelings. Mostly likely, many of the assumptions you make are false and a product of excessive worrying -- the issue often lies with false beliefs which are a product of high-anxiety sensitivity. There are great books out there to help you build some meta-thinking techniques to challenge false beliefs.
https://www.amazon.com/Mind-Over-Mood-Change-Changing/dp/0898621283
​
Just from what you've said I bet you're overestimating the risk of being fired. It sounds like you've only done a good job so far, you've been given more responsibility because other people think you can handle it. Even if that faith is misplaced (which seems unlikely) its hard to imagine how a screw up in this context will make it hard for you to find another job.
Many thanks about kaggle tip and book!
Despite your tip about book (that I dont known), I'd like to recommend the DATA MINING from the authors of Weka, a very good book too.(http://www.amazon.ca/Data-Mining-Practical-Learning-Techniques/dp/0123748569/ref=sr_1_1?s=books&ie=UTF8&qid=1425389007&sr=1-1&keywords=data+mining)
Also, does not really matter, ergonomics and even mid-range performance (with a bit more Ram) & possibly a reasonable gpu should do it. In general I am very pleased with lenovos over quite some time, working as data analyst.
If I could crab money I would get something like this:
https://www.amazon.com/Lenovo-ThinkPad-P51-Mobile-Workstation/dp/B075TGMNXQ/ref=sr_1_3?ie=UTF8&qid=1541287308&sr=8-3&keywords=Lenovo+ThinkPad+P51+64+gb
​
https://www.amazon.com/2018-Lenovo-ThinkPad-Workstation-Laptop/dp/B07KBBHM8N/ref=sr_1_4?keywords=lenovo+thinkpad+p52+64gb&qid=1555855604&s=gateway&sr=8-4
Why not the performance of a lightweight server for on the go?! \^\^
But then again, you can get that in the cloud too
Great guide for building crisp, clear data visuals: https://www.amazon.com/Street-Journal-Guide-Information-Graphics/dp/0393347281
Story telling, try this structure: https://www.richardhare.com/2007/09/03/the-minto-pyramid-principle-scqa/
I hope you get better with respect to the mental health. You didn't ask for it but I would recommend this book https://www.amazon.com/Guide-Good-Life-Ancient-Stoic-ebook/dp/B0040JHNQG that helped me in the past.
​
I don't think leaving looks bad. And I can understand that it can sometimes be hard to do something you don't want to do. But you said you're in your fifth year so I assume you've done a considerable amount of relevant research work and should be close to getting your PhD. It might be a good idea to take a break (if that's possible) and just come back after a year or two to finish it. Because a PhD is still a huge deal in my opinion.
​
Life is very long and a year or two spent not working towards your life goal (whatever that may be) isn't a huge deal. Maybe take that adjunct job. Teaching shouldn't be hard for you and it will pay the bills. And just spend the rest of the time chilling and reading.
Not mathematical, but Storytelling with Data: A Data Visualization Guide for Business Professionals https://www.amazon.com/dp/1119002257/ref=cm_sw_r_cp_apa_i_WhB.AbRPZ14ET
Is a good start to communicating results and really easy to understand. Almost mind blowing how much I was missing previously.
This might be what you are looking for, The Visual Display of Quantitative Information By Edward Tufte. The book is a little older, but the principles still stand, and it is considered a pretty seminal work for data visualization.
Essential for Data Visualization: The Visual Display of Quantitative Information by Edward R. Tufte.