(Part 2) Best products from r/OMSCS

We found 2 comments on r/OMSCS discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 22 products and ranked them based on the amount of positive reactions they received. Here are the products ranked 21-40. You can also go back to the previous section.

Top comments mentioning products on r/OMSCS:

u/QuisUt-Deus · 1 pointr/OMSCS

This has been my first semester of the program, so I can't speak in general, but from the 2 courses I have done:

  1. CCA - the Udacity videos could be considered as a primer / introduction to the respective topics. In addition to the videos we studied corresponding parts of 2 classical textbooks (https://www.amazon.com/Introduction-Algorithms-3rd-MIT-Press/dp/0262033844/ and https://www.amazon.com/Introduction-Theory-Computation-Sipser). Have a look into the textbooks, especially the problems after each chapter to have a glimpse of difficulty of problems solved in class. The real meat of the class were the problem sets (1 PS each week) with several quite difficult problems to solve. A grade was based on 5 exams (every 2 weeks) - each of the exams having 3 problems of comparable difficulty to solve in 90 mins (more or less) - which should prove student's mastery of the subject.
  2. CN - the Udacity lectures constitute a skeleton of the class, which is supplemented by a meat of more then dozen of scientific papers related to the studied topics and 8 projects (half of them programming, half of them reproducing some research, doing an experiment and writing a short paper with observations). Grade is based on 3 proctored exams covering Udacity lectures and mandatory reading material (the papers) and 8 projects.
    So far I can conclude that difficulty/rigor and time required are substantially higher than just watching the Udacity videos and clicking through somewhat banal in-lecture quizzes.
    You can get some idea by looking at www.omscentral.com - there are class reviews and time requirements estimates (based on the student's experiences).
    I spent in average at least 5-7 hrs/week by CCA (weeks before the exams were more intense, others more relaxed) and ca. 2-3 hrs/week by CN. However please note that time commitment vary according to previous experience, math and CS (I don't meen SW engineering) background.
    When comparing plain Udacity with real OMSCS program - access to profs, TAs and mutual discussions with classmates make a HUGE difference in learning value.
u/Bambo222 · 5 pointsr/OMSCS

I can offer my two cents. I’m a Googler who uses machine learning to detect abuse, where my work is somewhere between analyst and software engineer. I’m also 50% done through the OMSCS program. Here’s what I’ve observed:

Yes, Reinforcement Learning, Computer Vision, and Machine Learning are 100% relevant for a career in data science. But data science is vague; it means different things depending on the company and role. There are three types of data science tasks and each specific job may be weighted more heavily in one of these three directions: (1) data analytics, reporting, and business intelligence focused, (2) statistical theory and model prototyping focused and (3) software engineering focused by launching models into production, but with less empathsis on statistical theory.

I've had to do a bit of all three types of work. The two most important aspects are (1) defining your problem as a data science/machine learning problem, and (2) launching the thing in a distributed production environment.

If you already have features and labeled data, you should be able to get a sense of what model you want to use within 24 hours on your laptop based on a sample of the data (this can be much much harder when you can't actually sample the data before you build the prod job because the data is already distributed and hard to wrangle). Getting the data, ensuring it represents your problem, and ensuring you have processes in place to monitor, re-train, evaluate, and manage FPs/FNs will take a vast majority of your time. Read this paper too: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Academic classes will not teach you how to do this in a work environment. Instead, expect them to give you a toolbox of ideas to use, and it’s up to you to match the tool with the problem. Remember that the algorithm will just spit out numbers. You'll need to really understand what's going on, and what assumptions you are making before you use each model (e.g. in real life few random variables are nicely gaussian).

I do use a good amount of deep learning at work. But try not to - if a logistic regression or gradient boosted tree works, then use it. Else, you will need to fiddle with hyper parameters, try multiple different neural architectures (e.g. with time series prediction, do you start with a CNN with attention? CNN for preprocessing then DNN? LSTM-Autoencoder? Or LSTM-AE + Deep Regressor, or classical VAR or SARIMAX models...what about missing values?), and rapidly evaluate performance before moving forward. You can also pick up a deep learning book or watch Stanford lectures on the side; first have the fundamentals down. There are many, many ways you can re-frame and tackle the same problem. The biggest risk is going down a rabbit hole before you can validate that your approach will work, and wasting a lot of time and resources. ML/Data Science project outcomes are very binary: it will work well or it won’t be prod ready and you have zero impact.

I do think the triple threat of academic knowledge for success in this area would be graduate level statistics, computer science, and economics. I am weakest in theoretical statistics and really need to brush up on bayesian stats (https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445). But 9/10 times a gradient boosted tree with good features (it's all about representation) will work, and getting it in prod plus getting in buy-in from a variety of teams will be your bottleneck. In abuse and fraud; the distributions shift all the time because the nature of the problem is adversarial, so every day is interesting.