#1,504 in Computers & technology books
Use arrows to jump to the previous/next product

Reddit mentions of The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

Sentiment score: 2
Reddit mentions: 2

We found 2 Reddit mentions of The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics). Here are the top ones.

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)
Buying options
View on Amazon.com
or
Specs:
Release dateAugust 2009

idea-bulb Interested in what Redditors like? Check out our Shuffle feature

Shuffle: random products popular on Reddit

Found 2 comments on The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics):

u/csiz ยท 4 pointsr/BitcoinMarkets

Relevant xkcd. It's also aged pretty well, now that 5 years passed we can indeed tell what animal is in the picture.

Since you asked let me run down what you'd need to do:

  1. Know a programming language so you can make a computer do this for you (good job so far).
  2. Setup a scrapping program to request all pages on reddit and search through the soup of text to basically extract authors, text and time when they posted (a few days work and you're set).
  3. Check when they make a price prediction and store it so you can setup a bot that keeps track of these and tells everyone else (second part is also a few days work).

    But oh boy that first part of nr 3 is hard. Let me break that down more:

  4. Take a machine learning course or spend a couple of months looking stuff up so you have some idea what you need to do.
  5. Install other python libraries for numerical computation and machine learning.
  6. Get a hold of some ingredients to speed up text processing.
  7. Gather 1000 - 10000 text samples from random redditors posting here and manually label them as "prediction" or "not-prediction", then train your model so you can identify newly posted predictions automatically
    ... That'll take a while, you can optionally hire a bunch of people on the internet to do it for you, but you'll be paying them pennies and should probably expect proportional quality
    ... Make sure to also include as many examples of "not-prediction" that mention past prices as "prediction" mentioning future prices, otherwise you model will just learn to look for any random numbers in a post.

  8. From the posts that are predictions, write another script to identify the actual predicted price bit. A prediction post might have quite a few numbers mentioned in it, including numbers for other crypto currencies, or past prices. You should spend quite a bit of time making sure you're getting the right number out of a post. This bit is actually harder than writing the scraper in the first place; you could sell this data to hedge funds for a couple of millions to fund the whole project.
  9. Follow up on your creation to make sure it has the right accuracy and precision; what you thought those 2 words meant the same thing? Better read a stats book while you're at it.
  10. ...
  11. Profit?
u/throwaway0891245 ยท 1 pointr/javahelp

I have some recommendations on books to get up to speed.

Read this book:

Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

This author does a really good job going through a lot of different algorithms. If you can wait, then go with this book instead - which is by the same author but for TensorFlow 2.0, which is pretty recent and also integrated Keras. It's coming out in October.

Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems

You can get good datasets on Kaggle. If you want to get an actual good foundation on machine learning then this book is often recommended:

The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics)

​

As for staying up to date, it's hard to say because "machine learning" doesn't refer to a single thing, there are a lot of different types of machine learning and each one is developing fast. For example, I used to be pretty into recurrent neural networks for sequence data. I haven't kept up with it lately but I remember about two years ago the hotness was all about LSTM neural networks, but then a simplified gate pattern was shown to be just as good with less training and that became big (name is escaping me right now...). Then the last time I took a look, it looked like people were starting to use convolutional neural networks for sequence data and getting great results on par or better than recurrent neural networks.

The ecosystem is changing fast too. Tensorflow uses (used?) static graph generation, meaning you define the network before you train it and you can't really change it. But recently there was more development on dynamic neural networks, where the network can grow and be pruned during training - and people were saying this is a reason to go with PyTorch instead of Tensorflow. I haven't kept up, but I heard from a friend that things are changing even more - there is this new format called ONNX that aims to standardize information about neural networks; and as I've mentioned earlier in this post, TensorFlow 2.0 is coming out (or out already?).

I'm not doing too much machine learning at the moment, but the way I tried to get new information was periodically looking for articles in the problem type I was trying to solve - which at the time was predicting sequences based on sparse multidimensional sequence data with non-matching step intervals.

If you read the TensorFlow book I linked above, you'll get a great overview and feel for what types of problems are out there and what sort of ML solutions exist now. You'll think of a problem you want to solve and then it's off to the search engines to see what ideas exist now.