# Best products from r/statistics

We found 115 comments on r/statistics discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 465 products and ranked them based on the amount of positive reactions they received. Here are the top 20.

### 2. Probability Theory: The Logic of Science

- Used Book in Good Condition

**Features:**

### 3. SQL in 10 Minutes, Sams Teach Yourself (4th Edition)

- Sams Publishing

**Features:**

### 5. Statistical Rethinking: A Bayesian Course with Examples in R and Stan (Chapman & Hall/CRC Texts in Statistical Science)

- Ergonomic Design: Incorporates 3 functions - ambient light measurement, monitor profiling and projector profiling - into 1 sleek, compact and fully integrated device with no parts for you to misplace
- Custom-designed RGB filter set provides accurate color measurements, while the optical design allows for high rates of repetition on the same display and across different display types for more consistent color matching
- Nearly Universal Compatibility: Works on all modern display technologies, including LED and Wide Gamut LCDs and is also spectrally calibrated, which makes it field-upgradeable to support future display technologies!
- Projector Profiling: Rotating diffuser arm can be used as a stand for table top projector profiling, ambient light measurement, or as a cover for instrument optics and conveniently integrated tripod mount is great for projector profiling in larger venues
- Basic' and 'Advanced' Options: Basic mode's wizard-driven interface guides you through the profiling process in small, easy-to-follow steps, while Advanced mode provides additional pre-defined options for those users who want more color control

**Features:**

### 6. Bayesian Data Analysis (Chapman & Hall/CRC Texts in Statistical Science)

CRC Press

### 7. Applied Linear Statistical Models

- Play through an explosive adventure as three distinct characters united by one common goal: revenge.
- Endlessly fine-tune your performance through each of the five distinct car classes (Race, Drift, Off-Road, Drag, and Runner).
- Get on a roll and win big with risk-versus-reward gameplay. The return of intense cop chases means the stakes have never been higher.

**Features:**

### 8. Data Analysis Using Regression and Multilevel/Hierarchical Models

- Cambridge University Press

**Features:**

### 10. Applied Regression Analysis and Generalized Linear Models

- Family size 1.5 cuft capacity and 1100 Watts of High Cooking Power, Genius One-Touch Sensor Cook and Reheat that eliminates guesswork by automatically setting power levels and adjusting cooking or defrosting times
- Panasonic's own Inverter Technology perfects the art of cooking that consistently delivers microwave eneregy for even cooking without burnt edges and overcooking. This technology is even used for Inverter Turbo Defrost for even and faster defrosting of food
- 4-in-1 Cooking possibility with Microwave, Broiling, Baking and combination cooking, Silver Button control panel with Dual Dials for easy programming, 15" Turntable
- Please note that the upper left-corner of the door has a small indentation to allow the door to open smoothly and is completed at the factory
- Measures: 23 7/8" (W) x 19 5/16" (D) x 14 13/16" (H), 46 lbs, All Stainless Steel exterior / interior

**Features:**

### 11. Mostly Harmless Econometrics: An Empiricist's Companion

- Princeton University Press

**Features:**

### 13. Mathematical Statistics with Applications

- Used Book in Good Condition

**Features:**

### 15. Pattern Recognition and Machine Learning (Information Science and Statistics)

- Springer

**Features:**

### 16. Discovering Statistics Using SPSS, 3rd Edition (Introducing Statistical Methods)

- SPSS
- Well written
- Easy read, very helpful
- Statistics

**Features:**

### 18. Probability and Statistical Inference (9th Edition)

Written by three veteran statisticians, this applied introduction to probability and statistics emphasizes the existence of variation in almost every process, and how the study of probability and statistics helps us understand this variation.

My understanding is that this is more of a traditional statistics problem than a data science problem (though the two overlap substantially). It sounds like you will be dealing with (at most) a few hundred “squads” and maybe 10-20 pertaining variables (e.g., workload, time taken to complete work load, size of team, age of team members, educational qualifications of team members). There is no strict boarder segregating data science and traditional statistics, but generally speaking traditional statistics aim to analyse 100s-1000s of data points with 10s of variables, whereas data science or machine learning procedures aim to assess millions of data points with 1000s of variables available. Again, this is not a strict definition and you can almost always apply a data science procedure to a traditional statistics problem (and visa-versa).

This being said, this sub is an okay place to seek resources. I would highly recommend checking out Stack Exchange and the machine learning sub. You may want to purchase a textbook to facilitate your learning. My favourites include Applied Linear Statistical Models by Kutner and friends, and Python Machine Learning by Sebastian Raschika. The former is a traditional stats textbook and the latter is a data science/machine learning textbook. You may be able to find a good portion of these books on Google Books for free. This might help you decide which one to buy and what direction to go in. Additionally, I have made some videos on particular data analysis procedures with the aim of facilitating application oriented understanding rather than complete mathematical understanding, you may find some of these videos useful.

I would say that your proposed project is potentially a good one, but we’d need more information to gauge its feasibility. As a start, see what variables you have available to you and explore your data a bit (maybe look at all their bivariate relationships). Data analysis itself requires a lot of “looking at” and “interpreting what you see”. Just doing this basic task will give you a much better idea of what you are dealing with, what variables are possibly related to your outcome of interest, and how feasible it is to gain insight into the problem of interest. Explore and get to know your data, and if you are stuck after that then definitely come back and ask more questions. In the end, you may not be able to accurately “predict” anything, but you can definitely calculate probabilities of successfully completing a sprint based on workload + conditions.

Best of luck, sounds like a fun project :)

You're pretty good when it comes to linear vs. generalized linear models--and the comparison is the same regardless of whether you use mixed models or not. I don't agree at all with your "Part 3".

My favorite reference on the subject is Gelman & Hill. That book prefers to the terminology of "pooling", and considers models that have "no pooling", "complete pooling", or "partial pooling".

One of the introductory datasets is on Radon levels in houses in Minnesota. The response is the (log) Radon level, the main explanatory variable is the floor of the house the measurement was made: 0 for basement, 1 for first floor, and there's also a grouping variable for the county.

Radon comes out of the ground, so, in general, we expect (and see in the data) basement measurements to have higher Radon levels than ground floor measurements, and based on varied soil conditions, different overall levels in different counties.

We could fit 2 fixed effect linear models. Using R formula psuedocode, they are:

`radon ~ floor`

`radon ~ floor + county`

(county as a fixed effect)The first is the "complete pooling" model. Everything is grouped together into one big pool. You estimate two coefficients. The intercept is the mean value for all the basement measurements, and your "slope", the

`floor`

coefficient, is the difference between the ground floor mean and the basement mean. This model completely ignores the differences between the counties.The second is the "no pooling" estimate, where each county is in it's own little pool by itself. If there are

`k`

counties, you estimate`k + 1`

coefficients: one intercept--the mean value in your reference county, one "slope", and`k - 1`

county adjustments which are the differences between the mean basement measurements in each county to the reference county.Neither of these models are great. The complete pooling model ignores any information conveyed by the

`county`

variable, which is wasteful. A big problem with the second model is that there's a lot of variation in how sure we are about individual counties. Some counties have a lot of measurements, and we feel pretty good about their levels, but some of the counties only have 2 or 3 data points (or even just 1). What we're doing in the "no pooling" model is taking the average of however many measurement there are in each county, even if there are only 2, and declaring that to bethe radon level for that county. Maybe Lincoln County has only two measurements, and they both happen to be pretty high, say 1.5 to 2 standard deviations above the grand mean. Do you really think that this is good evidence that Lincoln County has exceptionally high Radon levels? Your model does, it's fitted line goes straight between the two Lincoln county points, 1.75 standard deviations above the grand mean. But maybe you're thinking "that could just be a fluke. Flipping a coin twice and seeing two heads doesn't mean the coin isn't fair, and having only two measurements from Lincoln County and they're both on the high side doesn't mean Radon levels there are twice the state average."Enter "partial pooling", aka mixed effects. We fit the model

`radon ~ floor + (1 | county)`

. This means we'll keep the overall fixed effect for the floor difference, but we'll allow the intercept to vary with county as a random effect. We assume that the intercepts are normally distributed, with each county being a draw from that normal distribution. If a county is above the statewide mean and it has lots of data points, we're pretty confident that the county's Radon level is actually high, but if it's high and has only two data points, they won't have the weight to pull up the measurement. In this way, the random effects model is a lot like a Bayesian model, where our prior is the statewide distribution, and our data is each county.The only parameters that are actually estimated are the

`floor`

coefficient, and then the mean and SD of the county-level intercept. Thus, unlike the complete pooling model, the partial pooling model takes the county info into account, but it is far more parsimonious than the no pooling model. If we really care about the effects of each county, this may not be the best model for us to use. But, if we care about general county-level variation, and we just want to control pretty well for county effects, then this is a great model!Of course, random effects can be extended to more than just intercepts. We could fit models where the

`floor`

coefficient varies by county, etc.Hope this helps! I strongly recommend checking out Gelman and Hill.

> she didn't know how useful it would be

probably more employable than geography

> Do you guys have any recommendations on where to develop my knowledge and skills?

There's a bunch of free and inexpensive stuff around .. but there's also a lot of

badfree/inexpensive stuff around; you have to be a bit discerning (which is hard when you're trying to learn it).It might sound a bit old-school but I'd suggest going to a university library and finding some decent stats texts; you probably want to avoid the stuff that says "11th edition".

Find

severalthat you like and work with those for a whileSome you might look for:

Statistics, Freedman, Pisani & Purves (any edition)

Introduction to the Practice of Statistics, Moore & McCabe (5th edition or earlier)

For a bit of theory (you'll need a bit of mathematics for this but not a ton of it):

Introduction To The Theory Of Statistics, Mood Graybill And Boes

These are all old books. You should be able to get them second hand for cheap, or read them in a library. They'll be a good grounding, but you'll need to be able to ask questions as well.

Places like this one and stats.stackexchange.com can be handy resources. I've seen determined people teach themselves a lot of statistics with only a bit of guidance so it can certainly be done.

> Do I need programming, if so, what would be the best programming language to learn?

It would be best to learn some, yes, because modern statistics relies on it heavily. You don't necessarily

haveto do it immediately but getting an early start (and using it to help with learning stats) will be better than leaving it really long.Two main things are widely used ... R and Python. Both are free. The second is more of a mainstream programming language, the first is a statistics package as well as a more specialized language

Learn one or both; my own suggestion would be to try R but other people may have different advice.

If you want to be a programmer rather than a statistician who uses code to solve statistical problems, python would be the better choice.

As you wish to get into applied statistics (i.e. actually analyzing data), you'll need software. I'd strongly recommend learning and using R because it's completely free and incredibly powerful.

Here are some resources for learning statistics using R:

Then, these websites provide very valuable resources for doing statistics with R:

Hope that helps.

You absolutely will need R and/or SAS to do any work beyond basic statistics. You'll have to know how to do data munging and how to reshape your data to get it in the right form. It's 2018. You have to know how to clean your own data. Additionally, you'll be asked to repeat complicated analyses, or questions like "How did you calculate this number in this analysis from six months ago?" A point-and-click interface doesn't give you a record that makes it easy to do these things. In a programming language, you can rerun complex procedures in the press of a button. Programming can be a little scary at first, but once you get the hang of it, you'll wonder how you lived without it.

Fear not though! There are a ton of fantastic resources to learn how to code. If you've never programmed before, my recommendation would be to go through the Codecademy intro Python tutorial. Even if you never want to use Python after this, you'll learn about variables, conditions, loops, data types, functions, and language-specific features. These are ideas that exist in every programming language (well SAS has macros, not functions, but you get my point.)

I also recommend using R for all of your statistics homework, even if the professor doesn't require it. That's how I learned R. It'll put you in a position where you have to learn how the language works and where the functions you want to use are. Once you have the basics of R down, check out R for Data Science. It's a very modern book on R that encourages you to use user-friendly packages to do data analysis. As for SAS, it's a terrible language that's losing market share to R and is popular because it's popular. It can help to know the basics, but it's a language I leave off my resume and LinkedIn because I never want to touch it again.

I'd also recommend learning SQL at some point. Most datasets will be in databases you'll have to query for the data you want. My favorite book for this is SQL in 10 Minutes, which is a book of 10-minute lessons, where each one is on a SQL concept. Don't worry about the specific SQL dialect since they're virtually all the same. Once you're comfortable with basic queries and joins you're in good shape.

Books:

"Doing Bayesian Data Analysis" by Kruschke. The instruction is really clear and there are code examples, and a lot of the mainstays of NHST are given a Bayesian analogue, so that should have some relevance to you.

"Bayesian Data Analysis" by Gelman. This one is more rigorous (notice the obvious lack of puppies on the cover) but also very good.

Free stuff:

"Think Bayes" by our own resident Bayesian apostle, Allen Downey. This book introduces Bayesian stats from a computational perspective, meaning it lays out problems and solves them by writing Python code. Very easy to follow, free, and just a great resource.

Lecture: "Bayesian Statistics Made (As) Simple (As Possible)" again by Prof. Downey. He's a great teacher.

If you are looking for something very calculus-based, this is the book I am familiar with that is most grounded in that. Though, you will need some serious probability knowledge, as well.

If you are looking for something somewhat less theoretical but still mathematical, I have to suggest my favorite. Statistics by William L. Hays is great. Look at the top couple of reviews on Amazon; they characterize it well. (And yes, the price is heavy for both books.... I think that is the cost of admission for such things. However, considering the comparable cost of much more vapid texts, it might be worth springing for it.)

John Fox's book is great. It's mostly linear regression models for continuous variables, but the GLM section is

veryhelpful. If I remember correctly, the second edition is way more helpful with GLM than the first.For categorical variables Scott Long's book is wonderfully helpful.

Unfortunately both are expensive. Hopefully your library has them.

Any more specificity in what types of variables you might be working with or what your data is like? Knowing what type of link function you're looking for my give you better results from some of the uber statisticians here.

I would highly recommend the following book: Mostly harmless econometrics

http://www.amazon.com/Mostly-Harmless-Econometrics-Empiricists-Companion/dp/0691120358/ref=wl_it_dp_o?ie=UTF8&amp;coliid=IVHBGLQH4VJ3I&amp;colid=21QCK3AR703JR

It is very problem driven book and will help build up your knowledge base to know what models are appropriate for a given situation or dataset.

You will then need to start practicing in a statistical program to gain the practical skills of applying those models to real data. Excel works, but I don't know a good book to recommend to guide you through using excel on real problems.

I recommend Stata to new data analysts and have them pick up "microeconomics using stata"; once they've worked through these two books they get excited and start grabbing data all over and begin running models, its exciting to watch new data modellers apply tools they're learning. R is free and open source but is more difficult to learn, if you're willing to ask there are tons of people willing to help you through R.

Not long! For this purpose I highly

highlyrecommend Richard McElreath'sStatistical Rethinking(this one here). It's SO good. The math is exceptionally straightforward for someone familiar with regression, and it's huge on developing intuition. Bonus: he sets you up with all the tools you need to do your own analyses, and there are tons of examples that he works from a lot of different angles. He even does hierarchical regression.It's an easy math book to read cover to cover by yourself, to be honest. He really holds your hand the whole way through.

Jesus, he should pay me to rep his book.

Did any of your calc classes include multivariate/vector calculus? E.g. things dealing with double and triple integrals.

If not, take another calc class or two; calculus is very important for statistics. It shouldn't be too hard to pick up the rest of the necessary calc since you've already got a good calc background.

If so, start taking probability and statistics courses in your school's math department if you can. The mathematical way (read: the right way) of understanding probability and statistics is based on probability distributions (like the normal distribution), defined by their probability

functions. As such, you can use calculus to obtain a myriad of information from them! For instance, among many other things, within the first one or 2 courses, you'd likely be able to answer at least the Spearman's coefficient question, the Bernoulli process question, and the MLE question.If you don't have room in your schedule to do the stats course, you could get a textbook and try learning on your own. There are tons of excellent resources. Hogg, Tanis, and Zimmerman is pretty good for an introduction, though I'm sure there's better out there.

I don't understand the question. Isn't this easy? Just follow the KISS principle (keep it simple, stupid). They're presumably seen histograms somewhere. Just present the kernel density estimate, presumably with optimal bandwidth chosen by cross-validation or something, which is way too complicated to explain, as a better competitor to the histogram. Explain the kernel density estimate as a better estimate of the "theoretical histogram" (I get this terminology from Freedman, Pisani, and Purves, an excellent "statistics for poets" book), which is what the histogram would be if you had an infinite amount of data. No one believes the theoretical histogram actually has jumps like a histogram (estimate), so why not use a smooth estimate like the kernel density estimate? That's almost ELI5.

I'm in a similar situation (requiring to be proficient in statistics), and here's what I'm doing.

a. Stats 133 - Computing with Data: A course on using R, SQL, and other technologies useful in statistics.

b. Stats 102 - Intro to Statistics I found multiple versions of this course, but I'm going to pick this one because it uses this interesting book which emphasizes case studies

c. Stats 135 - Concepts of Statistics : More advanced treatment of the same concepts from 102.

d. If you want to brush up on probability, you should look at Stats 101 and Stats 134.

e. After this level, they have a series of electives, such as Stochastic Processes (Stats 150), Linear Modelling Lab (151A and 151B), Sampling Surveys Lab (152), Time Series Lab (153), Game Theory (155), and seminars.

The classes don't have videos or audios, but they have syllabuses, lecture notes and assignments. So far I've found them to be more than sufficient.

https://www.amazon.co.uk/Discovering-Statistics-Introducing-Statistical-Methods/dp/1847879071

This book by Andy Field is by far my favorite. His writing style is really laid back and funny, which helps me concentrate as statistics can be pretty dry/boring. And he is good at explaining the statistic theories in an easy way. If you dont want to use spss while learning he has a statistics book in which he doesnt use a statistics program as part of teaching (I haven’t read that one though). He also had books on how to use Stata, R etc.

I really like this book:

http://www.amazon.co.uk/Discovering-Statistics-using-IBM-SPSS/dp/1446249182/ref=as_li_tf_sw?&amp;linkCode=wsw&amp;tag=statihell-21

Fun to read, easy to understand, entertaining. What stats book is entertaining???

> There are some philosophical reasons and some practical reasons that being a "pure" Bayesian isn't really a thing as much as it used to be. But to get there, you first have to understand what a "pure" Bayesian is: you develop reasonable prior information based on your current state of knowledge about a parameter / research question. You codify that in terms of probability, and then you proceed with your analysis based on the data. When you look at the posterior distributions (or posterior predictive distribution), it should then correctly correspond to the rational "new" state of information about a problem because you've coded your prior information and the data, right?

Sounds good. I'm with you here.

> However, suppose you define a "prior" whereby a parameter must be greater than zero, but it turns out that your state of knowledge is wrong?

Isn't that prior then just an error like any other, like assuming that 2 + 2 = 5 and making calculations based on that?

> What if you cannot codify your state of knowledge as a prior?

Do you mean a state of knowledge that is impossible to encode as a prior, or one that we just don't know how to encode?

> What if your state of knowledge is correctly codified but makes up an "improper" prior distribution so that your posterior isn't defined?

Good question. Is it settled how one should construct the strictly correct priors? Do we know that the correct procedure ever leads to improper distributions? Personally, I'm not sure I know how to create priors for any problem other than the one the prior is spread evenly over a finite set of indistinguishable hypotheses.

The thing about trying different priors, to see if it makes much of a difference, seems like a legitimate approximation technique that needn't shake any philosophical underpinnings. As far as I can see, it's akin to plugging in different values of an unknown parameter in a formula, to see if one needs to figure out the unknown parameter, or if the formula produces approximately the same result anyway.

> read this book. I promise it will only try to brainwash you a LITTLE.

I read it and I loved it so much for its uncompromising attitude. Jaynes made me a militant radical. ;-)

I have an uncomfortable feeling that Gelman sometimes strays from the straight and narrow. Nevertheless, I looked forward to reading the page about Prior Choice Recommendations that he links to in one of the posts you mention. In it, though, I find the puzzling "Some principles we don't like: invariance, Jeffreys, entropy". Do you know why they write that?

I would try

Mathematical Statistics and Data Analysisby Rice. The standard intro text for Mathematical Statistics (this is where you get the proofs) is Wackerly, Mendenhall, and Schaeffer but I find this book to be a bit too dry and theoretical (and I'm in math). Calculus is less important than a thorough understanding of how random variables work. Rice has a couple of pretty good chapters on this, but it will require some mathematical maturity to read this book. Good luck!Depending on how strong your math/stats background is you might consider Statistical Inference by Casella and Berger. It's what we use for our first year PhD Mathematical Statistics course.

That might be a little too difficult if you're not very comfortable with probability theory and basic statistics. If you look at the first few chapters on Amazon and it seems like too much I recommend Mathematical Statistics and Data Analysis by Rice which I guess I would consider a "prequel" to the Casella text. I worked through this in an advanced statistics undergrad course (along with Mostly Harmless Econometrics and the Goldberger's course in Econometrics).

Let's see, if you're interested in Stochastic Models (Random Walks, Markov Chains, Poisson Processes etc), I recommend Introduction to Stochastic Modeling by Taylor and Karlin. Also something I worked through as an undergrad.

The absolute best book I've found for someone with a frequentist background and undergraduate-level math skills is Doing Bayesian Data Analysis by John Kruschke. It's a fantastic book that goes into mathematical depth only when it needs to while also building your intuition.

The second edition is new and I'd recommend it over the first because of its improved code. It uses JAGS and STAN instead of Bugs, which is Windows-only now.

I've used a book by Gelman for self study. Great author, very good at using meaningful graphics -- which may be an effective way to convey ideas to students.

Thanks. The program is Data Science and prereqs are Calc, Lin Alg and basic stats.

I started my review using https://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X but the book assumes you have basic stats. I took these courses 5+ years ago so I only vaguely remember the material.

Good example with hetero/homoskedasticity. I want to make sure I understand things like random variables and different types of distributions.

This imo is a good book for basic probability and mathematical statistics. Super easy read with a lot of examples. [You also mentioned pdf's for books and someone told you library gensis. I can promise this one is on there :)]

Somewhat facetiously, I'd say the probability that an individual who has voted in X/12 of the last elections will vote in the next election is (X+1)/14. That would be my guess if I had no other information.

As the proverb goes: it's difficult to make predictions, especially about the future. We don't have any votes from the next election to try to discern what relationship those votes have to any of the data at hand. Of course that isn't going to stop people who need to make decisions. I'm not well-versed in predictive modeling (being more acquainted with the "make inference about the population from the sample" sort of statistics) but I wonder what would happen if you did logistic regression with the most recent election results as the response and all the other information you have as predictors. See how well you can predict the recent past using the further past, and suppose those patterns will carry forward into the future. Perhaps someone else can propose a more sophisticated solution.

I'm not sure how this data was collected, but keep in mind that a list of people who have voted before is not a complete list of people who might vote now, since there are some first-time voters in every election.

If you want to get serious about data modeling in social science, you might check out this book by statistician/political scientist Andrew Gelman.

http://www.amazon.com/Applied-Regression-Analysis-Generalized-Linear/dp/0761930426

http://www.amazon.com/Regression-Categorical-Dependent-Quantitative-Techniques/dp/0803973748

http://www-stat.stanford.edu/~tibs/ElemStatLearn/

http://www-bcf.usc.edu/~gareth/ISL/

http://www.amazon.com/Extending-Linear-Model-Generalized-Nonparametric/dp/158488424X/ref=sr_1_2?s=books&amp;ie=UTF8&amp;qid=1380716057&amp;sr=1-2

http://www.amazon.com/Generalized-Edition-Monographs-Statistics-Probability/dp/0412317605

hope it helps man. good luck. im learning this stuff for my project too

The short version is that in a bayesian model your likelihood is how you're choosing to model the data, aka P(x|\theta) encodes how you think your data was generated. If you think your data comes from a binomial, e.g. you have something representing a series of success/failure trials like coin flips, you'd model your data with a binomial likelihood. There's no right or wrong way to choose the likelihood, it's entirely based on how you, the statistician, thinks the data should be modeled. The prior, P(\theta), is just a way to specify what you think \theta might be beforehand, e.g. if you have no clue in the binomial example what your rate of success might be you put a uniform prior over the unit interval. Then, assuming you understand bayes theorem, we find that we can estimate the parameter \theta given the data by calculating P(\theta|x)=P(x|\theta)P(\theta)/P(x) . That is the entire bayesian model in a nutshell. The problem, and where mcmc comes in, is that given real data, the way to calculate P(x) is usually intractable, as it amounts to integrating or summing over P(x|\theta)P(\theta), which isn't easy when you have multiple data points (since P(x|\theta) becomes \prod_{i} P(x_i|\theta) ). You use mcmc (and other approximate inference methods) to get around calculating P(x) exactly. I'm not sure where you've learned bayesian stats from before, but I've heard good things , for gaining intuition (which it seems is what you need), about Statistical Rethinking (https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445), the authors website includes more resources including his lectures. Doing Bayesian data analysis (https://www.amazon.com/Doing-Bayesian-Data-Analysis-Second/dp/0124058884/ref=pd_lpo_sbs_14_t_1?_encoding=UTF8&amp;psc=1&amp;refRID=58357AYY9N1EZRG0WAMY) also seems to be another beginner friendly book.

Hi, i see you already have read some good books. I would also recommend Bishop's pattern recognition book for ML https://www.amazon.ca/Pattern-Recognition-Machine-Learning-Christopher/dp/0387310738

However, not sure whether they teach this in grad stat . Good Luck !

&#x200B;

It is hard to provide a "comprehensive" view, because there's so much disperate material in so many different fields that draw upon probability theory.

Feller is an approachable classic that covers all of the main results in traditional probability theory. It certainly feels a little dated, but it is full of the deep central limit insights that are rarely explained in full in other texts. Feller is rigorous, but keeps applications at the center of the discussion, and doesn't dwell too much on the measure-theoretical / axiomatic side of things. If you are more interested in the modern mathematical theory of probability, try Probability with Martingales.

On the other hand, if you don't care at all about abstract mathematical insights, and just want to be able to use probabilty theory directly for every-day applications, then I would skip both of the above, and look into Bayesian probabilistic modelling. Try Gelman, et. al..

Of course, there's also machine learning. It draws on a lot of probability theory, but often teaches it in a very different way to a traditional probability class. For a start, there is much more emphasis on multivariate models, so linear algebra is much more central. (Bishop is a good text).

https://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/dp/0062731025

If your just looking for a concept overview the cartoon guide to statistics is great. It's easy to read and filled with great visuals and examples.

If you want to learn how to do intro statistics/practice, look no further than khan Academy.

The book that you want the person to look up is Applied Linear Statistical Models. It is a great reference book and gets into the nitty gritty calculations for figuring out the appropriate degrees of freedom in some pretty ugly experimental designs.

Casella and Berger is one of the go-to references. It is at the advanced undergraduate/first year graduate student level. It's more classical statistics than data science, though.

Good statistical texts for data science are Introduction to Statistical Learning and the more advanced Elements of Statistical Learning. Both of these have free pdfs available.

Casella & Berger is the go-to reference (as Smartless has already pointed out), but you may also enjoy Jaynes. I'm not sure I'd say it's

quickbut if gaps are your concern, it's pretty drum-tight.I'm doing the same. Here are a couple of resources that you may find helpful.

That should be enough to get you going. Good luck!

This is the best recent book I have read https://www.amazon.com/Foundations-Linear-Generalized-Probability-Statistics/dp/1118730038

This is coming from someone who loves the Casella and Burger text.

I would say: Go for it as long as you are interested in the job :)

For study references for remembering R and Statistics, I think all you would need would be:

For R, data cleaning and the such: http://r4ds.had.co.nz/ and for basic statistics with R probably either Daalgard for Applied Statistics with R and something like OpenIntroStats or Freedman for review of stats

If you really need it dumbed down, I would recommend Asimow and Maxwell. This text has a solutions manual. Note that this is specifically tailored toward actuarial exams - i.e., people that have to learn the material quickly but not necessarily for grad school. (And yes, the website is legit. I've done some contract work for them in the past and have ordered books through them.)

If you don't mind something more mathematical, I would recommend Wackerly et al.

If you want a math book with that perspective, I'd recommend E.T. Jaynes "Probability Theory: The Logic of Science" he devolves into quite a lot of discussions about that topic.

If you want a popular science book on the subject, try "The Theory That Would Not Die".

Bayesian statistics has, in my opinion, been the force that has attempted to reverse this particular historical trend. However, that viewpoint is unlikely to be shared by all in this area. So take my viewpoint with a grain of salt.

Gelman's book is pretty awesome. http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X/ref=pd_sim_b_1

A lot of the recommendations in this thread are good, I'd like to add "Bayesian Data Analysis 3rd edition" by Gelman et al. Useful if you encounter Bayesian models, especially hierarchical/multilevel models.

In my graduate econometrics course we used Mostly Harmless Econometrics. It's focused on the question of causal inference, and specifically how to do empirically rigorous studies when variables aren't exogenous. It covers a bunch of best practices in design experiments. It's not focused on networks, which is a rapidly emerging field of study in the social sciences. However, it does a very good job of explaining possible sources of error in statistical inference and research design.

How would you rather split beginner vs intermediate/advanced ?

My feeling was that Ben Lambert's book would be a good intro and that Bayesian Data Analysis would be a good next ?

Andy Field is both very informative and entertaining. I highly recommend this book.

I don't understand how my example of spurious correlation among randomly generated numbers doesn't already meet that burden. That's a data generating process that is not causal by design but produces your preferred observed signal.

Your additions of "repeated", "different times" and "different places" only reduce likelihood of finding a set with your preferred signal (or similarly require checking more pairs). There's literally a cottage industry around finding these funny noncausal relationships http://tylervigen.com/page?page=1

If you're imagining something more elaborate about what it means to move "reliably" together, Mostly Harmless Econometrics walks through how every single thing you might be thinking of is really just trying to get back to Rubin style randomized treatment assignment

https://www.amazon.com/Mostly-Harmless-Econometrics-Empiricists-Companion/dp/0691120358

you can check the following books

https://www.amazon.com/dp/B07QCHTR54

https://www.amazon.com/dp/B07GRBXX7D

Surprisingly effective intro to probability

might be too informal for your purposes though...

Honestly, ignore the "for engineering" part of "Statistics for Engineering." They're largely the same content.

How much calculus have you taken? Does the class use calculus?

First, the cartoon guide to statistics is surprisingly helpful for some people.

For a more traditional textbook, you might try Devore's main intro book.

Almost every student finds statistics confusing and it's either difficult to teach, or just difficult to learn. It's also a fractal discipline, since you can keep going deeper and deeper, but it's generally just going over the same few concepts with additional depth. If you end up in a class that's not well suited to your mathematical background it's especially frustrating.

Good luck.

I'm assuming this is some sort of experimental psychology?

Probably everything in this book:

http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182/

or this website:http://www.statisticshell.com/html/apf.html

Same guy, great book.

Applied Linear Statistical Models by Kutner is a far better reference for statistical modeling compared to ISLR/ESLR or any kind of "machine learning" text, but it sounds as though you did a stat masters since you're asking about stat modeling instead of the new buzzwords. The latter options are certainly more narrow.

https://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X

Considered a cornerstone, of sorts.

PDF WARN: Introduction to Math Stat by Hogg

Not to be confused with Probability and Math Stat by Tannis and Hogg which is a "first semester" course.

Good blend of theory and "talky-ness", good exercises that test your understanding, most should be do-able from just applying the basics.

I've always loved Andrew Gelman and Jennifer Hill's book: http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1410039864&amp;sr=1-1&amp;keywords=data+analysis+using+regression+and+multilevel+hierarchical+models

Specifically written for Social Research, and presents example code for R and WinBugs

Seconding /u/khanable_ -- most of statistical theory is built on matrix algebra, especially regression. Entry-level textbooks usually use simulations to explain concepts because it's really the only way to get around assuming your audience knows linear algebra.

My Ph.D. program uses Casella and Berger as the main text for all intro classes. It's incredibly thorough, beginning with probability and providing rigorous proofs throughout, but you would need to be comfortable with linear algebra and at least the basic principles of real analysis. That said, this is THE book that I refer to whenever I have a question about statistical theory-- it's always on my desk.

A good bet would be "Mostly harmless Econometrics"

http://www.amazon.com/Mostly-Harmless-Econometrics-Empiricists-Companion/dp/0691120358

Not overly theoretic and very focussed on practical applications.

This book is excellent:

https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445

You could read the IBM manual OR you could buy this much more user friendly book:

http://www.amazon.com/Discovering-Statistics-using-IBM-SPSS/dp/1446249182

It requires study so you might not have any sudden moments of clarity, but this is pretty much the Bible of regression.

http://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X

Highly recommended.

If you want an extremely practical book to complement BDA3, try Statistical Rethinking.

It's got some of the clearest writing I've seen in a stats book, and there are some good R and STAN code examples.

This one? Damn, it's £40-ish. Any highlights or is it just a case of this book is the highlight?

It's on my wishlist anyway. Thanks.

Jaynes: Probability Theory. Perhaps 'rigorous' is not the first word I'd choose to describe it, but it certainly gives you a thorough understanding of what Bayesian methods actually mean.

In addition to linear regression, do you need a reference for future use/other topics? Casella/Berger is a good one.

For linear regression, I really enjoyed A Modern Approach to Regression with R.

For research methods in behavioral and social sciences, you probably can't get better than Shadish, Cook, and Campbell's : Experimental and Quasi-Experimental Design for generalized causal inference. As far as stats go, the Andy Field books are good and he has one for R, one for SAS, and one for SPSS. I prefer the John Fox book on Applied Regression analysis and the corresponding r book. Here are some links:

http://www.amazon.com/Experimental-Quasi-Experimental-Designs-Generalized-Inference/dp/0395615569/ref=sr_1_1?ie=UTF8&amp;qid=1417220301&amp;sr=8-1&amp;keywords=shadish+cook+and+campbell

http://www.amazon.com/Applied-Regression-Analysis-Generalized-Linear/dp/0761930426/ref=sr_1_6?ie=UTF8&amp;qid=1417220395&amp;sr=8-6&amp;keywords=John+Fox

http://www.amazon.com/An-R-Companion-Applied-Regression/dp/141297514X/ref=pd_bxgy_b_img_y

This is very similar to the analysis featured on the cover of Bayesian Data Analysis (third edition).

Here's a bigger picture of their decomposition into day-of-week effects, seasonal effects, long-term trends, holidays, etc.

A bit more here, and lots more in the book.

I agree with all of the above. Also, here's the Linear Models tome we used: http://www.amazon.com/Applied-Linear-Statistical-Models-Michael/dp/007310874X

I recommend Andy Field's book

They're not free, but Doing Bayesian Data Analysis and Statistical Rethinking are worth their weight in gold.

If you want to go deep, try a math stats book. Although there would probably be disagreement in this subreddit on which is the best, they are all pretty much the same. Maybe try an early edition of the Wackerly book (I think that one is most widely used one). A lot of people would suggest Casella and Berger, but I would suspect those people have never taught a course and forget that that book does require a bit of mathematical maturity. Go with an undergrad book.

For R, I would suggest either going through a tutorial (such as the swirl package), or, what I am assuming how most people learned it, buying an applied stats book and just doing the problems in R. You have to go through the hump of learning it, but you learn programming by doing it. After you are done with math stats, a good next step in the applied direction is Regression and ANOVA/Design. Regression, there are a ton of books. But again, the first few chapters of most books are the same. I would try and find a cheapo book with a modern typeset. ANOVA... probably want to go with Montgomery. I don't know the others too well though.

'The Cartoon Guide To Statistics' by Larry Gonick is a classic place to start.

An intermediate resource between the Downey book and the Gelman book is Doing Bayesian Analysis. It's a bit more grounded in mathematics and theory than the Downey, but a little less mathy than the Gelman.

A very user-friendly treatment that hits every criterion you mention is John Kruschke's

Doing Bayesian Data Analysis, Second Edition.Try the two:

https://www.amazon.com/Introduction-Mathematical-Statistics-Robert-Hogg/dp/0321795431

https://www.amazon.com/Statistical-Inference-George-Casella/dp/0534243126

introduction to mathematical statistics by craig and statistical inference by george casella.

See Casella and Berger chapter 2, theorem 2.1.5

Bayesian Data Analysis and Hoff are both well-respected. The first is a much bigger book with lots of applications, the latter is more of an introduction to the theory and methods.

Statistical Rethinking: https://www.youtube.com/playlist?list=PLDcUM9US4XdM9_N6XUUFrhghGJ4K25bFc

Also has the book: https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/1482253445

I’m finishing up my stats degree this summer. For math, I took 5 courses: single variable calculus , multi variable calculus, and linear algebra.

My stat courses are divided into three blocks.

First block, intro to probability, mathematical stats, and linear models.

Second block, computational stats with R, computation & optimization with R, and Monte Carlo Methods.

Third block, intro to regression analysis, design and analysis of experiments, and regression and data mining.

And two electives of my choice: survey sampling & statistical models in finance.

Here’s a book for intro to probability. There’s also lectures available on YouTube: search MIT intro to probability.

For a first course in calculus search on YouTube: UCLA Math 31A. You should also search for Berkeley’s calculus lectures; the professor is so good. Here’s the calc book I used.

For linear algebra, search MIT linear algebra. Here’s the book.

The probability book I listed covers two courses in probability. You’ll also want to check out this book.

If you want to go deeper into stats, for example, measure theory, you’re going to have to take real analysis & a more advanced course on linear algebra.

Start here:

If you have access to a decent academic library, I'd recommend reading skimming the chapters on hypothesis testing in Casella & Berger or Hogg, Mckean & Craig. I believe HMC also discusses risk and loss functions in the context of bayesian statistics, although I'm not sure of C&B does. Unfortunately I don't have any good recommendations for you specific to bayesian hypothesis testing, my bayesian experienced only lightly touched on hypothesis testing.

Very,

verygenerally, the idea is that you want to determine how likely each hypothesis is given your data. By taking a ratio of these probabilities, you can determine proportionally which how likely each hypothesis is relative to the other one, given the data. If you set up your ratio (here called a bayes factor) as p(H0|X)/P(H1|X), then a BF close to or larger than '1' indicates you can't reject the null hypothesis, since it is more likely than the alternative. As the BF approaches 0, your confidence in a choice to reject the null hypothesis increases, since this indicates that the denominator is large and therefore the alternative hypothesis is more likely given your data.The question then becomes how do you establish the cutoff value: exactly how small does the BF need to be to definitively reject the null hypothesis? This is where the risk functions come in. The risk functions help you to decide on a decision function, which is really just a point estimator. You will use this result to determine the cutoff value for rejecting the null hypothesis and compare this value to your calculated BF, which here serves as the test statistic.

Recommendation - don't learn statistics through "statistics for biology/ecology".

Go straight to statistics texts, the applied ones aren't that hard and they usually have fewer of the lost-in-translation errors (e.g. the abuse of p-values in all of biology).

Try Gelman and Hill -

http://www.amazon.com/Analysis-Regression-Multilevel-Hierarchical-Models/dp/052168689X/ref=sr_1_1?ie=UTF8&amp;qid=1427768688&amp;sr=8-1&amp;keywords=gelman+hill

Faraway - Practical Regression and Anova using (free)

http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf

Categorical data analysis

http://www.amazon.com/Categorical-Data-Analysis-Alan-Agresti/dp/0470463635/ref=sr_1_1?ie=UTF8&amp;qid=1427768746&amp;sr=8-1&amp;keywords=categorical+data+analysis