(Part 2) Best products from r/AskStatistics

We found 20 comments on r/AskStatistics discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 71 products and ranked them based on the amount of positive reactions they received. Here are the products ranked 21-40. You can also go back to the previous section.

Top comments mentioning products on r/AskStatistics:

u/Jimmy_Goose · 1 pointr/AskStatistics

There is a bunch of engineering stats books out there. The one we teach out of at my uni is the one by Devore. I think it does a good job of teaching what it does. I know Ross has an engineering stats book out there, and so does Montgomery, and they are both people who have written good books in the past. The one by Ross seems to have some good topics in it from reading the table of contents.


Also, you probably want to pick up a regression book. I like the one by Kutner et al., but it is ungodly pricey. This one has a free pdf. I don't like a lot about it, but the first few chapters of every regression book are pretty much the same.

If you want to go deep into statistical theory, there is Casella and Berger as well.


For programs, I know MATLAB has a stats package that should be sufficient for the time being. If you want to go further in stats, you might want to consider R because it will have vastly more stats functions.

u/noahpoah · 9 pointsr/AskStatistics

The Theory of the Design of Experiments by Cox and Reid is a useful book. You should definitely click that link and buy it from Amazon rather than clicking on this link of a pdf of the whole book which came up as the second result in my google search just now, but which we all certainly agree should not be clicked on or shared.

u/BurkeyAcademy · 2 pointsr/AskStatistics

I know this is a boring suggestion, but nothing beats the old, venerable Schaum's Outlines for their combination of problems, solutions, and inexpensiveness. If you are just starting, perhaps start with this one, and once you get some exposure to the basics you'll have a better idea of what you might want to pursue next- perhaps the next step would be analysis using a computer instead of by hand.

Lots of us have free YouTube videos on the basics that you can reference if/when you need them as you go. Try me or Kahn Academy, there are many others. Let me know if this idea doesn't fit with what you had in mind, and I can try to point you in a different direction.

u/EthanMacdonald · 3 pointsr/AskStatistics

Uhh... I think your reference is either incorrect or referencing a republication of the same edition. According to Amazon the 8th edition was released in 1989 long after both authors had passed away. Also according to Amazon, the 9th edition is on pre-order for an Oct. 2016 release date: https://www.amazon.com/Snedecor-Cochrans-Statistical-Methods-Kenneth/dp/0813808642

So that people don't waste time re-doing what I've done:

  • Google scholar has no info on anything from 2004
  • Sci-Hub and Library Genesis are also no help
  • Amazon only sells the 1989 8th edition and are accepting pre-orders for the 2016 9th edition

    Good luck.
u/COOLSerdash · 2 pointsr/AskStatistics

Here are some thoughts:

  • How strong an association is cannot be inferred from the magnitude of the p-values. Low p-values just provide some evidence against the hypothesis that these regression coefficients are zero. How strong their influence depends on the actual coefficient and subject matter expertise is needed to properly interpret them.
  • These regression models have underlying assumptions which is why you should inspect some model fits, especially the residuals (residuals vs. fitted, QQ-Plots of the residuals etc.).
  • The R-squared is a bad measure for strength of relationship or predictive power of your model. Relevant post is here.
  • How did you arrive at these 3 parameters in the final model? Model selection is a very delicate topic in statistics and many books have been written solely on this (e.g. Frank Harrell's). Just picking the parameters on the basis of p-values - for example - is generally thought to be a bad idea. This book is free and has a nice chapter (i.e. 6, page 203) on model selection in linear models.
  • Excel is good for many things. Statistics is not among them.

u/scientific_derp · 4 pointsr/AskStatistics

One common method is to use a simple slopes plot. Basically, you pick two (or three) values of one of the predictors (we'll call it X1) and plot the slopes for the other predictor (x2) at those points. Which points of X1 you select depends a lot on the nature of the variable. If there are meaningful points you can select, choose those. For example, if X1 is score on some clinical instrument, you might select points above and below a diagnostic cutoff. If there are no meaningful points to select, you might use +/- 1 standard deviation.

Jeremy Dawson has some worksheets for making these plots at his website: http://www.jeremydawson.co.uk/slopes.htm

A great book covering simple slopes analyses is Aiken & West (1991).

u/Niemand262 · 1 pointr/AskStatistics

I'm a graduate student who teaches an undergraduate statistics course, and I'm going to be brutally honest with you.


Because you have expressed a lack of understanding about what standard deviation is, I don't anticipate that you will be able to understand the advice that you receive here. I teach statistics at an undergraduate level. I teach standard deviations during week 1, and I teach ANOVA in the final 2 weeks. So, you are at least a full undergraduate course away from understanding the statistics you will need for this.

Honestly, you're probably in over your head on this and a few days spent on reddit aren't going to give you what you're looking for. Even if you're given the answers here, you'll need the statistical knowledge to understand what the answers actually mean about your data.


You have run an experiment, but the data analysis you want to do requires expertise. It's a LOT more nuanced and complex than you probably realized from the outset.


Some quick issues that I see here at a glance...

Mashing together different variables can make a real mess of the data, so the scores you have might not even be useful if you were to attempt to run an ANOVA (the test you would need to use) on them.

With what you have shown us in the post, we are unable to tell if group b's scores are higher because of the message they received or whether they just happen to be higher due to random chance. Without the complete "unmashed" dataset we won't be able to say which of the "mashed" measurements are driving the effect.


I have worked with honors students that I wouldn't trust with the analysis you need. Because you are doing this for work, you really should consider contacting a professional. You can probably hire a graduate student to do the analysis for a few hundred dollars as a side job.


If you really want to learn how to do it for yourself, I would encourage you to check out Andy Field's text book. He also has a YouTube Channel with lectures, but they aren't enough to teach you everything you need to understand. Chapter 11 is ANOVA, but you'll need to work your way up to it.

u/gianisa · 3 pointsr/AskStatistics

Causality (causal research) is its own area. Statistics can't really do anything about causality. That's why it's so common for studies to be done as follow-ups to the results from a statistical analysis. Especially if you're doing a retrospective study, there's really no way to determine causality.

Here's a couple of things to read that may be helpful: 1, 2, 3.

Judea Pearl's book on causality is one of the best known in the area, so you may want to take a look at that.

u/ThomasSpeidel · 5 pointsr/AskStatistics

I think it may be worth calculating the rate itself after you fit the model (instead of modeling the rate itself). Your dep var could be a count of infected trees per unit area. Your independent var would be the distance from the treeline. You probably need to control for other factors as well; presumably age of tree. The model could be a suitable count model such as poisson, negative binomial etc. An ordinal model may be suitable as well. Depending on the approach you may need to adjust for dependence: trees that are closer are more likely to be similar than trees that are farther. This can be done using robust approabhes or clustering by plot. Hilbe is an excellent and applied resource: Modeling Count Data https://www.amazon.ca/dp/B00KL8CEW8/ref=cm_sw_r_cp_apa_n5cgAbJC586BH

u/Undecided_fellow · 2 pointsr/AskStatistics

I'm a big fan of The Drunkard's Walk. Also, the author Leonard Mlodinow (PhD in physics from Berkeley) has a number of other really good books on different scientific fields.

u/[deleted] · 2 pointsr/AskStatistics

The handbook of structural equation modeling edited by Hoyle is a really great resource, especially if you have the mathematics background. Also, Bayesian statistics for the social sciences by David Kaplan.

Kaplan

Hoyle

u/DrGar · 3 pointsr/AskStatistics

This is the appropriate way to go about it, but you should realize the drawbacks. You estimated the error rate for the models that were built using 4000 samples, which means you have no guarantees of performance when they are trained on 5000 samples.

So for example, if you are choosing K for a KNN classifier, then chances are you might end up choosing K too small, since 4000 samples might mean that K=4 is best, but once you get to 5000 samples it is possible that K=5 will now be best.

In general model selection is a very difficult problem. A lot of people like leave one out (loo) cross-validation, since a model built with 4999 samples will behave very similarly to one built with 5000 samples. But obviously loo will be more computationally expensive than your proposed cross validation scheme. AIC and BIC are also tools that are commonly applied to model selection, and these are performed using ALL of your data at once.

If you want a book that goes into a lot of the details of error rate estimation and model selection, I suggest this one. edit: This book also discusses some more advanced theoretical topics such as VC dimension, which leads to other tools like structural risk minimization that compete with cross-validation. Here is a free online primer on SRM.

u/ddefranza · 1 pointr/AskStatistics

"Numbers Rule Your World" by Kaiser Fung offers a great explanation of probability and statistics with lots of real world examples and applications. Numbers Rule Your World: The Hidden Influence of Probabilities and Statistics on Everything You Do https://www.amazon.com/dp/0071626530/

u/davidjricardo · 1 pointr/AskStatistics

You'll need some statistics background first (Maybe try a MOOC or two?). But The Analysis of Household Surveys: A Microeconometric Approach to Development Policy is an excellent resource for program analysis.

u/Sarcuss · 1 pointr/AskStatistics

Also interested in possible solutions as I'm also starting to carry meta-analysis on my own. If you use R, one of the books that was suggested to me was Applied Meta-Analysis with R

u/jacobcvt12 · 1 pointr/AskStatistics

Incorporating expert opinion into a Bayesian model is usually done through prior distributions instead of an additional feature. (As an aside, doing so is considered subjective Bayesian inference versus objective Bayesian inference).

As a quick overview, Bayesians usually make inference on the posterior distribution - a combination of the prior distribution (in your case, expert opinion), and the likelihood. As a really basic example, consider a setting where you have data on MI outcomes (no covariates at this point) - a series of 1's and 0's. A frequentist would likely take the mean of the data. As a Bayesian, you would consider this binomial likelihood and likely combine it with a beta prior. The default (non informative) prior would be to use a beta(1, 1) distribution. However, if in a prior dataset, you had observed four patients, three with an MI and one without, you could use a prior of beta(1+3, 1+1). See here for more details on beta-binomial.

In the above example, it's easy to incorporate prior information because we used a conjugate prior. While probably not exactly what you are doing for your dissertation, here's an overview of a conjugate prior with a linear regression from wikipedia. There are many more resources online for this that you can find by searching for something along the lines of "bayesian linear regression subjective conjugate prior". For a more detailed (introductory) overview of bayesian statistics, check out this book.

To be honest, as much as I'm a Bayesian, I think that creating an automatic model that incorporates expert opinion will be really difficult. Usually, subjective priors are chosen carefully, and there not always as interpretable as the beta-binomial posterior presented above. I think this goal is possible, but it would require a lot of though about how the prior is automatically constructed from a data set of surgeon's predictions. If you have any followup questions/would like more resources, let me know!

Edit: I guess I never really addressed the issue of predictive models. However, the difficult part will be constructing the prior automatically. If you can do this, predicting outcomes will be a simple change to make, especially in the case of linear model.