#128 in Computers & technology books
Use arrows to jump to the previous/next product
Reddit mentions of Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Sentiment score: 16
Reddit mentions: 27
We found 27 Reddit mentions of Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. Here are the top ones.
Buying options
View on Amazon.comor
- O'Reilly Media
Features:
Specs:
Height | 9.19 Inches |
Length | 7 Inches |
Number of items | 1 |
Weight | 1.7 Pounds |
Width | 0.9 Inches |
This book, "Python for Data Analysis" is coming out in October on Amazon, but PDFs might be available directly from O'Reilley if you pre-order. It's by Wes McKinney, who was apparently involved with pandas and has a blog about doing quant analysis with Python:
http://blog.wesmckinney.com/
You might find what you're looking for in some of his stuff.
Pandas is a well-known library for data analysis. Very good tutorial.
Good book on Pandas
Good Udemy Course for Python
"Python for Data Analysis" is pretty good. It's written by Wes McKinney, the creator of Pandas, so its focus is using Pandas for data analysis, but it does include sections on basic and advanced NumPy features: http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
Alternatively, the prolific Ivan Idris has written four books covering different aspects of NumPy, all published by Packt Publishing. I haven't read any of them, but the Amazon reviews seem OK:
I would suggest getting some basic computing skills first. This book gives you a great grasp on data analysis in Python with statistical applications explored in the later part of the book. Read the whole thing through.
http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
I recommend Python for Data Analysis (Holy shit! That's the title of your post!). It's written by the author of Pandas and I have found it incredibly straightforward and helpful.
There is a LOT you can learn. It can be very bewildering. Here are some links that should help you get started. There are a lot of other posts in this sub with good tips so you should browse a bit.
https://www.reddit.com/r/datascience/comments/7ou6qq/career_data_science_learning_path/
https://www.dataquest.io/blog/why-sql-is-the-most-important-language-to-learn/
https://www.becomingadatascientist.com/2016/08/13/podcast-episodes-0-3/
https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
Sooner or later you'll want to start tackling some projects. That's basically where I am now in the process. I'm at the point where I know enough about Python, Statistics, and SQL to integrate some skills and hopefully do something interesting.
Best advice I can give you is
Also I've heard good things about this book This Book. But haven't gone through it myself.
I was in the same boat, with a history undergraduate major and limited math (although I picked up an MS in CS), working full time. My first semester I registered for 6040 and 8803 and had to drop 8803 because of the workload. Second semester I registered for 6501 and 6242 and had to drop 6242 because of the workload. You <might> be able to handle two courses, but GT has a lenient drop policy so the only downside is that you lose your money.
Standard advice: do your best to work through the following two books Before you start:
http://faculty.marshall.usc.edu/gareth-james/ISL/
https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
If you are new to programming, Python for Data Analisys, from the author of the pandas library: Wes McKinney - https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793/ref=sr_1_1?s=books&amp;ie=UTF8&amp;qid=1483984751&amp;sr=1-1&amp;keywords=python+for+data+analysis
This question or a variant comes up nearly weekly.
I always try to respond, if one doesn't exist already, with a plug for the module 'Pandas'.
Pandas is a data analysis module for python with built in support for reading Excel files. Pandas is perfect for database style work where you are reading csv files, excel files, etc, and creating table like data sets.
If you have used the 'R' language the pandas DataFrame may look familiar.
Specifically look at the method read_excel: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.excel.read_excel.html
main website: http://pandas.pydata.org/
book that I use frequently for a reference and examples: http://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
I wanna say that Wes mentions R in his book but I'm not sure. I know numpy and pandas are pretty dang fast though, it's the statistics that Python isn't so great with.
There's a great book about this! It goes over python basics and then goes in depth on Pandas, which is a python library used for data analysis.
I think if you've never used Python before it couldn't hurt to also find some general intro-to-python online tutorial to supplement it.
>Like the concept of piping info between applications is just starting to make sense (even though I have no clue how it works).
Coming from a programming background it might be easier for you to think of each of the little unix core programs as a function. They all have options and generally do one thing really well. "grep" searches for things. "sed" does regex matching/replacment. "cut"... well it cuts out parts of files. The easiest way to figure out what something does is probably through the man page. (run "man grep" at the terminal). That being said some programs have -really- goddamn big man pages and are much harder to navigate. Bash, for instance, has an enormous man page.
The concept of piping makes more sense in the context of functions. In python you might write something like this:
"hello".upper()
Which would give you:
"HELLO"
In bash you could write that as:
echo "hello" | tr '[a-z]' '[A-Z]'
That first command just prints out the string, but instead of printing it out at your terminal the pipe will send all of it's output to the "tr" command. ("man tr" will help you understand what it's doing there). Because tr does not have it's output being redirected it just gets printed back to the terminal.
>Question 1, should I stick with zsh or learn the basics of bash first?
I don't think you would have much of a problem learning either just so long as you understand that there will be minor differences between different shell languages. Those differences tend to be syntax rather than functionality, and when it is a difference in functionality it tends to be much less commonly used features. If you have to choose one I would recommend bash for scripting solely because it is somewhat more portable. "sh" is even more portable than bash, though it can be more painful to use since it doesn't have some of the nice features in modern shells. Remember that you don't have to use the same language for your shell and for your scripts. You just have to define a different shebang on the first line of the script.
>2. what are some things I can use scripting for (what do you use it for)?
I don't find myself scripting much at home. At work though I spend a TON of time writing various scripts. What I -do- use bash for a ton is one-liners. Once you get used to the syntax you can write some very useful code in just a couple lines. One example that I use frequently is "Run this command every 10 seconds forever" which can be written as
while sleep 10; do
{command}
done
The "watch" program does more-or-less the same thing, but I find it unwieldy once the commands inside get more complex.
An example of a somewhat longer, and arguably poorly written script for backups using tarsnap is here.
>Any explination for common commands would be awesome.
As I mentioned earlier "man" is your friend. The other option is "command --help". You can generally google for some examples, which can be really useful for some of the less easily grok'd programs (awk, for example).
>And I do know a bit of python and have heard of iPython. Could that be a replacement for bash or zsh or is that something completely different and I'm in over my head (very likely). Much thanks.
ipython is not going to be a good replacment for your standard shell. It's cool, and I use it frequently when coding in python, but it simply lacks the powerful integration with the system that bash/zsh has. What it is extremely useful for though is exploratory programming. What really opened my eyes on the subject was the book Python for Data Analysis.
Edit: Syntax
Also, for any shell junkies please don't complain about the non-necessary "echo" up there. I know you could use a here string, but I think it would defeat the purpose of an easily digested example.
Wait a second...!
http://www.amazon.com/Python-Data-Analysis-Wes-McKinney/dp/1449319793
I'm onto you...
I also do a fair amount of NLP and anomaly detection in my work and use python for both. The reason I suggested starting with numpy is because, as I suggested, it is the basis on which everything else is built on.
I learned python before R, then used R for my scientific computing needs, then learned the scientific computing stack in python after building out my data science chops in R. I've found the numpy array datatype much less intuitive to work with than R vectors/matrices. I think it's really important to understand how numpy.ndarrays work (in particulary, memory views, fancy indexing and broadcasting) if you're going to use them with any regularity.
It doesn't take a ton of time to learn the basics, and to this day the most pernicious bugs I wrestle with in my scientific (python) code relate to mistakes in how I use numpy.ndarrays.
Maybe you don't think it's that important to learn scipy. I think it's useful to at least know what's available in that library, but whatever. But I definitely believe people should start with numpy before jumping into the rest of the stack. Even the book Python for Data Analysis (which is really about using pandas) starts with numpy.
Also, I strongly suspect you use "out of the box" numpy more often than you're giving it credit.
Wish I had seen this post sooner, not sure if you'll still see this but I was pretty much in the same situation as you this past year. Statistics student trying to get into data analytics (insurance/finance). Most of these tips have already been mentioned but they are definitely valuable if you are trying to get an internship and don't have any other experience.
All this being said, this should be taken with a grain of salt. I'm not a recruiter or a full time at a fortune 500, but these are some of the steps I took to get some internship offers this summer.
I was ISYE so I'm not sure how much you are allowed to cross over being CS but I would absolutely recommend taking a regression course. ISYE also has some data analysis electives, but to me learning and mastering regression is a must.
BBUUTT my biggest recommendation is to start playing with data yourself. I am a "Data Scientist" and graduated from the MS Analytics program at Tech and still to this day I learn the most just from playing around with data sets and trying new techniques or learning new coding tools. Don't wait to take classes to jump in, just go.
Here are some great books to get started doing "data science" in R and Python.
R: Introduction to Statistical Learning (free!!)
Python: Python for Data Analysis
http://mathesaurus.sourceforge.net/r-numpy.html
https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
https://www.amazon.com/Python-R-Users-Ajay-Ohri/dp/1119126762
These are just the first few hits not a personal endorsement.
This might work for you.
This book is by Wes McKinney, the author of Pandas. It's a great resource. https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
You see, python is a very simple language that doesn't require you to annotate everything line by line. You might be better off brushing up your general python knowledge befire jumping into projects. This will save you time having to read or looking for comments to understand the code. Also, consider looking at the requirements.txt file for the imports of a particular repo. It'll tell you what packages are being used and you can then Google their documentation.
I'd definitely recommend you to read a book about python first. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython https://www.amazon.co.uk/dp/1449319793/ref=cm_sw_r_cp_apa_i_U38gDb4RE5933
Sorry for the misunderstanding --
>Factset, Bloomberg, Dimensional, AQR
Are not so much resources for dealing with data as employers of data wranglers. I mean Factset and Bloomberg are data providers, but...again, I was suggesting you look for employment with them, not have them teach you.
As for learning:
pandas
) while he was working for AQR, one of the biggest quant hedge fund managers, and open-sourced it when he left. Some of the examples in the book have to do with finance because of this.This might be useful https://www.amazon.com/Python-Data-Analysis-Wrangling-IPython/dp/1449319793
The Oracle training is outdated and irrelavant. The Percona training is up to date and very good. But both are aimed at DBA's, sysadmins, and application developers.
For your needs, you need to learn SQL, and learn to get useful information out of alien data sets.
Start with the basics:
You Can download some example databases:
I suggest you buy a book called 'The Art of SQL':
SQL is essentially math and programming. Having a good book for reference is highly recommended.
MySQL workbench is a pretty handy tool. If you are going to connect to multiple databases (Oracle, MySQL, MS SQL, DB2 etc) then maybe look at a Navicat or RazorSQL license.
For when you are feeling more confident with SQL, either look at learning VBA in Excel or Python and the PyData Libraries: