Best products from r/bigdata
We found 11 comments on r/bigdata discussing the most recommended products. We ran sentiment analysis on each of these comments to determine how redditors feel about different products. We found 10 products and ranked them based on the amount of positive reactions they received. Here are the top 20.
3. Linked: How Everything Is Connected to Everything Else and What It Means for Business, Science, and Everyday Life
- Used Book in Good Condition
Features:
6. Data Just Right: Introduction to Large Scale Data & Analytics (Addison-Wesley Data and Analytics)
- Coated Fabric Decal material for Everything
- Repositionable, Reusable, Removable
- Tree with leaves is 40.3"w X 54.7" h
- 2 extra branches with leaves are 19.5"w X 8.8"h each
- 3 monkeys approx. 13"w X 15"h, 2 birds, butterflies
Features:
7. Big Data: A Revolution That Will Transform How We Live, Work, and Think
- Eamon Dolan Houghton Mifflin Harcourt
Features:
8. Analytics 2 Insight: Roadmap for Data-driven Business Operations Drive Performance, Lower Costs, Reduce Risks and Increase Value
- Made of the highest quality materials
- Camping cooking supplies cookware
- Another quality Coleman product
Features:
9. Functional Programming in Scala
- Supports the Intel Xeon E3-1200 v5 processor and 6th Gen. Intel Core i3/ Pentium/ Celeron Processor
- Dual Channel DDR4, 2 DIMMs
- 8-channel HD Audio with High Quality Audio Capacitors
- Audio Noise Guard with LED Trace Path Lighting
- Realtek GbE LAN with cFos Speed Internet Accelerator Software
- APP Center Including Easy Tune and Cloud Station Utilities
- Support Intel Small Business Basics
- Micro ATX Form Factor; 22.6cm x 17.4cm
Features:
Glad it's useful! I'll copy some of a reply I gave to somebody who PM'd me about advice for a data science career, because it's pertinent to you:
You need to understand where you want to go -- more science-y or more business-y. See, science-y type of analytics are heavy on the stats, applying really advanced methods to glean some counterintiutive and/or non-obvious insight. Business-y type stuff is digging through the data to understand what it's telling you and to build a bit of a story to figure out what the business is doing, and then measuring success after something changes. Both have their value. Essentially: science side tells you about the data, but the business side tells you how to make decisions based on the data. You'll fall somewhere on that spectrum, so just play to your strengths.
Once you've determined this, you need to learn a few things:
As for some resources, here are some courses I think would be good from MIT CourseWare (full disclosure: I haven't sat through these specific courses, but these are the topics that are important):
You may also want to read up on machine learning. I like the O'Reilly book on it, but there are tons of books out there about it now.
Hope that helps!
First of all, I don't know how large your company is or how much data exactly you are dealing with, but I'll give some general advice based on what you've said.
It sounds like your company operates the way many do: with individual data marts that were created organically for different needs as they arose. What you need is a centralized data warehouse that brings these data marts together into some type of star-schema setup. Pick up this book. It's basically an introduction to dimensional modeling but more importantly it lays out how to navigate the politics of a large organization such as yours to get a data warehouse created. You will need someone at C-level most likely to make the push but the benefits are well worth it. It's not a small project, you'll need people to admit you need a major overhaul and be willing to invest in it.
>but getting these people to allow me to connect to the data is the most difficult thing (why is that?!).
Probably because whoever is managing the data realizes that there is no way an end user can make useful queries into the database due to it's disjointed and poorly maintained manner.
>In fact, I think they just hired some external enterprise data company, which I feel like is the wrong approach. Information is the most important thing. I feel like this is one thing a company should be managing in-house and not outsourcing. Is that completely wrong?
I would tend to agree, it's important to have people in the organization that understand the data structure fully. It's not unusual to bring in consultants to provide expertise on specific things (like the ETL process) but if I were a large financial company I would want our in house team to have a good handle on the data coming in and being stored.
>Fields are constantly being overwritten so that history isn't maintained. People aren't notified of the overwritten name changes so existing reports aren't capturing all information properly.
At the very least convince them to create expiration fields so you can expire old records without losing information.
I'm partial to Cloudera or Horton Works. Both have training courses.
I personally like good 'ol books. I've taken the Coursera intro and Hive/Pig training courses and while they were invaluable, nothing quite replaces sitting down and working your way through books like Hadoop: The Defininitive Guide or MongoDB: The Definitive Guide. I highly recommend Safari Books Online if you enjoy online reading. Perhaps some of your professional development money could go to paying for an account for that. For those who don't have the money for that, don't underestimate the usefulness of your public library. I currently have 3 books out from my local library on graph/network science (Linked is awesome and a great start for anyone interested in Networks/Graphs).
One thing I'll mention is that Hadoop has really become more of an ecosystem than a produce. HDFS, MapReduce, Pig, Hive, Sqoop, Flume, HBase, Storm, etc. Just saying "Hadoop" is like just saying JQuery. Half the battle with JQuery is knowing how to use the best plugins. It's the same with Hadoop.
This books has no math or programming. It concentrates on existing real-word applications of Big Data technologies, and how it is helping to make life better. I recommend it a lot : http://www.amazon.com/Big-Data-Revolution-Transform-Think/dp/0544227751/ref=sr_1_1?ie=UTF8&qid=1422216718&sr=8-1&keywords=big+data
I wrote a book on this very subject. You can read the first chapter here for free.
You may be interested in this e-book http://www.amazon.com/dp/B00JVYVO3I/ref=cm_sw_r_tw_dp_HB0Qtb0BCGWMN
I wouldn't be scared of functional programming or Scala. If you've been writing PySpark jobs then you're probably using Python in a functional way itself. From the testing I've done with Spark and Scala it's almost impossible to not write functional Spark jobs as that's how Spark is designed. I would equally say a lot of Scala devs probably aren't using Scala in a purely functional way anyway.
I say just get stuck in using Spark/Scala based on your PySpark knowledge and see how far you get. If you get stuck and feel you need a cleaner understanding of functional programming / scala, try this book:
http://www.amazon.co.uk/Functional-Programming-Scala-Paul-Chiusano/dp/1617290653/ref=sr_1_3?ie=UTF8&qid=1451992580&sr=8-3&keywords=scala
All good books, but id recommend getting the In Action series of books
http://www.amazon.co.uk/Hadoop-Action-Chuck-Lam/dp/1935182196
Just a little sad at the lack of a book on HBase in your collection :(!