Data analysis has replaced data acquisition as the bottleneck to evidence-based decision making --- we are drowning in it. Extracting knowledge from large, heterogeneous, and noisy datasets requires not only powerful computing resources, but the programming abstractions to use them effectively. The abstractions that emerged in the last decade blend ideas from parallel databases, distributed systems, and programming languages to create a new class of scalable data analytics platforms that form the foundation for data science at realistic scales.
- 5 stars57.23%
- 4 stars25.39%
- 3 stars9.07%
- 2 stars4.73%
- 1 star3.55%
This is a quite wonderful course for large-scale data science. I believe I will have learned a lot via completing the courses.
A great way to start, and become familiar with the nature, requirements & analytics of today's data.
covers a lot of ground quickly, but you still get a good understanding of the underlying theory or technologies
Good! I like the final (optional) project on running on a large dataset through EC2. The lectures aren't as polished and compact as they could be but certainly a very valuable course.