Chevron Left
Back to Introduction to Data Science in Python

Learner Reviews & Feedback for Introduction to Data Science in Python by University of Michigan

4.5
stars
26,898 ratings

About the Course

This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python pandas data science library and introduce the abstraction of the Series and DataFrame as the central data structures for data analysis, along with tutorials on how to use functions such as groupby, merge, and pivot tables effectively. By the end of this course, students will be able to take tabular data, clean it, manipulate it, and run basic inferential statistical analyses. This course should be taken before any of the other Applied Data Science with Python courses: Applied Plotting, Charting & Data Representation in Python, Applied Machine Learning in Python, Applied Text Mining in Python, Applied Social Network Analysis in Python....

Top reviews

YH

Sep 28, 2021

This is the practical course.There is some concepts and assignments like: pandas, data-frame, merge and time. The asg 3 and asg4 are difficult but I think that it's very useful and improve my ability.

PK

May 9, 2020

The course had helped in understanding the concepts of NumPy and pandas. The assignments were so helpful to apply these concepts which provide an in-depth understanding of the Numpy as well as pandans

Filter by:

5526 - 5550 of 5,915 Reviews for Introduction to Data Science in Python

By Govardhani S

•

Aug 6, 2020

good

By Aayesha N

•

Jul 30, 2020

Nice

By Aansh S

•

Jul 10, 2020

good

By Bicky G

•

Jun 13, 2020

nice

By GOWTHAM M

•

May 22, 2020

good

By xiao h

•

Oct 22, 2019

太难了8

By DELA C J K (

•

Oct 12, 2019

HARD

By Mohammad J

•

Aug 5, 2017

good

By Pranav P

•

Jun 17, 2021

ok

By Yash V B

•

May 20, 2020

ok

By Irfan S B

•

Oct 4, 2017

A

By Richard H

•

Jul 29, 2019

Truly horrible delivery of the material - even worse than Coursera's old Intro to Machine Learning course from Univ of Washington. This course will discourage nearly anyone from pursuing Data Science.

And it's not even an intro to data science. It's a course on Pandas for dataset manipulation. (In fairness, cleaning up ingest data is like 95% of the work in data science, but the course doesn't even tease the student with some exciting machine learning examples of where this is all headed.)

It's not delivered like you'd expect an intro course. It does an awful job of progressing the student through the Pandas toolset, building concepts incrementally. The whole topic of object types, methods, returned objects, and chaining gets barely a mention, but it's essential to the assignments. Examples are rapid-fire and sparse - very few techniques needed in the assignments can be found in the examples. The Week 2 quiz tests on techniques not introduced until Week 3, and the Week 3 and 4 assignments cite "individual study" which is academic-speak for "We didn't teach you about this - go Google it".

Then, there are errata that the student needs to pick out of the discussion forums to pass the assignments because some key questions are vague. The errata are 1-2 years old and they can't be bothered to correct errors.

The auto-grader could be the highlight of the course, but it provides limited feedback on wrong answers and no guidance toward the right answer; just "wrong". You're not allowed to post code or discuss answers in the forum - you have to go to StackOverflow to do that. (It'd be awesome if several of the exercises provided the student with the answer and challenged them to match it, but instead it's very sink-or-swim.)

Even when your answer is right, the auto-grader throws errors and warnings for, say, returning a numpy.float64 (which you should) when the grader is expecting a Python float type. Or it's expecting a float64 for a counter value (!!) when you provide an int64 (which is correct). These behaviors should have been fixed long ago.

It claimed to be a 15-hour course; I did it intensively and invested more than 30 hours before pulling the plug on the final project. That was claimed to be a 4-hour project, but experience with the rest of the course says it'd be more like another 12 hours - and that's for a guy who's not new to coding.

Bottom-line: I paid for educational material and I don't feel like this course delivers. What it does deliver is Pandas exercises and an "OK" auto-grader; truthfully, most of what I learned was via Google searches while trying to do the assignments - effective, but very slow and very frustrating. The real disappointment is seeing that the issues I encountered have been well-known for 2 years in the discussion forums; the course could be a lot better by now if they cared to nurture it.

Finally, a frustrating aside that's on Coursera, not the instructors... Coursera's online Jupyter notebook platform is really unstable and constantly drops connections even when you're actively editing and executing cells. (Including from 2 Fortune 100 companies - it's not the connection.) Once dropped, the notebook can't be re-connected, and has to be re-launched from the syllabus at the risk of losing your most recent edits. (But beware, if you run Jupyter offline for stability, this course also has defective input filenames that will cause grading to fail - read the discussion forums first.)

By Francisco A

•

Jan 14, 2023

During this course, I learned a lot about Python and Pandas. You will also learn a lot about these tools. Trust me, a lot. Still, I will only give two stars. This is why:

My background: I am doing Python courses so that I can expand my knowledge on technical tools. I have spent my last 8 years on data analytics/statistical analysis on other platforms, mainly Stata. Most of the techniques presented to me in this course are, therefore, familiar to me in other languages.

To start with, the course should suggest/direct you to a better tool for you to solve the assginments than Jupyter notebooks. Using Anaconda/Spyder is of relevant.

Pedagogically speaking, lectures are terribly designed. They mostly rely on Jupyter notebooks, which are sloppy and will jump in unsynchronized manner with the presenter. Some of them are too long, skipping the main point or logic of the tool being presented.

Assignments are really good. You will learn a lot from them and you will need to go for the documentation and StackOverflow to get answers. This is actually very important, as real life data management work do need this ability: your productivity will increase by how proficient you are looking for different solutions. But still, the assignements' autograder has too many mistakes and fails giving you reliable/effective feedback. Plus, some questions present factual mistakes regarding the answers expected (in Assignment 4, it is suggested for you are looking looking for teams in the autograder when it should read Metropolitan areas). To not be stuck on these issues, please go immediately to the Discussion Forum of each assignment.

To the Director of this course: PLEASE increase the number of visible hints in each assignment as it helps you solving questions and will decrease the autograder issues (e.g. the first five elements of a list of 15 that you are expecting for each question)

The suggested time to solve each assignment is utterly wrong for Assignments 3 and 4: it took me 2 weeks for each, not three hours (I did this course after working hours, though).

Finally, a final note: the course was revised in December 2022. As I initiated the course previous to this date, I started the old version of the course. To my surprise, after several deadline reset (which are particularly welcomed in this course), I was took to the new version of the course. This should be not a problem until I realised that all my past grades where blank (even if the platform confirmed I had passed the assignments and quizzes for weks 1 to 3). I had to redo the full course as I was already in Week 4. Some of the code was saved on my computer, other not. It took me an additional week to get everything back. This should not happen... and a better solution should have been provided other than redoing the quizzes and assignments.

Overall: excellent course for you to enter the Python, Pandas world, but be ready for a bumpy road ahead.

By Jeroen D

•

Apr 23, 2018

More or less my copy from an earlier review,

I was really excited about the this course, and was really let down. This course is really, really poorly done. I would not waste time and money on this course when there are much better options out there. I feel like I've gotten little in return for my time and money.

First, as several other students have noted, the timeframe for assignments is really unrealistic, taking much longer than projected (at least for me, and several other students). This is not acceptable when Coursera bills by the month. Coursera needs to provide a better assessment of the time commitments for the class. I took another datasciense course prior to this one (my employer wants a certificate) but still the assignments were tough, and I found it really dissappointing that I spend a lot of time solving inconsistencies in the assignments. I believe American students are in advantage here because of the Geo-American orientated datasets.

Second, the teaching is horrific. The professor is not engaging at all, but simply mechanically reads lines which often sound straight out of a user manual. The point of online videos is not to turn books into audio files- it’s to have a human talk/reason through problems with you. The teacher of the course should discuss the material, not recite a manual. In addition, the little amount of material is presented far too quickly, Also great emphasis is put on the discussion groups (which turns out to be just responded by the moderators, volunteers). In absence of a proper syllabus students are directed to Stack Overflow, a sign of the courses' weakness.

Third, the title of this course is a misnomer: an introduction to data science would provide an overview of the tools, techniques and scope of the field. An extremely detailed introduction to Pandas, which is essentially what most of this course is, is useful if well executed (which it is not here), but it is not an introduction to data science.

A more minor complaint is the absolutely horrendous choice of the background. Showing different permutations of lifeless office drones is not exactly inspiring material for aspiring data scientists, even if this the reality of office life- it’s distracting at best, and at worst, deeply disparaging. Why not have just a plain colored background? Or anything else?

The only positive thing besides some of the misleading assignemnts are the rest of the assignments. In general I had fun solving them, and althoug I've had my share of Jupyter Notebook and Grader's issues I was able to complete the course. I will not reconsider any online course from Michigan University again.

By Neel N

•

Sep 3, 2020

It pains me greatly to give just 2 stars to a course from UofM, since it is my alma mater, but I will be honest. I would like to echo the sentiment of the majority of my fellow learners that the course needs to be structured better. Instructor needs to take more time to explain some of the concepts in greater detail. It seems like the instructor and his assistants are always trying to rush things and cover too much material in tool little time . I had to pause and replay lecture videos to completely grasp what was being conveyed. I also adjusted my playback speed to 0.75x to keep up with the instructions. I will admit that I had to heavily rely on the pseudo codes posted on the forums to answer assignment questions and even though I answered them correctly, I did not completely grasp the reasoning behind lot of them, which I think defeats the purpose of learning a programming based course.

Suggestions for improvement: Upgrade the autograder, because it is frustrating to keep rectifying the answers to make them acceptable for the autograder. Completely overhaul the assignments so that they are more in-line with what is being taught in the lectures. Students should not have to figure out everything from the online forums. If not for the pseudo-codes, algorithms and explanations from mentors, this course would have been an impossible one to finish. Assignments and exams need to be designed such that learners don't have to treat forums and stack overflow as a primary vehicle for getting successful with the course, but more like a helper tool.

By Ryan N

•

Nov 19, 2017

The course content is very good. The videos are very good. Unfortunately this course is severely hurt by a very high ratio of non-learning work to learning work. This is due to some issues that could be easily addressed. The questions are poorly worded or ambiguous about critical details. Some of these details are hidden in the forums, but that's a waste of time. Some of the assignments do not directly bear on the course content and involves much "self-learning". Unfortunately this means I do not know if my self-taught methods are optimal - there is no feed back or checking. So you can do very poor coding but still pass in scoring and never get any feedback to improve your coding skills. All along, some very simple hints about what libraries and methods to use for each question would prevent lots of blind searching on the web. There are some helpful instructors and helpers haunting the forums, but they are not always around, and they are not always implementing permanent fixes to the problems that are frustrating students. One shouldn't have to hunt around forums to find out about broken pieces of the application or other errors in the course. Finally, the grading system is unstable and the Jupyter Notebook system is also not very stable, leading to many submissions and resubmissions just to make sure it got through for grading. For these reasons it took much more time than three weeks for me personally. I would not have signed up had I known.

By Jonathan T

•

May 3, 2021

While my Python chops definitely improved as a results of the course, the homework was extremely frustrating. The requirements of the questions are not communicated in a consistently clear way.

What was more irritating, though, was that the auto-grader is extremely picky. There is very little room to solve the problem in your own way, and more of my time was spent trying to contort my code to fit what the auto-grader wanted than spent actually solving the problem and applying the course concepts.

I also was disappointed with how much we were expected to manually clean data. One of the questions even explicitly says that the answer will require students to "hand-code" the answer. This strikes me as an extremely poor habit to instill in students--combing through data manually to strong-arm the data into the formatting conventions won't cut it when tackling a dataset that is millions of lines long. For a computer science course, this is not a scientific, or even a programmatic, approach to solving problems.

I give the course 2 stars because I felt that only 40% of what I learned was data science and/or Python. The other 60% of what I learned was how to smash my code until it conformed to the auto-grader, how to bother the TAs in the forum when it wasn't obvious how to do so, and how to write translation dictionaries with the "wrong" format as the key and the "right" format as the value and then apply it to DataFrames.

By Alan S

•

Oct 13, 2017

The topics covered in the course are certainly important for anyone interested in using Python for Data Science, but sadly, there is only really basic information about each topic taught in the videos. In the labs, there is heavy focus on "self learning" (basically, the instructor encourages you to use StackOverflow and other resources to figure out things that were intentionally not taught in the course).

While it's interesting that the problems have a very real-world nature to them (including searching the web to help you find answers to things), if I wanted to learn the tools taught in this course that way, I wouldn't have enrolled in this, and just got a good book and practiced myself independently.

Also puzzling is that there are weird "discussion" segments that have no relevance to any of the topics taught. One moment you are learning about pandas dataframes, and the next, it's asking you for a 90 word opinion on data science ethics. This is somewhat ironic given the otherwise very practical/applied nature of the course. Not relevant at all to a working professional.

One other note: as I write this, for the past few days, no students were able to get their assignments graded, since the auto-grader was broken, and there are multiple reports of this in the forums. It's puzzling the staff let this problem persist for days with no ETA or acknowledgement of the issue.

By Kevin Y

•

Feb 8, 2017

Overall, this course is tough for me. In my opinion, the course title is not suitable, which is "introduction". The word "introduction" generally means something basic and fundamental. However, the level programming assignments in this course is between slightly medium and very hard.

Furthermore, the assignment instructions are not clear enough, i.e. what steps need to be followed to complete each question. So, it needs time for me to find the way or solution in the discussion forum. I know in each week forum, there is one discussion specifically for assignment (i.e. Assignment X Tips), which is good. But, I think the forum is not a good place, since we have to find it meticulously the information and what we want to find is spread out, not in an easy to follow order.

Moreover, I do not think the lectures and the assignments are well synchronized as there are many things in assignments that are outside from the lectures. I mean, it is "too many" new things in the assignment. Although it is a great way for the students to search and enhance their skills by themselves, in general, the assignments should at least reflect the lectures.

Lastly, in my humble opinion, this course is not appropriate for someone who is new to programming and has not been familiar or mastered any programming languages, though the title is "introduction".

By George A

•

Jul 16, 2019

This is a pretty awful course, as of the time of writing this review in July of 2019. Let me preface this by saying that the material you learn is very helpful. Pandas is a great library to learn for loading, cleaning, and manipulating large amounts of data. But the real problem with this course isn't the material, it's the lectures and the autograder. The lectures are very short. They don't cover the concepts well enough, and some material is blatantly skipped and you have to learn it yourself through google. Then comes the video quizzes which test you on functions and concepts that haven't even been introduced yet. It's like the quizzes were put at timestamps randomly. Then comes the worst part of the course: the Autograder. I'm not sure how old this course is but the autograder is running on an outdated version of both python and pandas. What does this mean for you? Well if you want to code on your computer instead of the course's broken online coding notebook, you will run into severe code-breaking bugs between versions. It really ruins the course. I learn the material but then spend hours trying to please the broken autograder. Most of the time in this course isn't spent learning, it's spent fixing code that the autograder rejects even though it runs perfectly on your machine locally. Have fun!

By Isaac D

•

Jun 7, 2018

When ones motivation for taking a course switches from learning as much as possible to wanting to finish the course in order to leave a review warning others not to take the course until the numerous structural issues with the course are resolved then something has gone very wrong. The course materials are okay for an intermediate course. Just 'okay'. Not good. Not great. Certainly a substantial step down from the wonderful 'Python for Everybody' courses which, by the way, are inadequate preparation for this course despite the Dr. Brooks' claim. That said, the main issue with this course lies in its incredibly vague and poorly thought out assignments. If you are actually decent with Python you will, in all likelihood, spend more time fighting with the Jupyter notebooks and auto-grader than you will actually completing the assignments. If you're newer to programming expect to spend at least five times as much time on each assignment as the estimated completion time suggests. Also, good luck if you actually need help, as this course has the most aggressive enforcement of Coursera's honor code that I have seen on this site which means that you are SOL if you need help on a problem. In short, I would recommend that no one take this course until the numerous issues with it are provably fixed.

By bob l

•

Apr 18, 2022

The lectures are good.  The quizzes are silly, because they test mostly esoteric knowledge that I would look up when I need it rather than the basic understanding needed to do real data science work, data cleaning in particular.  The projects are also unnecessarily complicated to the point that someone from the course has to post examples for almost all questions.  If we are going to receive that kind of general help, why not just make the explanations better?  In the real world, I would simply ask the project stakeholders questions when something is ambiguous.  Requirements gathering is not an objective of this course though and simply looking in the forums for explanations does nothing to advance that goal even if it was within the scope of this class.  The goal of this course should be to give the student an understanding of the tools involved, not do a lot of gymnastics to understand vague instructions and then apply oddly specific pandas functionality.  I say all of this as someone who has been doing data cleaning with python/pandas for several years and was looking to just formalize things I have learned as needed.  I will probably not bother to complete the course, because the final assignment and quiz have no real learning value and a Coursera completion certificate is of little value.

By Markus

•

Apr 2, 2017

Thanks for the course. A few things that can be improved:

1- The video material was very short. I expected same amount of teaching like homework, but it's more like 30 minutes versus 7 hours every week. As a result it's mostly googling and copy-pasting code, which I'm very sceptical if solving issues this way will enter my long-term memory. Probably next time I will need to look it up again. I'd prefer if the videos were the source of knowledge instead of stack overflow and the forum.

2 - As an experienced programmer but being new to Python I found it difficult to load the data. I was not aware that the files are available on the server and can just be loaded by read_csv(filename.csv). Instead I tried to load the files from my local drive, which worked, but not for the grader. Then I tried to submit it offline which also failed. I wasted half a day on this. I suggest to mention quickly how the online and offline assignments work, in particular in how to load data.

3 - The feedback from the grader is usually not telling much. It was often unclear to me what was the expected outcome. I think a screenshot of the expected answer for the more advanced questions (first few rows+columns) would help a lot and save us a lot of suffering.

By Ivan K

•

Aug 25, 2020

Four year old instructional content. Four year old versions of pandas and other libraries. $50/month for 4 year old content??

The course relied on very basic functions and libraries in pandas and numpy. I doubt that any of the specific skills and content taught here would transition very well to a professional or academic context.

Why is it so hard to find a real practical Python data science course?

I'm also pretty sure there were errors some of in-video quizzes. And there exists a broken link to Chris Anderson's Wired Article entitled "The End of Theory..." that has been broken for the past 3 years at least when I checked on discussions for the article.

This course in general has sat here for 3-4 years seemingly unmaintained (see the broken quizzes and links above) and unchanged (see the 4 year old videos and libraries above) just yearning money for Coursera and UMich with little to no evidence of improvements or basic maintenance. I think it's shameful that I am being charged $50/month for access to these materials and the grading system which quite frankly has stymied and stunted the growth and improvements to the course material overall.

By YUE C

•

Jan 7, 2017

Content is actually very good, I can feel the content creator and staffs emphasis on real world problems. Projects and Assignments are quite useful, and I can expect to use skills I acquired in my day to day work. However, the most agonizing parts of this course is it requires tremendous of time to do self-study. I spend tons of time on google search, reading docs and go over other people's post on various forums in order to find the right way(sometimes optimal way) to finish the assignments. I understand and accept that self-study is very import to master things nowadays, but I really think gap is too big between what was taught in class and what you need to complete this course, especially for people that has zero data science programming experience like me.

I'd rather spend time watching 2 more hours of teaching videos per week that can cover more aspects/topics/tips/tricks than go over lots of docs and posts. I won't deny I learned a lot through this course, but I believe my learning curve can be flatten significantly if there is more class materials available. If the majority of course is spent on googling, what's the point to take it?