Course Wrapup

In Fall 2013, I was assigned to teach cs4414: Operating Systems, a course that is required for our BS Computer Engineering and BS Computer Science (and taken by about half of our BA Computer Science students as an elective). I had never taught operating systems before, or for that matter, even taken an operating systems course at any level¹, so didn't have much idea what should be in such a course.

This was my first chance to teach a real course since my experience teaching and helping develop MOOCs at Udacity. I'd been asked many times by reporters how my experience teaching open on-line courses would change my in-classroom teaching, and never had a very good answer to this at the time. I didn't have much of an idea in starting this course, but did have a perspective that led me to question many aspects of traditional courses that are typically taken for granted.

cs4414 Fall 2013 Graduates

Styles of "Operating Systems" Courses

From my survey of "Operating Systems" courses at other universities, I found two main types:

Courses where students build a simple operating system.
Courses where students learn about low-level programming and build robust and scalable systems, but don't build an operating system.

The first type of course has a clear and estimable goal: provide students with a deep understanding of low-level aspects of computing systems by building their own OS kernel. This is typically done using some kind of processor emulator such as Nachos and with some code provided. Exemplars of this type of course include MIT's 6.828: Operating System Engineering, Yale's CS422/522: Operating Systems, Princeton's COS381: Operating Systems, and Harvard's cs161: Operating Systems course taught by Margo Seltzer (2013) and Matt Welsh (2007) (famously featured in The Social Network). Since building an operating system is a fairly huge undertaking, such courses typically involve teams that work together for the full semester and require survival guides.

A variation is to not attempt to build a full simple OS, but to instead make modifications to an existing OS. For example, University of Washington's CSE 451: Introduction to Operating Systems uses a sequence of projects where students modify a Linux kernel. Such courses are great for providing deep understanding of operating systems, and this type of actual OS hacking is probably needed to gain a strong understanding of how operating systems really work at a low level.

The second type of course is often still called "Operating Systems", but is better named "Systems Programming" or something like this. Exemplars of this type of course include CMU's 15-213: Introduction to Computing Systems (and the excellent associated textbook], MIT's 6.033: Computer Systems Engineering and Penn State's CMPSC311: Introduction to Systems Programming. The goal of these courses is to teach students what they need to know to build robust and scalable computing systems, as well as the most interesting intellectual ideas from operating systems. Such a goal is less well-defined than the goal for the first type of course, but more relevant to the vast majority of students than the first type of course. Very few people will ever actually write or work on an OS kernel today, but all programmers need to understand low-level aspects of computing systems if they want to build efficient, scalable, and robust computing systems.

I believe there is a great case for offering a core Operating Systems course of the first type and encouraging interested students to take such a course as an elective. When an "Operating Systems" course is required for all students (which is the case for two of the three computing majors our department offers), though, it seems better to me to focus on the more broadly-applicable aspects of building scalable and robust computing systems. Everyone benefits from learning these, and the benefit-suffering ratio should be much better than in classes where students focus on building an operating system.

Choosing a Language

Perhaps the most controversial decision I made was to use the Rust programming language as the main language for course assignments. The jury is still out on whether or not this was a good decision in the long term. I've written up more about this in a separate post.

Avoiding the Trough of Mediocrity

The problem with most university classes is they are too big to provide students with the individualized attention and flexibility that maximizes learning, but too small to provide the economies of scale needed to develop really high quality teaching materials and to engender substantial student contributions.

The amount of teacher attention per student drops as the class size increases, but the resources that can be put into developing quality teaching materials increases with the number of students, and most importantly, the value of student contributions increases dramatically (both because there are more students, and each student has more incentive to put effort into major contributions). Unfortunately, most University classes fall in the "trough of mediocrity", with 20-150 students.

Despite the problematic class size (for Fall 2013, cs4414's enrollment was 65 students), I tried to avoid the trough of mediocrity by (1) assuming the effort invested in developing the course would be amortized over more than just the students in this semester's course, and (2) finding ways to provide a flexible and individualized experience even in a fairly large class.

Amortizing Development Effort

I've always spent a ridiculous amount of time preparing lectures, at least compared to what most faculty do and what is typically recommended.² I feel like it takes me at least 10 hours to prepare a decent lecture, and often 2-3 times that to prepare a good one (the preparation time is a bit more for topics that I don't initially know much about, which is the case for much of this course, but I find it is often harder to produce a good lecture about something I already know well than about something I am figuring out myself as part of developing the lecture). This means that in a semester where I am preparing a new class, developing the lectures is taking up the majority of my time.

Although I might want to do this anyway since I enjoy it (especially when I'm learning about new and interesting things) and take personal pride in it, it would be very hard to justify this investment for just this semester's class. That's not to say it isn't worth trying to make lectures good even for a small class, but in a "trough of mediocrity" size class, instead of spending ~30 hours a week developing the two lectures, I could meet with each student individually for 20-30 minutes each week, which I hope would be more useful for most students than sitting in 2.5 hours of lectures (even if I hope they are good ones).

The way to justify putting a lot of effort into developing lectures for the classes, is to view it as a more generally valuable resource that will be useful to me in future semester, and will hopefully also be useful to many people who are not in my class. Although our department is notoriously fickle in scheduling classes and unwilling to plan more than a semester ahead, I did get an agreement that I would get to teach the course at least one more time before starting this year's course. Reusing a prepared lecture still requires effort, both to improve and update the content and to refresh my memory about what to talk about, but much less effort than it takes to build a new lecture from scratch.

The better way to amorotize effort in producing course materials is to make them open so people outside the class can also access them.³ This is more likely to be useful if the material covered is different form what many other courses have covered before, and if lectures and notes can be released in a way that makes then useful on their own. I wasn't able to do this for all the classes this semester (and have plans to make things more useful externally next semester), but nevertheless did manage to produce materials with some external impact. Its hard to measure this definitively, but according to the SlideShare stats, several of the lecture slides have been viewed by over a thousand people including the lectures on Rust (SlideShare page), processes (SlideShare page), trust (SlideShare page), synchronization (SlideShare page), and my personal favorite, Inventing the Future (SlideShare page).

Providing Individualized Experiences

The biggest challenge for larger classes is to still provide a sufficiently individualized experience for the class to be valuable for most students. Even in a curriculum with set prerequisites, students enter classes with very different backgrounds and aptitudes, and even more different interests and goals.

Providing Alternatives. Since this course is required for two of our majors, a large fraction of students are taking the course merely because it is required to satisfy a graduation requirement (about 20 out of 65 students gave this as the main reason for taking the course on the initial course survey). I don't really think this course should be required, so wanted to give students who felt their personal goals were not well aligned with mine an opportunity to do something different. I explained this in Class 5, and even made it a question on the midterm, but no one asked to take advantage of it. I'm not sure if this is because everyone thought their goals aligned well with what I was doing in the class, or because students realized coming up with a good alternative would be more work than doing the course as prescribed, but at least felt that by providing the option gave me some more justifiable freedom in designing the class.

Open-Ended Assignments. One way to individualize a class is to provide more flexible assignments. The challenge in doing this for a larger class is maintaining consistent grading (I'll talk more about grading below) and providing enough guidance for individual (or small team) projects despite the medium-sized class. We did that this semester by leaving it mostly up to students to decide what extensions to do at the end of the two main problem sets: Problem Set 2 is about implementing a shell and included a couple creative exercises and a final problem to implement extensions chosen by the students; Problem Set 3 is about implemeting a multi-process web server, and left it for students to make most of the major design decisions (given a starting framework), and to extend or improve the performance of their server however they could. I think these could have benefitted from a bit more structure, in particular in how performance is measured, so plan next semester to release the most (if not all) of the benchmarking tests as part of the assignment.

The final project was completely open, and students could do whatever they wanted in teams of any size so long as they could convince me it would be worthwhile and satisfy at least two of the goals of the project (fun, relevant, technically interesting, and useful). There were only a few teams that had hard times finding fulfilling projects.

I met with each project team twice during the course of the 5-week project (scheduling meetings using ohours.org). This took a lot of time, but I think was worthwhile, and most groups ended up doing projects that I thought were worthwhile, and some did things I thought were spectacular (including one team that built an ARM kernel in Rust which I hope to use for a new assignment in the next class)!

Course Forum

I tried using Piazza for course discussion. This was my first experience using Piazza, and there were some things I really liked about it, but other things that are quite unsatisfactory. A major problem is that it is closed by nature. There is no way to share discussions with people unless they register for the Piazza course site. It is also not very effective at encouraging useful class discussions, and has no easy way to link discussions in Piazza to course content.

The only topics that ended up getting semi-reasonable discussion were questions from problem sets where students were required to post something in the forum. The forum did work fairly well for students getting answers to technical questions, and we benefited greatly from participation in the course forums from some Rust experts who were not students in the class. Piazza does have some easy-to-use polling features which I used a few times during the semester, which I think worked okay but not great. To be fair, I don't know of anything that works better, and I don't know of anyway to have a very effective course discussion forum without having several thousand active students. I'm not sure how much of this is a technology problem, but there are some important technology-support features needed for good discussions that no tool I know of supports well.

(Minimizing) Grading

The usual reason given for not scaling courses is the difficult in grading a large number of students. From what I see, most CS courses put far too much effort into grading, and most of that effort could be much better invested in teaching.

There are four main reasons to grade assignments and exams in a course:

To provide (hopefully) useful feedback to students.
To assess how well the course is going and how well students (as a group) are understanding key ideas.
To provide motivation to individual students, who might otherwise not be inclined to spend time on the course.
To measure student performance and have data for assigning final grades.

Of these reasons, #1 is very important and essential to actual learning in many cases. But, assigning detailed scores for each question doesn't provide any useful feedback. Providing written feedback on code or answers can be useful. This can require a huge amount of effort to do well, and it is often demoralizing to discover later that most students are not actually reading these comments. More commonly, students do read them but not necessarily understand them, and it is pretty rare for a student to come to office hours to ask about comments on an old assignment. So, I decided it was more useful to spend the time I would normally spend on providing written feedback in having in-person post-assignment meetings (these were called "demos", but that's probably a misnomer, since they were mostly about asking students to explain how they did something and why, and following up with questions about whether other options would be better and how they would do things differently if there were other requirements, etc.) The time was fairly tight for these (15-20 minutes for each team), but I think it was enough to be more valuable than spending far more time on written feedback.

In addition to the in-person demos, students submited a web form with some questions about the assignment, as well as assessments to indicate whether they were able to solve each main problem. Having students self-grade this seemed to work well, and since they had follow-up demos, even without relying fully on the honor system I think there were sufficient incentives for students to answer them honestly (and I'm not aware of any instances where students claimed to have solved a problem they didn't at least appear to solve). The submitted forms provided a useful basis for structuring the demo, and I could quickly read through the submissions at the beginning of a team's demo and follow-up on any interesting answers.

This way of doing grading scaled well to a 65-person class, and can probably work for a somewhat larger class, but I'm expecting to need to do things a bit differently next semester (with around 115 students).

In addition to the problem sets, we had a midterm exam, which was mostly short answer questions drawn from the provided course notes. This mainly served purpose #2 and #3, with the hopes that it would encourage students to review and synthesize the course material that was not used directly in the problem sets, as well as give me a sense of how well students were understanding things. To minimize grading time, I used the Efficient Grading Algorithm described in Class 13:

use std::rand;
fn grade_midterm(answers: [~str]) -> float {
    if (/* answered question 9, which asked students if there was a particular answer I should read */) 
        return great_answer(answers[9]) // and possibly look at other answers
    let numq = answers.length;
    let urand = rand::random::<uint>() % numq;

    if good_answer(answers[urand]) { 1.0 } 
    else if good_answer(answers[(urand + 1) % numq]) && good_answer(answers[(urand + 2) % numq]) { 1.0 } 
    else { ... // grade all answers }
}

This made it feasible for me to grade all 65 exams in a reasonable amount of time, while still providing grades that were either fair or generous to students. I don't know of anyone else who uses such a grading scheme, but I strongly encourage it. I think most faculty feel there is some kind of moral obligation to at least superficially grade everything students do on an exam, but its hard for me to see why this is necessary.

I don't think courses should be designed to emphasize #4 (grading for the sole purpose of sorting students), especially courses mostly taken by third-year and higher students. Providing more precise grading matters a lot in first and second year classes, since these are the prerequisites for later course, and are courses where students benefit from clear signals if they are in the right major for them. By the time students are in 4000-level classes, though, they have mostly decided on their major, and the difference between getting an A- or B+ or between a B- and a C+ should be very minimal. It only really matters for helping second-rate companies who are not able to hire the best students more efficiently determine which students to interview, and I don't think its really up to us to do that. Much better to spend time on teaching, and on providing ways for students do things that will help them get jobs at first-rate companies or succeed in graduate school.

Outcomes

I am, of course, not the least biased judge of how the course went, but I was reasonably happy with how things worked out for a first run of a brand new course. For a more balanced view, you can read the student's course evaluations.

There were definitely places were things were not as well prepared as they should have been, especially with Problem Set 2 (which initially included a problem that was effectively impossible) and Problem Set 3 (which I think mostly worked okay, but was delayed because of problems getting it ready and didn't provide some starting code it should have). My attempts to make up for lack of assignments with low-level kernel hacking by looking in-depth at the Linux source code in class didn't work for most students, and I think we should have done some more concrete things with lower-level implementation issues.

I enjoyed the demo-based grading a lot, and from what I've heard, most students found it worthwhile (although for some, a bit nerve-racking, even though demos were designed to be low-pressure). Since the course size for the next semester will be about double the 65 students in the first course, I won't be able to do all the demos and project meetings myself, but will need to find a way to distribute this effort.

Thanks

I'd like to thank my spectacular assistants, Weilin Xu and Purnam Jantrania, who did much of the work in creating the assignments, and provided a great deal of help to students throughout the course. I'd also like to sincerely thank all the students who were brave enough to stick with an experimental and unusual course, and especially the many who made great contributions to the class. I'd also like to thank the contributors from outside the class, especially Corey Richardson and Huon Wilson, Rust experts to answered many student questions in our course forum and in the #rust IRC.

Please feel free to comment with your identify or anonymously below, or to email me directly to follow-up on any of the comments here. I'm especially happy to get comments from any students in the course who might disagree with my take on things.

comments powered by Disqus

When I was an undergraduate at MIT, students had a choice between taking 6.033: Computer System Engineering and 6.035: Computer Language Engineering (Compilers). The 6.033 course (at the time) was mostly reading papers and writing a long paper, whereas the 6.035 course was a large project to build a compiler. I was very interested in building a compiler, and didn't like reading and writing much, so it was an easy choice to take 6.035. ↩
Faculty are typically told that they should spend one hour of preparation time for each hour of class time, and this seems fairly consistent with what faculty surveys report (see Table VII-12 of the 2012 Faculty Survey, which reports UVa faculty spending 15.6% of their teaching time in classroom hours, 16.6% on course preparation, and 6.6% on grading, and faculty with research emphasis spending 14.9% of their total time on teaching, and working 57.9 hours per week. This implies that a typical research-emphasis faculty member is spending 1.4 hours per week on class preparation). ↩
Sadly, the standard systems people are encouraged to use at our university are closed by default, so very few of the course materials others produce here are accessible to anyone who isn't enrolled in the class. I think public universities should require faculty to make their course materials open, except in cases where they can provide a strong justification for not doing so, but at least at this public university, it is more of a battle to be permitted to release materials openly. ↩