Problem Set 3 - Zhtta Server
Purpose
The goals of this assignment are to learn about synchronization, scheduling, memory management, and caching by implementing a much more useful web server than the zhttpo server from PS1. We also hope that at least one team will produce a server suitable for running rust-class.org for the rest of the semester (and future semesters until another group of students does better!), as well as providing us with a way to make the content in the Piazza forum more open and useful.
Collaboration Policy
For this problem set, you are required to work in a team of two or three people (except in cases where you were notified based on your PS2 teamwork that you should work alone for PS3, or where you made your own successful argument that it is better to work alone).
Your team may not be the same as your team for PS2, so you should either (1) find a new partner to work with for PS3, or (2) if you want to work with your PS2 partner again you must find one other person to join your team. Feel free to use the Piazza forum to form your team.
Your teams should work together in a way that is efficient and collaborative, and ensures that both of you understand everything in the code you submit. As part of the grading for this assignment, you will do a short demo with one of the course staff, at which all team members will be expected to be able to answer questions about how your code works.
Please note that only one of you needs to create the private repository for this problem set, the other member should work in the same repository as a collaborator.
In addition to working directly with your teammates, you should feel
free to discuss the problems, provide coding help, and ask for help with
any students in the class (or anyone else in the world, for that
matter), so long as you don't to it in a way that is detrimental to your
own or anyone else's learning. You can do this in person, using the
Piazza forum, using the #cs4414
and #rust
IRC channels, or any other
communication medium you find most effective.
Getting Started
Before continuing with this assignment, one member of your team should:
- Set up the private repository named 'cs4414-ps3'.
- Add your teammate(s) and 'cs4414uva' as the collaborators.
- Clone the empty private repository to your working environment. Instead of mygithubname below, use your github username.
git clone https://github.com/mygithubname/cs4414-ps3.git
- Get the starting code for ps3.
git remote add course https://github.com/cs4414/ps3.git
git pull course master
git push --tags origin master
After finishing these steps, everyone in the team should have access to your own cs4414-ps3
repository that contains starting code for ps3.
Background
Web servers are among the most important and performace-critical programs in the world today. Amazon estimated that each 100ms increase in latency reduced sales by 1% (this means if you can reduce latency by 100ms for Amazon that is worth $500M/year). Back when she still worked for Google, Marissa Meyer talked about how important speed is for user experience and reported that increasing the reponse latency by 400ms reduced searches by 0.7%. Failing to design a web service to scale well, can also have serious political consequences.
Modern web servers also provide many features beyond just serving static files. Features supported by Apache, the world's most popular web server, include content caching, server-side includes, and mechanisms for enforcing security policies.
For this problem set, your goal is to produce a high-performance web server that also supports some interesting features.
Moving to Rust 0.8
Rust 0.8 was released on September 26. Rust 0.8 is less rusty and more rustic than Rust 0.7, with many improvements to both the core compiler and standard library. You should follow these directions to upgrade your version of Rust to Version 0.8 for this assignment.
The 0.8 release is not backwards compatible with Rust 0.7, so most code that worked in Rust 0.7 doesn't work in the new version. This includes the zhttpto code we provided for PS1 (as well as the reference solution). We have provided an updated version of zhttpto that works in Rust 0.8: PS1 Reference Solution. You can see the changes from the 0.7 version by viewing the commit diff page. As you can see from the diff page, several modules changed their paths, and the network-related APIs were completely changed.
Putting the Z in zhtta: 1042 times better than zhttpto!
In PS1, we have implemented a simple Web server named zhttpto. Your zhttpto server has functionality similar to Sir Tim Berners-Lee's first web server, but is far from adequate for today. With the help of Rust, zhttpto did support good concurrency compared to early web servers, but there still exists several obvious drawbacks. First, zhttpto used an unsafe visitor counter in the code. What's worse, it exposes all of the files on your file system to web users. Moreover, it doesn't support any flexible scheduling, but just processes requests in the order in which they arrive.
Your Zhtta server may not be 1042 times better than
zhttpto, but it should be a huge improvement (and better than apache
in some ways)! We have provided starting code in zhtta.rs
to help you
take the first step.
Safe Visitor Counter
For Problem Set 1, you added a visit counter, but needed to use unsafe
to do it (you should understand why the visit counter in PS1 was unsafe,
but we'll leave that for a midterm question).
For this problem, your goal is to provide a visit counter with the same behavior, but without needing any unsafe code.
Smarter Scheduling
The provided zhtta code uses a FILO scheduler. For the next two problems, you should modify this scheduler to provide more flexibility in how requests are processed.
You may assume that clients in Charlottesville can be distinguished by having an IP address that starts with 128.143. or 137.54. (if your own IP address starts differently, you should add that also). More ambitious groups will use an IP geolocation service like http://freegeoip.net to provide better accuracy, but this is not required.
Reducing Median Latency
Shortest-Remaining-Processing-Time-First (SRPT) is a well-known preemtive scheduling algorithm in Web servers. By giving priority to short requests or those requests with short remaining time, a web server can minimize the average and median response time.
Implementing high-level shortest-processing-time first is satisfactory for this problem, but more ambitious students will also read Bianca Schroeder and Mor Harchol-Balter's paper, Web servers under overload: How scheduling can help (ACM Transactions on Internet Technology, Feb 2006) to learn more about scheduling web requests and attempt to implement some of the strategies describe in the paper also. (Some of the things they do would require making changes at the level of the network library code.)
Server-Side Gashing
Many web servers (including Apache) offer the ability to run shell commands embedded in the web page. For example, using Apache Server-side Includes, you can put the following string in an HTML document to display the current date and time:
<!--#exec cmd="date" -->
This is done by passing the commands embedded in the page to a shell to execute, and then replacing the SSI tag with the result.
Benchmarking Web Servers
An important measure of performance for a web server is how many concurrent connections it can handle. The C10K problem has been addressed by several modern web servers, including nginx and Microsoft IIS.
Note: this question originally mentioned Apache Benchmark, but we do not recommend using that since it is not a stand-alone tool and not well-suited to measuring your Zhtta server. Instead, we recommend using Httperf, which is a simple open-source benchmarking tool that should be sufficient for measuring your Zhtta server. (For more on benchmarking, see Class 15 and Benchmarking.)
we'll provide some more details on this in class later, as well as information on how we will be benchmarking your servers.
Caching
Reading from files is expensive. We can significantly improve web server performance by caching responses for requests, but need to be careful about memory size tradeoffs (bigger caches mean more memory that is outside the processor's L2 and L3 caches and slower responses) as well as correctness (need to be careful about caching responses whose values may change).
Extensions
For the last problem, your goal is to extend the web server in some interesting way. This could be a significant performance improvement, or adding some new functionality.
A few suggestions are below, but you are encourage to come up with your own ideas.
Open Piazza. Although Piazza has many nice features, its closedness makes it much less useful than it should be. For example, there is no way to send a link to a forum discussion to someone not enrolled through Piazza (e.g., I would like to send Peter Norvig a link to the Norvig Numbers discussion, but there is no sensible way to do this). In addition, this means the class discussion content is not indexed by search engines, so all of the valuable content produced by students in this class is basically invisible and wasted. So far, I have been unsuccessful in convincing the Piazza CEO to support open classes, so the next best solution is to provide an open proxy that is a server which redirects requests to Piazza with credentials added (I have a demo account that can be used for this). (If you are interested in doing this, please contact me for more information and to avoid duplicate effort with another team.)
App Server. Many web frameworks exists that make it easy to build web services that instead of just serving static files will run application code in response to requests to provide dynamic behavior. A simple example is web.py (this was originally developed by Aaron Swartz to run reddit). A simple web app framework built in Rust would have many advantages including safety, high performance, and easy concurrency over existing web app frameworks.
Security. The current zhtta server is very insecure. It can be exploited to serve any file on the host machine, and features like server-side gashing are very risky. Modify your zhtta server to provide stronger security.
Background Gashing. A more ambitious way to incorporate shell commands would support running the shell commands in the background, and send the results without them back to the client but with a hook to incorporate those results in the page later. Then, when the shell command responds, the results would be sent to the client and incorporated into the client-side DOM later. This would make it easy to build sites the provide clients with quick partial responses (to expensive requests) which would encourage them to wait for the rest of the response rather than leaving for a competitor's site. This requires some familiarity with client-side web programming, but it seems like a feature that a zhtta server should support.
Submission and Demos
Once you decide to submit your project for grading after commiting some code and documents, you should add a tag on your code repository with a version number, and submit your assignment by providing the corresponding URL using the submission form for PS3.
In addition to submitting using the form, you will also schedule a demo at which you will present your zhtta server to one of the course staff and answer questions about how you did it. All team members are expected to be able to answer questions about your server implementation.
We are also planning a performance benchmark competition to find the best server to use for hosting rust-class.org. To be eligible for the benchmarking competition, your submission must pass the basic functionality tests (in your demo), and must build and run by cloning your submitted github repository and then executing:
> make
> zhtta