Monday, Mar. 10, 2014

Obama's Trauma Team

Last Oct. 17--more than two weeks after the launch of HealthCare.gov--White House chief of staff Denis McDonough came back from Baltimore rattled by what he had learned at the headquarters of the Centers for Medicare and Medicaid Services (CMS), the agency in charge of the website.

McDonough and the President had convened almost daily meetings since the Oct. 1 launch of the website with those in charge--including Health and Human Services Secretary Kathleen Sebelius, CMS administrator Marilyn Tavenner and White House health-reform policy director Jeanne Lambrew. But they couldn't seem to get what McDonough calls "actionable intel" about how and why the website was failing in front of a national audience of stunned supporters, delirious Republican opponents and ravenous reporters.

"Those meetings drove the President crazy," says one White House senior adviser who was there. "Nobody could even tell us if the system was up as we were sitting there, except by taking out laptops and trying to go on it. For Denis, going to Baltimore was like leaving Washington and visiting a war zone."

But not even a trip to the war zone produced good intel. According to notes from a meeting in one of CMS's three war rooms (yes, things were so uncoordinated that there were three), those assembled discussed the fact that "we heard that the capacity"--the number of possible simultaneous users--"was 100,000 people, and there are 150,000 people on it." Yet five days later, White House chief technology officer Todd Park would tell USA Today that the capacity was 50,000 and that the website had collapsed because 250,000 people tried to use it at the same time. Park, a highly successful--but, for this job, disablingly mild-mannered--health care tech entrepreneur, had been kept out of the planning of the website. In fact, the site's actual capacity at the time was "maybe a few thousand users," according to a member of the team that later fixed it.

What McDonough was able to pry out of the beleaguered crew at CMS on his Baltimore visit was that even on Oct. 17--by which time the site's failure was the subject of daily headlines and traffic had collapsed--only 3 in 10 people were able to get on at all. And of the lucky third that did, most were likely to be tossed off because there were so many other bugs.

Unknown to a nation following the fiasco, McDonough's assignment from the President had boiled down to something more dire than how to fix the site. As the chief of staff remembers his mission, it was "Can it be patched and improved to work, or does it need to be scrapped to start over? He wanted to know if this thing is salvageable."

Yes, on Oct. 17, the President was thinking of scrapping the whole thing and starting over.

When McDonough got back to the White House, he met with Jeff Zients, a highly regarded businessman who had won high marks as a deputy director of the Office of Management and Budget. Among other projects, Zients--who in looks and résumé is the epitome of the buttoned-up manager--had overseen the Cash for Clunkers program in 2009. He was now slated to take over in January as the director of the President's National Economic Council. Obama and McDonough had quietly brought Zients in the week before when it had become obvious that the early White House and CMS explanation for the website's problems--astonishingly high volume--was anything but the whole story.

Zients, who is not an engineer, was teamed with Park, the White House chief technology officer. "On Oct. 17, I went from White House CTO to full-time HealthCare.gov fixer," Park says. The two were charged, says Zients, with "finding fresh eyes who could decide whether the thing was salvageable."

As one of the engineers they recruited put it, "Maybe we had to tell the world we'll be back to you in six or nine months with a new site."

As McDonough and Zients were digesting what the chief of staff had learned in Baltimore, White House press secretary Jay Carney was going through what one senior Obama aide calls "probably the most painful press briefing we've ever seen ... It was like one of those scenes out of The West Wing where everyone's yelling at him."

Thursday, Oct. 17, was the day the government shutdown ended. Until then, the failed launch of the website on Oct. 1 had been overshadowed in the news--and in the questions Carney had to field every day--by the shutdown and the related threat of a debt-ceiling deadlock. Now the unfolding Obamacare disaster was center stage.

Carney tried to fend off the inquisition, but he had little to work with. Pressed repeatedly on when the site would be fixed, the best he could say was that "they are making improvements every day."

"They" were, in fact, not making improvements, except by chance, much as you or I might reboot or otherwise play with a laptop to see if some shot in the dark somehow fixes a snafu.

Yet barely six weeks later, HealthCare.gov not only had not been scrapped, it was working well and on its way to working even better.

This is the story of a team of unknown--except in elite technology circles--coders and troubleshooters who dropped what they were doing in various enterprises across the country and came together in mid-October to save the website. In about a tenth of the time that a crew of usual-suspect, Washington contractors had spent over $300 million building a site that didn't work, this ad hoc team rescued it and, arguably, Obama's chance at a health-reform legacy.

It is also a story of an Obama Administration obsessed with health care reform policy but above the nitty-gritty of implementing it. No one in the White House meetings leading up to the launch had any idea whether the technology worked. Early on, Lambrew, highly regarded as a health care policy expert and advocate for medical care for the poor, kept Park off the invitation list for the planning meetings, according to two people who worked on the White House staff prior to the launch. (The White House declined to make Lambrew available for an interview.) The only explanation Park offers for his exclusion is that "The CTO helps set government technology policy but does not get involved in specific programs. The agencies do that." The other attendees were also policy people, pollsters or communications specialists focused largely on the marketing and political challenges of enrolling Americans.

McDonough, as chief of staff, was supposed to be tending to everything associated with the rollout, including the technology. But he and Lambrew simply accepted the assurances from the CMS staff that everything was a go. Two friends and former colleagues of McDonough's say they spoke to him 36 hours prior to the launch, and in both conversations he assured them that everything was working. "When we turn it on tomorrow morning," he told one friend, "we're gonna knock your socks off."

Months later, when I asked him in February if he should have worried more about the website, McDonough admitted, "Would I do things differently if I had a chance to? Absolutely."

1. Return of the Campaign Geeks

Early on the morning of Friday, Oct. 18, Gabriel Burt, whose résumé actually includes work as a rocket scientist, woke up in a room at the DoubleTree in Columbia, Md., about 35 miles outside Washington. Burt, 30 at the time, had flown there from Chicago the night before, toting an overnight bag for what he thought might be a two- or three-day trip. By the following weekend his wife would be flying in to resupply him. He didn't get home until Dec. 6.

Burt is the chief technology officer at a Chicago company called Civis Analytics. Park, the White House CTO, had connected with him via the White House political office. How did Obama's political people know about Burt's firm? Because Civis is the home of the Obama-campaign whiz kids who re-engineered politics in 2012. Burt and a team of coders and data analysts had developed tools that could sift data so finely that finding and tracking persuadable voters to make sure they turned out to vote was brought to a whole new level.

Soon after the campaign, the group formed a company to sell its services to nonprofits, governments and private companies. Its sole investor is Google executive chairman Eric Schmidt, who had helped organize their work as an informal Obama campaign adviser. The Civis website describes its creation this way: "Our company was born in a large backroom of the Obama 2012 re-election headquarters. We called it the analytics cave ... From millions of data points, we constructed the most accurate voter targeting models ever used in a national campaign. We predicted the election outcome in every battleground state within one point. And our work guided decisionmaking and resource optimization across the campaign ... This company is our next step," the website continues. "We are taking our team outside The Cave to solve the world's biggest problems using Big Data."

In fact, Obamacare had indirectly become a Civis client. Following the passage of the Affordable Care Act, a nonprofit called Enroll America was formed with the goal of boosting enrollment in the coming insurance exchanges through grassroots organizing and targeted advertising. Enroll America is funded--in "the tens of millions," says its president, Anne Filipic, a former Obama campaign worker--not only by some political groups sympathetic to health care reform, like Families USA, but also by businesses that will benefit from people enrolling, chief among them insurance companies and pharmaceutical manufacturers. The organization became one of Civis' first and biggest clients.

Before the website crashed on Oct. 1, this kind of marketing-oriented data crunching was seen as central to the drama of whether Obamacare would succeed. The political intrigue and punditry around the launch was mostly about whether people would come to the website exchanges, not what would happen to them once they got there.

Through the summer of 2013, David Simas, who then had the title of White House deputy senior adviser for communications, gave rounds of interviews detailing how big data, much of it provided to Enroll America by Civis, was being used to target specific precincts, say, in Miami or Houston, to identify the uninsured, make contact with them--"We want multiple touches," Simas told me--and lure them into enrolling. When I interviewed Simas in September, he assured me that "everything has been tested and is working perfectly ... Our challenge is getting the right people to show up."

McDonough, in telling associates that the Obamacare launch was consuming an hour or two of his every day, similarly focused on the communications and outreach planning rather than the technology.

The press, too, concentrated on the purported marketing and enrollment hurdles. One favorite theme was that the White House had brought back its 2012 Obama-campaign whiz kids for an encore data-crunching, polling and messaging blitz, which is why Simas, a campaign pollster, data analyst and message maven, had assumed center stage .

It turns out that when it came to Civis' skills, McDonough, Simas and the others were working the wrong side of the house. Civis is great at analytics, but behind that world-class data crunching is a world-class technology team run by Gabriel Burt. Indeed, the key mistake made by President Obama and his team--who never publicized the arrival of Burt and other campaign coders in October the way they touted the role of the data-analytics marketing team last summer--is that they had turned only to the campaign's marketing whiz kids instead of the technologists who enabled them.

2. A Team Formed On the Fly

Among the tech geniuses Burt got to know during the 2012 campaign is Mikey Dickerson--whose title at Google is site-reliability engineer. Dickerson had taken a leave from Google in 2012 to help scale the Obama-campaign website and create its Election Day turnout-reporting software. As it happened, Dickerson, then 34, was in town visiting Burt and others at Civis on Oct. 11 when Park called from the White House. "I consider Mikey a mentor," says Burt. "We were picking his brain about our company when we got a call about the health care site ... We all wanted to do something."

Burt and Dickerson decided to go to Washington to help Park figure out what to do. They also began making a list of others who they thought could form a rescue squad. By the afternoon of Oct. 18, Burt was on the ground at the headquarters in Maryland of a company called QSSI, one of the contractors that had been hired by CMS to build and run the website. Of the many companies that had worked on HealthCare.gov QSSI was thought to have performed the least badly.

That afternoon, Dickerson, who was in California preparing to fly east the following Monday to join Burt, jumped on what he later described as a "really bizarre conference call." It was with Park, who at that moment was riding in a White House van around D.C., Maryland and Virginia with the beginnings of his hastily assembled team trying to assess the damage.

In the van was Paul Smith, whom Burt had recruited. Smith had been deputy director of the Democratic National Committee's tech operation. He immediately put fundraising for a startup he was planning on hold to join the group. Another passenger was Ryan Panchadsaram, 28, who had come to the White House as part of a program called Presidential Innovation Fellows, which was launched by Park to bring high-tech achievers into government to work on specific projects that they design. (The program is already responsible for a series of innovations in making government data and health care records more available electronically.) "I decided we should all go introduce ourselves to the people we were going to help," says Park, explaining the van ride.

The team started by driving from the White House to see Tavenner, the CMS administrator, at her Washington office. They then drove off to Baltimore to meet other senior CMS officials. It was during that drive that Park decided to loop in Dickerson and some others to a conference call. "We were passing around an iPhone with a speaker so we could all talk," says Park. "I wanted us to get to know each other."

"I had no idea who this guy leading the call was, and you couldn't hear a lot of it," recalls Dickerson, who was wearing a T-shirt sporting an image of a nuclear reactor over the word Science! when I met him three weeks ago in the Roosevelt Room across from the Oval Office. "Finally I jumped in and asked, 'Who am I talking to? Who is leading this call?' And the guy says, 'I'm Todd Park.' So I Googled him and saw he's the chief technology officer of the country and had founded two health care technology companies. Oh, I figured. Not bad. So I made plans to fly out for a few days."

Park's van continued on from Baltimore, stopping at the two main contractors working on the website. It turned out the engineers at both QSSI and even CGI, the contractor that attracted much of the blame for the site's failure, did not seem nearly as defensive or hostile as Park and the others had feared. "These guys want to fix things. They're engineers, and they were embarrassed," says one of the members of Park's gathering band. "Their bosses might have been turf conscious, but by then the guys in the suits really didn't want to have anything to do with the site, so they were glad to let us take over."

When the meetings ended at a CMS outpost in Herndon, Va., at about 7:00 p.m., the rescue squad already on the scene realized they had more work to do. One of the things that shocked Burt and Park's team most--"among many jaw-dropping aspects of what we found," as one put it--was that the people running HealthCare.gov had no "dashboard," no quick way for engineers to measure what was going on at the website, such as how many people were using it, what the response times were for various click-throughs and where traffic was getting tied up. So late into the night of Oct. 18, Burt and the others spent about five hours coding and putting up a dashboard.

What they saw, says Park, was a site with wild gyrations. "It looked awfully spiky," recalls Panchadsaram. "The question was whether we could ride that bull. Could we fix it?"

The team went home at about 2:30 a.m. on Saturday, Oct. 19.

3. "It's Just a Website. We're Not Going to the Moon."

The decision had still not been made whether to save or scrap HealthCare.gov Zients wanted even more eyes from Silicon Valley on the problem. At about 6 in the morning on Saturday, Oct. 19, he emailed John Doerr, a senior partner at Kleiner Perkins Caufield & Byers, the Menlo Park, Calif.--based venture-capital powerhouse, whose investments include Amazon, Google, Sun, Intuit and Twitter. Could Doerr call him when he awoke to talk about the health care website? Zients asked.

When Doerr quickly called back, Zients said, "We're pulling together this surge of people to do this assessment to see if the site's fixable or not. We've got to do it incredibly quickly. Do you know anyone?" Doerr recommended a relatively new Kleiner partner named Mike Abbott.

"Mike saved Twitter's technology when it was failing," Doerr told me later, referring to the days when the Twitter Fail Whale error-message icon was ubiquitous. "His being there gave me the confidence to make the largest investment we had ever made--over $100 million ... He had also worked at Microsoft and led the team at Palm that rebuilt their system ... Yet he's really low-key and well liked."

Abbott spoke to Zients the next day, Sunday, Oct. 20, and flew to Washington on Oct. 21. That day, Obama offered what the New York Times called "an impassioned defense of the Affordable Care Act" in a Rose Garden statement, "acknowledging the technical failures of the HealthCare.gov website but providing little new information about the problems with the online portal or the efforts by government contractors to fix it."

Nor did the President volunteer that he had recruited a team whose first job was to decide whether to kill the website and start over.

"The first red flag you look for," says Abbott, "is whether there is a willingness by the people there to have outside help. If not, then I'd say it's simpler to write it new than to understand the code base as it is if the people who wrote it are not cooperating. But they were eager to cooperate."

"The second thing, of course, was, What were the tech problems? Were they beyond repair? Nothing I saw was beyond repair. Yes, it was messed up. Software wasn't built to talk to other software, stuff like that. A lot of that," Abbott continues, "was because they had made the most basic mistake you can ever make. The government is not used to shipping products to consumers. You never open a service like this to everyone at once. You open it in small concentric circles and expand"--such as one state first, then a few more--"so you can watch it, fix it and scale it."

What Abbott could not find, however, was leadership. He says that to this day he cannot figure out who was supposed to have been in charge of the HealthCare.gov launch. Instead he saw multiple contractors bickering with one another and no one taking ownership for anything. Someone would have to be put in charge, he told Zients. Beyond that, Abbott recalls, "there was a total lack of urgency" despite the fact that the website was becoming a national joke and crippling the Obama presidency.

But by then, Dickerson--the Google reliability guru and Burt's mentor--had arrived. "I knew Mikey by reputation," Abbott recalls. "He was a natural fit to lead this team."

Looking over the dashboard that Park, Burt and the others had rigged up the prior Friday night, Abbott and the group discovered what they thought was the lowest-hanging fruit--a quick fix to an obvious mistake that could improve things immediately. HealthCare.gov had been constructed so that every time a user had to get information from the website's vast database, the website had to make what's called a query into that database. Well-constructed, high-volume sites, especially e-commerce sites, will instead store or assemble the most frequently accessed information in a layer above the entire database, called a cache. That way, the query to it can be faster and not tie up connections to the overall database. Not doing that created a huge, unnecessary bottleneck, the equivalent of slowing down traffic on an on-ramp to an otherwise empty highway.

The team began almost immediately to cache the data. The result was encouraging: the site's overall response time--the time it took a page to load--dropped on the evening of Oct. 22 from eight seconds to two. That was still terrible, of course, but it represented such an improvement that it cheered the engineers. They could see that HealthCare.gov could be saved instead of scrapped.

Also weighing in by this time on the phone and through chat lines was another Silicon Valley legend recruited by Zients who also happened to be named Abbott. Marty Abbott had been the CTO of eBay and now ran a consulting business that offered high-tech crisis management and evaluation. Venture funds pay him "tens of thousands of dollars a day," says Zients, to kick the tires, hard, of potential companies seeking their money, and the companies themselves hire him when their websites or other technology crash.

"It was pretty obvious from the first look that the system hadn't been designed to work right," says Marty Abbott. "It was not really managed at all and wasn't architected to scale. For example, any single thing that slowed down would slow everything down."

Marty Abbott volunteered his time, which was limited to participation in multiple conference calls in the first few weeks of the salvage effort. Mike Abbott was also a volunteer; he stayed in the D.C. area until Oct. 25, then participated through December on conference calls, sometimes doing two or three a day.

As for Dickerson, Burt and the others who arrived for what they thought was a few days only to stay eight to 10 weeks, they were told that government regulations did not allow them, even though they offered, to be volunteers if they worked for any sustained period. So they were put on the payroll of contractor QSSI as hourly workers, making what Dickerson says was "a fraction" of his Google pay.

The day after their first breakthrough with the caching, Dickerson and the rest of the team gave Zients and Park their verdict: they could fix the site by the end of November, six weeks away, so that "the vast majority" of visitors could go on and enroll. "I was, like, never worried," Dickerson adds. "It's just a website. We're not going to the moon."

A few hours later on the afternoon of Oct. 23, Zients and McDonough told the President the news. According to Zients, the President "pressure-tested the decision," putting them through a series of questions related to why they thought they could make that deadline. Then he signed off on it. There was one further irony: the general contractor Zients and Park had chosen to coordinate things, they told the President, was QSSI, which had handled some of the more successful functions of the ailing website. Andy Slavitt, a top executive from another unit of QSSI's parent company--UnitedHealth Group, the giant insurer--would be called in to run the QSSI team. Which meant that the largest player in an industry that had vehemently opposed Obamacare in 2010 was now about to take a lead role in saving it. And profiting from it.

4. Stand-Ups And Hiccups

It was in a 4,000-sq.-ft. room rented by QSSI in a nondescript office park in Columbia, Md.--lined with giant Samsung TV monitors showing the various dashboard readings and graphs--that Barack Obama's health care website was saved. What saved it were Mikey Dickerson's stand-ups.

Stand-ups, which Mike Abbott says became a standard part of his playbook at Twitter, are Silicon Valley--style meetings where everyone usually stands rather than sits and works through a problem or a set of problems, fast. Then everyone disperses, acts and reports back at the end of the day at a second stand-up. Dickerson held the first one on Oct. 24. He would convene them every day, including weekends, in October and November, at 10:00 in the morning and 6:30 in the evening. Each typically ran about 45 minutes ("causing some of us to sit down," Dickerson concedes). An open phone line would connect people working on the website at other locations; in fact, the open line would remain live 24 hours a day so that everyone could immediately talk to the others if an issue suddenly came up.

Dickerson quickly established the rules, which he posted on a wall just outside the control center.

Rule 1: "The war room and the meetings are for solving problems. There are plenty of other venues where people devote their creative energies to shifting blame."

Rule 2: "The ones who should be doing the talking are the people who know the most about an issue, not the ones with the highest rank. If anyone finds themselves sitting passively while managers and executives talk over them with less accurate information, we have gone off the rails, and I would like to know about it." (Explained Dickerson later: "If you can get the managers out of the way, the engineers will want to solve things.")

Rule 3: "We need to stay focused on the most urgent issues, like things that will hurt us in the next 24--48 hours."

The stand-up culture--identify problem, solve problem, try again--was typical of the rescue squad's ethic. They worked stretches of three or four days during which they might have had five or 10 hours of sleep cumulatively, often changing clothes only when they made a shopping trip to the nearby mall. They and the dozens of willing, even eager, engineers they led--who worked for the contractors who had failed so badly to lead them in the run-up to Oct. 1--pounded away on the bugs that Dickerson had demanded they identify every morning, focus on and clear up in time for the evening stand-up. They began to sweep across increasingly big swaths of their punch list.

Well, actually, they hummed along happily for less than three days, until the whole site crashed at 1:20 a.m. on Sunday morning, Oct. 27, two days after Zients had announced that all would be well by Nov. 30. A switch had failed during maintenance work at a data center. The outage lasted 37 hours, during which Dickerson and his team could do little because they had no website to look at.

Then, two days later at 4:00 p.m. on Oct. 29, it went down again because of a malfunction in a data-storage unit. This outage lasted 40 hours, including the afternoon of Oct. 30, when HHS Secretary Sebelius testified about the website's troubles before a loaded-for-bear House of Representatives subcommittee, whose majority Republican members flashed images on their tablets and iPhones of the website being down as they questioned her. "In her testimony Ms. Sebelius came across as a hapless official," the New York Times reported. "Those outages were totally demoralizing," says Burt. "We thought we were on our way. We had gotten some momentum but lost it."

"We just kept saying, 'Let's pick ourselves up and fight,'" Park recalls. "And when the site came back, we pushed ahead nonstop ... We went from doing three or four releases"--upgrades or changes to the website--"in October to 25 in November."

"The team," says Zients, "ran two-minute drills to perfection. We had the best players on the field. Some plays didn't work. We talked about some of those. But there was never any finger pointing. People just hustled right back to the line, and we ran the next play."

Dickerson was so adamant about the need to forgo finger pointing and move on to the next play that during one stand-up in mid-November he demanded a round of applause for an engineer who called out from the back of the room that a brief outage had probably been the result of a mistake he had made.

Zients isn't a techie himself. He's a business executive, one of those people for whom control--achieved by lists, schedules, deadlines and incessant focus on his targeted data points--seems to be everything. He began an interview with me by reading from a script crowning the team's 10-week rescue mission as the White House's "Apollo 13 moment," as if he needed to hype this dramatic success story. And he bristled because a question threatened not to make "the best use of the time" he had allotted. So for him, this Apollo 13 moment must have been frustrating--because in situations like this the guy in the suit is never in control.

True, Zients had assembled a terrific team that had gelled perfectly. But his engineers could move only so fast. Though he had carte blanche to add resources, putting 10 people on a fix that would take one coder 10 days doesn't turn it into a one-day project. Coding doesn't work that way. "Jeff was a great leader, but there were limits," says Dickerson. "He would ask us every day if we were going to make the deadline ... He'd say how he had to report on how we were doing to the President. And I'd say till I was blue in the face, 'We're doing as much as we can as fast as we can, and we're going to do that no matter what the deadline is.'"

One crisis as the November deadline approached gave the team confidence that it could work through anything. Paul Smith, the campaign alumnus Burt had persuaded to join the team just as he was trying to raise money for a startup, had been working on a problem that had stumped everyone so far: the unique identifier that the website had to issue to anyone who was trying to enroll was taking too long to generate. By the afternoon of Nov. 6, the ID generator became so overloaded that the site was effectively down. "This kind of database problem is in basically everything I've ever worked on before," Smith says. "So I worked with the dev team to come up with a patch."

The patch worked in some ways, but the team learned a few days later that the identifications it was generating didn't have the right number of digits to match insurance companies' needs. So it had to be removed, and on Nov. 20 the old ID generator effectively shut the website down again. Smith and the team quickly designed a new patch, this time with the right number of digits, and executed what's called a "hot fix," meaning they put it onto the site almost instantaneously without testing. It worked.

As Dickerson marched his troops through the punch list in November, he added to the team, mostly with recruits he had worked with at Google. Jini Kim, a 32-year-old who had left Google to start her own health care data-analytics service, arrived on Nov. 21 and became the team's "Queen of Errors." Her job was to work with a group at a separate office near Dulles Airport in Virginia devoted to dealing with longer-term issues the site would face following the Nov. 30 deadline. The most important of these was scale: Would the site be able to handle the traffic a revived and working HealthCare.gov would, everyone hoped, generate?

One of the key issues involved in preparing for that surge was the error rate--the rate at which any click on the site generated a result that it was not supposed to, such as a time-out or the popping up of the wrong page. In October the error rate had been an astoundingly high 6%, meaning that even the lucky few who got on to the site invariably had something go wrong, because at 6%, just 15 or 16 clicks on the site would likely produce a problem.

With Thanksgiving falling on Nov. 28, what for most of the country was a long holiday weekend became five days of two-minute drills for the team, all aimed at keeping the President's promise of a website working for the "vast majority" of visitors by Sunday, Dec. 1. Dozens of items remained on the punch list. For example, people still couldn't go back a page on the website in certain situations, and the process for comparing competing insurance plans was still too slow. So the releases were pumped out even faster. At the same time, the engineers executed a major upgrade in the hardware powering the system, giving it more capacity and reliability. "You normally don't do hardware and software changes at the same time," says Zients. "Because if something breaks you don't know what the cause is. But we were in a position where we had to take chances."

The rest of the world remained skeptical. On Nov. 13, CMS issued its first report on monthly enrollments, covering the disastrous October rollout. Just 26,794 people had enrolled through the federal exchange over the entire month--90% fewer than what the Administration had been counting on. The night before, the Washington Post website ran a lead story headlined troubled HealthCare.gov unlikely to work fully by end of November. Citing "an official with knowledge of the project," the Post reported that "government workers and technical contractors racing to repair the Web site have concluded ... that the only way for large numbers of Americans to enroll in the health-care plans soon is by using other means so that the online system isn't overburdened."

After a slew of fixes on Nov. 27, the day before Thanksgiving, and more on Thanksgiving morning, the team went to Park's house for turkey. Later that night, they returned to the office to execute still more releases while they shared pies brought in by Zients. On Sunday, Dec. 1, Zients issued a public report card showing the website's turnaround. A series of hardware upgrades had dramatically increased capacity; the system was now able to handle at least 50,000 simultaneous users and probably more. There had been more than 400 bug fixes. Uptimes had gone from an abysmal 43% at the beginning of November to 95%. And Kim and her team had knocked the error rate from 6% down to 0.5%. (By the end of January it would be below 0.5% and still dropping.) The press generally accepted the new numbers but questioned whether the site would be able to handle all the traffic expected ahead of the Dec. 23 deadline for people who wanted coverage effective on Jan. 1.

That was what Zients, Park and the rescue crew were worried about too. And yet through December, the numbers kept improving, helped by Kim's falling error rate and a group of new Dickerson recruits who either parachuted in for stays of a few weeks or, in some cases, vowed to stay until the close of enrollment at the end of March.

The team gathered at the command center early on Monday, Dec. 23, to see if what they had rebuilt could handle the traffic crush.

"I'll never forget that day for the rest of my life," says Park. "We'd been experiencing extraordinary traffic in December, but this was a whole new level of extraordinary ... By 9 o'clock traffic was the same as the peak traffic we'd seen in the middle of a busy December day. Then from 9 to 11, the traffic astoundingly doubled. If you looked at the graphs, it looked like a rocket ship."

Traffic rose to 65,000 simultaneous users, then to 83,000, the day's high point. The result: 129,000 enrollments on Dec. 23, about five times as many in a single day as what the site had handled in all of October. Because the sign-up deadline had been extended until Christmas Eve, Park and the team slept a few hours at the DoubleTree and came back at dawn. Traffic was again at levels never seen until the day before--and produced 93,000 more enrollments.

As it got later on the afternoon of Christmas Eve, the band was starting to break up. Smith left early to spend the holiday with his wife and young daughter, whom he had not seen in weeks. Although he lived about 20 miles away in Baltimore, the commute had become an impossible luxury in the frantic weeks in the run-up to the deadline.

Before Smith left that night, he gave an impassioned speech about what a privilege it had been to work on the project and to work with this crew, and, says Park, "we all had a hug."

Later that night, Park talked by videophone to Dickerson's parents in Connecticut, thanking them for lending their son to the team.

Just after midnight, Park went home and Dickerson went back to the DoubleTree. He didn't go back to Google until Jan. 5, spending the days after Christmas helping organize a crew of pit bosses who would cycle in and out of the operations center, which looked calm and whose video dashboards all displayed a remarkably stable system when I was there recently. (One screen showed that the current average response time--once a ridiculous eight seconds per page--was down to 0.343 seconds.)

As of its mid-February report covering the period through Jan. 31, CMS says the site had processed 1.9 million enrollments.

5. Where Technology Stops And Policy Begins

Challenges remain. A back-end link providing payments and automated account records to insurance companies has yet to be built and might not be completed before summer. But that is mostly a headache for the insurance companies, which have to bill and process payments through spreadsheets; it is not likely to affect consumers' experience or their access to insurance.

Had the Obama team brought in its old campaign hands in the first place to run the launch, there would have been howls about cronyism. But one lesson of the fall and rise of HealthCare.gov has to be that the practice of awarding high-tech, high-stakes contracts to companies whose primary skill seems to be getting those contracts rather than delivering on them has to change. "It was only when they were desperate that they turned to us," says Dickerson. "I have no history in government contracting and no future in it ... I don't wear a suit and tie ... They have no use for someone who looks and dresses like me. Maybe this will be a lesson for them. Maybe that will change."

In the way the team dropped everything to help and then stayed as long as it took, there's also a lesson about what John Doerr calls "the myth that everyone in Silicon Valley is a selfish narcissist." In one way or another, every member of the team told me the same thing--that this was the toughest but most rewarding project of their lives.

"The two months I spent on this were harder and more intense than the 17 months I spent on the campaign," says Burt, who like Dickerson initially thought he was going to be working for free. "But I loved every minute of it ... I believe in getting people health care. I am so proud of this."

"Jeff was good at pumping us up, and so was Todd," says one of the team members. "We even got to meet McDonough, the chief of staff, and that was good. But we really didn't need to be pumped up much. This is what we do. And this job had special meaning." That may be why none of the group--even those like Dickerson who had worked for President Obama during one or both of the campaigns and had met him multiple times at campaign headquarters--expressed any surprise or regret that they never got to meet the President. "I'm sure he's got a lot of other things to do," says Kim, chuckling. Nonetheless, a quick visit from Obama (who spent Thanksgiving 2013 at the White House) to the troops who worked around the clock to save his signature domestic-policy initiative would have seemed fitting.

McDonough says that in meetings with the President prior to the launch, Obama always would end each session "by saying, 'I want to remind the team that this only works if the technology works.'" The problem, of course, was that no one in the meetings had any idea whether the technology worked, nor did the President and his chief of staff have the inclination to dig in and find out. The President may have had the right instinct when he repeatedly reminded his team about the technology. But in the end he was as aloof from the people and facts he needed to avoid this catastrophe as he was from the people who ended up fixing it.

Now that it is fixed, the real test of his legacy achievement--what should have been the test all along--will begin. The website works. Will Obamacare work?

Brill, who a year ago wrote TIME's special report "Bitter Pill: Why Medical Bills Are Killing Us," is writing a book about the business and politics of health care, to be published this year by Random House