Текст книги "Geek Sublime: The Beauty of Code, the Code of Beauty"
Автор книги: Vikram Chandra
Жанры:
Современная проза
,сообщить о нарушении
Текущая страница: 7 (всего у книги 15 страниц)
Flannery O’Connor writes:
The meaning of a story has to be embodied in it, has to be made concrete in it. A story is a way to say something that can’t be said any other way, and it takes every word in the story to say what the meaning is. You tell a story because a statement would be inadequate. When anybody asks what a story is about, the only proper thing is to tell him to read the story. The meaning of fiction is not abstract meaning but experienced meaning. 58
The only way to explain to you what I experienced when I first read Hemingway is to tell you to read those stories. And even then, you will read different stories. We may read the same texts, but the dhvanithat manifests within you will be unique. Your beauty will be your own. If you reread a story that you read ten years ago, its dhvaniwithin you will be new. Poetry’s beauty is infinite.
6 THE BEAUTY OF CODE
This is what ugly code looks like:
This is a dependency diagram – a graphic representation of interdependence or coupling (the black lines) between software components (the gray dots) within a program. A high degree of interdependence means that changing one component inside the program could lead to cascading changes in all the other connected components, and in turn to changes in their dependencies, and so on. Programs with this kind of structure are brittle, and hard to understand and fix. This dependency program was submitted anonymously to TheDailyWTF.com, where working programmers share “Curious Perversions in Information Technology” they find as they work. The exhibits at TheDailyWTF are often embodiments of stupidity, of miasmic dumbness perpetrated by the squadrons of sub-Mort programmers putting together the software that runs businesses across the globe. But, as often, high-flying “enterprise architects” and consultants put together systems that produce dependency diagrams that look like this renowned TheDailyWTF exhibit. A user commented, “I found something just like that blocking the drain once.”
If the knot of tangled hair in figure 6.1 provokes disgust, what kind of code garners admiration? In the anthology Beautiful Code, the contribution from the creator of the popular programming language Ruby, Yukihiro “Matz” Matsumoto, is an essay titled “Treating Code as an Essay.” Matz writes:
Judging the attributes of computer code is not simply a matter of aesthetics. Instead, computer programs are judged according to how well they execute their intended tasks. In other words, “beautiful code” is not an abstract virtue that exists independent of its programmers’ efforts. Rather, beautiful code is really meant to help the programmer be happy and productive. This is the metric I use to evaluate the beauty of a program. 1
He goes on to list the virtues of good code: brevity; reusability (“never write the same thing twice”); familiarity (Ruby is an “extremely conservative programming language” that does not use “innovative control structures” but “sticks to traditional control structures programmers are familiar with”); simplicity; flexibility (which Matz defines as “freedom from enforcement of tools,” so programmers aren’t forced to work in a certain way by the tools or languages they use); and, finally, balance: “No element by itself will ensure a beautiful program. When balanced together and kept in mind from the very beginning, each element will work harmoniously with the others to create beautiful code.” 2
So, beautiful code is lucid, it is easy to read and understand; its organization, its shape, its architecture reveals intent as much as its declarative syntax does. Each small part is coherent, singular in its purpose, and although all these small sections fit together like the pieces of a complex mosaic, they come apart easily when one element needs to be changed or replaced. All this leads to the happiness of the programmer, who must understand it, change it, extend it. This longing for architectural coherence leads to comparisons of code with music, which is often described as the most mathematical of the arts. There is, in fact, an anecdotal but fairly generalized belief among American programmers that there is a high correlation between coding and music-making, that many coders are musicians. A similar claim is made about mathematicians and music. These connections seem culturally encoded to me, specific to America – I’ve never heard of Indian programmers or mathematicians having a special affinity for music, apart from some being passionate listeners. Still, the code-and-music analogy is illuminating in that both practices prize harmonious pattern-making and abhor cacophony, a loss of clarity and structure. The snarl in the dependency diagram (figure 6.1) may strike the civilian as a pretty picture, with its swirl of lines and punctuating sparks of gray; to the programmer, it is an abomination because it speaks of incoherence, incomprehensibility, unpredictability, sticky seams of connection that prevent swift diagnosis and make excision and replacement all but impossible.
With his emphasis on programmer happiness, Matz makes explicit his allegiance to Donald Knuth’s literate programming. He writes:
Programs share some attributes with essays. For essays, the most important question readers ask is, “What is it about?” For programs, the main question is, “What does it do?” In fact, the purpose should be sufficiently clear that neither question ever needs to be uttered … Both essays and lines of code are meant – before all else – to be read and understood by human beings. 3
The trouble of course is that as software programs grow bigger and more complex, the code they comprise tends to become unreadable and incomprehensible to human beings. Programmers like to point out that if each line of code, or even each logical statement (which may spread to more than one physical line), is understood to be a component, software systems are the most complicated things that humans have ever built: the Lucent 5ESS switch, used in telephone exchanges, derives its functionality from a hundred million lines of code; the 2008 Fedora 9 distribution of Linux comprises over two hundred million lines of code. 4No temple, no cathedral has ever contained as many moving parts. So if you’ve ever written code, you understand in your bones the truth of Donald Knuth’s assertion, “Software is hard. It’s harder than anything else I’ve ever had to do.” 5If you’ve ever written code, the fact that so much software works so much of the time can seem profoundly miraculous.
Software is complicated because it tries to model the irreducible complexity of the world. Even a simple software requirement for a small company that, say, provides secretarial services for the medical insurance industry—“We need an application that makes it easier for our scribes to write up reports from doctors’ examinations of insurance claimants”—will always reveal a swirling hodgepodge of exceptions and special cases. Some of the doctors will have two addresses on file, some will have three, and this one was on Broadway until January 22 and in the East Village afterwards. A report will always begin with a summary of the patient’s claimed condition, unless it’s being written for Company X, which wants a narration of the doctor’s exam up front. There are four main types of boilerplate text for the doctor’s conclusions, but there needs to be a “freeform” option, and room for other templates, but creation of new templates needs to be restricted to certain users. And so on, and on, and on.
The program you create in response to these requirements must reduce repetitive labor, automate the work that must be done each time, yet remain flexible enough to allow variation. The business practices that can be formalized into sets of procedures, into Grace Hopper’s dinner recipes – first do “a,” then “b,” then “c”—are easy to convert into code. In fact, the software practice that I learned in the eighties was called “procedural programming”—you wrote a program as a series of procedures that you called in sequence. Your job as a programmer was to “chunk” the system you were trying to model into clean, self-contained actions, and then construct more complex parts out of these simple elements. So, if you want to write a new report, what you do is: retrieve the doctor’s address, retrieve the patient’s information, create a new file. Or, in (pseudo) code: RetrieveDoctorsAddress(), RetrievePatientInfo(), CreateNewFile(). And these procedures would be called from BeginNewReport(). Which might be called from ShowApplicationMainMenu().
To engage in this kind of assemblage of functionality is bracingly logical and orderly; you feel like you are making a perfect little machine, a clean, comprehensible automaton that you have set in motion. But soon, as you adapt your procedural engine for the exceptions, for all the variations that exist in the real world, you find yourself snarled in squirming thickets of if-then-else constructs, each of which contains yet other if-elseif and switch-case monsters, and you find that you have to break out of your beautiful Report-Main-Body loop and backtrack to other reports to retrieve history, and then, inevitably your procedures become more complex and start doing two things instead of one, RetrievePatientInfo() is now doing the retrieving but is also checking for valid addresses, you know that functionality should be somewhere else but you don’t have the time to bother, the users ask for a new feature and you patch it in, and of course you mean to come back later and clean everything up, but then, before you know it, you are trapped inside an unwholesome, uncontrollable atrocity, a Big Ball of Mud: “[a] haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle.” 6
Often, it is not the lack of programming skill that leads to the emergence of a Big Ball of Mud, but something akin to the time-honored Indian practice of jugaad. Jugaadis Hindi for a creative workaround, a working improvisation that is built in the absence of resources and under pressure of time (from the Sanskrit yukti, trick, combination, concatenation). There can be something heroic about jugaad, as in the strange-looking trucks one sees bumping down country roads in rural India, which on closer examination turn out to be carts with diesel irrigation pumps strapped on; or the amphibious bicycles buoyed by improvised air floats and powered by blades taken from ceiling fans. Jugaadmakes do, it gets work done, it maneuvres around uncooperative bureaucracies, it hacks. In recent years, jugaadhas been recognized as down-to-earth creativity, as a prized national resource, and has acquired the dignifying sobriquet of “frugal engineering.”
In software, repeated applications of excessively frugal engineering by a series of programmers leads to a scheme that has no discernible structure, within which components use each other’s functionality promiscuously, so that the logic of the program becomes hard or impossible to follow. In a Big Ball of Mud – and yes, it is a technical term of art – effects flow across boundaries, so that introducing a small change to one piece of code results in unpredictable behavior in distant parts of the system. Software needs maintenance: bugs need to be fixed, new features are demanded by users. If you are the programmer asked to go into the depths of a Big Ball of Mud, the prospect is terrifying. I mean that quite literally; as you poke and prod into the innards of a badly written program on which users depend, you are often beset by paralyzing dread. How can you fix something you can’t understand? What if your fix introduces new bugs that reveal themselves in some future disaster which corrupts and loses data? The impulse then is to rewrite the whole program from the bottom up, in accordance with hard-won principles of good program design. But – often there is no budget for a complete rewrite, there is no time, there isn’t enough manpower. So maybe you patch a bit here, work in a clumsy kludge there– jugaad!Hundreds of programmers may have worked on such a program over the years, each contributing a little to the mess. So you add your handful of mud to the Big Ball, or maybe you just back away carefully and leave the damn thing alone. There are some areas of code in running programs that may as well be marked Here Be Dragons, and there are some programs that have run for decades – at universities, corporations, banks – that cannot be efficiently maintained or enhanced because nobody completely understands how they work.
For instance: the US Pentagon’s Defense Finance and Accounting Service (DFAS) is known to make “widespread” errors in paying soldiers’ salaries, and is slow to correct these mistakes when challenged. The software that the Pentagon uses for payroll and accounting comprises about seven million lines of COBOL code, mostly written in the sixties. The system hasn’t been updated in more than a dozen years, and significant portions of the code have been “corrupted”—long-running systems can suffer from “software entropy” or “code rot,” a slow deterioration in functionality because of the changing hardware or software environments they run within. A retired Pentagon employee reported that the system is “nearly impossible” to update because its documentation disappeared long ago: “It’s hard to make a change to a program if you don’t know what’s in there.” 7
When a situation like this becomes desperate enough, the powers-that-be may employ skilled “code archaeologists” to spelunk into the depths. Or pony up for a complete rewrite. Mostly, managers prefer to plug up the holes and leave the Big Balls of Mud to roll on. COBOL, a language first introduced in 1959 by Grace Hopper (“Grandma COBOL”), still processes 90 percent of the planet’s financial transactions, and 75 percent of all business data. 8You can make a comfortable living maintaining code in languages like COBOL, the computing equivalents of Mesopotamian cuneiform dialects. These ancient applications – too expensive to replace, sometimes too tangled to fix or improve – run on, serving up the data that appears on the chromed-up surface of your browser, which gives you the illusion that your bank and your local utility companies live on the technological cutting edge. But as always, the past lives on under the shiny surface of the present, and often, it is too densely tangled to comprehend.
The International Obfuscated C Code Contest annually awards recognition to the writer of “the most Obscure/Obfuscated C program”—that is, to the person who can produce the most incomprehensible working code in the language C. 9The stated pedagogical aim of the contest is “to show the importance of programming style, in an ironic way.” 10But it has always seemed to me that confronting unfathomable code is the programming equivalent of staring at the abject, of slowing down to peer into the carnage of a car wreck. This is the reason that programmers expend time and effort in designing esoteric, purposely difficult computer languages like the infamous “brainfuck”—that really is its official name, with the lowercase b—which was created as an exercise in writing the smallest possible compiler (240 bytes) that could run on the Amiga operating system. “Hello, world!” in brainfuck is:
brainfuck is a “Turing tarpit,” which is to say it is a very small language in which you can write any program that you could write in C or Java; but attempting to do so would, well, fuck your brain, and therefore the delectable frisson of terror the code above induces in discerning code cognoscenti. brainfuck is venerable and famous, but my favorite esoteric language is Malbolge, which was designed solely to be the most outrageously difficult language to program in. It is named, appropriately, after the eighth circle of hell in Dante’s Inferno, Malebolge (“Evil ditches,” reserved for frauds). In the language Malbolge, the result of any instruction depends on where it is located in memory, which effectively means that what specific code does changes with every run. Code and data are very hard to reuse, and the constructs to control program flow are almost nonexistent. 11Malbolge inverts the sacred commandments of literate programming, and is so impenetrable that it took two years after the language was first released for the first working program to appear, and that program was not written by a human, but generated by a computerized search program that examined the space of all possible Malbolge programs and winnowed out one possibility. “Hello, world!” in Malbolge is:
And yet, this snippet doesn’t convey the true, titillating evil of Malbolge, which changes and quakes like quicksand. To contemplate Malbolge is to stare into the abyss in which machines speak their own tongues, indifferent to the human gaze; the programmer thereafter knows the pathos of her situation, and recognizes the costs of sacrilege. The coder’s quest is for functionality—“all computer programs are designed to accomplish some kind of task”—and the extension and maintenance of that functionality demands clarity and legibility. Illegibility, incomprehensibility – that way madness lies.
The desire for lucidity as well as power, and therefore the maximization of productivity and happiness among programmers, has – according to some accounts – created more than eight thousand computer languages over the last half-century. 12Each of these artificial, formal languages embodies a philosophy of human-computer interaction; whole family trees of dialects have evolved from the hacker’s eternal desire to improve, to implement better. Adherents of one language will criticize another’s choice with the fierce religiosity of those who are convinced that they are completely rational. The computer scientist Edsger Dijkstra, for instance, didn’t hold back his feelings in a famous 1975 memo: FORTRAN was an “infantile disorder”; PL/I was a “fatal disease”; programmers exposed to BASIC were “mentally mutilated beyond hope of recognition”; use of COBOL “cripples the mind, its teaching should, therefore, be regarded as a criminal offence”; APL was a “mistake, carried through to perfection.” 13
In any given year, there are a couple of languages that are cool – at the time of this writing, all the really hip kids are learning Clojure. And there are certain languages that smell of terminal uncoolness: users of Microsoft’s Visual Basic are regarded by many as the dorks of the computing world, the most Mort-ish of the Morts. Like all its predecessors in the long “Basic” lineage of languages, which goes back to 1964, Visual Basic has an English-like syntax. With its first release in 1991, Visual Basic provided a relatively easy way for non-experts to build programs for Windows; precisely for these reasons, it has attracted the contempt of the math-inclined Einsteins, who despise its verbosity and blame its accessibility for the plague of terrible programs that still afflicts Windows.
The evolution of computer languages also reflects the development of computer science and the craft of programming. Procedural programming called for the decomposition of complexity into simpler procedures which were then called sequentially. But this method was clearly inadequate: Big Balls of Mud were being built everywhere, software projects overshot budgets and failed at an alarming rate, there was an ever-present air of crisis. In the mid-eighties, a newly popular method, “Object Oriented Programming,” offered salvation. OOP – first conceived in the nineteen sixties – uses “objects,” code constructions that contain both data (attributes that describe the object) and behavior (methods or procedures that you can call to effect changes); objects interact by passing messages to each other. In procedural programming, information and functionality were scattered all over the place; a doctor’s addresses might be in a database, the knowledge of how to use these addresses in a procedure somewhere, and the ability to create new addresses in yet another procedure elsewhere. With OOP, the promise was that you could neatly encapsulate both data and behavior inside objects, and then use these objects to faithfully model the stuff of the real world, which of course is full of objects. Using OOP, you could write code like this to create an object representing a new doctor and store his phone number:
Doctor drKumar = new Doctor();
drKumar.PhoneNumber = ‘5105550568’;
Here, “drKumar” is an object of the type “Doctor”; one of the data fields of this object is “PhoneNumber.” You may have another type called “Patient,” which might have a data field called “Diagnosis.” So with objects, when you needed the good Dr. Kumar’s address, you didn’t have to go searching through a master list of addresses. You could just retrieve the drKumar object and write something like the following to assign Dr. Kumar’s address to the “MailingAddress” field of the report the user wanted to print:
report.MailingAddress = drKumar.Address;
The dawn of OOP was a heady time. I imagined shiny metallic objects rising out of the primal Big Ball of Mud, pristine and clean, shooting messages at each other like bolts of energy. The rest of the programming world seemed equally excited: hundreds of books were written, OOP conferences were held, some programmers became superstar OOP gurus who charged large sums to teach the secrets of the OOP way of thinking. Every prominent programming language now acquired object orientation, if only as a possible method among others. And really, programming in this new idiom made it possible to solve certain problems with ease and a degree of elegance.
Yet – Big Balls of Mud continued to be born, to grow. Software projects continued to crash and burn, ruining budgets and careers. If OOP managed to avoid the inevitable snarls of procedural programming, it introduced new kinds of complexity. Even small systems now divided functionality into hundreds and thousands of objects. Thinking about all these pieces and layers and their interactions became increasingly difficult; subtle bugs arose from what felt like spooky action at a distance – entanglements between objects emerged from their whirling dance. In defense, programmers codified best practices into a smorgasbord of easily abbreviated rubrics: Separation of Concerns, Single Responsibility Principle, Don’t Repeat Yourself, Liskov Substitution Principle. Automated testing of each independent section of code became more crucial than ever, first to ensure that the code was actually doing what you wanted, and later to be sure that it was still doing what you wanted after you made changes in some other section of code – effects sometimes flowed unpredictably from one region to another.
In fact, adherents of Test-Driven Design (TDD) would have you write the tests before you ever write the code – that is, you write DoesMyAddTwoNumbersFunctionReallyReturnTheRightSum() before you write AddTwoNumbers(), thus forcing you to design AddTwoNumbers() to be easily testable. Usually, you write several tests for each section of code, checking for correct behavior under varying conditions, and so the lines of test code can easily outnumber the lines of program code by orders of magnitude. The open-source database SQLite, at the time of this writing, has 1,177 times the amount of test code as it does program code. 14Most non-programmers have never heard of SQLite, but it is the most widely deployed database in the world. 15SQLite is a tiny program. It runs within your Firefox browser, storing your bookmarks; it is used widely within the Mac operating system; it runs within each copy of Skype; it runs on your smartphone, storing contacts and appointments. SQLite’s vast suite of tests is an attempt to prevent bugs from creeping into a program that has become an essential, foundational component of the working memory of humanity.
Programmers work doggedly toward correctness, but the sheer size and complexity of software ensures that bugs lurk within. A bug is, of course, a flaw or fault in a program that produces unexpected results. In 1986, the award-winning researcher and academic Jon Bentley published a book that is now widely regarded as a classic, Programming Pearls.One of the algorithms he implemented was for binary search, a method of finding a value in a sorted array, which was first published in 1946. In 2006, decades after the publication of Bentley’s book – by which time his particular implementation had been copied and used many thousands of times – one of his erstwhile students, Joshua Bloch, discovered that under certain conditions this technique could manifest a bug. 16Bloch published his finding under a justly panic-raising headline, “Extra, Extra – Read All About It: Nearly All Binary Searches and Mergesorts are Broken.” He wrote:
The general lesson that I take away from this bug is humility: It is hard to write even the smallest piece of code correctly, and our whole world runs on big, complex pieces of code.
Careful design is great. Testing is great. Formal methods are great. Code reviews are great. Static analysis is great. But none of these things alone are sufficient to eliminate bugs: They will always be with us. A bug can exist for half a century despite our best efforts to exterminate it. 17
That software algorithms are now running our whole world means that software faults or errors can send us down the wrong highway, injure or kill people, and cause disasters. Every programmer is familiar with the most infamous bugs: the French Ariane 5 rocket that went off course and self-destructed forty seconds after lift-off because of an error in converting between representations of number values; the Therac-25 radiation therapy machine that reacted to a combination of operator input and a “counter overflow” by delivering doses of radiation a hundred times more intense than required, resulting in the agonizing deaths of five people and injuries to many others; the “Flash Crash” of 2010, when the Dow Jones suddenly plunged a thousand points and recovered just as suddenly, apparently as a result of automatic trading programs reacting to a single large trade.
These are the notorious bugs, but there are bugs in every piece of software that you use today. A professional “cyber warrior,” whose job it is to find and exploit bugs for the US government, recently estimated that “most of the software written in the world has a bug every three to five lines of code.” 18These bugs may not kill you, but they cause your system to freeze, they corrupt your data, and they expose your computers to hackers. The next great hope for more stable, bug-free software is functional programming, which is actually the oldest paradigm in computing – it uses the algebraic techniques of function evaluation used by the creators of the first computers. In functional programming, all computation is expressed as the evaluation of expressions; the radical simplicity of thinking about programming as only giving input to functions that produce outputs promotes legibility and predictability. There is again the same fervent proselytizing about functional programming that I remember from the early days of OOP, the same conviction that this time we’ve discovered the magic key to the kingdom. Functional languages like Clojure conjure up the clean symmetries of mathematics, and hold forth the promise of escape from all the jugaaduworkarounds that turn so much code into a gunky, biological-seeming mess. In general, though, programmers are now skeptical of the notion that there’s any silver bullet for complexity. The programmer and popular blogger Steve Yegge, in his foreword to a book called The Joy of Clojure, describes the language as a “minor miracle” and “an astoundingly high-quality language … the best I’ve ever seen,” but he also notes that it is “fashionable,” and that
our industry, the global programming community, is fashion-driven to a degree that would embarrass haute couture designers from New York to Paris … Fashion dictates the programming languages people study in school, the languages employers hire for, the languages that get to be in books on shelves. A naive outsider might wonder if the quality of a language matters a little, just a teeny bit at least, but in the real world fashion trumps all. 19
In respect to programming languages and techniques, the programming industry has now been through many cycles of faith and disillusionment, and many of its members have acquired a sharp, necessary cynicism. “Hype Cycle”—a phrase coined by the analysts at Gartner, Inc. – adroitly captures the up-and-down fortunes of many a tech fad. 20
The tools and processes used to manage all this complexity engender another layer of complexity. All but the simplest programs must be written by teams of programmers, each working on a small portion of the system. Of course these people must be managed, housed, provided with equipment, but also their product – the code itself – must be distributed, shared, saved from overwriting or deletion, integrated, and tested.
Entire industries have grown around these necessities. Software tools for building software – particularly “Integrated Development Environments,” applications used to write applications – are some of the most complex programs being built today. They make the programmer’s job easier, but the programmer must learn how to use them, must educate herself in their idiosyncrasies and the workarounds for their faults. This is not a trivial task. For example, every programmer needs to use a revision control system to track changes and easily branch and merge versions of code. The best-regarded revision control system today is Git, created by Linus Torvalds (and named, incidentally, after his famous cantankerousness). 21Git’s interface is command-line driven and famously UNIX-y and complex, and for the newbie its inner workings are mysterious. In response to a blog post titled “Git Is Simpler Than You Think,” an irritated Reddit commenter remarked, “Yes, also a nuclear submarine is simpler than you think … once you learn how it works.” 22I myself made three separate attempts to learn how Git worked, gave up, was frustrated enough by other revision control systems to return, and finally had to read a 265-page book to acquire enough competence to use the thing. Git is immensely powerful and nimble, and I enjoy using it, but maneuvering it felt – at least initially – like a life achievement of sorts.