5 more timeless lessons of programming 'graybeards'
Alas, the computer industry has a strange, cultish fascination with new technologies, new paradigms, and of course, new programmers. It’s more fascination than reality because old tech never truly dies. Old inventions like the mainframe may stop getting headlines, but they run and run. As I write this, Dice shows more than five times as many jobs postings for the keyword "Cobol" (522) than "OCaml," "Erlang," and "Haskell" combined (11, 52, and 27, respectively).
The stories of age discrimination are common, as are the rationalizations. Younger programmers’ heads aren’t filled with old ideas, so they learn faster. Whippersnappers are more focused and diligent. They don’t suffer distractions, like having families, or at least their distractions keep them yoked to their PCs and smartphones.
Even if these are true -- there’s evidence they aren’t -- programming geezers have valuable wisdom you can’t absorb simply by watching a TED talk on YouTube or fast-forwarding through a MOOC. They understand better how computers work because they had to back when computers had front panels with switches. They didn’t have the layers of IDEs, optimizing compilers, and continuous integration to save their bacon. If they didn’t build it right from the beginning, it wouldn’t run at all. The young punks won’t know this for years.
Our last story on “7 timeless lessons of programming ‘graybeards’” generated many responses, so we’re back with five more lessons everyone should learn, or relearn, from their wizened, hardened colleagues.
Most people younger than 50 can’t recognize a statement like mov ah, 09h or cmp eax, ebx. Many probably think that computers naturally demand lots of curly brackets because the major languages use them to delimit blocks of code. Even those who understand that languages like Java or C must be translated into binary often have little to no experience crafting it.
Many older programmers spent their days writing assembler code, the name given to the human-readable version of raw binary machine code. Some could actually convert the assembly code by hand and turn it into hexadecimal bytes. The very best could then flip the toggle switches on the front panel to program the computers.
It’s not that writing assembler is great or essential. It’s a long slog filled with repetition and lots of opportunities to make sloppy mistakes. The compilers have become good enough to recognize complex patterns that can be optimized; in fact, some compiler creators like to brag that they can create better code than humans can.
That may be true, but the advantage of learning even a sliver of assembler is that you understand how a computer works. The higher-order languages may offer lots of quick shortcuts for standard operations, such as concatenating strings, but these can be a trap because programmers start to think that the plus operand (“+”) takes the same amount of time whether it’s adding two integers or concatenating two strings. It doesn’t. One operation takes dramatically longer, and people who understand assembly code and the way the JMP (jump) operation works are going to make the right decision.
Understanding how objects are packed in memory and loaded into the CPU when necessary is a big help in minimizing the copying and overcalculation that can produce slow code. Folks who grew up on assembler may not remember much about writing x86 code, but they still have instincts that tingle when they start to do something inherently slow. The whippersnappers don’t have these instincts, unless they train themselves through experience.
A long time ago, a programmer told me he hated Unix. Why He started out programming single-user microcomputers like the Altair or the Sol 20 that only ran one block of code at a time.
“A Unix computer will start running something else at any time,” he told me. “You’ll hear the floppy disks start up and you’ll have no idea why.”
This upset him because he was losing a powerful means of understanding what the computer is doing. No one really knows what’s going on in a modern computer. There are countless layers of software running on four or eight cores. Viruses and worms can live forever without the user noticing the lag.
Old programmers still watch for visual and auditory clues that help them understand and debug the code. They watch the light on the RJ-45 Ethernet jack that flickers when data is flowing. They listen to the hard disk and can hear when the disk starts to change tracks, an indication that something is either reading or writing to the disk. The really good ones can tell the difference between the paging that happens when memory is full and the sustained reading and writing that’s part of indexing.
The value of these clues are fading as the hard disks are replaced with solid-state drives and more and more data move wirelessly instead of through routers with blinking lights. But as long as the smartphones have little indicators that show when data is flowing, there will be value in sleuth skills like these.
In the good old days, the programmers would pack as many as eight different Boolean values into one byte. They flipped the individual bits because they didn’t want to waste any of them.
The modern data structures are incredibly wasteful. XML is filled with tags with long names, and each has a matching closing tag with an extra slash. It’s not uncommon to see modern XML files that are more than 90 percent fluff added to meet strict parsing rules.
JSON is considered an improvement because it’s a bit smaller, but only because there are no closing tags -- just curly brackets. There are still too many quotation marks on all the tags and strings.
The good news is that modern compression algorithms can often squeeze much of the fat out of data structures. But they can never get all of it. The graybeards know how to avoid putting it in from the beginning. That’s why code like MS-DOS 3.0 could run fast and light within a partition of no more than 32MB. Notice the modifier: no more than. That’s 32 million bytes and the maximum size of the disk partition.
That detail from MS-DOS 3.0 dates from the early 1980s, a time when the personal computer was already common and the computer revolution was well past its infancy. If you go back a bit more, the code from the 1970s was even leaner. The code from the 1960s was amazing.
The operations for testing and flipping bits weren’t merely novelties for early programmers; they were necessities. Some operations were so slow that programmers had to look for any advantage they could find. The best was understanding that dividing by two was equivalent to shifting a binary number to the right, like dividing by 10 is the same as shifting a decimal number to the right.
Shifting all of the bits is a standard operation on CPUs, and it was often blazingly fast compared to basic division. The good programmers used this advantage to write faster code that didn’t need to wait for multiplication and division when a shift could do the same.
We’re losing the connection to powers of two. It used to be that designers would instinctively choose numbers that were powers of two because they would lead to greater efficiencies. Numbers like 512 or 4,096 appeared frequently because it was easier to work with limits that are powers of two.
On many early processors, some operations took much longer than others. On the original 8086, dividing a number took anywhere from 80 to 190 times clock cycles, while adding two numbers took only three cycles. Even when the CPU could run at 5MHz, that could still make a big difference when doing the operation again and again.
Older programmers know that not every line of code or every instruction will execute in the same amount of time. They understand that computation is not free, and not every line of code is equivalent. Choose the wrong kind of operation and your machine will dramatically slow down.
People forget that choosing the wrong data type can also have consequences. Using a double or a long variable can still be slower on some chips. Using the wrong data structure can turn the program into sludge when you scale.
Too many youngsters think that computation is instantaneous and CPUs can do an infinite number of calculations in the blink of the eye. Their elders remember the slow CPUs that would putter along doing addition and seize up when asked to divide. All of the little details gathered over the years of hacking, debugging, and rehacking their code add up. The only way you get this knowledge is with time.