注:机翻,未校。
History of Compiler Design
编译器设计的历史
Pritesh Pawar
Oct 20, 2021
In this blog I’ll be trying to shadow the history of compilers in detail along with the basic introduction to compilers and it’s optimization.
在这篇博客中,我将尝试详细介绍编译器的历史,以及编译器的基本介绍及其优化。
Starting with what a compiler is. A compiler is a computer program that translates a source program written in some high-level programming language (such as Java, Python) into machine code for some computer architecture (such as the Intel Pentium architecture, x86 AMD). The generated machine code can be later executed many times against different data each time. The common reason to use a compiler is to transform the source code into machine code and create an executable program.
从编译器是什么开始。编译器是一种计算机程序,它将用某些高级编程语言(如 Java、Python)编写的源程序转换为某些计算机体系结构(如 Intel Pentium 体系结构、x86 AMD)的机器代码。生成的机器代码可以在以后针对不同的数据多次执行。使用编译器的常见原因是将源代码转换为机器代码并创建可执行程序。
First Compiler 第一个编译器
Software’s for early computers was fundamentally written in machine code or in low level language like assembly language. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of CPUs started to become significantly greater than the cost of writing a compiler. The very limited memory capacity of early computers also created many technical problems when implementing a compiler.
早期计算机的软件基本上是用机器代码或低级语言(如汇编语言)编写的。直到能够在不同类型的 CPU 上重用软件的好处开始明显大于编写编译器的成本,高级编程语言才被发明出来。早期计算机的内存容量非常有限,在实现编译器时也带来了许多技术问题。
Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by Grace Hopper, in 1951, for the A-0 programming language. Grace Hopper coined the term “Compiler” which referred to her A-0 system which functioned as a loader or linker, not the modern notion of a compiler.
在 1950 年代末,独立于机器的编程语言首次被提出。随后,开发了几个实验性编译器。第一个编译器是由 Grace Hopper 于 1951 年为 A-0 编程语言编写的。Grace Hopper 创造了“编译器”一词,指的是她的 A-0 系统,它充当加载器或链接器,而不是编译器的现代概念。
The first Autocode and compiler in the modern sense were developed by Alick Glennie in 1952 at the University of Manchester for the Mark 1 computer. The FORTRAN team led by John Backus at IBM is generally credited as having introduced the first complete compiler in 1957. COBOL was an early language to be compiled on multiple architectures, in 1960. In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer programming languages and the increasing complexity of computer architectures, compilers have become more and more complex.
第一个现代意义上的 Autocode 和编译器由 Alick Glennie 于 1952 年在曼彻斯特大学为 Mark 1 计算机开发。由 IBM 的 John Backus 领导的 FORTRAN 团队被认为在 1957 年推出了第一个完整的编译器。COBOL 是 1960 年在多种架构上编译的早期语言。在许多应用领域中,使用高级语言的想法很快就流行起来。由于较新的编程语言支持的功能不断扩展,计算机体系结构的复杂性也越来越高,编译器变得越来越复杂。
source: realpython
Self Hosting Compilers 自托管编译器
In early days of compiler development, self hosting compilers were made. A compiler can be self hosted means it is written in programming language that it compiles, Building a self-hosting compiler is a bootstrapping problem, i.e. the first such compiler for a language must be either handwritten machine code or compiled by a compiler written in another language, or compiled by running the compiler in an interpreter.
在编译器开发的早期,出现了自托管编译器。编译器可以是自托管的,这意味着它是用它编译的编程语言编写的,构建一个自托管编译器是一个引导问题,即一种语言的第一个此类编译器必须是手写的机器代码,或者由用另一种语言编写的编译器编译,或者通过在解释器中运行编译器来编译。
- Corrado Böhm PhD dissertation
Corrado Böhm developed a language, a machine, and a translation method for compiling that language on the machine in his PhD dissertation dated 1951. He not only described a complete compiler, but also defined for the first time that compiler in its own language. The language was interesting in itself, because every statement (including input statements, output statements and control statements) was a special case of an assignment statement.
Corrado Böhm 博士论文 Corrado Böhm 在他 1951 年的博士论文中开发了一种语言、一种机器和一种在机器上编译该语言的翻译方法。他不仅描述了一个完整的编译器,而且还首次用自己的语言定义了该编译器。该语言本身就很有趣,因为每个语句(包括 input 语句、output 语句和控制语句)都是 assignment 语句的特例。 - NELIAC
The Navy Electronics Laboratory International ALGOL Compiler or NELIAC was a dialect and compiler implementation of the ALGOL 58 programming language developed by the Naval Electronics Laboratory in 1958. NELIAC was the brainchild of Harry Huskey — then Chairman of the ACM and a well known computer scientist and supported by Maury Halstead. The earliest version was implemented on the prototype USQ-17 computer (called the Countess) at the laboratory. It was the world’s first self-compiling compiler — the compiler was first coded in simplified form in assembly language, then re-written in its own language and compiled by the bootstrap, and finally re-compiled by itself, making the bootstrap obsolete.
NELIAC 海军电子实验室国际 ALGOL 编译器或 NELIAC 是海军电子实验室于 1958 年开发的 ALGOL 58 编程语言的方言和编译器实现。NELIAC 是 Harry Huskey 的心血结晶,Harry Huskey 当时是 ACM 的主席,也是一位著名的计算机科学家,并得到了 Maury Halstead 的支持。最早的版本是在实验室的原型 USQ-17 计算机(称为 Countess)上实现的。它是世界上第一个自编译编译器 — 编译器首先用汇编语言以简化形式编码,然后用自己的语言重写并由 bootstrap 编译,最后由自身重新编译,从而使 bootstrap 过时。 - Lisp
Another early self-hosting compiler was written for Lisp by Tim Hart and Mike Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
Lisp 语言 另一个早期的自托管编译器是由 Tim Hart 和 Mike Levin 于 1962 年在 MIT 为 Lisp 编写的。他们用 Lisp 编写了一个 Lisp 编译器,在现有的 Lisp 解释器中对其进行了测试。一旦他们将编译器改进到可以编译自己的源代码的程度,它就是自托管的。
High Level Languages for System Programming 用于系统编程的高级语言
Compiler technology evolved from the need for a strictly defined transformation of the high-level source program into a low-level target program for the digital computer. The compiler could be viewed as a front end to deal with the analysis of the source code and a back end to synthesize the analysis into the target code.
编译器技术是从需要将高级源程序严格定义的转换转换为数字计算机的低级目标程序发展而来的。编译器可以看作是处理源代码分析的前端和将分析综合到目标代码中的后端。
Optimization between the front end and back end could produce more efficient target code. Early operating systems and software were written in assembly language. In the 1960s and early 1970s, the use of high-level languages for system programming was still controversial due to resource limitations. However, several research and industry efforts began the shift toward high-level systems programming languages, for example, BCPL, BLISS, B, and C.
前端和后端之间的优化可以产生更高效的目标代码。早期的操作系统和软件是用汇编语言编写的。在 1960 年代和 1970 年代初,由于资源限制,使用高级语言进行系统编程仍然存在争议。然而,一些研究和行业工作开始转向高级系统编程语言,例如 BCPL、BLISS、B 和 C。
- BCPL (Basic Combined Programming Language) designed in 1966 by Martin Richards at the University of Cambridge was originally developed as a compiler writing tool.[17] Several compilers have been implemented, Richards’ book provides insights to the language and its compiler.[18] BCPL was not only an influential systems programming language that is still used in research[19] but also provided a basis for the design of B and C languages.
BCPL(基本组合编程语言)由剑桥大学的 Martin Richards 于 1966 年设计,最初是作为编译器编写工具开发的。[17]已经实现了几个编译器,Richards 的书提供了对该语言及其编译器的见解。[18] BCPL 不仅是一种有影响力的系统编程语言,至今仍在研究中使用[19],而且还为 B 和 C 语言的设计提供了基础。 - BLISS (Basic Language for Implementation of System Software) was developed for a Digital Equipment Corporation (DEC) PDP-10 computer by W.A. Wulf’s Carnegie Mellon University (CMU) research team. The CMU team went on to develop the BLISS-11 compiler one year later in 1970.
BLISS(系统软件实现的基本语言)是由 W.A. Wulf 的卡内基梅隆大学 (CMU) 研究团队为数字设备公司 (DEC) PDP-10 计算机开发的。一年后的 1970 年,CMU 团队继续开发 BLISS-11 编译器。 - Multics (Multiplexed Information and Computing Service), a time-sharing operating system project, involved MIT, Bell Labs, General Electric (later Honeywell) and was led by Fernando Corbito from MIT. Multics was written in the PL/I language developed by IBM and IBM User Group.
Multics(多路复用信息和计算服务)是一个分时操作系统项目,涉及麻省理工学院、贝尔实验室、通用电气(后来的霍尼韦尔),由麻省理工学院的 Fernando Corbito 领导。Multics 是用 IBM 和 IBM 用户组开发的 PL/I 语言编写的。
source: freepik
## Compiler Construction 编译器构造
A compiler implements a formal transformation from a high-level source program to a low-level target program. Compiler design can define an end-to-end solution or tackle a defined subset that interfaces with other compilation tools e.g. preprocessors, assemblers, linkers. Design requirements include rigorously defined interfaces both internally between compiler components and externally between supporting toolsets.
编译器实现从高级源程序到低级目标程序的正式转换。编译器设计可以定义端到端解决方案或处理与其他编译工具(例如预处理器、汇编器、链接器)接口的已定义子集。设计要求包括严格定义的接口,包括编译器组件内部和支持工具集之间的外部接口。
In the early days, the approach taken to compiler design was directly affected by the complexity of the computer language to be processed, the experience of the person(s) designing it, and the resources available. Resource limitations led to the need to pass through the source code more than once.
在早期,编译器设计方法直接受到要处理的计算机语言的复杂性、设计者的经验以及可用资源的影响。资源限制导致需要多次传递源代码。
A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. However, as the source language grows in complexity the design may be split into a number of interdependent phases. Separate phases provide design improvements that focus development on the functions in the compilation process.
由一个人编写的相对简单的语言的编译器可能是一个整体式软件。但是,随着源语言复杂性的增加,设计可能会分为许多相互依赖的阶段。单独的阶段提供设计改进,将开发重点放在编译过程中的函数上。
One-pass versus multi-pass compilers 单传递编译器与多传递编译器
Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing much work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
按传递次数对编译器进行分类的背景是计算机的硬件资源限制。编译涉及执行大量工作,而早期的计算机没有足够的内存来包含一个完成所有这些工作的程序。因此,编译器被分成更小的程序,每个程序都传递源代码(或源代码的某种表示形式),执行一些必要的分析和翻译。
The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler and one-pass compilers generally perform compilations faster than multi-pass compilers. Thus, partly driven by the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., Pascal).
在单通道中编译的能力通常被视为一种优势,因为它简化了编写编译器的工作,并且单通道编译器通常比多通道编译器更快地执行编译。因此,部分由于早期系统的资源限制,许多早期语言被专门设计成可以在一次编译中编译(例如,Pascal)。
The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyze one expression many times but only analyze another expression once.
单次编译的缺点是无法执行生成高质量代码所需的许多复杂优化。可能很难准确计算优化编译器执行了多少次。例如,优化的不同阶段可能会多次分析一个表达式,但只分析另一个表达式一次。
Three-stage compiler structure 三阶段编译器结构
Compilers bridge source programs in high-level languages with the underlying hardware. A compiler requires 1) determining the correctness of the syntax of programs, 2) generating correct and efficient object code, 3) run-time organization, and 4) formatting output according to assembler and/or linker conventions. A compiler consists of three main parts: the frontend, the middle-end, and the backend.
编译器将高级语言的源程序与底层硬件连接起来。编译器需要 1) 确定程序语法的正确性,2) 生成正确且高效的目标代码,3) 运行时组织,以及 4) 根据汇编程序和/或链接程序约定格式化输出。编译器由三个主要部分组成:前端、中端和后端。
source: Wikipedia
The frontend checks whether the program is correctly written in terms of the programming language syntax and semantics. Here legal and illegal programs are recognized. Errors are reported, if any, in a useful way. Type checking is also performed by collecting type information. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end.
前端检查程序是否在编程语言语法和语义方面编写正确。在这里,合法和非法程序得到认可。以有用的方式报告错误(如果有)。类型检查也是通过收集类型信息来执行的。然后,前端生成源代码的中间表示或 IR,供中间端处理。
The middle-end is where optimization takes place. Typical transformations for optimization are removal of useless or unreachable code, discovery and propagation of constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the context. The middle-end generates another IR for the following backend. Most optimization efforts are focused on this part.
中间端是进行优化的地方。典型的优化转换是删除无用或无法访问的代码、发现和传播常量值、将计算重新定位到不常执行的位置(例如,脱离循环)或基于上下文进行计算专业化。中间端为以下后端生成另一个 IR。大多数优化工作都集中在这部分。
The backend is responsible for translating the IR from the middle-end into assembly code. The target instruction(s) are chosen for each IR instruction. Variables are also selected for the registers. Backend utilizes the hardware by figuring out how to keep parallel FUs busy, filling delay slots, and so on. Although most algorithms for optimization are in NP, heuristic techniques are well-developed.
后端负责将 IR 从中端转换为汇编代码。为每个 IR 指令选择目标指令。还为 registers 选择了 Variables。Backend 通过弄清楚如何保持并行 FU 繁忙、填充延迟槽等来利用硬件。尽管大多数优化算法都在 NP 中,但启发式技术已经非常成熟。
Optimizing Compilers 优化编译器
Programming Languages and their Compilers by John Cocke and Jacob T. Schwartz, published early in 1970, devoted more than 200 pages to optimization algorithms. It included many of the now familiar techniques such as redundant code elimination and strength reduction
John Cocke 和 Jacob T. Schwartz 于 1970 年初出版的《编程语言及其编译器》一书用 200 多页的篇幅介绍了优化算法。它包括许多现在熟悉的技术,例如冗余代码消除和强度降低
- Peephole optimization
Peephole optimization is a simple but effective optimization technique. It was invented by William M. McKeeman and published in 1965 in CACM. It was used in the XPL compiler that McKeeman helped develop.
窥视孔优化 窥视孔优化是一种简单但有效的优化技术。它由 William M. McKeeman 发明,并于 1965 年在 CACM 上发表。它被用于 McKeeman 帮助开发的 XPL 编译器中。 - Capex COBOL optimizer
Capex Corporation developed the “COBOL Optimizer” in the mid-1970s for COBOL. This type of optimizer depended, in this case, upon knowledge of “weaknesses” in the standard IBM COBOL compiler, and actually replaced (or patched) sections of the object code with more efficient code. The replacement code might replace a linear table lookup with a binary search for example or sometimes simply replace a relatively “slow” instruction with a known faster one that was otherwise functionally equivalent within its context. This technique is now known as “Strength reduction”. For example, on the IBM System/360 hardware the CLI instruction was, depending on the particular model, between twice and 5 times as fast as a CLC instruction for single byte comparisons.
Capex COBOL 优化器 Capex Corporation 在 1970 年代中期为 COBOL 开发了“COBOL 优化器”。在这种情况下,这种类型的优化器依赖于对标准 IBM COBOL 编译器中 “弱点” 的了解,并且实际上用更高效的代码替换(或修补)目标代码的部分。例如,替换代码可能会用二进制搜索替换线性表查找,或者有时只是简单地将相对“慢”的指令替换为已知的更快指令,该指令在其上下文中功能等效。这种技术现在被称为 “Strength reduction (强度降低)”。例如,在 IBM System/360 硬件上,根据特定型号,CLI 指令的速度是 CLC 指令的两倍到 5 倍,用于单字节比较。
Modern compilers typically provide optimization options to allow programmers to choose whether or not to execute an optimization pass.
现代编译器通常提供优化选项,以允许程序员选择是否执行优化传递。
Conclusion 结论
The development of compilers started in the late 1950s, the first compiler was written by Grace Hopper in 1951 for her A-0 system. The first commercial compiler was made by team FORTRAN in 1957. With time development of self hosted compilers was progressing aggressively, till the 1970s there were plenty of self hosted compilers made available. Parsers were introduced in compilers along with parser generators like XPL, Yacc etc. Then as more high level languages were formed, the new compilers were introduced with the concept of optimizing compilers with one pass, multi pass and with three structure compilers. The process of optimizing compilers is still going on with a lot of research and development and has a wide scope in construction of new compilers.
编译器的开发始于 1950 年代后期,第一个编译器是由 Grace Hopper 于 1951 年为她的 A-0 系统编写的。第一个商业编译器由 FORTRAN 团队于 1957 年制作。随着时间的推移,自托管编译器的发展正在积极进行,直到 1970 年代,有大量自托管编译器可用。解析器与 XPL、Yacc 等解析器生成器一起引入编译器中。然后,随着更多高级语言的形成,引入了新的编译器,其概念是优化具有 one pass、multi pass 和 three structure 编译器的编译器。优化编译器的过程仍在进行中,需要进行大量的研究和开发,并且在构建新编译器方面具有广泛的范围。
Further Research and References
For the purpose of creating this blog, we referred to several sources which can be found below. Apart from those, if you are interested in learning about compilers or keen to know it’s history, we’d encourage you to go through references once !!!
- “Digitized: The Science of Computers and how it Shapes Our World”, Oxford University Press, Peter J. Bentley, 2012
- “Computers Then and Now. Journal of the Association for Computing Machinery”, Maurice V. Wilkes, 1968
- “Compiler Construction before 1980”, Dick Grune, 2010
- “History of Compiler Construction”, Wikipedia Article
Compilers: A Brief History and How They Work
编译器:简史及其工作原理
SkillMill
Feb 12, 2023
In the world of software development, compilers are essential tools that translate human-readable source code into machine-executable code. But have you ever wondered how these remarkable pieces of software came into being, or how they work their magic? In this post, we’ll explore the history of compilers and provide a high-level overview of their inner workings.
在 在软件开发领域,编译器是将人类可读的源代码转换为机器可执行代码的重要工具。但是你有没有想过这些非凡的软件是如何诞生的,或者它们是如何发挥它们的魔力的?在这篇文章中,我们将探索编译器的历史,并提供其内部工作原理的高级概述。
History of Compilers 编译器的历史
The first programming languages were developed in the 1950s, and they were written in assembly language or machine language. These languages were difficult to work with, requiring a deep understanding of the underlying hardware, and making programming a slow and error-prone process.
第一种编程语言是在 1950 年代开发的,它们是用汇编语言或机器语言编写的。这些语言很难使用,需要对底层硬件有深入的了解,并且使编程成为一个缓慢且容易出错的过程。
In the early 1950s, computer scientists began to work on a solution to this problem: a program that would translate high-level programming languages into machine language. The first successful compiler was created in 1952 by Grace Hopper, who is now recognized as one of the pioneers of computer science.
在 1950 年代初期,计算机科学家开始研究这个问题的解决方案:一种将高级编程语言翻译成机器语言的程序。第一个成功的编译器是由 Grace Hopper 于 1952 年创建的,她现在被公认为计算机科学的先驱之一。
Hopper’s compiler was a breakthrough in the field of computer science, allowing programmers to write code in higher-level languages and significantly improving their productivity. Since then, compilers have become an essential tool for software development, used to translate a wide variety of programming languages into machine code.
Hopper 的编译器是计算机科学领域的一项突破,它允许程序员用高级语言编写代码,并显著提高了他们的工作效率。从那时起,编译器就成为软件开发的重要工具,用于将各种编程语言转换为机器代码。
How Compilers Work 编译器的工作原理
Compilers take source code written in a high-level programming language and translate it into machine-executable code. The process can be broken down into three steps:
编译器采用以高级编程语言编写的源代码,并将其转换为机器可执行的代码。该过程可以分为三个步骤:
- Lexical Analysis: In this stage, the compiler analyzes the source code and breaks it down into a series of tokens, such as keywords, operators, and identifiers.
词法分析:在此阶段,编译器分析源代码并将其分解为一系列标记,例如关键字、运算符和标识符。 - Syntax Analysis: In this stage, the compiler analyzes the structure of the code and checks it against the rules of the programming language’s syntax. The compiler ensures that the code follows the language’s syntax and is well-formed.
语法分析:在此阶段,编译器分析代码的结构并根据编程语言的语法规则对其进行检查。编译器确保代码遵循语言的语法并且格式正确。 - Code Generation: In this stage, the compiler takes the tokens and the syntax tree generated in the previous stages and translates them into machine code.
代码生成:在此阶段,编译器获取前面阶段生成的令牌和语法树,并将它们转换为机器代码。
Compilers are complex pieces of software, and the translation process can take a long time. To speed up the process, compilers use a technique called optimization. During optimization, the compiler analyzes the code to find ways to make it run more efficiently. For example, the compiler may look for code that can be simplified or eliminated, or it may reorder instructions to take advantage of the processor’s capabilities.
编译器是复杂的软件,翻译过程可能需要很长时间。为了加快该过程,编译器使用一种称为优化的技术。在优化过程中,编译器会分析代码以找到使其更高效运行的方法。例如,编译器可能会寻找可以简化或消除的代码,或者它可能会对指令重新排序以利用处理器的功能。
Conclusion 结论
Compilers have played a critical role in the development of modern software, making it possible to write code in high-level programming languages that are easier to read, write, and maintain. The history of compilers is a testament to the creativity and ingenuity of the pioneers of computer science, who developed these powerful tools from scratch. As software development continues to evolve, compilers will remain an essential tool for translating high-level programming languages into machine code.
编译器在现代软件开发中发挥了关键作用,使使用更易于阅读、编写和维护的高级编程语言编写代码成为可能。编译器的历史证明了计算机科学先驱的创造力和独创性,他们从头开始开发了这些强大的工具。随着软件开发的不断发展,编译器仍将是将高级编程语言转换为机器代码的重要工具。
Compiling History: A brief tour of C compilers
编译历史:C 编译器简介
Diego Crespo
Jan 04, 2024
As the story of C’s birth goes hand in hand with the creation of Unix, the first C compiler can be traced back to the early 1970’s. I’ve detailed the history of C in my previous article Tracing the Lines: From the Telephone to Unix, which includes a brief summary of this history.
由于 C 的诞生故事与 Unix 的创建齐头并进,第一个 C 编译器可以追溯到 1970 年代初。我在之前的文章 Tracing the Lines: From the Telephone to Unix 中详细介绍了 C 语言的历史,其中包括对这段历史的简要总结。
Around 1971, Ken decided that Unix needed to be ported to a higher level language. Dennis Ritchie took on the task, evolving Ken’s B language into something more feature rich. It was first called New B (NB), but each time Ken tried to rewrite the kernel in New B he would run into a roadblock. He would then ask Dennis to add more features. Eventually after structures were invented, there were enough features that Ken could rewrite version 4 of the whole Unix kernel in it. After a new compiler was written for this new language, it was renamed to C, and the rest is history. This was a significant breakthrough, as, until then, kernels were written in Assembly. For perspective, as late as 1983, Microsoft was still programming MS-DOS v2.0 in Assembly. Unix was truly ahead of its time.
1971 年左右,Ken 决定需要将 Unix 移植到更高级别的语言。Dennis Ritchie 承担了这项任务,将 Ken 的 B 语言发展成功能更丰富的语言。它最初被称为 New B (NB),但每次 Ken 尝试用 New B 重写内核时,他都会遇到障碍。然后,他会要求 Dennis 添加更多功能。最终,在结构发明之后,有足够的功能让 Ken 可以在其中重写整个 Unix 内核的第 4 版。在为这种新语言编写了新的编译器后,它被重命名为 C,剩下的就是历史了。这是一个重大突破,因为在那之前,内核都是用 Assembly 编写的。从长远来看,直到 1983 年,Microsoft 仍在使用 Assembly 编写 MS-DOS v2.0。Unix 确实走在了时代的前面。
Since Unix was created on a PDP-10/11, it makes sense that the first compiler for C was created for the PDP-11. This is usually just referred to as the PDP C Compiler. The earliest known version of this compiler’s source code can still be viewed here, and is an interesting time capsule of computing history.
由于 Unix 是在 PDP-10/11 上创建的,因此为 PDP-11 创建第一个 C 编译器是有道理的。这通常简称为 PDP C 编译器。该编译器源代码的已知最早版本仍然可以在此处查看,并且是计算历史的一个有趣的时间胶囊。
This was succeeded by the Portable C Compiler, developed by Stephen C. Johnson of Bell Labs, and one of the first compilers that was capable of generating machine independent C code. In Section 2.1 of Bjarne Stroustrup’s article titled Sibling Rivalry: C and C++, he details key aspects of this compiler and why it was so important
随后,由贝尔实验室的 Stephen C. Johnson 开发的 Portable C 编译器紧随其后,它是最早能够生成独立于机器的 C 代码的编译器之一。在 Bjarne Stroustrup 的文章 Sibling Rivalry: C and C++ 的第 2.1 节中,他详细介绍了此编译器的关键方面以及它为何如此重要
Pre-ANSI C is often referred to as K&R C. However, that is slightly incorrect. The C described in [Kernighan,1978] lacks three features of the language used by almost all C programmers before the emergenceof C89: void, enumerations, and structure assignment. These three features were added in PCC, the Portable C Compiler, developed by Steve Johnson and distributed as the C compiler by Bell Labs (with the ‘‘blessing’’ of Dennis Ritchie).
ANSI之前的C通常被称为K&R C。然而,这有点不正确。在 C89 出现之前,[Kernighan,1978] 中描述的 C 语言缺少几乎所有 C 程序员都使用的语言的三个特征:void、枚举和结构分配。这三个功能被添加到 PCC 中,即可移植 C 编译器,由 Steve Johnson 开发,并由 Bell Labs 作为 C 编译器分发(在 Dennis Ritchie 的“祝福”下)。Adding void (used as a possible return type for functions only) allows a programmer to directly express that a function doesn’t return a value, and allows the compiler to check that. Similarly, adding enumerations allows a programmer to directly express that a group of values in some way belong together. It also supports the notion of manifest constants in a way that does not rely on macros.
添加 void (仅用作函数的可能返回类型) 允许程序员直接表示函数不返回值,并允许编译器检查该值。同样,添加枚举允许程序员直接表示一组值以某种方式属于一起。它还以不依赖于宏的方式支持清单常量的概念。Adding structure assignment (and also structure copy initialization, argument passing, and function return) makes struct values first-class citizens of C.
添加结构体赋值(以及结构体复制初始化、参数传递和函数返回)使结构体值成为 C 语言的一等公民。Thus, two of the three last additions to Classic C add to the expressive power of the type system without actually allowing a programmer to express any new computations. The third makes user-defined types, as then existing, equal to built-in types. In addition, one of the additions provides an alternative to the use of macros. These are all themes that recur in the design of C++.
因此,Classic C 的最后三个添加中的两个增加了类型系统的表达能力,但实际上不允许程序员表达任何新的计算。第三个选项使用户定义的类型(与当时存在的类型一样)等于内置类型。此外,其中一项新增功能提供了使用宏的替代方法。这些都是 C++ 设计中反复出现的主题。
The Portable C Compiler was distributed with version 7 of Unix, the last version of Unix released before it was commercialized. Due to its early mover advantage, and the fact that it could be adapted to produce assembly for different architectures, meant that it enjoyed much success in the nascent years of C.
Portable C 编译器与 Unix 版本 7 一起分发,这是 Unix 在商业化之前发布的最后一个版本。由于它的先发优势,以及它可以适应为不同架构生产汇编的事实,这意味着它在 C 的萌芽时期取得了很大的成功。
But it was not the only compiler to pop up during that time. The 1970s also saw the Small-C compiler created by Ron Cain. It was a minimalist subset of C that could run on 8-bit microcomputers. It’s hard to believe now a days that computers at one point struggled to compile C code, let alone a subset of it, but that was indeed the case. The PDP-11 that C was developed on was a 16-bit broom closet sized computer, which was still considerably more powerful than the 8-bit home computers of the day. This is often why programs from that era were written in Assembly, Basic, and Pascal instead of C.
但它并不是那段时间出现的唯一编译器。1970 年代还见证了 Ron Cain 创建的 Small-C 编译器。它是 C 的极简子集,可以在 8 位微型计算机上运行。现在很难相信计算机曾经难以编译 C 代码,更不用说它的子集了,但事实确实如此。开发 C 的 PDP-11 是一台 16 位扫帚壁橱大小的计算机,它仍然比当时的 8 位家用计算机强大得多。这通常就是那个时代的程序用 Assembly、Basic 和 Pascal 而不是 C 编写的原因。
Photo of PDP 11-40 by Stefan_Kögl
Another commercial compiler to come out during this time was the Lattice C compiler, one of the first C compilers written for the IBM Personal Computer. It was created by Lifeboat Associates and retailed for $500 ($1,628 in today’s money), and it ran on PC-DOS and MS-DOS. Microsoft used this as the basis for their Microsoft C Compiler (MSC). During this time many compilers were produced including, the Mark Williams Compiler, the Green Hills compiler, the Aztec C compiler and many others.
在此期间出现的另一个商业编译器是 Lattice C 编译器,这是最早为 IBM 个人计算机编写的 C 编译器之一。它由 Lifeboat Associates 创建,零售价为 500 美元(按今天的货币计算为 1,628 美元),可在 PC-DOS 和 MS-DOS 上运行。Microsoft 将其用作其 Microsoft C 编译器 (MSC) 的基础。在此期间,产生了许多编译器,包括 Mark Williams 编译器、Green Hills 编译器、Aztec C 编译器等。
These developments however, navigated a landscape devoid of an official C standard, leading to varied interpretations and implementations. They were based on “The C Programming Language” book by Brian Kernighan and Dennis M. Ritchie published on February 22, 1978. The eventual release of the standard, known as C89 or C90, brought much needed uniformity and clarity to the language. The preface in the 2nd edition of the book, published in April of 1988, highlights the importance.
然而,这些发展在没有官方 C 标准的环境中导航,导致了不同的解释和实现。它们基于 Brian Kernighan 和 Dennis M. Ritchie 于 1978 年 2 月 22 日出版的《C 编程语言》一书。该标准(称为 C89 或 C90)的最终发布为该语言带来了急需的统一性和清晰度。该书第 2 版于 1988 年 4 月出版,序言强调了其重要性。
The standard formalizes constructions that were hinted but not described in the first edition, particularly structure assignment and enumerations. It provides a new form of function declaration that permits cross-checking of definition with use. It specifies a standard library, with an extensive set of functions for performing input and output, memory management, string manipulation, and similar tasks. It makes precise the behavior of features that were not spelled out in the original definition, and at the same time states explicitly which aspects of the language remain machine-dependent.
该标准将第一版中暗示但未描述的结构形式化,特别是结构赋值和枚举。它提供了一种新形式的函数声明,允许对 definition 和 use 进行交叉检查。它指定了一个标准库,其中包含一组用于执行输入和输出、内存管理、字符串操作和类似任务的广泛函数。它精确地说明了原始定义中未阐明的特征的行为,同时明确说明了语言的哪些方面仍然依赖于机器。
With the publishing of the standard, C became a much more consistent language to program across environments.
随着该标准的发布,C 语言成为一种更加一致的跨环境编程语言。
Fast forward to the present, the GNU Compiler Collection (GCC) stands as a testament to the evolution of compilers, supporting not just multiple platforms, but also multiple languages.
快进到现在,GNU 编译器集合 (GCC) 证明了编译器的演变,它不仅支持多个平台,还支持多种语言。
One of my favorite things about computer lore is that many of the most instrumental people are still alive today, and we still have records of when they made history. This is no different for GCC, and we actual have the text Richard Stallman sent, introducing the GCC beta back in 87.
关于计算机传说,我最喜欢的一点是,许多最有用的人今天仍然活着,我们仍然有他们创造历史的记录。这对 GCC 来说也不例外,我们实际上有 Richard Stallman 发送的文本,在 87 年引入了 GCC 测试版。
Date: Sun, 22 Mar 87 10:56:56 EST
From: rms (Richard M. Stallman)
The GNU C compiler is now available for ftp from the file /u2/emacs/gcc.tar on prep.ai.mit.edu. This includes machine descriptions for vax and sun, 60 pages of ocumentation on writing machine descriptions (internals.texinfo, internals.dvi and Info file internals).
现在可以通过 FTP 从 prep.ai.mit.edu 上的文件 /u2/emacs/gcc.tar 获取 GNU C 编译器。其中包括针对 VAX 和 SUN 计算机的机器描述,还有 60 页关于编写机器描述的文档(internals.texinfo、internals.dvi 以及信息文件 internals)。
This also contains the ANSI standard (Nov 86) C preprocessor and 30 pages of reference manual for it.
这里面还包含了美国国家标准协会(ANSI)1986 年 11 月标准的 C 预处理器以及与之相关的 30 页参考手册。
This compiler compiles itself correctly on the 68020 and did so recently on the vax. It recently compiled Emacs correctly on the 68020, and has also compiled tex-in-C and Kyoto Common Lisp.
However, it probably still has numerous bugs that I hope you will find for me.
该编译器能在 68020 计算机上正确地对自身进行编译,最近在 VAX 计算机上也能做到这一点。它最近在 68020 计算机上正确地编译了 Emacs,并且还编译了用 C 语言编写的 TeX 以及京都通用 Lisp。不过,它可能仍然存在大量的漏洞,希望你们能帮我找出来。
I will be away for a month, so bugs reported now will not be handled until then.
我将外出一个月,所以现在报告的漏洞要到那时才能处理。
If you can't ftp, you can order a compiler beta-test tape from the Free Software Foundation for $150 (plus 5% sales tax in Massachusetts, or plus $15 overseas if you want air mail).
如果你无法进行 FTP 操作,可以从自由软件基金会订购一份编译器测试版磁带,价格为 150 美元(在马萨诸塞州需加收 5% 的销售税,如果你想要航空邮寄到海外则需另加 15 美元)。
Free Software Foundation
1000 Mass Ave
Cambridge, MA 02138
This feels like computer archeology to me.
这对我来说就像计算机考古学。
Today, GCC is more than just a C compiler. It’s a Compiler Collection, and it supports these programming languages: C, C++, Objective-C, Objective-C++, Fortran, Ada, D, and Go. It has support for the most platforms, and the most CPU architectures out of all compilers today, and is still being actively developed.
今天,GCC 不仅仅是一个 C 编译器。它是一个编译器集合,它支持以下编程语言:C、C++、Objective-C、Objective-C++、Fortran、Ada、D 和 Go。它支持当今所有编译器中的大多数平台和大多数 CPU 架构,并且仍在积极开发中。
look at all of them!
But GCC isn’t the only cross platform industrial grade compiler on the block. LLVM provides a great experience as well, and benefits from decades of hindsight in compiler construction. It was created by Vikram Adve and his PhD student Chris Lattner at the University of Illinois at Urbana–Champaign in 2000. It started as a research project in December while Chris was on winter break. Over the course of the next year, Chris and Vikram continued to work on the compiler before publishing their first paper on it titled, Automatic Pool Allocation for Disjoint Data Structures. Though they didn’t know it at the time, they were making history. A lot had happened in the field of compilers by the early 2000s, and this allowed LLVM to enjoy many benefits.
但 GCC 并不是唯一的跨平台工业级编译器。LLVM 也提供了很好的体验,并受益于编译器构建方面数十年的后见之明。它由 Vikram Adve 和他的博士生 Chris Lattner 于 2000 年在伊利诺伊大学厄巴纳-香槟分校创建。它始于 12 月,当时 Chris 正在放寒假。在接下来的一年里,Chris 和 Vikram 继续研究编译器,然后发表了他们的第一篇论文,题为《不相交数据结构的自动池分配》。虽然他们当时并不知道,但他们正在创造历史。到 2000 年代初,编译器领域发生了很多事情,这使得 LLVM 享受了许多好处。
- LLVM IR
- Front-end developers only need to understand LLVM IR, its workings, and invariants, making it easy to create new front ends for LLVM. Unlike other compilers like GCC, LLVM IR is self-contained, eliminating the need to manipulate complex data structures and global variables from other parts of the compiler
前端开发人员只需了解 LLVM IR、其工作原理和不变量,即可轻松为 LLVM 创建新的前端。与 GCC 等其他编译器不同,LLVM IR 是自包含的,无需从编译器的其他部分操作复杂的数据结构和全局变量
- Front-end developers only need to understand LLVM IR, its workings, and invariants, making it easy to create new front ends for LLVM. Unlike other compilers like GCC, LLVM IR is self-contained, eliminating the need to manipulate complex data structures and global variables from other parts of the compiler
- Modular Library-Based Design
基于库的模块化设计- The LLVM infrastructure consists of loosely coupled libraries instead of a monolith, including the optimizer, allowing developers to choose and order optimization passes for their specific needs. Only the necessary optimization passes are linked into the final application, optimizing compile times and avoiding unnecessary bloat
LLVM 基础设施由松散耦合的库而不是整体组成,包括优化器,允许开发人员根据其特定需求选择和订购优化通道。只有必要的优化过程才会链接到最终应用程序中,从而优化编译时间并避免不必要的膨胀
- The LLVM infrastructure consists of loosely coupled libraries instead of a monolith, including the optimizer, allowing developers to choose and order optimization passes for their specific needs. Only the necessary optimization passes are linked into the final application, optimizing compile times and avoiding unnecessary bloat
- Retargetable Code Generator
Retargetable Code 生成器- The LLVM code generator transforms LLVM IR into target specific machine code. It employs a modular approach with individual passes for instruction selection, register allocation, scheduling, code layout optimization, and assembly emission. This flexibility enables target-specific optimizations, such as register pressure reduction for x86 and latency optimization for PowerPC, without requiring a complete code generator rewrite
LLVM 代码生成器将 LLVM IR 转换为特定于目标的计算机代码。它采用模块化方法,为指令选择、寄存器分配、调度、代码布局优化和汇编发射提供单独的通道。这种灵活性支持特定于目标的优化,例如降低 x86 的寄存器压力和 PowerPC 的延迟优化,而无需完全重写代码生成器
- The LLVM code generator transforms LLVM IR into target specific machine code. It employs a modular approach with individual passes for instruction selection, register allocation, scheduling, code layout optimization, and assembly emission. This flexibility enables target-specific optimizations, such as register pressure reduction for x86 and latency optimization for PowerPC, without requiring a complete code generator rewrite
And lastly it’s 13 years younger than GCC, and benefits from not having to support as many architectures. This means that this, along with LLVM’s modular natures, allows LLVM’s code base to be about 3.5 times smaller than GCC (5million vs 1.6 million lines of code). While these numbers might seem staggering, it pales in comparison to the mighty Ford F150 which has 150 million lines of code much of which is compiled with LLVM and GCC I’m sure.
最后,它比 GCC 年轻 13 岁,并且受益于不必支持那么多架构。这意味着,这与 LLVM 的模块化特性一起,使 LLVM 的代码库比 GCC 小约 3.5 倍(500 万行对 160 万行代码)。虽然这些数字可能看起来令人震惊,但与强大的福特 F150 相比,它相形见绌,福特 F150 拥有 1.5 亿行代码,我敢肯定,其中大部分是用 LLVM 和 GCC 编译的。
But while LLVM might be newer, it still faces stiff competition from GCC. Both GCC and LLVM support many modern C and C++ standards, and they both have a large suite of tools for working with their output. In the great article by Jeremy Bennett titled How Much Does a Compiler Cost? We see the many tools that these compilers bring, as well as the hundreds of thousands of lines of code it takes to create the supporting software suite.
但是,尽管 LLVM 可能较新,但它仍然面临来自 GCC 的激烈竞争。GCC 和 LLVM 都支持许多现代 C 和 C++ 标准,并且它们都有大量用于处理其输出的工具。在 Jeremy Bennett 的精彩文章中,题为编译器成本是多少?我们看到了这些编译器带来的许多工具,以及创建支持软件套件所需的数十万行代码。
- Debugger: Either GDB (800k lines) or LLDB (600k lines)
调试器:GDB(800k 行)或 LLDB(600k 行) - Linker: GNU ld (160k lines), gold (140k lines) or lld (60k lines)
链接器:GNU ld(160k 线)、金(140k 线)或 lld(60k 线) - Assembler/disassembler: GNU gas (850k lines) or the built in LLVM assembler
汇编器/反汇编器:GNU gas(850k 行)或内置 LLVM 汇编器 - Binary utilities: GNU (90k lines) and/or LLVM (included in main LLVM source)
二进制实用程序:GNU(90k 行)和/或 LLVM(包含在主 LLVM 源代码中) - Emulation library: libgcc (included in GCC source) or CompilerRT (340k lines)
仿真库:libgcc(包含在 GCC 源代码中)或 CompilerRT(340k 行) - Standard C library: newlib (850k lines), glibc (1.2M lines), musl (82k lines) or uClibC-ng (251k lines)
标准 C 库:newlib(850k 行)、glibc(1.2M 行)、musl(82k 行)或 uClibC-ng(251k 行)
In Matt Godbolt’s CppCon talk titled What Has My Compiler Done for Me Lately? Unbolting the Compiler’s Lid, we see how many of the clever optimization tricks we used to need to do in code, can be done at the compiler level. This allows us to have simpler and easier to maintain code, without making tradeoffs on performance
在 Matt Godbolt 的 CppCon 演讲中,题为“我的编译器最近为我做了什么?打开编译器的盖子,我们可以看到我们过去需要在代码中完成的聪明优化技巧有多少可以在编译器级别完成。这使我们能够拥有更简单、更容易维护的代码,而无需在性能上做出权衡
- CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler’s Lid” - YouTube
https://www.youtube-nocookie.com/embed/bSkpMdDe4g4
I’m grateful for all the work compilers maintainers have put in optimizing our programs. We’ve come a long way to get here, and have built upon decades of hard won programming experience to come up with the robust solutions we have today. As our programming languages continue to evolve, our compilers will be right their with us, tirelessly optimizing our code and catching bugs, so we can get the best performance we can
我感谢编译器维护者为优化我们的程序所做的所有工作。我们走过了漫长的道路,并积累了数十年来之不易的编程经验,从而形成了我们今天拥有的强大解决方案。随着我们的编程语言不断发展,我们的编译器将与我们同在,不知疲倦地优化我们的代码并捕获错误,以便我们获得最佳性能
via:
-
History of Compiler Design. In this blog I’ll be trying to shadow… | by Pritesh Pawar | Medium Pritesh Pawar Oct 20, 2021
https://medium.com/@PowerPP/history-of-compiler-design-c48bfa78122e
-
Compilers: A Brief History and How They Work | by SkillMill | Medium SkillMill Feb 12, 2023
https://medium.com/@nikitinsn6/compilers-a-brief-history-and-how-they-work-acfcd2faa063
-
Compiling History: A brief tour of C compilers Diego Crespo Jan 04, 2024
https://www.deusinmachina.net/p/compiling-history-a-brief-tour-of