c语言预处理生成什么文件(C语言编译的8个阶段)
c语言预处理生成什么文件(C语言编译的8个阶段)b) 10 digit characters from '0' to '9'5个空格字符(空格、水平制表符、垂直制表符、换页符、换行符)The source character set is a multibyte character set which includes the basic source character set as a single-byte subset consisting of the following 96 characters:源字符集是一个多字节字符集,包括作为单字节子集的基本源字符集,由以下96个字符组成:a) 5 whitespace characters (space horizontal tab vertical tab form feed new-line)
The C source file is processed by the compiler as if the following phases take place in this exact order. Actual implementation may combine these actions or process them differently as long as the behavior is the same.
编译器对C源文件进行处理,就好像以下阶段按照这个确切的顺序进行一样。只要行为相同,实际实现可能会组合这些操作或以不同的方式处理它们。
Phase 1 字符映射The individual bytes of the source code file (which is generally a text file in some multibyte encoding such as UTF-8) are mapped in implementation defined manner to the characters of the source character set. In particular OS-dependent end-of-line indicators are replaced by newline characters.
源代码文件的各个字节(通常是一些多字节编码(如UTF-8)的文本文件)以实现定义的方式映射到源字符集的字符。特别是,依赖操作系统的行尾指示符被换行符所取代。(不同的字符集对同一个汉字的编码可能是不同的。)
The source character set is a multibyte character set which includes the basic source character set as a single-byte subset consisting of the following 96 characters:
源字符集是一个多字节字符集,包括作为单字节子集的基本源字符集,由以下96个字符组成:
a) 5 whitespace characters (space horizontal tab vertical tab form feed new-line)
5个空格字符(空格、水平制表符、垂直制表符、换页符、换行符)
b) 10 digit characters from '0' to '9'
从“0”到“9”的10位数字字符
c) 52 letters from 'a' to 'z' and from 'A' to 'Z'
从“a”到“z”以及从“a”到“z”的52个字母
d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * - / ^ & | ~ ! = \ " '
29个标点符号:_{}[]\35;()<>%:;.?* -/^&| ~!=,\" '
Trigraph sequences are replaced by corresponding single-character representations.
三角图序列由相应的单字符表示代替。
Phase 2 物理换行替换为逻辑换行1) Whenever backslash appears at the end of a line (immediately followed by the newline character) both backslash and newline are deleted combining two physical source lines into one logical source line. This is a single-pass operation: a line ending in two backslashes followed by an empty line does not combine three lines into one.
每当反斜杠出现在行尾(紧跟着换行符),反斜杠和换行符都会被删除,将两个物理源行合并为一个逻辑源行。这是一个单程操作:一条以两个反斜杠结尾并后跟空行的行不会将三行合并为一行。
2) If a non-empty source file does not end with a newline character after this step (whether it had no newline originally or it ended with a backslash) the behavior is undefined.
如果非空源文件在此步骤之后没有以换行符结尾(无论它最初是否没有换行符,或者以反斜杠结尾),则行为未定义。
Phase 3 单条注释替换为一个空格3.1 The source file is decomposed into comments sequences of whitespace characters (space horizontal tab new-line vertical tab and form-feed) and preprocessing tokens which are the following
源文件分解为注释、空格字符序列(空格、水平制表符、新行、垂直制表符和表单提要)和预处理标记,如下所示
a) Header names: <stdio.h> or "myfile.h"
b) identifiers
c) preprocessing numbers which cover integer constants and floating constants but also cover some invalid tokens such as 1..E 3.foo or 0JBK
预处理数字,包括整数常量和浮点常量,但也包括一些无效标记,如1..E 3.foo或0JBK
d) character constants and string literals
字符常量和字符串文字
e) operators and punctuators such as <<= <% or ##.
运算符和标点符号,如 、<<=、<%或##。
f) individual non-whitespace characters that do not fit in any other category
不适用于任何其他类别的单个非空白字符
3.2 Each comment is replaced by one space character
每个注释都替换为一个空格字符
3.3 Newlines are kept and it's implementation-defined whether non-newline whitespace sequences may be collapsed into single space characters.
保留换行符,它的实现定义了非换行符空格序列是否可以折叠为单个空格字符。
If the input has been parsed into preprocessing tokens up to a given character the next preprocessing token is generally taken to be the longest sequence of characters that could constitute a preprocessing token even if that would cause subsequent analysis to fail. This is commonly known as maximal munch.
如果输入已被解析为预处理标记(最多可达给定字符),则下一个预处理标记通常被视为可能构成预处理标记的最长字符序列,即使这会导致后续分析失败。这通常被称为贪婪原则(最大吞噬)。
int foo = 1;
int bar = 0xE foo; // error: invalid preprocessing number 0xE foo
int baz = 0xE foo; // OK
int pub = bar baz; // OK: bar baz
int ham = bar - baz; // OK: bar - baz
int qux = bar baz; // error: bar baz not bar baz.
The sole exception to the maximal munch rule is:
贪婪规则(最大吞噬)的唯一例外是:
Header name preprocessing tokens are only formed within a #include directive and in implementation-defined locations within a #pragma directive.
头文件预处理标记仅在#include指令内和#pragma指令内实现定义的位置中形成。
#define MACRO_1 1
#define MACRO_2 2
#define MACRO_3 3
#define MACRO_EXPR (MACRO_1 <MACRO_2> MACRO_3) // OK: <MACRO_2> is not a header-name
Phase 4 预处理
1) Preprocessor is executed.
执行预处理器。
2) Each file introduced with the #include directive goes through phases 1 through 4 recursively.
使用#include指令引入的每个文件都会递归地经历阶段1到4。
3) At the end of this phase all preprocessor directives are removed from the source.
在此阶段结束时,将从源代码中删除所有预处理器指令。
Phase 5 字符集转换1) All characters and escape sequences in character constants and string literals are converted from source character set to execution character set (which may be a multibyte character set such as UTF-8 as long as all 96 characters from the basic source character set listed in phase 1 have single-byte representations). If the character specified by an escape sequence isn't a member of the execution character set the result is implementation-defined but is guaranteed to not be a null (wide) character.
字符常量和字符串文字中的所有字符和转义序列都从源字符集转换为执行字符集(可以是多字节字符集,如UTF-8,只要阶段1中列出的基本源字符集中的所有96个字符都具有单字节表示)。如果转义序列指定的字符不是执行字符集的成员,则结果是实现定义的,但保证不是空(宽)字符。
当源文件被转换为可执行程序以后,这些由字符所组成的源代码,都将被替换为实现了相同功能的机器指令。但是,源文件中的字符串、字符常量等信息文本,并不会被替换为相应的机器指令。因为,它们并不对应某个功能,它们一般是用来打印和输出、传输。所以,它们在可执行文件(程序)中存储的仍然是其所对应的字符编码。
接下来的问题是:字符串、字符常量等信息文本在可执行文件中是按照哪种字符编码来存储的?这是由gcc来决定的:Linux下gcc默认使用UTF-8来存储这些信息文本到可执行文件中。
Note: the conversion performed at this stage can be controlled by command line options in some implementations: gcc and clang use -finput-charset to specify the encoding of the source character set -fexec-charset and -fwide-exec-charset to specify the encodings of the execution character set in the string literals and character constants that don't have an encoding prefix (since C11).
注意:在某些实现中,此阶段执行的转换可以由命令行选项控制:gcc和clang使用-finput charset指定源字符集的编码,-fexec charset和-fwide exec charset指定没有编码前缀的字符串文字和字符常量中执行字符集的编码(自C11起)。
Phase 6 相邻的字符串字面量串联到一起Adjacent string literals are concatenated.
相邻的字符串文字是串联的。
从ANSI C标准起, 如果字符串字面量之间没有间隔, 或者用空白字符分隔,C会将其视为串联起来的字符串字面量。例如:
char greeting[20] = "Hello and"" how are" " you"
" today!";
与下面的代码等价:
char greeting[5O] = "Hello and how are you today!";
Phase 7 编译
Compilation takes place: the tokens are syntactically and semantically analyzed and translated as a translation unit.
编译发生:标记在语法和语义上进行分析,并作为翻译单元进行翻译。
Phase 8 链接Linking takes place: Translation units and library components needed to satisfy external references are collected into a program image which contains information needed for execution in its execution environment (the OS).
链接发生:满足外部引用所需的翻译单元和库组件被收集到一个程序映像中,其中包含在其执行环境(OS)中执行所需的信息。
-End-