一、问题
当gcc的优化打开前后的效果不同时,可以通过gcc的-Q --help=optimizers查看在构建时gcc默认开启了哪些选项
tsecer@harry: gcc -Q --help=optimizers -O1 tsecer.cpp | more
下列选项控制优化:
-O
-Ofast
-Og
-Os
-faggressive-loop-optimizations [启用]
-falign-functions [禁用]
-falign-jumps [禁用]
-falign-labels [禁用]
-falign-loops [禁用]
-fassociative-math [禁用]
-fasynchronous-unwind-tables [启用]
-fauto-inc-dec [启用]
-fbranch-count-reg [启用]
-fbranch-probabilities [禁用]
-fbranch-target-load-optimize [禁用]
-fbranch-target-load-optimize2 [禁用]
-fbtr-bb-exclusive [禁用]
-fcaller-saves [禁用]
-fcode-hoisting [禁用]
-fcombine-stack-adjustments [启用]
-fcompare-elim [启用]
-fconserve-stack [禁用]
-fcprop-registers [启用]
-fcrossjumping [禁用]
-fcse-follow-jumps [禁用]
-fcx-fortran-rules [禁用]
-fcx-limited-range [禁用]
-fdce [启用]
-fdefer-pop [启用]
-fdelayed-branch [禁用]
-fdelete-dead-exceptions [禁用]
-fdelete-null-pointer-checks [启用]
-fdevirtualize [禁用]
-fdevirtualize-speculatively [禁用]
-fdse [启用]
-fearly-inlining [启用]
-fexceptions [禁用]
-fexpensive-optimizations [禁用]
-ffast-math
-ffinite-math-only [禁用]
-ffloat-store [禁用]
-fforward-propagate [启用]
-ffp-contract=[off|on|fast] fast
-ffp-int-builtin-inexact [启用]
-ffunction-cse [启用]
-fgcse [禁用]
-fgcse-after-reload [禁用]
-fgcse-las [禁用]
-fgcse-lm [启用]
-fgcse-sm [禁用]
-fgraphite [禁用]
-fgraphite-identity [禁用]
-fguess-branch-probability [启用]
-fhandle-exceptions
-fhoist-adjacent-loads [禁用]
-fif-conversion [启用]
-fif-conversion2 [启用]
-findirect-inlining [禁用]
-finline [启用]
-finline-atomics [启用]
对于函数被内联的情况,从名字大致猜测是这一个
-finline-functions-called-once Consider all "static" functions called once for inlining into their caller even if they are not marked "inline". If a call to a given function is integrated, then the function is not output as assembler code in its own right. Enabled at levels -O1, -O2, -O3 and -Os.
二、哪些调用可以inline
在没有开启优化的时候,flag_inline_functions_called_once没有置位,所以函数不会被inline。
当然,还有其它的情况判断是否需要inline函数,例如,调用者被调用者的属性不同(sanitize_attrs_match_for_inline_p),优化级别不同等。
其中最常见的无法inline的原因就是caller_growth_limits函数导致的,也即是inline之后调用函数增加太大。
/* Return true when NODE has uninlinable caller;
set HAS_HOT_CALL if it has hot call.
Worker for cgraph_for_node_and_aliases. */
static bool
check_callers (struct cgraph_node *node, void *has_hot_call)
{
struct cgraph_edge *e;
for (e = node->callers; e; e = e->next_caller)
{
if (!opt_for_fn (e->caller->decl, flag_inline_functions_called_once))
return true;
if (!can_inline_edge_p (e, true))
return true;
if (e->recursive_p ())
return true;
if (!(*(bool *)has_hot_call) && e->maybe_hot_p ())
*(bool *)has_hot_call = true;
}
return false;
}
/* Decide if we can inline the edge and possibly update
inline_failed reason.
We check whether inlining is possible at all and whether
caller growth limits allow doing so.
if REPORT is true, output reason to the dump file.
if DISREGARD_LIMITS is true, ignore size limits.*/
static bool
can_inline_edge_p (struct cgraph_edge *e, bool report,
bool disregard_limits = false, bool early = false)
{
gcc_checking_assert (e->inline_failed);
if (cgraph_inline_failed_type (e->inline_failed) == CIF_FINAL_ERROR)
{
if (report)
report_inline_failed_reason (e);
return false;
}
bool inlinable = true;
enum availability avail;
cgraph_node *caller = e->caller->global.inlined_to
? e->caller->global.inlined_to : e->caller;
cgraph_node *callee = e->callee->ultimate_alias_target (&avail, caller);
tree caller_tree = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (caller->decl);
tree callee_tree
= callee ? DECL_FUNCTION_SPECIFIC_OPTIMIZATION (callee->decl) : NULL;
if (!callee->definition)
{
e->inline_failed = CIF_BODY_NOT_AVAILABLE;
inlinable = false;
}
else if (callee->calls_comdat_local)
{
e->inline_failed = CIF_USES_COMDAT_LOCAL;
inlinable = false;
}
else if (avail <= AVAIL_INTERPOSABLE)
{
e->inline_failed = CIF_OVERWRITABLE;
inlinable = false;
}
/* All edges with call_stmt_cannot_inline_p should have inline_failed
initialized to one of FINAL_ERROR reasons. */
else if (e->call_stmt_cannot_inline_p)
gcc_unreachable ();
/* Don't inline if the functions have different EH personalities. */
else if (DECL_FUNCTION_PERSONALITY (caller->decl)
&& DECL_FUNCTION_PERSONALITY (callee->decl)
&& (DECL_FUNCTION_PERSONALITY (caller->decl)
!= DECL_FUNCTION_PERSONALITY (callee->decl)))
{
e->inline_failed = CIF_EH_PERSONALITY;
inlinable = false;
}
/* TM pure functions should not be inlined into non-TM_pure
functions. */
else if (is_tm_pure (callee->decl) && !is_tm_pure (caller->decl))
{
e->inline_failed = CIF_UNSPECIFIED;
inlinable = false;
}
/* Check compatibility of target optimization options. */
else if (!targetm.target_option.can_inline_p (caller->decl,
callee->decl))
{
e->inline_failed = CIF_TARGET_OPTION_MISMATCH;
inlinable = false;
}
else if (!inline_summaries->get (callee)->inlinable)
{
e->inline_failed = CIF_FUNCTION_NOT_INLINABLE;
inlinable = false;
}
/* Don't inline a function with mismatched sanitization attributes. */
else if (!sanitize_attrs_match_for_inline_p (caller->decl, callee->decl))
{
e->inline_failed = CIF_ATTRIBUTE_MISMATCH;
inlinable = false;
}
/* Check if caller growth allows the inlining. */
else if (!DECL_DISREGARD_INLINE_LIMITS (callee->decl)
&& !disregard_limits
&& !lookup_attribute ("flatten",
DECL_ATTRIBUTES (caller->decl))
&& !caller_growth_limits (e))
inlinable = false;
/* Don't inline a function with a higher optimization level than the
caller. FIXME: this is really just tip of iceberg of handling
optimization attribute. */
else if (caller_tree != callee_tree)
{
bool always_inline =
(DECL_DISREGARD_INLINE_LIMITS (callee->decl)
&& lookup_attribute ("always_inline",
DECL_ATTRIBUTES (callee->decl)));
inline_summary *caller_info = inline_summaries->get (caller);
inline_summary *callee_info = inline_summaries->get (callee);
/* Until GCC 4.9 we did not check the semantics alterning flags
bellow and inline across optimization boundry.
Enabling checks bellow breaks several packages by refusing
to inline library always_inline functions. See PR65873.
Disable the check for early inlining for now until better solution
is found. */
if (always_inline && early)
;
/* There are some options that change IL semantics which means
we cannot inline in these cases for correctness reason.
Not even for always_inline declared functions. */
/* Strictly speaking only when the callee contains signed integer
math where overflow is undefined. */
else if ((check_maybe_up (flag_strict_overflow)
/* this flag is set by optimize. Allow inlining across
optimize boundary. */
&& (!opt_for_fn (caller->decl, optimize)
== !opt_for_fn (callee->decl, optimize) || !always_inline))
|| check_match (flag_wrapv)
|| check_match (flag_trapv)
/* When caller or callee does FP math, be sure FP codegen flags
compatible. */
|| ((caller_info->fp_expressions && callee_info->fp_expressions)
&& (check_maybe_up (flag_rounding_math)
|| check_maybe_up (flag_trapping_math)
|| check_maybe_down (flag_unsafe_math_optimizations)
|| check_maybe_down (flag_finite_math_only)
|| check_maybe_up (flag_signaling_nans)
|| check_maybe_down (flag_cx_limited_range)
|| check_maybe_up (flag_signed_zeros)
|| check_maybe_down (flag_associative_math)
|| check_maybe_down (flag_reciprocal_math)
|| check_maybe_down (flag_fp_int_builtin_inexact)
/* Strictly speaking only when the callee contains function
calls that may end up setting errno. */
|| check_maybe_up (flag_errno_math)))
/* We do not want to make code compiled with exceptions to be
brought into a non-EH function unless we know that the callee
does not throw.
This is tracked by DECL_FUNCTION_PERSONALITY. */
|| (check_maybe_up (flag_non_call_exceptions)
&& DECL_FUNCTION_PERSONALITY (callee->decl))
|| (check_maybe_up (flag_exceptions)
&& DECL_FUNCTION_PERSONALITY (callee->decl))
/* When devirtualization is diabled for callee, it is not safe
to inline it as we possibly mangled the type info.
Allow early inlining of always inlines. */
|| (!early && check_maybe_down (flag_devirtualize)))
{
e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
inlinable = false;
}
/* gcc.dg/pr43564.c. Apply user-forced inline even at -O0. */
else if (always_inline)
;
/* When user added an attribute to the callee honor it. */
else if (lookup_attribute ("optimize", DECL_ATTRIBUTES (callee->decl))
&& opts_for_fn (caller->decl) != opts_for_fn (callee->decl))
{
e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
inlinable = false;
}
/* If explicit optimize attribute are not used, the mismatch is caused
by different command line options used to build different units.
Do not care about COMDAT functions - those are intended to be
optimized with the optimization flags of module they are used in.
Also do not care about mixing up size/speed optimization when
DECL_DISREGARD_INLINE_LIMITS is set. */
else if ((callee->merged_comdat
&& !lookup_attribute ("optimize",
DECL_ATTRIBUTES (caller->decl)))
|| DECL_DISREGARD_INLINE_LIMITS (callee->decl))
;
/* If mismatch is caused by merging two LTO units with different
optimizationflags we want to be bit nicer. However never inline
if one of functions is not optimized at all. */
else if (!opt_for_fn (callee->decl, optimize)
|| !opt_for_fn (caller->decl, optimize))
{
e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
inlinable = false;
}
/* If callee is optimized for size and caller is not, allow inlining if
code shrinks or we are in MAX_INLINE_INSNS_SINGLE limit and callee
is inline (and thus likely an unified comdat). This will allow caller
to run faster. */
else if (opt_for_fn (callee->decl, optimize_size)
> opt_for_fn (caller->decl, optimize_size))
{
int growth = estimate_edge_growth (e);
if (growth > 0
&& (!DECL_DECLARED_INLINE_P (callee->decl)
&& growth >= MAX (MAX_INLINE_INSNS_SINGLE,
MAX_INLINE_INSNS_AUTO)))
{
e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
inlinable = false;
}
}
/* If callee is more aggressively optimized for performance than caller,
we generally want to inline only cheap (runtime wise) functions. */
else if (opt_for_fn (callee->decl, optimize_size)
< opt_for_fn (caller->decl, optimize_size)
|| (opt_for_fn (callee->decl, optimize)
> opt_for_fn (caller->decl, optimize)))
{
if (estimate_edge_time (e)
>= 20 + inline_edge_summary (e)->call_stmt_time)
{
e->inline_failed = CIF_OPTIMIZATION_MISMATCH;
inlinable = false;
}
}
}
if (!inlinable && report)
report_inline_failed_reason (e);
return inlinable;
}
- calls_comdat_local的简单说明
一些代码及注释
/* Return true if NODE is local to a particular COMDAT group, and must not
be named from outside the COMDAT. This is used for C++ decloned
constructors. */
inline bool comdat_local_p (void)
{
return (same_comdat_group && !TREE_PUBLIC (decl));
}
- gcc选项的说明
-fdeclone-ctor-dtor The C++ ABI requires multiple entry points for constructors and destructors: one for a base subobject, one for a complete object, and one for a virtual destructor that calls operator delete afterwards. For a hierarchy with virtual bases, the base and complete variants are clones, which means two copies of the function. With this option, the base and complete variants are changed to be thunks that call a common implementation. Enabled by -Os.
大致感觉是说:一些constructor是必须和另外一个constructor的定义绑定在一起的(并且都在comdat里),不能单独内联一个constructor。这些被调用的comdat需要由链接器最终确定。
三、哪些可以从符号表删除
1、主体逻辑
在一些函数调用被内联之后,有些函数就成为没有被调用的函数,那么这种函数就有可能从符号表中删除,并且函数定义也可以被删除。
当然,在函数的最开始,对于函数通过can_remove_if_no_direct_calls_and_refs_p判断是否可以被删除;对于变量则通过can_remove_if_no_refs_p来判断是否可以被删除。
其它的如果没有被引用,就成为unreachable_nodes节点,进而可以从输出文件中删除。
bool
symbol_table::remove_unreachable_nodes (FILE *file)
{
///...
/* Mark functions whose bodies are obviously needed.
This is mostly when they can be referenced externally. Inline clones
are special since their declarations are shared with master clone and thus
cgraph_can_remove_if_no_direct_calls_and_refs_p should not be called on them. */
FOR_EACH_FUNCTION (node)
{
node->used_as_abstract_origin = false;
node->indirect_call_target = false;
if (node->definition
&& !node->global.inlined_to
&& !node->in_other_partition
&& !node->can_remove_if_no_direct_calls_and_refs_p ())
{
gcc_assert (!node->global.inlined_to);
reachable.add (node);
enqueue_node (node, &first, &reachable);
}
else
gcc_assert (!node->aux);
}
/* Mark variables that are obviously needed. */
FOR_EACH_DEFINED_VARIABLE (vnode)
if (!vnode->can_remove_if_no_refs_p()
&& !vnode->in_other_partition)
{
reachable.add (vnode);
enqueue_node (vnode, &first, &reachable);
}
///...
}
2、哪些不能删除
明显地,哪些显式/隐式具有external属性的必然不能被删除(DECL_EXTERNAL (decl) / externally_visible),其它static的如果没有被引用就真的可以被删除了。
但是,在函数调用中对于comdat有一个特殊的说明:
/* Only COMDAT functions can be removed if externally visible. */
也就是说:外部可见的comdat类型函数定义依然可以被删除,这个也是模板函数内联之后从符号表消失的根本原因。
/* Return true when function can be removed from callgraph
if all direct calls are eliminated. */
inline bool
cgraph_node::can_remove_if_no_direct_calls_and_refs_p (void)
{
gcc_checking_assert (!global.inlined_to);
/* Instrumentation clones should not be removed before
instrumentation happens. New callers may appear after
instrumentation. */
if (instrumentation_clone
&& !chkp_function_instrumented_p (decl))
return false;
/* Extern inlines can always go, we will use the external definition. */
if (DECL_EXTERNAL (decl))
return true;
/* When function is needed, we can not remove it. */
if (force_output || used_from_other_partition)
return false;
if (DECL_STATIC_CONSTRUCTOR (decl)
|| DECL_STATIC_DESTRUCTOR (decl))
return false;
/* Only COMDAT functions can be removed if externally visible. */
if (externally_visible
&& (!DECL_COMDAT (decl)
|| forced_by_abi
|| used_from_object_file_p ()))
return false;
return true;
}
/* Return true when variable can be removed from variable pool
if all direct calls are eliminated. */
inline bool
varpool_node::can_remove_if_no_refs_p (void)
{
if (DECL_EXTERNAL (decl))
return true;
return (!force_output && !used_from_other_partition
&& ((DECL_COMDAT (decl)
&& !forced_by_abi
&& !used_from_object_file_p ())
|| !externally_visible
|| DECL_HAS_VALUE_EXPR_P (decl)));
}
四、栗子
1、优化后符号表
在开了优化的情况下,未被引用的static变量leela,comdat类型的模板函数fry都因为没有被引用没有在生成目标文件中。
tsecer@harry: cat tsecer.cpp
//default external
int hube;
//default external
int tsecer()
{
return 1;
}
//static, not used
static int leela;
//comdat
template<typename T>
int fry(T t)
{
return t + 1;
}
int harry()
{
return tsecer() + fry(1);
}
tsecer@harry: gcc -O2 -c tsecer.cpp
tsecer@harry: nm tsecer.o |c++filt -t
0000000000000000 B hube
0000000000000010 T harry()
0000000000000000 T tsecer()
tsecer@harry:
这样的优化是有效的。大部分模板类/函数都因此不会在输出符号表中体现。例如stl容器类中的大量函数都不会在目标文件中生成符号。
但是这样也会导致一些其它的问题:比方说对于一个模板函数(例如例子中的fry
2、定义放到头文件可能的问题
对于explicit specialization函数:如果不特殊声明,则默认它们都是具有external可见性(而不是inline)。
也就是说,完全实例化函数默认是external属性,同时也是comdat。这个数据在优化的时候如果被内联就有可能不会生成符号;不加inline放在头文件中又没有inline就可可能会生成多重定义。
An explicit specialization of a function template is inline only if it is declared with the inline specifier or defined as deleted, and independently of whether its function template is inline. [ Example:
template void f(T) { /* ... / } template inline T g(T) { / ... */ }
template<> inline void f<>(int) { /* ... / } // OK: inline template<> int g<>(int) { / ... */ } // OK: not inline — end example ]
3、显式实例化为什么可以
对于显式实例化,取消了comdat属性
DECL_COMDAT (result) = 0;
而mark_needed也设置了orced_by_abi属性,从而保证了这种显式实例化的符号一定会存在于生成的object文件中。
/* Called if RESULT is explicitly instantiated, or is a member of an
explicitly instantiated class. */
void
mark_decl_instantiated (tree result, int extern_p)
{
SET_DECL_EXPLICIT_INSTANTIATION (result);
/* If this entity has already been written out, it's too late to
make any modifications. */
if (TREE_ASM_WRITTEN (result))
return;
/* For anonymous namespace we don't need to do anything. */
if (decl_anon_ns_mem_p (result))
{
gcc_assert (!TREE_PUBLIC (result));
return;
}
if (TREE_CODE (result) != FUNCTION_DECL)
/* The TREE_PUBLIC flag for function declarations will have been
set correctly by tsubst. */
TREE_PUBLIC (result) = 1;
/* This might have been set by an earlier implicit instantiation. */
DECL_COMDAT (result) = 0;
///...
}
/* DECL is a VAR_DECL or FUNCTION_DECL which, for whatever reason,
must be emitted in this translation unit. Mark it as such. */
void
mark_needed (tree decl)
{
TREE_USED (decl) = 1;
if (TREE_CODE (decl) == FUNCTION_DECL)
{
/* Extern inline functions don't become needed when referenced.
If we know a method will be emitted in other TU and no new
functions can be marked reachable, just use the external
definition. */
struct cgraph_node *node = cgraph_node::get_create (decl);
node->forced_by_abi = true;
/* #pragma interface and -frepo code can call mark_needed for
maybe-in-charge 'tors; mark the clones as well. */
tree clone;
FOR_EACH_CLONE (clone, decl)
mark_needed (clone);
}
else if (VAR_P (decl))
{
varpool_node *node = varpool_node::get_create (decl);
/* C++ frontend use mark_decl_references to force COMDAT variables
to be output that might appear dead otherwise. */
node->forced_by_abi = true;
}
}
标签:decl,DECL,gcc,符号,caller,源文件,&&,inline,callee
From: https://www.cnblogs.com/tsecer/p/16720607.html