网志

绕着编译器乱走

杰森·萨克斯(Jason Sachs)2019年12月9日3条评论

我们的团队还有一个 代码审查 最近。我看了看其中一个文件,看到一个看起来像这样的函数时惊恐地直立起来:

void some_function(SOMEDATA_T *psomedata)
{
    asm volatile("push CORCON");
    CORCON = 0x00E2; 

    do_some_other_stuff(psomedata);

    asm volatile("pop  CORCON");
}

那里 is a serious bug here —你看到了什么吗?

堆放自己的甲板

好,首先,此功能在做什么?所涉及的C代码是为在Microchip dsPIC33E器件上使用而编写的。的 CORCON 在dsPIC33E器件中注册 是个“核心控制寄存器” —它控制着处理器执行的许多其他操作。它’s a set of 模式位. Some of the bits control DSP instructions; the E2 content essentially tells the compiler to turn on arithmetic saturation 和 rounding, 和 use fractional multiplication, when working with the DSP accumulators.

因此,代码的意图大致如下:

  • save the CORCON register on the stack
  • configure CORCON to turn on saturation 和 rounding, 和 use fractional multiplication for certain DSP instructions
  • 执行一些计算
  • restore the CORCON register from the stack

但这赢了’t work as intended. 那里 are a few bugs here.

One is that by setting CORCON = 0x00E2, 所有 the bits of CORCON are being set a specific way, including some which affect the system interrupt priority level 和 how DO loops behave.

What the code should have done is set or clear only the relevant bits of CORCON, 和 leave the 其他 bits alone.

The more serious bug has to do with the way that CORCON is saved 和 restored, 和 the way that the compiler interacts with inline assembly code.

您会看到,在C语言中使用内联汇编时必须非常小心。’有点像做心脏手术…当病人还醒着走路时… 和 the patient isn’不了解外科医生。好吧,我希望这会吓到你,但这确实没有’不能很好地解释这种危险。这样就可以了:需要注意的重要事情是CPU和内存的某些部分(例如堆栈)是 管理 由编译器,并且编译器有某些东西 承担s。这些字 管理 承担s 很重要,就像我一样’在写这篇文章时,他们触发了与我的某些关联。

  • 管理 — You don’不要碰堆栈。编译器拥有堆栈。堆栈不是您可以直接读取或写入的内容。放手。如果您使用堆栈,编译器会伤害您。请远离堆栈。不要依赖于读取堆栈的内容。请不要’像猴子一样with着堆栈。它’由编译器使用堆栈。别管堆栈。非常非常害怕操纵堆栈的C代码。唐’甚至不用考虑。危险,禁止入内。

(其实, 管理 让我想起 托管代码 从2005-2008年左右开始,就是说Microsoft的一些奇怪的C ++或.NET功能术语毫无意义,让我只想离开。微软将为我管理一些事务,并保护我免受自身侵害,好吧,这是什么,一堆东西放在方括号中,为什么我要使用它呢?

堆栈是由编译器生成的输出程序的实现细节,是为编译器保留的一部分内存,用于 LIFO数据结构 与所谓的 堆栈指针 which keeps track of the top of the stack. As the compiler generates code to save temporary data, it puts it on the stack 和 moves the 堆栈指针, until later when it no longer needs that data 和 it restores the 堆栈指针 back. Strictly speaking, the compiler could use a completely different mechanism than the CPU stack, 和 often it optimizes things 和 places local variables in registers. If you write code like the excerpt below, you have no idea where the variable thing may be placed:

int foo(int a)
{
    int thing = a*a;

    thing += do_some_stuff(a);
    return thing;
}

它可能在堆栈上。它可能在寄存器中。编译器可以动态分配内存(但可能会赢’t)或将其粘贴到内存中的其他预定义位置(尽管几乎可以肯定会获胜)’t)。你没有办法知道,没有 看看编译器在做什么。编译器必须以某种预定义的方式运行,我们’告诉它通常使用堆栈作为局部变量,但是’由它决定。

  • 承担s —编译器不仅为您管理堆栈,还为您管理堆栈 承担s 它是堆栈的唯一所有者。它假定有关内联汇编代码的某些事情,即它没有责任检查该代码在做什么,并且作为程序员,您有责任确保与编译器正确配合。从技术上讲,除非您与汇编代码正确连接(如 GCC扩展的ASM语法), you can barely do anything. The compiler treats what you write as a black box 和 承担s you aren’不要碰任何你不应该碰的东西’t。其中包括堆栈。 (我还没有提到您应该’t modify the stack?)

这意味着处理器和编译器相关;让’看一下XC16 C编译器用户’s指南,其中包含以下内容 第12.3节 — with my emphasis on 管理 承担s:

12.3更改注册目录

编译器从C源代码生成的程序集将使用某些寄存器 16位设备上存在的内容。最重要的是,编译器 承担s 除了它生成的代码外,没有什么可以改变这些寄存器的内容。所以如果 程序集使用值加载寄存器,而无需后续代码生成 this register, the compiler will 承担 that the contents of the register are still valid later 在输出顺序中。

特殊的寄存器是 管理 通过编译器是:W0-W15, RCOUNT,状态(SR),PSVPAG和DSRPAG。如果启用了定点支持,则 编译器可以分配A和B,在这种情况下,编译器可以调整CORCON。

这些寄存器的状态绝不能通过C代码或与C代码内联的任何汇编代码直接更改。以下示例显示了C语句和内联代码 违反这些规则并更改STATUS寄存器中的ZERO位的程序集。

#include <xc.h>
void badCode(void)
{
 asm (“mov #0, w8”);
 WREG9 = 0;
}

编译器无法解释C代码中遇到的内联汇编代码的含义。也没有将通过SFR映射的变量与实际 注册自己。使用这两种方法之一写入SFR寄存器不会标记 该寄存器已更改,并可能导致代码失败。

听起来不错,但实际上是什么意思呢?

三个错误造成灾难

让’我们再来看一个自治实体承担某些事情的情况。您可能已经听说过2018年3月18日在亚利桑那州坦佩发生的死亡事故,其中涉及Uber Technologies,Inc.拥有的自动驾驶汽车和行人。你听“pedestrian” 和 “self-driving vehicle”,这也许会让人联想到一个幸福的家伙走在繁忙的市区街道上的景象,就在即将驶来的自动驾驶汽车决定撞上他之前。好吧,不是那样的。这里’s 华盛顿邮报的最新文章 关于2018年3月的事故,我的重点是“assume”:

怎么样 ever, documents released Tuesday by the National Transportation Safety Board show 优步’自动驾驶系统被编程为有故障 假设 有关某些道路使用者的行为方式。尽管有足够的时间在击中49岁的Elaine Herzberg之前停下来— nearly 6 seconds — the system repeatedly failed to accurately classify her as a 行人 or to understand she was pushing her bike across lanes of traffic on a Tempe, Ariz., street shortly before 10 p.m..

优步’自动驾驶系统“never classified her as a 行人 — or predicted correctly her goal as a jaywalking 行人 or a 骑单车的人” —NTSB的调查人员发现,因为她在没有人行横道的区域过马路。“The system design did not include a consideration for jaywalking 行人s.”

该系统对是否将Herzberg归类为车辆,自行车或“an 其他,” so “系统无法正确预测检测到的对象的路径,”根据NTSB的调查报告,NTSB将于本月晚些时候开会,就造成赫兹贝格问题的原因做出决定’s death.

In one particularly problematic 优步 假设,考虑到公共道路上常见的混乱情况, 承担d 该对象归类为“other”会留在那里“static location.”

That is, the autonomous vehicle had certain rules: it 承担d that detected objects fell into a certain set of fixed foreseeable behaviors.

假定?

自动驾驶汽车和编译器竞技场’真的有能力承担任何责任。那’确定性系统无法做到这一点,无论我们可以对它们进行人为化。它们包含前提条件和断言。而是由他们的系统设计人员来做假设。 (我要说的是,自动驾驶汽车假设没有马路行人之类的东西,或者它假设没有步行者会穿过繁忙的街道,但是这种说法暗示着设计师意识到了这种可能性并将其系统编程为包含这些概念,但拒绝使用它们。更有可能的是,它们刚刚超出了系统编程的潜在行为集。)

作为记录, NTSB报告2018年3月的崩溃,特别是 车辆自动化报告公路集团事实报告人类绩效小组主席’s Factual Report机载图像&数据记录器组主席’s Factual Report,未使用该词“assume”当涉及自动驾驶系统(ADS)时:

…但是,某些对象分类—其他—没有分配目标。对于此类对象,其当前检测到的位置被视为 静态位置;除非该位置直接在自动驾驶汽车的路径上,否则该对象为 不被视为可能的障碍...

At the time when the ADS detected the 行人 for the first time, 5.6 seconds before 撞击时,她大约位于两个左转车道的中间(见图3)。 Although the ADS sensed the 行人 nearly 6 seconds before the impact, the system never classified her as a 行人—or predicted correctly her goal as a jaywalking 行人 or a cyclist—因为她在没有人行横道的地方穿越了N. Mill Avenue;系统 design did not include a consideration for jaywalking 行人s. Instead, the system had initially classified her as an 其他 未分配目标的对象。随着ADS的更改, classification of the 行人 several times—在车辆,自行车和其他之间交替— 系统无法正确预测检测到的对象的路径。

另一方面,我使用这个词没有问题“assume” —不是因为我想暗示任何一种意愿,或者是在技术上完全准确,而是因为我们可以将其用作根据有限的一组数据和一组简化的规则得出结论的捷径。

无论如何,这起事故是造成许多因素的悲剧—夜间在市区工业区的一条街道,靠近高架公路,时速限制为45英里;行人决定过这条街,穿深色衣服,并在不适合行人过路的区域内骑自行车行走;以某些可疑假设进行编程的系统;一个人“operator”谁可能会采取行动停止汽车,但谁又向下瞥了车内, 观看NBC的流媒体视频’s 声音。与我的假设主题最相关的是

  • the 行人 承担d it was safe enough to cross at a location with no crosswalk —大概是如果汽车确实驶近,驾驶员会看到她并避免碰撞。
  • the vehicle 承担d the detected object was not a 行人 crossing the road, 和 that the object’s的轨迹不是会导致碰撞的轨迹。
  • the 算子 承担d it was safe to watch streaming video 和 rely on the vehicle to drive itself.

这三个假设都是错误的,并且发生了冲突。


亚利桑那州修订法规,标题28-运输

28-793. Crossing at 其他 than crosswalk

  1. A 行人 crossing a roadway at any point 其他 than within a marked crosswalk or within an unmarked crosswalk at an intersection shall yield the right-of-way to 所有 vehicles on the roadway.
  2. A 行人 crossing a roadway at a point where a 行人 tunnel or overhead 行人 crossing has been provided shall yield the right-of-way to 所有 vehicles on the roadway.
  3. Between adjacent intersections at which traffic control signals are in operation, 行人s shall not cross at any place except in a marked crosswalk.

28-794。司机要注意

尽管有本章的规定,每位车辆驾驶员应:

  1. Exercise due care to avoid colliding with any 行人 on any roadway.

  2. 必要时通过吹响喇叭发出警告。

  3. 采取适当的预防措施,观察道路上的儿童或困惑或无行为能力的人。

装配地碰撞

背部 to our CORCON example. The PUSHPOP instructions cause the 堆栈指针 W15 to be modified 和 data written to 和 read from the stack. But the compiler 管理 the 堆栈指针 W15, 和 uses it however it sees fit, 假设 no one else is going to modify W15 or the memory it points to. It also uses register W14 as a frame pointer using the LNKULNK instructions. (If you want more of the technical details, look at the 程序员’s Reference Manual

让’看一个简单的例子。一世’m using XC16 1.41:

import pyxc16

for optlevel in [1,0]:
    print "// --- -O%d ---" % optlevel
    pyxc16.compile('''
    #include <stdint.h>

    int16_t add(int16_t a, int16_t b)
    {
        return a+b;
    }
    ''', '-c','-O%d' % optlevel)
// --- -O1 ---
_add:
	add	w1,w0,w0
	return
// --- -O0 ---
_add:
	lnk	#4
	mov	w0,[w14]
	mov	w1,[w14+2]
	mov	[w14+2],w0
	add	w0,[w14],w0
	ulnk
	return

编译器的输出有很大的不同,具体取决于我们是打开还是关闭优化。

With optimization on (-O1) the compiler can reduce this function to a single ADD instruction, 和 the stack isn’t used at 所有 except for the return address (CALL pushes the return address onto the stack; RETURN pops it off.)

With optimization off (-O0), the compiler takes the following steps:

  • it 所有 ocates a new stack frame of 4 bytes using the LNK instruction —这会将旧的帧指针W14压入堆栈,将堆栈指针复制为W14作为新的帧指针,并在堆栈上分配4个额外的字节
  • it copies the first argument a from W0 to its place in the stack frame [W14], 和 copies the second argument b from W1 to its place in the stack frame [W14+2]
  • it performs the required computation, taking its inputs from those places (a = [W14]b = [W14+2]) 和 puts the result into W0
  • 它用 ULNK

这项工作很多都是不必要的,但是那’s what happens with -O0; you’将会看到一系列机械的,可预测的和安全的行为来实现C程序员所要求的。

现在我们’re going to spice it up by adding some jaywalking to mess with CORCON in inline assembly.

for optlevel in [1,0]:
    print "// --- -O%d ---" % optlevel
    pyxc16.compile(r'''
    #include <stdint.h>

    extern volatile uint16_t CORCON;

    int16_t add(int16_t a, int16_t b)
    {
        asm volatile("\n_l1:\n        push CORCON\n_l2:");
        CORCON = 0x00e2;
        
        int16_t result = a+b;
        
        asm volatile("\n_l3:\n        pop CORCON\n_l4:");
        return result;
    }
    ''', '-c','-O%d' % optlevel)
// --- -O1 ---
_add:
_l1:
        push CORCON
_l2:
	mov	#226,w2
	mov	w2,_CORCON
_l3:
        pop CORCON
_l4:
	add	w1,w0,w0
	return
// --- -O0 ---
_add:
	lnk	#6
	mov	w0,[w14+2]
	mov	w1,[w14+4]
_l1:
        push CORCON
_l2:
	mov	#226,w0
	mov	w0,_CORCON
	mov	[w14+2],w1
	mov	[w14+4],w0
	add	w1,w0,[w14]
_l3:
        pop CORCON
_l4:
	mov	[w14],w0
	ulnk
	return

我在这’ve added labels _l1 through _l4 to help capture what’在某些瞬间持续进行。

Now, the -O1 case is fairly easy to understand; here we push CORCON onto the stack, write E2 = 226 into CORCON, pop CORCON back off the stack, then do our adding operation.

??

The C code asked the compiler to add a+b in between the PUSHPOP calls. So that’s another bug —不是在编译器中,而是在我们与之交互的方式中。编译器 承担s 它可以对某些事物进行重新排序;它没有’t know what you’re trying to do with this inline assembly, it just knows that you want to return a+b, 和 that’s what it does. We’ll look at that again in a bit. (Yes, I know that the ADD instruction isn’t affected by CORCON, but if we were using a DSP instruction like MPY or MAC , n this bug could produce incorrect behavior.)

The -O0 case is a little more involved, 和 it does do the math between labels _l2_l3 after setting CORCON to E2. 这里’s在不同时刻堆栈的外观:

黄细胞就是我们’通过内联汇编手动添加;其余的已由编译器处理。“FP” stands for frame pointer (W14 in the dsPIC) 和 “SP” for 堆栈指针 (W15); the “addr.lo” 和 “addr.hi” content 是个return address which has been placed onto the stack when add() is reached via a CALL instruction.

The stack on the dsPIC grows upwards in memory with PUSHLNK instructions, 和 either contains 所有 ocated or unallocated data:

  • 分配了堆栈指针以下地址的内容—编译器(或者我们,如果我们’足够愚蠢以尝试使用它)仅在有意更改已在堆栈上分配的特定值的情况下才修改此内容,并且期望在使用完该内容后通过还原堆栈来取消分配该内容指向其先前位置的指针。

  • 未分配堆栈指针或上方的地址处的内容—允许编译器将其用于临时数据,并且可以分配和取消分配内存。我们可以’假设我们知道未分配数据的内容,因为中断可能在任何时刻发生,并且已在堆栈上分配/使用/取消分配了内存。我已经用???标记了所有未知内容。

So even though we just had the saved value of CORCON on the stack at _l3, we aren’t 所有 owed to 承担 that this value will still be there at _l4.

否则,这里没有问题。编译器做它的事,我们做我们的事,一切都很好,对吧?就像开车时过马路一样’t coming.

让’s raise it up another notch: below is a function foo() which is just like add()但它 calls some external function munge() to modify the result a+b before returning it. If you want to test it yourself, create a different file that contains something like

#include <stdint.h>

void munge(int16_t *px)
{
    (*px)++;   // add 1 to whatever px points to
}
for optlevel in [0,1]:
    print "// --- -O%d ---" % optlevel
    pyxc16.compile(r'''
    #include <stdint.h>

    extern volatile uint16_t CORCON;
    
    void munge(int16_t *px);
    
    int16_t foo(int16_t a, int16_t b)
    {
        asm volatile("\n_l1:\n        push CORCON\n_l2:");
        CORCON = 0x00e2;
        
        int16_t result = a+b;
        asm volatile("\n_l3:");
        munge(&result);
        
        asm volatile("\n_l4:\n        pop CORCON\n_l5:");
        return result;
    }
    ''', '-c','-O%d' % optlevel)
// --- -O0 ---
_foo:
	lnk	#6
	mov	w0,[w14+2]
	mov	w1,[w14+4]
_l1:
        push CORCON
_l2:
	mov	#226,w0
	mov	w0,_CORCON
	mov	[w14+2],w1
	mov	[w14+4],w0
	add	w1,w0,w0
	mov	w0,[w14]
_l3:
	mov	w14,w0
	rcall	_munge
_l4:
        pop CORCON
_l5:
	mov	[w14],w0
	ulnk
	return
// --- -O1 ---
_foo:
	lnk	#2
_l1:
        push CORCON
_l2:
	mov	#226,w2
	mov	w2,_CORCON
	add	w1,w0,w1
	mov	w1,[w15-2]
_l3:
	dec2	w15,w0
	rcall	_munge
_l4:
        pop CORCON
_l5:
	mov	[w15-2],w0
	ulnk
	return

让’s look at the unoptimized -O0 version first.

This looks a lot like the add() case, except here we rcall _munge with the value of W14 as an argument by placing it in W0 — we’re passing in the address contained in the frame pointer. Then munge() can read 和 write this value as appropriate. After munge() completes:

  • pop off the saved value of CORCON和put it back into the CORCON register
  • copy the munged value of a+b into W0 as the return value
  • 取消分配堆栈框架并返回到调用方

同样,这里没有问题。

但在这里’s what happens when this is compiled in -O1:

  • _foo_l1 —分配两个字节的堆栈帧以存储临时值
  • _l1_l2 — push the value of CORCON on the stack
  • _l2_l3 — write E2 into CORCON, compute a+b, 和 store it in the location below the 堆栈指针 = [W15-2]. Uh oh. 这里’发生碰撞的位置。 We saved our CORCON value on the stack, but the compiler doesn’t know it’s there 和 thinks that [W15-2] is where it 所有 ocated the two bytes on the stack, which it owns, 和 which it can safely modify. If the compiler were aware that we 所有 ocated two more bytes on the stack using inline assembly, then it should be storing a+b at [W15-4]… but it’s not aware, 和 instead, the compiler-generated code overwrites the saved value of CORCON.
  • _l3_l4 – call munge(), passing in W15-2 as an argument by placing it in W0
  • _l4_l5 — our inline assembly is executed, 和 the CPU pops what we think 是个saved value of CORCON back into the CORCON register. But instead, it’s the munged value of a+b.
  • _l5 → return from foo() — copy these two 所有 ocated bytes on the stack into W0 to use as a return value, then deallocate the two-byte stack frame. Unfortunately the compiler thinks those bytes contain the munged value of a+b, whereas in reality they contain uninitialized memory.

您不能使用内联汇编来分配堆栈内存,除非在将控制权返回给编译器之前先对其进行了分配。 也就是说,如果你’re going to execute PUSH in a section of inline assembly, that same section has to contain a corresponding POP. Otherwise, your inline assembly conflicts with what the compiler 承担s about what’s在分配的堆栈的顶部,一旦发生这种情况,所有赌注都将关闭;编译器管理的指针和内存可能会由于我们手动插入内联汇编而损坏,结果可能导致意外行为。 这不是一个良性的失败!

如果您有MPLAB X和XC16编译器的副本,并且使用模拟器进行调试,则可以尝试自己运行此程序。

If you step through the code, you will find the results of the collision after foo() returns:

  • Instead of restoring its original value, CORCON will contain the munged version of a+b in the bits of CORCON that are writeable (some bits are read-only)
  • The result of foo() will be whatever value happened to be at the appropriate place on the stack immediately below where CORCON gets PUSHed 和 the value of a+b gets storesd. (So if the 堆栈指针 W15 contained 0x1006 before the call to foo() , n the “addr.lo” word in the diagram is located at 0x1006和 result of foo() will be whatever value is contained three words past it, at address 0x100c, whereas a+b will get stored at 0x100e, 和 munge() will modify the value located there.)

正确保存CORCON的方法

那么我们该如何解决呢?好吧,那边’s still a way we can save CORCON without corrupting the compiler’对系统状态的理解,那就是将其放在局部变量中:

for optlevel in [0,1]:
    print "// --- -O%d ---" % optlevel
    pyxc16.compile(r'''
    #include <stdint.h>

    extern volatile uint16_t CORCON;
    
    void munge(int16_t *px);
    
    int16_t foo(int16_t a, int16_t b)
    {
        uint16_t tempCORCON = CORCON;
        CORCON = 0x00e2;
        
        int16_t result = a+b;
        munge(&result);
        
        CORCON = tempCORCON;
        return result;
    }
    ''', '-c','-O%d' % optlevel)
// --- -O0 ---
_foo:
	lnk	#8
	mov	w0,[w14+4]
	mov	w1,[w14+6]
	mov	_CORCON,w1
	mov	w1,[w14]
	mov	#226,w0
	mov	w0,_CORCON
	mov	[w14+4],w1
	mov	[w14+6],w0
	add	w1,w0,w0
	mov	w0,[w14+2]
	inc2	w14,w0
	rcall	_munge
	mov	[w14],w1
	mov	w1,_CORCON
	mov	[w14+2],w0
	ulnk
	return
// --- -O1 ---
_foo:
	lnk	#2
	mov	w8,[w15++]
	mov	_CORCON,w8
	mov	#226,w2
	mov	w2,_CORCON
	add	w1,w0,w1
	mov	w1,[w15-4]
	sub	w15,#4,w0
	rcall	_munge
	mov	w8,_CORCON
	mov	[w15-4],w0
	mov	[--w15],w8
	ulnk
	return

这里 the compiler is managing everything, 和 it can 承担 that what it 所有 ocated on the stack will stay there in the state it intended, unless it modifies the 所有 ocated memory itself.

其他细微之处

计算依赖性和执行顺序

那里’s still that 其他 little bug we ran into in add() under -O1, namely that the addition happened outside of the section of code in which CORCON was saved 和 restored. This bug will still be there even if get rid of our use of inline assembly; see below, where the ADD instruction takes place after we’ve restored CORCON:

pyxc16.compile(r'''
#include <stdint.h>

extern volatile uint16_t CORCON;

int16_t add(int16_t a, int16_t b)
{
    uint16_t tempCORCON = CORCON;
    CORCON = 0x00e2;

    int16_t result = a+b;

    CORCON = tempCORCON;
    return result;
}
''', '-c','-O1')
_add:
	mov	_CORCON,w2
	mov	#226,w3
	mov	w3,_CORCON
	mov	w2,_CORCON
	add	w1,w0,w0
	return

The problem here is that the compiler has no knowledge of data dependency between the content of the CORCON register 和 the instruction we want to execute. Again — yes, the ADD instruction isn’t affected, but the same problem could occur if we use an accumulator instruction that depends on the CORCON content, like SAC.R:

pyxc16.compile(r'''
#include <stdint.h>

extern volatile uint16_t CORCON;
register int accA asm("A");    // accumulator A

int16_t bar(int16_t a, int16_t b)
{
    uint16_t tempCORCON = CORCON;
    CORCON = 0x00e2;

    accA = __builtin_lac(a+b, 3);
    int16_t result = __builtin_sacr(accA, 4);

    CORCON = tempCORCON;
    return result;
}
''', '-c','-O1')
_bar:
	mov	_CORCON,w2
	mov	#226,w3
	mov	w3,_CORCON
	add	w1,w0,w0
	lac	w0, #3, A
	sac.r	A, #4, w0
	mov	w2,_CORCON
	return

在这种情况下’t, but it’我不清楚您是否可以依靠此C代码来工作—换句话说,编译器是否知道可以’t reorder an accumulator __builtin with respect to a volatile memory access.

We can force our add() function to not reorder by computing the result in a volatile local variable.

pyxc16.compile(r'''
#include <stdint.h>

extern volatile uint16_t CORCON;

int16_t add(int16_t a, int16_t b)
{
    uint16_t tempCORCON = CORCON;
    CORCON = 0x00e2;

    volatile int16_t result = a+b;

    CORCON = tempCORCON;
    return result;
}
''', '-c','-O1')
_add:
	lnk	#2
	mov	_CORCON,w2
	mov	#226,w3
	mov	w3,_CORCON
	add	w1,w0,w1
	mov	w1,[w15-2]
	mov	w2,_CORCON
	mov	[w15-2],w0
	ulnk
	return

Unfortunately this causes the compiler to put the sum on the stack rather than just stick it in the W0 register, as desired. It’要说服编译器按照自己的意愿进行操作非常困难,并确保正确完成了操作,这就是为什么在这种情况下,模式位确实很痛苦的原因。

您也可以尝试使用“barriers”强制编译器在其计算中采取某些依赖关系约束。这些基本上是空的内联汇编块,它们使用扩展的汇编语法来表达这些约束,但是要确保您的代码正确可能会非常棘手。

pyxc16.compile(r'''
#include <stdint.h>

extern volatile uint16_t CORCON;

int16_t add(int16_t a, int16_t b)
{
    uint16_t tempCORCON = CORCON;
    CORCON = 0x00e2;

    asm volatile("" :"+r"(a));
    // don't actually do anything, but tell the compiler 
    // that the value of "a" might depend on this assembly code

    int16_t result = a+b;
    
    asm volatile("" ::"r"(result));
    // don't actually do anything, but tell the compiler 
    // that this assembly code might depend on the value of "result"
    
    CORCON = tempCORCON;
    return result;
}
''', '-c','-O1')
_add:
	mov	_CORCON,w2
	mov	#226,w3
	mov	w3,_CORCON
	add	w0,w1,w0
	mov	w2,_CORCON
	return

CORCON和定点

最后,XC16 C编译器用户的12.3节中的注释之一’s Guide is that the compiler also 管理 CORCON in some cases:

由编译器管理的特殊寄存器为:W0-W15, RCOUNT,状态(SR),PSVPAG和DSRPAG。如果启用了定点支持,则 编译器可以分配A和B,在这种情况下,编译器可以调整CORCON。

In the code below, the compiler adds its own push _CORCONpop _CORCON instructions at the beginning 和 end of the function, but doesn’t seem to modify CORCON和 ordering of the computations get rearranged (the CORCON = tempCORCON translates into mov w2,_CORCON which executes before any of the _Fract/_Accum code even runs)

pyxc16.compile(r'''
#include <stdint.h>

extern volatile uint16_t CORCON;

int16_t baz(int16_t a, int16_t b)
{
    uint16_t tempCORCON = CORCON;
    CORCON = 0x00e2;

    _Fract af = a;
    _Fract bf = b;
    _Accum acc = 0;
    acc += af*bf;
    _Fract result = acc >> 15;
    
    CORCON = tempCORCON;
    return (int16_t)result;
}
''', '-c','-O1', '-menable-fixed')
_baz:
	push	_CORCON
	mov	_CORCON,w2
	mov	#226,w3
	mov	w3,_CORCON
	mov	w2,_CORCON
	cp0	w1
	mov	#0x8000,w2
	btsc	_SR,#0
	mov	#0x7FFF,w2
	btsc	_SR,#1
	clr	w2
	cp0	w0
	mov	#0x8000,w1
	btsc	_SR,#0
	mov	#0x7FFF,w1
	btsc	_SR,#1
	clr	w1
	mul.ss	w2,w1,w4
	sl	w4,w4
	rlc	w5,w0
	mov	w0,w1
	clr	w0
	asr	w1,#15,w2
	mov	#15,w3
	dec	w3,w3
	bra	n,.LE18
	asr.b	w2,w2
	rrc	w1,w1
	rrc	w0,w0
	bra	.LB18
.LE18:
	mov	w1,w1
	asr	w1,#15,w0
	pop	_CORCON
	return

The _Accum_Fract types have certain semantics as defined in ISO C提案N1169(扩展以支持嵌入式处理器, 但是我’我对它们还不够熟悉,无法给您有关如何使用它们的建议。

包起来

我们讨论了如何不与内联汇编中的C编译器进行交互— basically don’因为编译器一直都在假设某些有关CPU状态的信息,所以会弄乱编译器管理的任何CPU资源。这些资源包括核心CPU寄存器和堆栈。

As a metaphor for this dangerous behavior, I cited the March 2018 traffic fatality involving a jaywalking 行人 和 a 自动驾驶汽车. Improper 假设 can be deadly. Don’绕过编译器。请注意


©2019 Jason M.Sachs,保留所有权利。


杰森·萨克斯(Jason Sachs)上一篇文章:
   Shibboleths:无声的静摩擦音,白光和其他二进制结果测试的危险
杰森·萨克斯(Jason Sachs)下一篇文章:
   竞速入睡

[-]
评论者 mr_bandit2019年12月17日

优秀的文章!基本上:

  • 您不要混淆编译器
  • 您应该了解优化结果
  • 您应该了解编译器的动作可以随着版本的变化而变化
  • 你不要做愚蠢的事情

解决此问题的另一种安全方法是在ASM中编写一个函数并控制所有内容。用C编写它的问题是隐藏了结果-正如Jason清楚而专业地表明的那样。 

This 是个kind of "technique" that can take 天(如果真的很幸运) 找到,甚至可能直到某个东西才寻找 特别糟糕 (tm)发生-就像有人死了一样。

杰森:做得好!

[-]
评论者 乔里克 2019年12月17日

我绝对同意,您不应弄乱编译器管理的资源。我使用的编译器IAR实际上允许内联汇编器使用一种语法,该语法告诉编译器使用了什么以及破坏了什么。即便如此,我从未发现有任何理由在我编写的执行了寄存器和堆栈转储的崩溃处理程序之外使用内联汇编程序。甚至特殊的寄存器也具有编译器提供的固有功能。我发现,如果我需要优化代码以提高速度,则可以在C语言中完成,并且与尝试在汇编器中进行优化相比,优化器将生成更好的代码(它知道一些我不知道的技巧)。

[-]
评论者 纳德勒 2019年12月17日

优秀的杰森,谢谢!
同样,只有被迫使用内联汇编的时间才是用于OS内部和崩溃转储的时间。
And then, one must carefully check assembly output to ensure no missing 障碍 or 其他 errors...
再次感谢!

要发布对评论的回复,请单击每个评论所附的“回复”按钮。要发布新评论(而不是回复评论),请查看评论顶部的“写评论”标签。

注册后,您可以参加所有相关网站上的论坛,并获得所有pdf下载的访问权限。

注册

我同意 使用条款隐私政策.

试试我们偶尔但很受欢迎的时事通讯。非常容易退订。
或登录