Security and the Cortex-M MPU, part 5: Step-by-step MPU security

February 06, 2017

Blog

Security and the Cortex-M MPU, part 5: Step-by-step MPU security

Previous blogs have presented an introduction to the MPU and terminology, MPU multitasking, defining MPU regions, and a software interrupt (SWI) API f...

Previous blogs have presented an introduction to the MPU and terminology, MPU multitasking, defining MPU regions, and a software interrupt (SWI) API for use with an MPU. In the first blog, privileged tasks (ptasks) and unprivileged tasks (utasks) were defined. The former run in privileged thread mode and the latter run in unprivileged thread mode. The mode of a task is determined by the umode flag in its TCB and takes effect when it is dispatched by the real-time operating system (RTOS) scheduler.

This blog presents a step-by-step procedure to provide memory protection unit (MPU) security to late- and post-project systems. It, of course, can also be applied to new projects. The goal is to achieve the reliability, security, and safety that modern embedded systems require, as they become connected to the Internet of Things (IoT).

The discussion that follows assumes the SMX RTOS and EWARM tool suite for the sake of specificity. However, the MPU-Plus software package can be ported to other RTOSs and tool suites. Although the following may look like a cookbook recipe, it is intended as a feasibility demonstration of adding MPU security to an existing product.

1. Start

To start, it is assumed that xmpu.c and xmpu_iar.c have been added to the RTOS library and that mpu.c has been added to the application project. Add a call to sb_MPUInit() near the beginning of the startup code, and temporarily disable loading MPU[6] and [7] in it. This turns on the MPU and enables its background region. Your application should run normally.

2. System regions

Next, define .sys_code and .sys_data sections. sys_code should contain all handler and interrupt service routine (ISR) shell code. If an ISR does not use a shell, then the ISR itself must be included. This is done as in the following examples for assembly code:

and for C code:

The background region (BR) macros are discussed in the next section. sys_data contains the system stack (also called the main stack by ARM Ltd.). Then, in the linker command file:

Of course, the actual sizes depend upon the application. They should be the next power of two that is large enough (if it’s not large enough, the linker will complain). The alignment must equal the size. Now enable loading sys_code into MPU[6] and sys_data into MPU[7] in sb_MPUInit(). These are permanent regions that are present for every task, and they allow privileged access only. Hence, they are not accessible by utasks.

3. BR switching

Unfortunately, the MPU does not have enough slots to serve utasks, ptasks, handlers, and ISRs. ARM Ltd. added the background region to permit pcode to access all memory, eliminating the need for pregions. However, using BR undercuts isolation and protection for ptasks. Therefore, we have developed a technique of switching BR on for handlers and ISRs and off for tasks.

The .sys_code and .sys_data regions allow exceptions and interrupts to be serviced up to the point where MPU_BR_ON() turns on the background region. Then, all necessary code and data can be accessed by the handler or ISR to perform its function. When done, MPU_BR_OFF() turns off the background region if mpu_br_off is on. At this point in the conversion, mpu_br_off is always off, so BR is always enabled. Hence, the application should still run normally – nothing has changed. These macros add a total of 13 cycles, worst case, to each handler and ISR.

4. Super regions

The next step is to define super regions for SRAM, ROM, and DRAM for your system. These serve as temporary replacements for BR until task-specific regions are defined. Consult the linker map to determine the starting address and how much memory is being used in each of these memory areas. Then, pick the next larger power of two for the size. The following template is an example for an existing system:

Super regions encompass all other regions in their memory areas. Hence, it is simpler to use physical addresses and sizes as shown above.

This template is loaded into the memory protection array (MPA) for each task, after the task is created. When a task’s MPA is loaded, its mpav flag is set. When a task is started, or resumed, the global flag, mpu_br_off, is set if mpav is 1, or it is reset if mpav is 0. PendSV_Handler() turns BR off before starting the current task if mpu_br_off is set, else BR is left on. Note that this handler is running in the sys_code region, so it does not need BR. Hence, a task with mpav on runs only in the super regions and its own task stack.

If a task gets a memory manage fault (MMF), then the task needs access to something else, such as a peripheral. In this case, put the additional region into MPA[3], as shown above. Or, merely disable loading mpa_tmplt_app for that task so it will run with BR and deal with the problem later.

Note that whenever an exception or interrupt occurs, BR is turned on for the handler or ISR, then BR is left on or turned off when it is done, depending upon mpu_br_off and whether the handler or ISR is nested, in which case BR is left on.

At this point, it is desirable, but not necessary, to have all tasks running with BR off.

5. Cropping region sizes

Using subregions allows tightening region boundaries and reducing wasted memory. In the example above, MPA[1] is 256 KB, but actual SRAM used is 210 KB. The subregion size is 32 KB. Cropping subregion 7 (N7 above) reduces MPA[1] size to 224 KB, which is big enough. Only 14 KB is wasted – still a lot, but security costs!

A significant gain has been made at this point: handlers and ISRs are running as they were before, but all or most tasks are running in reduced memory spaces with strictly controlled attributes (e.g. read only (RO), execute never (XN), etc.) This is likely to reveal errors you didn’t know you had. In addition, large unused memory areas are protected from access by wild pointers and malware.

6. Task-specific regions

The next step is to identify the most untrusted or vulnerable task or group of tasks that you wish to isolate from the rest of the system. This might be a networking subsystem or third-party code. We recommend an incremental approach to improving system security. Significant gains can be made by isolating one bad actor at a time, and as you go, your skill at this will improve. So, start easy. For simplicity, in the following discussion, we will assume that a single task, taskA, is being isolated. See previous blogs for how to define sections, blocks, templates, MPAs, etc.

The first step is to group code and data into task-specific regions and to define blocks in the linker command file to hold these regions. These regions are separated from the app regions defined previously. It is convenient to name them after the task (e.g., taskA_code and taskA_data). If not already the case, it may be helpful to put all task-specific code into a single module and it may also be helpful to put task-specific data, if any, at the start of the same module.

Next, define common code and data regions to hold RTOS and other system services and to hold common data needed by them. These might be named pcom_code and pcom_data, respectively. At this point, taskA is a ptask, so pcom_code needs to include RTOS and other system services needed by taskA, and pcom_data needs to include data needed by these services.

Then, create mpu_tmplt_taskA and add code to load it into the MPA for taskA. This code is normally placed after the smx_CreateTask() call for the task. At this point, the mpa_tmplt_app has been replaced by mpu_tmplt_taskA for this task. taskA is standing alone and is partially isolated from all other tasks. Will it run? This is where the tire meets the road. Memory manage faults (MMFs) from taskA are likely to be due to references outside of its regions or attribute violations (e.g., writing to ROM.)

The C-SPY debugger is helpful in tracking these down. Put a breakpoint at the start of the MMF_Handler() so that execution will stop immediately on an MMF. In the Registers window, open System Control Block. The CFSR register shows the causes of all faults (see ARM Application Note 209). The PC register points to a faulting instruction. The MMFAR register shows the address of a data violation. The Memory Protection Unit in the Registers window allows looking at the MPU. To see a specific slot, enter its number into MPU_RNR. RBAR and RASR will display the desired region in easy-to-read form.

Solving MMFs may consist, in many cases, of just moving taskA-specific code and data into .taskA_code and .taskA_data regions, respectively. Assigning regions to tasks is task-specific. Some tasks may not need some of the standard regions but may need other regions, such as I/O regions. MPU[0] is tentatively reserved for a system region common to all tasks that would hold common subroutines (e.g., a C library), common tables, text strings, etc. This would be a read-only region so it should be safe from tampering.

However, if there are not enough task regions, MPA_SIZE may be increased to 6, which results in all MPAs having six regions. Another alternative is to split tasks into smaller tasks that require fewer regions. For example, a task doing both input and output might be split into an input task and an output task, linked by a message exchange or a pipe. Then, the input task requires only an input region and the output task only an output region.

7. umode operation

The final step is to move taskA to umode. This is done by setting its umode flag. Now when it is dispatched, PendSV_Handler() will set CONTROL = 0x3, which causes the processor to run in unprivileged thread mode using the task’s stack. In addition, add #include xapiu.h ahead of the task’s code in its module. This forces the software interrupt (SWI) application programming interface (API) to be used for RTOS service calls and possibly other system service calls.

Before actually running taskA, mpu_tmplt_taskA must be changed. taskA_code, taskA_data, and the taskA stack regions should stay the same. However, replace the pcom regions with ucom_code and ucom_data. The first contains the system service shells in xmpu.c. You may need to move routines from pcom_code to ucom_code and move data from pcom_data to ucom_data. This may not be possible if these routines and data are used by other ptasks. Solving this problem may require remedies such as:

  • Simultaneously converting other ptasks to utasks.
  • Splitting taskA into a ptask and a utask.
  • Moving common routines to MPU[0] and making it accessible in umode.
  • Replicating common routines with different names.
  • Passing global values via messages or pipes.

When taskA first starts running as a utask, you are likely to see PRIVILEGE VIOLATION errors indicating that restricted service calls are being made. This may necessitate recoding to not use these services. Or, it may work better to split taskA into a ptask, which calls these services (e.g., TaskCreate()) and a utask, which does not. Alternatively, taskA could start as a ptask, make all of the restricted service calls, then restart itself as a utask (it must restart itself so that the PendSV_Handler() will change CONTROL to 0x3).

Once you get taskA running as a utask, you have a task which cannot harm critical system resources. It can only access its own code, data, and stack, plus common code, which may consist only of system service shells, and possibly only common data shared with other utasks in its subsystem. This final solution may not be perfect, but it is a big improvement over doing nothing.

8. Final tuning

If all has gone well, all untrusted code is running in utasks and trusted code is running in ptasks. Furthermore, ptasks are isolated and protected from each other, as well as from utasks. Unfortunately, there is a chink in the armor: handlers and ISRs can access everything via BR. Handlers are internal and therefore relatively trustworthy. Of particular concern are ISRs, which can be manipulated by hackers from outside of the system. It is therefore desirable to move ISRs entirely within the sys_code region, if possible, and to not enable BR for them. If this is not possible, try to do most of their work in tasks – preferably utasks.

Unfortunately, there is no way to block access to the private peripheral bus in privileged mode. Hence, if malware can gain access to pmode, it can change the MPU and then access anything it wants. The strongest protection is to move as much code into umode as possible, reducing vulnerability to a small amount of code that can be fortified.

Why ptasks?

The case for utasks is obvious, but what about ptasks? The following are reasons why ptasks may be necessary:

  • Avoid changing highly-trusted, critical software with a high debug investment
  • Better performance
  • Direct access to all operating system (OS) and board support package (BSP) services
  • Direct access to hardware
  • Stepping stones to utasks

Conclusion

There is a step-by-step process to incrementally improve the security of Cortex-M embedded systems that have MPUs, and this process can be performed on already-released systems. Although some recoding and restructuring may be necessary, it is likely to be minor, and there are many remedies for problems that arise. Furthermore, proceeding in an incremental manner with frequent testing helps to ensure that new bugs are found and fixed as soon as they are introduced. Being able to easily shuttle tasks between umode and pmode helps further in tracking down problems.

Ralph Moore, President and Founder of Micro Digital, graduated with a degree in Physics from Caltech. He spent his early career in computer research, then moved into mainframe design and consulting.

Micro Digital

www.smxrtos.com/mpu

[email protected]

Ralph Moore, Micro Digital
Categories
Security