RtOS -implementing it: be cooperative

In a previous part i have shown You how little task switch differs from an interrupt. In this part I would like to show You how does preemptive, interrupt like task switch differs from a cooperative one.

The primary difference is…

… in the fact that preemptive task switch is basically an interrupt while cooperative one is a call to yield() subroutine. How could that subroutine look like? Well… maybe like this:

subroutine yield()
  push R0  ; store first register
  ....
  push R15 ; store last register
  save SP of current task to some variable
  ....
  load SP for next task from some variable
  pop R15
  ...
  pop R0
  pop PC  ; an equivalent of return from subroutine.

and it is used just like that:

call yield() ; which is in fact:
             ; push PC
             ; PC = yield()

Note: Do You remember what PC and SP are? PC is Program Counter and SP is a Stack Pointer. Please refer to previous part for an explanation.

Is there anything wrong with it?

No. It is a fine cooperative task switch.

The only problem with it is that it is unnecessarily expensive.

Calling convention

The “calling convention” is an agreement made between Yourself and Yourself or between Your C-compiler and itself about how parameters are passed to subroutines and how registers and other CPU resources are used.

For an example the “interrupt calling convention” says:

  1. No parameters are passed.
  2. No registers may be changed by called subroutine.
  3. No stack content may be changed by called subroutine.

Our yield() subroutine takes no parameters and returns no value, so we don’t care about passing parameters. So what a calling convention would tell us?

For an example it may be like that:

  1. Registers R12…R15 can be freely changed by called subroutine and a caller may take no assumptions about their content.
  2. Registers R0…R11 may not be changed by called subroutine and at the return from the subroutine they must have the same value as before a call.
  3. The content of stack may not be changed.

Caller save, called save…

The register listed in first point of that convention are so called “caller save” registers. If a code which is calling a certain subroutine x() likes to preserve their content it must save it by itself, while to x() subroutine may make any use of them:

 ... some code
 push R12
 push R13
 push R14
 push R15
   call x()
   ; R12... R15 can be virtually anything.
 pop R15
 ....
 pop R12
 ...

subroutine x()
  R15 = 10    ; no need to preserve it.
  R14 = R15+5
  return

On the contrary the registers listed in point two of the calling convention are named “called save register” (I used to use name “callee save” until I figured out that I can’t spell this right in English). The code which calls a subroutine x() may assume that they are not changed, but if x() needs to use them it is up to x() to preserve them.

R10 = 5
call x()
; R10 is still 5

subroutine x()
 push R10
  R10 = ....
 pop R10
 return

Why such strange calling convention?

Because it is efficient.

If You have a CPU like MSP430 or ARM which has a plenty of registers You will usually end up with even more wired convention, like for an example:

  1. Registers R0…R4 are reserved for interrupts only. No main code may use them, but interrupts may use them without a need of saving anything on stack.
  2. Registers R5…R11 are “called save”.
  3. Registers R12…R15 are “caller save”.
  4. Stack cannot be changed.

This type of calling convention allows super fast interrupt routines. In my experience four registers are usually enough for what happens within interrupts and the fact, that You do not have to save them on stack allows You to:

  • save some microseconds on interrupt entry/leave code;
  • put a less load on stack or avoid stack switching.

Remember, if You need to save registers on stack there must be enough space. And since interrupt is roughly happening randomly it means, that at each and every moment You need to have enough free space on stack. Or eventually switch stack to dedicated for interrupts. I will return to it later in subsequent blog posts.

The four caller save registers are usually used to pass arguments and return values. I also did observe that in a code at an assembly level You usually need something what can be called a scratch-pad space. Registers which You use to compute something and the throw them away. Four registers is a good guess, and since they are often thrown away there is no point in preserving them.

And if You are a king of an assembler…

… then You may define a dedicated calling convention for yield(). I usually used such one:

  1. All registers may be changed by called subroutine.
  2. The content of stack must be preserved.

In this calling convention no register is preserved, but I found out that it is usually a very good convention. In most cases I called yield() in such places in which I finished one part of work and was preparing to start an another one, so there was nothing what needed to be preserved.

Why should I care about calling convention anyway?

Because if You will either obey it or force Your compiler to obey it then the yield() and a task switch can be used as a plain regular function call. Like, for an example in pseudo-C:

extern void yield();
void task1()
{
  for(;;)
  {
    ...
    yield();
  }
}

How does calling convention impact task switch?

Directly and in a simple way: You need to save just the “called save” registers.

Let us compare the first calling convention and the one which is used by kings of assembler:

subroutine yield()
 push R0
 push R1
 ....
 push R11
 save SP somewhere 
 load SP from somewhere
 pop R11
 ...
 pop R1
 pop R0
 return
subroutine yield()
 save SP somewhere
 load SP from somewhere
 return

The left one yield() will need 13 elements on each task stack (12 registers + PC) while the right just one element on each task stack stack. It does matter on memory constrained devices because this value must be multiplied by the number of tasks.

And how much would preemptive switch need?

17 stack elements. 16 registers + PC.

I dare to say it is worth to consider a calling convention and learn how to inform Your C compiler about Your custom calling convention for Your yield() routine.

Summary

In this blog entry You learned what is the so called “calling convention” and how does it impact the cooperative task switch. You might also have noticed that if You can force a certain calling convention for yield() routine then the cooperative task switch may be extremely lightweight.

In next blog entry I will show You how exactly a complete but still limited RtOS kernel looks like. And I assure You, You will be surprised how tiny it is.

Leave a comment