April 27

Pointers, Arrays, and Functions in Arduino C

Array Memory Diagram

Now that we’ve completed our introduction to pointers, I had really wanted to move on and wrap up our section on using an EEPROM with the I2C protocol today. However, I feel like I would be doing a disservice to you without elaborating further on why we would even want to use pointers in the first place.

Just to recap, let’s look at some simple code to demo the syntax of using a pointer:

int myVar = 10;
int *myPointer;
myPointer = &myVar;
*myPointer = 20;

If you were to compile this code and run it, you would see that at the end myVar’s value would now be 20 even though you’ll notice we never set myVar itself to 20. We accomplished this by referencing our pointer, myPointer, to the memory address of myVar using the reference operator (&). We then dereferenced our pointer by using the dereference operator (*) with our pointer and setting its value to 20.

Now, the obvious question you probably have is, “Why in the heck would I want to do that?”

The example above is more of a toy, obviously contrived, but there are very real reasons why you would want to do this, especially when you’re running a microcontroller like the Arduino and you have to handle a lot more low-level operations. To see the value in pointers, you’ll first need to know something about how functions work in C.

I want to keep this explanation of functions at a high-level to keep the concepts easy to understand. For now, just know there are two ways to call a function: by value and by reference.

Function Call By Value:

Pass by Value lvalue-rvalue diagram
Pass By Value- Note how the rvalues are copied

We’ll start with the easiest one- easiest because it’s the one you’re most familiar with; you’ll see that a function call by reference isn’t particular difficult either.

int sum(int x, int y) {
	int sumZ = x + y;
	return sumZ;
}

 void main {
	int a = 2;
	int b = 3;
	int sumAB = sum(a,b);
}

Start by reading this kind of code like a computer would: with the main function. We set a = 2 and b =3, we then go to get sumAB by calling the function sum(a,b). When we call that function, we replace a with the value of a (i.e. 2) and b with the value of b (i.e. 3), so we could just as easily say int sumAB = sum(2, 3);

In the sum function we created, we set x = 2 and y = 3 inside the function due to the above arguments that have been passed to it. This is called passing by value because we’ve merely passed the values of the variables. Inside the function those values passed to it (the values of a and b in this case) are copied to its variables (x and y in our example).

You’ll see this can actually be a problem if we wanted the function to actually do something with those original variables (like change them). The function that has had its arguments passed by value can’t actually change the values of the arguments themselves because all it has access to is the function’s own copy of those values (i.e. x and y have a different lvalue from a and b even though their rvalues are the same).

So yeah, it can change the values of x and y, but it doesn’t affect the values of a and b because they reside at a different location in memory. Additionally, the values of x and y cease to exist as soon as we exit the stack (i.e. right after we return sumZ).

Thankfully, there’s a way around this: enter call by reference.

Function Call by Reference:

Pass by reference lvalue-rvalue diagram
Pass by reference: Note how the lvalue (memory address) is copied

So what if we want to actually change the parameters that we are passing to a function? Or what if we simply want to return more than one value? How can we escape the box that is the function’s call stack? That’s where call by reference (pointers) comes to the rescue.

void addOne (int *numA) {
	*numA = *numA + 1;
}

void main() {
	int varA = 15;
	addOne(&varA);
	Serial.println(varA);
}

Let’s again think about what’s happening here. We are taking a variable, varA, and extracting its memory address (its lvalue) with the reference operator, &. We are then passing that memory address in for the parameter numA in the function addOne. You can think of this as being the equivalent of declaring and initializing the pointer: int *numA = &varA. Inside the function, we are given direct access to the value stored in varA by dereferencing our numA pointer. The result of this program is the console prints 16 now stored in varA.

In a much more general (and I dare say enlightened sense), another way you can think of this is that in a way, this really is the same as call by value where we simply pass the rvalue of our parameter off to the function’s internal variables. The key difference is that for a pointer, its rvalue is simply the memory address of what it points to, therefore a memory address is what gets copied to the function!

Advanced Topic: This is the perfect opportunity to introduce this. In programming, particularly the C family of languages, there are two distinct categories of variables: value type variables and reference type variables. Now that you’re an expert on function call by value, function call by reference, and pointers, you can appreciate where the terms come from. Value type variables are variables where the rvalue stores an actual value (like an int storing the value 10). Value type variables tend to be associated with the “primitives”- primitive variables like int, char, byte, etc. Reference type variables store a memory address in their rvalue.

Arrays:

You’re undoubtedly familiar with the usual way of looping over an array, where we simply increment the index:

Standard for loop over an array
Standard for loop over an array’s index

But what if I told you, there was another way? Take a look at the following code:

Looping over an array using pointers
Looping over an array using pointers

They have the exact same output!

Serial output showing memory address and value of an array.

But why?! How?!

Look closely at the line where we print our value: Serial.println(*(numArray + i)); That looks like a pointer doesn’t it? Well, that’s because it is. Let’s dissect this a little more and look inside the parentheses. We know I is an int so as this loop progresses we’re adding + 0, +1, +2, etc. What does that tell us about numArray then? Well, that means it has to be a number of some sort for the addition operation to make any sense. But what kind of number? Well, we know this is a pointer so it must be what? That’s right, numArray is an address!

Now that you understand how pointers work, you now understand the implications of what this means. It means that when you define an array, what you’re actually doing is defining a pointer. The name of an array itself, such as numArray, is actually a memory address! If you read that advanced topic blurb above, the implication is that arrays are actually reference type variables and are therefore inherently passed by reference when used in functions.

Before we close this page of the notebook, I want to highlight a “gotcha”. Let’s say numArray had a memory address of 2288 as it apparently does from my screenshot above. If I = 1, why is it on the second iteration of the loop the address is 2290 and not 2289? The reason is because of how the compiler handles pointers. You see, when you define the array initially, the compiler is smart enough to allocate the memory based on the size of the data type used. In our case, we used ints which, in Arduino C, are two bytes long. When you iterate a pointer, the compiler is smart enough to multiply the iteration by the size of the data type for the next memory address. Therefore we start at 2288 and the next memory address for our next item in the array is 2290, followed by 2292, 2294, and so on:

Array memory diagram showing memory addresses.
April 25

Variables, Pointers, and Indirection in Arduino C

Before we continue on with learning about the I2C protocol and our EEPROM project, we need to discuss variables: what they are and what goes on behind the scenes. Knowledge of how variables work and the use of pointers and indirection with arrays will serve us well when it comes time to read from our EEPROM. Let’s begin.

Anatomy of a Variable:

1. What is a variable?

Simply put, variables hold data. More specifically, a variable holds data of a specific data type. For example, an int holds an integer, a string contains a collection of chars, etc.

2. What goes on behind the scenes when a variable is defined and when it is assigned?

When you define a variable, the compiler goes and checks the symbol table (basically a list of variables that have previously been declared) to see if that variable already exists. If it doesn’t, the compiler goes ahead and adds the new variable to the list.

Say, for example, you add the following statement:

int myVar;

Since our variable has not already been declared (it doesn’t already exist in the table), the compiler updates the symbol table so it now looks like this:

Symbol table with myVar declared (but not yet defined) since it lacks a location in memory (lvalue).
Symbol table after myVar declared- note the lack of an lvalue. This is because myVar is not yet defined. rvalue is also unknown because we haven’t assigned a value to myVar yet.

Now, technically, the variable has only been declared at this point- it’s missing an actual location in memory. To get this location in memory, the compiler requests a place to put this variable from the system’s memory manager. The memory manager then responds with a memory address which the compiler then adds to the symbol table for that variable. This memory address is known as an lvalue (lvalue = location value) and it merely represents where the variable can be found in memory. With this addition of the lvalue to the symbol table, our variable is now defined:

myVar now defined in the symbol table (myVar now has an lvalue).
Symbol table with myVar defined- this means that the variable now has a location in memory (lvalue).

With our new variable defined, we can now move on to storing a value in it. Fortunately, assigning a value to a variable is rather straightforward. When we assign a value to a variable, we directly navigate to the variable’s location in memory (the lvalue) and update the memory at that address with the new value. The data that’s actually stored in memory is known as the rvalue (rvalue = register value).

Continuing our example with the following assignment statement:

myVar = 10;

With this assignment, our symbol table now looks like this:

myVar after rvalue assignment
Symbol table after assignment- note the updated rvalue which holds our data value.

Another way to visualize what we have just gone over is with an lvalue-rvalue diagram:

lvalue-rvalue diagram for a value type variable
lvalue-rvalue diagram

This diagram is why you will see some people refer to the memory address as the “left value” and the actual data value as the “right value”.

  • There’s also an important caveat here: in Arduino, and C in general, there is no duty to clear that rvalue at our variable’s lvalue when we define it. Therefore you should always assume that a variable’s value contains whatever garbage was originally in that memory location unless we’ve explicitly assigned a value to the variable. (i.e., Don’t assume it’s 0 or null). Therefore it’s probably best to go ahead and initialize your variable with a value when you define it.
    Let’s summarize: Whenever your program needs to use the value stored in a variable, it uses the variable’s lvalue to go to that memory address and retrieves the data (rvalue) from that memory location.

Pointers:

Now that we’ve covered what variables are and how they really work, we’re ready to understand pointers. Simply put, a pointer is nothing more than a variable that references the memory address of another variable. Using the terminology that we’ve just learned, a pointer is a variable whose rvalue is the lvalue of another variable.

To visualize this, let’s take a look at two lvalue-rvalue diagrams representing the value type variable myVar and the reference type variable myPointer:

myPointer referencing myVar - Notice how the rvalue of myPointer is the memory address of myVar.
myPointer referencing myVar – Notice how the rvalue of myPointer is the memory address of myVar.

Declaring a Pointer:

Declaring a pointer variable is rather straightforward:

int *myPointer;

The type specifier (int in this case) must match the data type of the variable the pointer is to be used with. The asterisk indicates to the compiler that myPointer is a pointer. Since whitespace doesn’t really matter in C, the asterisk can be placed anywhere between the type specifier and the pointer variable name so you will sometimes also see: int* myPointer, int * myPointer, etc.

The Address-Of Operator:

By itself, a pointer that is defined but does not actually point to anything is a pretty pointless pointer (ha!). To point it to the memory address of another variable we simply need to assign the pointer the memory address of that variable. But where do we get the memory address from? That is, where do we get the lvalue of myVar from? Enter the address-of operator (&).

The address-of operator is a unary operator that returns the lvalue of a variable.

Pointer Assignment:

To point our new pointer at the memory location of our value type variable, myVar, we simply call the following statement:

myPointer = &myVar;

This completes the link shown in the previous diagram and is known as referencing. It is for this same reason that the address-of operator (&) is also known as the “referencing operator“.

Whenever you are learning a new concept, it’s a good idea to try it out yourself to prove to yourself what you’ve read. Let’s mock up an example of what we’ve learned so far in the Arduino IDE:

void setup() {
  Serial.begin(9600);
  
  int myVar = 10;  // Initialize a variable.
  
  Serial.print("myVar's lvalue: ");
  Serial.println((long) &myVar, DEC);  // Grab myVar's lvalue
  Serial.print("myVar's rvalue: ");
  Serial.println(myVar, DEC);
  Serial.println();
  
  int *myPointer;   // Declare your pointer.
  myPointer = &myVar; //Assign myVar's memory address to pointer.
  
  Serial.print("myPointer's lvalue: ");
  Serial.println((long) &myPointer, DEC);  //myPointer's lvalue
  Serial.print("myPointer's rvalue: ");
  Serial.println((long) myPointer, DEC);  //myPointer's rvalue
}

void loop() {
}

Watching the serial monitor, what you should see is something like this:

Serial log showing that the rvalue of a pointer is the memory address of the value type variable it references.
Note that the rvalue of myPointer is the same as myVar’s lvalue.

Notice that myPointer’s rvalue is the memory address of myVar (i.e. myVar’s lvalue), just like it shows in the diagram.

Indirection (Dereferencing):

We just saw that a pointer can reference a location in memory by assigning that pointer a variable’s memory address using the reference operator (&). We can take this a step further and obtain the actual value stored at that memory address by dereferencing the pointer. This is also known as indirection and is accomplished via the indirection operator (*) with your pointer. Example:

*myPointer = 5; // Go to memory addressed stored in myPointer's rvalue (myVar's lvalue) and place the value 5 in that memory address.

Continuing off our previous Arduino code example:

void setup() {
  Serial.begin(9600);
  
  int myVar = 10;
  
  Serial.print("myVar's lvalue: ");
  Serial.println((long) &myVar, DEC);
  Serial.print("myVar's rvalue: ");
  Serial.println(myVar, DEC);
  Serial.println();
  
  int *myPointer;
  myPointer = &myVar;
  
  Serial.print("myPointer's lvalue: ");
  Serial.println((long) &myPointer, DEC);
  Serial.print("myPointer's rvalue: ");
  Serial.println((long) myPointer, DEC);
  Serial.println();

  *myPointer = 5;  //THIS IS OUR DEREFRENCING ADDITION.
  Serial.println("-----------------------");
  Serial.println("Updating *myPointer = 5");
  Serial.println();

  Serial.print("myPointer's lvalue: ");
  Serial.println((long) &myPointer, DEC);
  Serial.print("myPointer's rvalue: ");
  Serial.println((long) myPointer, DEC);
  Serial.println();

  Serial.print("myVar's lvalue: ");
  Serial.println((long) &myVar, DEC);
  Serial.print("myVar's rvalue: ");
  Serial.println(myVar, DEC);
  Serial.println();

}

void loop() {
}
dereferencing the pointer and assigning a value; we are able to manipulate the data stored in myVar
Notice that by dereferencing the pointer and assigning a value, we are able to manipulate the data stored in myVar.

Notice that nothing changed to myPointer at all (blue). Neither its lvalue nor its rvalue changed. Contrast that with myVar (red) which had it’s rvalue changed to 5 by the indirection operator we applied to our pointer.

That is the power of pointers and indirection. In my next journal entry, I will discuss pointers and arrays which will then allow us to finally move on to the last part of our EEPROM I2C project!