Thought for the Dazed

I've had to give up that Distance Learning course as I was having trouble seeing the teacher.

Flickr
www.flickr.com
RobMiles' items Go to RobMiles' photostream
Twitter
C# Yellow Book

Search entire site
« Over Compressed Audio | Main | Save Dalby Forest »
Tuesday
Feb012011

Reference and Value Types

IC 2007 Fri Trip f10 124

 

I reckon that the day I give a lecture and don’t learn anything is the day that I will give up teaching. I always take something away from a lecture, although sometimes it is only a decision not to use that particular joke again…

Today I was telling the first year about reference and value types in C# and I learnt something as well. For those of you who are not familiar with programming in C# (and why should you be?)  this is all about how data is held in a program.

Once you get beyond programs that do simple sums you find yourself with a need to lump data together. This happens as soon as you have to do some work in the Real World™. For example, you might be creating an account management system for a customer and so you will need to have a way of holding information about a particular customer account. This will include the name of the customer, their address and their account balance, amongst other things.

Fortunately C# lets you design objects that can contain these items, for example a string for the name, a number for the balance, a string for the address and so on.  In fact, C# provides two ways that you can lump data together. One of these is called a struct (short for structure) and the other is called a class (short for class). These two can be hard to tell apart, in that the way that they are created is exactly the same. But they have one very important difference. Structures are managed by value, but classes are managed by reference.

Today is the point in the course where I have to explain the difference between the two.  I’ve got a routine for doing this which I’ve used in the past, and it usually gets there. If an item is managed by value (for example a struct) you can think of it as a box with a name painted on it.  If we move data between two variables managed by value:

destination= source;

- the result is that whatever value is in the source box is copied into the destination box. If my source is a structure value which contains lots of elements all of these are copied into the destination. This is how simple variables such as integers and floating point values are managed.

However, if an item is managed by reference the effect of the assignment above is different. You can think of a reference as a named tag which is tied to an object in memory. If I assign one reference to another:

destination = source;

- the result of this is that both reference tags are now tied to the same object in memory.  No data is actually copied anywhere.

At this point in the explanation I usually have a room full of people wondering why we bother with references. They just seems to be an added piece of confusion. Now that we have references we have the potential for problems when the equals behaviour doesn’t do what we expect.  Why do we have these two ways of working with data? Why can’t we just use values for everything?

My answer to this is that using references allows us to provide different views of data. If I have a list of customers that I want to order by both customer name and account number then this is not possible with a single array of values. But if I use references it is much easier. I can have a list of references which is ordered by name and another list ordered by account number.

So far I’m going by the slides. But then it occurred to me to go a bit further, and think about the use of reference and value types from a design point of view. If I’m designing a data structure for a  sprite in a game (for example a single alien in a Space Invaders game) the sprite will have to contain the image to be used to draw the sprite and the position of the sprite on the screen. I posed the question which of these two elements should be managed by value and which by reference.

After some discussion we came to the conclusion that it is best if the image to be used to draw the sprite is managed by reference. That means that a number of sprites can hold references to the same sprite design. You see this a lot in computer games, where a game has multiple identical elements (soldiers, cars, spaceships etc) it is often the case that they are all sharing a single graphic. However the position of the sprite on the screen is a value that should be unique to each sprite, we are never going to want to share this, and so the position should be a value type.

We then went through a bunch of other situations where an object contains things, and pondered for each thing whether it should be managed by value or by reference. Generally speaking we came to the conclusion that anything you want to share should be managed by reference, but stuff that is unique to you should be a value.

Of course references bring a lot of other benefits too, which we will explore in the next few weeks, but the thing I learnt was that the more you can show a context in which a particular language characteristic is applicable the more chance you have of getting the message across.

As a little puzzle, one thing we talked about was the storage of the address of a customer in our account database. Should that be managed by value or reference, and why?

Reader Comments (6)

Jon Skeet's book "C# in Depth" is very hot on this topic. See also: http://www.yoda.arachsys.com/csharp/parameters.html
February 2, 2011 | Unregistered CommenterBri
Hey Rob, I was ready to say ( Address = String = Reference Type! ) but then I went back to your text and now I'm a bit confused. I guess I've never used structs before and that is whats making me go back and forth on saying it should be managed by reference, if I've got a struct representing the whole customer and use a string to represent the customer's address, can I say that I'm managing it by value? if so, then I'd say it should be managed by value, if by the fact that the customer address is represented by a string it is managed by reference, then I can't see why/how it could be managed by value...

So as an answer, I'd say by reference, cause I don't know how to do it in any other way!

Well I've rewrote this post at least 20 times, excuse my bad english I hope I've made my point clear, curious to find out what the answer is...
February 2, 2011 | Unregistered CommenterBreno
Reminds me distinctly of "Replace Data Value with Object" in Martin Fowler's "Refactoring" book.

Am enjoying lurking on the blog btw, keep it up!
February 2, 2011 | Unregistered CommenterTim Barrass
"Should that be managed by value or reference, and why?"

In my head a little voice is saying "Read the specification from the customer, how do they want to use the address data?"
The lecture reminded my so much of working with relational databases in that referencing is a great way to stop data redundancy. In the case of a shopping website the database table for the shipped orders won’t hold the address each order was shipped to... it will hold a reference to the customer’s address that’s only held once in a different table.
We would do the same in our Arrays wouldn’t we? Have an array of customer details and an array of shipped orders and use a reference between the 2? (I use a orders and shipping just as an example)
February 2, 2011 | Unregistered CommenterDuncan
Some good thoughts.

address = string = reference type is a good start, but strings are a bit strange. They are managed by reference, they are immutable, which is not quite the same thing as a pure reference. I'll do a post about immutable as soon as I've told the first year about it...

By address I really meant some collection of data that holds the street, the town and the postcode etc. In this situation I reckon the address objects should be managed by reference, since several people might live at the same address and it would be best not to store multiple copies. This has the useful side-effect of making it very easy to detect when two people live at the same address, since they will contain references that refer to the same instance.

The point about databases is also well made. A reference equates very well to the value of a primary key in a database table, in that it lets you get to a particular item in that table. One of the difficulties in programming is that we store our data in databases but we work on it as classes. This is where things like LINQ are so useful.
February 2, 2011 | Registered CommenterRob
This gave me something to think, I guess the answer lies in your text and I didn't realize it before "Generally speaking we came to the conclusion that anything you want to share should be managed by reference, but stuff that is unique to you should be a value.".
February 3, 2011 | Unregistered CommenterBreno

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.