www.delorie.com/gnu/docs/smalltalk/gst_92.html   search  
 
Buy GNU books!


GNU Smalltalk User's Guide

[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

4.11.1 How Arrays Work

Smalltalk provides a very adequate selection of predefined classes from which to choose. Eventually, however, you will find the need to code a new basic data structure. Because Smalltalk's most fundamental storage allocation facilities are arrays, it is important that you understand how to use them to gain efficient access to this kind of storage.

The Array Class. Our examples have already shown the Array class, and its use is fairly obvious. For many applications, it will fill all your needs--when you need an array in a new class, you keep an instance variable, allocate a new Array and assign it to the variable, and then send array accesses via the instance variable.

This technique even works for string-like objects, although it is wasteful of storage. An Array object uses a Smalltalk pointer for each slot in the array; its exact size is transparent to the programmer, but you can generally guess that it'll be roughly the word size of your machine. (36) For storing an array of characters, therefore, an Array works but is inefficient.

Arrays at a Lower Level. So let's step down to a lower level of data structure. A ByteArray is much like an Array, but each slot holds only an integer from 0 to 255-and each slot uses only a byte of storage. If you only needed to store small quantities in each array slot, this would therefore be a much more efficient choice than an Array. As you might guess, this is the type of array which a String uses.

Aha! But when you go back to chapter 9 and look at the Smalltalk hierarchy, you notice that String does not inherit from ByteArray. To see why, we must delve down yet another level, and arrive at the basic methods for creating a class.

For most example classes, we've used the message:
 
       subclass:
       instanceVariableNames:
       classVariableNames:
       poolDictionaries:
       category:

But when we implemented our CheckedArray example, we used variableSubclass: instead of just subclass:. The choice of these two kinds of class creation (and two more we'll show shortly) defines the fundamental structure of Smalltalk objects created within a given class. Let's consider the differences in the next sub-sections.

subclass:. This kind of class creation specifies the simplest Smalltalk object. The object consists only of the storage needed to hold the instance variables. In C, this would be a simple structure with zero or more scalar fields.(37).

variableSubclass:. All the other types of class are a superset of a subclass:. Storage is still allocated for any instance variables, but the objects of the class must be created with a new: message. The number passed as an argument to new: causes the new object, in addition to the space for instance variables, to also have that many slots of unnamed (indexed) storage allocated. The analog in C would be to have a dynamically allocated structure with some scalar fields, followed at its end by a array of pointers.

variableByteSubclass:. This is a special case of variableSubclass:; the storage age allocated as specified by new: is an array of bytes. The analog in C would be a dynamically allocated structure with scalar fields(38), followed by a array of char.

variableWordSubclass:. Once again, this is a special case of variableSubclass:; the storage age allocated as specified by new: is an array of C signed longs, which are represented in Smalltalk by Integer objects. The analog in C would be a dynamically allocated structure with scalar fields, followed by an array of long. This kind of subclass is only used in a few places in Smalltalk.

Accessing These New Arrays. You already know how to access instance variables--by name. But there doesn't seem to be a name for this new storage. The way an object accesses it is to send itself array-type messages like at:, at:put:, and so forth.

The problem is when an object wants to add a new level of interpretation to the at: and at:put: messages. Consider a Dictionary--it is a variableSubclass: type of object, but its at: message is in terms of a key, not an integer index of its storage. Since it has redefined the at: message, how does it access its fundamental storage?

The answer is that Smalltalk has defined basicAt: and basicAt:put:, which will access the basic storage even when the at: and at:put: messages have been defined to provide a different abstraction.

An Example. This can get pretty confusing in the abstract, so let's do an example to show how it's pretty simple in practice. Smalltalk arrays tend to start at 1; let's define an array type whose permissible range is arbitrary.

 
   ArrayedCollection variableSubclass: 'RangedArray'
       instanceVariableNames: 'base'
       classVariableNames: ''
       poolDictionaries: ''
       category: nil !
   RangedArray comment: 'I am an Array whose base is arbitrary' !
   !RangedArray class methodsFor: 'creation'!
   new
       ^self error: 'Use new:base:'
   !
   new: size
       ^self new: size base: 1
   !
   new: size base: b
       ^(super new: size) init: b
   ! !
   !RangedArray methodsFor: 'init'!
   init: b
       base := (b - 1).   "- 1 because basicAt: works with a 1 base"
       ^self
   ! !
   !RangedArray methodsFor: 'basic'!
   rangeCheck: i
       ((i <= base) | (i > (base + (self basicSize)))) ifTrue: [
           'Bad index value: ' printOn: stderr.
           i printOn: stderr.
           (Character nl) printOn: stderr.
           ^self error: 'illegal index'
       ]
   !
   at: i
       self rangeCheck: i.
       ^self basicAt: (i-base)
   !
   at: i put: v
       self rangeCheck: i.
       ^self basicAt: (i-base) put: v
   ! !

The code has two parts; an initialization, which simply records what index you wish the array to start with, and the at: messages, which adjust the requested index so that the underlying storage receives its 1-based index instead. We've included a range check; its utility will demonstrate itself in a moment:
 
   Smalltalk at: #a put: (RangedArray new: 10 base: 5) !
   a at: 5 put: 0 !
   a at: 4 put: 1 !

Since 4 is below our base of 5, a range check error occurs. But this check can catch more than just our own misbehavior!

 
   a do: [:x| x printNl] !

Our do: message handling is broken! The stack backtrace pretty much tells the story:

 
   RangedArray>>#rangeCheck:
   RangedArray>>#at:
   RangedArray>>#do:

Our code received a do: message. We didn't define one, so we inherited the existing do: handling. We see that an Integer loop was constructed, that a code block was invoked, and that our own at: code was invoked. When we range checked, we trapped an illegal index. Just by coincidence, this version of our range checking code also dumps the index. We see that do: has assumed that all arrays start at 1.

 
   The immediate fix is obvious; we implement our own do:
   !RangedArray methodsFor: 'basic'!
   do: aBlock
       1 to: (self basicSize) do: [:x|
           aBlock value: (self basicAt: x)
       ]
   ! !

But the issues start to run deep. If our parent class believed that it knew enough to assume a starting index of 1(39), why didn't it also assume that it could call basicAt:? The answer is that of the two choices, the designer of the parent class chose the one which was less likely to cause trouble; in fact all standard Smalltalk collections do have indices starting at 1, yet not all of them are implemented so that calling basicAt: would work.(40)

Object-oriented methodology says that one object should be entirely opaque to another. But what sort of privacy should there be between a higher class and its subclasses? How many assumption can a subclass make about its superclass, and how many can the superclass make before it begins infringing on the sovereignty of its subclasses? Alas, there are rarely easy answers.

Basic Allocation. In this chapter, we've seen the fundamental mechanisms used to allocate and index storage. When the storage need not be accessed with peak efficiency, you can use the existing array classes. When every access counts, having the storage be an integral part of your own object allows for the quickest access. When you move into this area of object development, inheritance and polymorphism become trickier; each level must coordinate its use of the underlying array with other levels.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

  webmaster     delorie software   privacy  
  Copyright 2003   by The Free Software Foundation     Updated Jun 2003