The difference between technical objects and semantic objects in programming


It's better to use only semantical objects as much as possible in programming rather than technical objects. Or so we're told (and so I believe). But what does that mean ?

The problem isn't much with technical objects than implicit technical objects. Likely, only few good programmers would create an object like (number of coconuts, list of planes, age of captain). However, when I read (object oriented) code, I often see a lot of them subtly hidden, notably in the function parameters. This is in my opinion because, they look so much better, while in reality they are not.

Example

Let's say you have a payroll software, and you have to compute some benefits based on the average amount paid on the last 3 payslips. If the employee has been in the company for more than a year, its 5%, otherwise it's 3, so you need to know how long the employee has been in the company, and the last 3 payslips.

Now there are two approaches:
either you have

computeBenefits1(daysInCompany, last3average)

or

computeBenefits2(employee)

Comparing the two solutions

Make the data needed by the computation explicit

which might seem good, since you have a function that focus on the computation. It goes along the line that a function shouldn't know more that it needs to do its job, because you might get confuse by looking at the call. Or you can write something like computeBenefits2(employee), that will do the decomposition by itself.

Point goes to computebenefits1

Separate the steps of the computation in various function

Point goes to computebenefits1

Protection of the record

Another trouble with the last formulation it that, it there is a programming mistake, computeBenefits2 will have the ability to modify the employee record. If your using a language like C++ or Ada, there are some construction in the language that you can use to prevent it to some extent (like the const keyword), but it only works to some extent. For example, it may protect the record itself, but not some sub-structure accessed through a pointer.

If you're using a language like Python, nothing will stop you from making a mistake, set aside your own discipline. In that case, computeBenefits1 has the advantage, that, when reading the call, you know that nothing is modified in the record, because it's not passed. To be more concrete:

[...]

Point goes to computebenefits1

Computing efficiency

computeBenefits1 is likely more efficient, because if you needs to use the data repeatedly, you can extract it once, pass it once to computeBenefits1 and pass it again to some other function. In the second case, since for example the average salary is computed inside the function, if another function needs it, it will have to compute it again.

Code factoring

When you have

f1(obj):
  x = obj.field
  (do something with x)

f2(obj):
  x = obj.field
  (do something else with x)

main(obj):
  v1 = f1(obj)
  v2 = f2(obj)
  (use v1 and v2)

It is so tempting to write instead:

f1(x):
  (do something with x)

f2(x):
  (do something with x)

main(obj):
  v = obj.x
  v1 = f1(v)
  v2 = f2(v)
  (use v1 and v2)

After all, the second method did in some sense a better job at factoring code, since the extraction of the x field was written only once (and thus will be modified at only one place if the field name is modified). However, this raises the question of the semantic nature of obj.x. Is it an object in our problem space, or only a technical object ?

Point goes to computebenefits1

Value scope control

It is also tempting (and functional languages do promote it) to use the function call to assign values, so the scope of the variables is controlled.

Consider the previous example:

f1(obj):
  x = obj.field
  (do something with x)

f2(obj):
  x = obj.field
  (do something else with x)

main(obj):
  v1 = f1(obj)
  v2 = f2(obj)
  (use v1 and v2)


f1(x):
  (do something with x)

f2(x):
  (do something else with x)

main(obj):
  v1 = f1(obj.field)
  v2 = f2(obj.field)
  (use v1 and v2)

Thus getting rid of two assignments, also you don't have some misty zone in your functions f1 or f2 where x doesn't exist (before assignment), this if you reorder your code lines, you don't risk to have a "non assigned value" error (assuming you're not using a language that can check it statically). Depending on your point of view they were "bad" assignments or not:

  • If you're a believer in the functional philosophy, assignments (change of state) are bads
  • However in that particular case of function parameter passing, the value is assigned only once, and the function encapsulates the scope of the variable quite well. From a semantic point of view, that is, the value cease to exist as soon as it is no longer needed, unlike an assignation where the variable can survive long after the block is was used like in:
temporary_result = some_computation
if temporary_result:
   use(temporary_result)
continue_your_business // but temporary_result is still alive

Extensibility

We see are that extensibility runs sometimes against code factoring (quite counter-intuitively).

Point goes to computebenefits2

Why does computeBenefits2 wins (in my opinion) in spite of all the problems

To summarize:

Form 1 Form 2
  1. Makes explicit the data needed by the computation
  2. Leads to separate the steps of the computation in various functions
  3. Almost unbreakable protection of the record
  4. Likely more efficient
  1. Programmers needs to be discipline
  2. Doesn't use funny more or less implicit type like (number x list).

It may help to think of the passed parameters in term of one type (Haskell / Curry style).

Example 2, processing an HTML form

(should we pass REQUEST.startPage, REQUEST.numPage or just REQUEST ?)

The lesson

If you pass more than one subfield of an object to a function, that is

f(obj.field1, obj.field2)

even indirectly like in

value1 = object.method1(params)
(...)
value2 = object.method2(otherParams)
f(value1, value2)

what you really should write is:

f(object)