Java anti-pattern: non-final getters

In this post I would like to present You first of the series of Java anti-patterns. And it will be:

public class MyClass
{

          private double speed;

public void setSpeed(double speed){ this.speed = speed; };
public double getSpeed(){ return this.speed; };
public double computeTraveledDistance(double time){  return this.speed * time; };
}

I did let myself to color in red problematic pieces.

What is wrong with that pattern?

We do live in Object Oriented world, what means inheritance, overriding and virtual methods.
This means that anyone can legally do:

public class MyUniDirectionalMover extends MyClass
{
@Override public double getSpeed(){ return Math.abs(super.getSpeed()); };
};

This is legal code, which makes sure that regardless what speed is set returned value will always be positive and we should always travel in one direction…

Is that true?

No, because:

....
public double computeTraveledDistance(double time){  return this.speed * time; };

is not referring to getSpeed() but to the this.speed field directly. User did change the behavior, class asked about speed will never return negative number, yet the computed travel distance can be negative.

How to cure it?

There are two possible ways: either block user from overriding what has no effect or prevent Yourself from using the fields directly.

Final getter

The easiest way is to prevent user from overriding code in expectancy of some effect which can’t take place:

public class MyClass
{
          private double speed;
public void setSpeed(double speed){ this.speed = speed; };
public final double getSpeed(){ return this.speed; }
public double computeTraveledDistance(double time){  return this.speed * time; };
}

Hidden fields

If this is undesired and side effect are expected one must prevent oneself from touching fields directly:

class MyClassBase0
{
    private  double speed;

public void setSpeed(double speed){ this.speed = speed; };
public void double getSpeed(){ return this.speed; }
}
... 
class MyClass extends MyClassBase0
{
  public double computeTraveledDistance(double time){  return this.getSpeed() * time; };
}

Where can I find such anti-patterns?

For an example in java.awt.CardLayout which refers directly to component.width and component.height
package private fields while java.awt.Component#getWidth() and getHeight() are non-final.

Summary

What to say, what to say…

Just keep in mind, that You can always use Your compiler to prevent You from doing stupid things. Declaring a getter final clearly states to anyone that the only way to limit speed to positive values is to override setSpeed:

public class MyUniDirectionalMover extends MyClass
{
@Override public void getSpeed(double speed){ super.setSpeed(Math.abs(speed)); };
};

How not to translate the user interface

i18n – internationalization

Today I would like to throw my stone in someone else garden, and this stone will be about translation of the user interface. Or to be more precise – how not to did.

Ok, what is all about?

No more than few days ago I have inspected some Qt GUI code with a line like that:

someObject->setText(tr("Close file"));

The tr() is a function which takes input text and translates it to selected language.

Excellent, isn’t it? Standard, good and readable. And if no translation is found it obviously falls back to English. Good.

Right?

Well…

Not so good.

Each language does have own specifics. In each language some words do have different meanings depending on where and how they are used. For an example in English “drive a nail” and “drive a car” are using the same word: “drive”. But in Polish it is: “wbić gwóźdź” and “prowadzić samochód”. You don’t have to know Polish to notice, that there is no common word in those two sentences.

Of course the tr() does not translate the text word-by-word. Instead it is a simple look-up table or a map which maps one text to another. So there is less space for confusion.

Despite of that it is still there. And You will have just one, single translation for all places where the sentence “Close file” is used in Your program.

It is all about a context

This is all about a context. Each text which is to be translated must be translated in a specific context. So I would recommend to do something like that:

someObject->setText(tr("SomeWindow::Close file"));

This way the tr() function is provided with an additional context: the “close file” text is used in “SomeWindow” class. Now if this text is used in an another window or an another context it may have a different translation.

This is relatively easy to make the tr() to use English translation as a fallback before it falls back to passed key. So it may return “Zamknij plik” or “Close file” or, finally if neither Polish nor English translation is present it will return “SomeWindow::Close file”. I agree that the last will be ugly, but it still can be understood by a human.

It is all about an ease of work

Second thing is the ease of actually performing the translation. Imagine You are a translator and imagine, You are provided with a file which is just a bunch of English sentences.

How can You correctly translate it?

Of course You can’t. You won’t take such a job. You would rather have the screenshots of GUI and write translations over it because then You will know the context.

Sadly supplying such screenshots to You is complex and expensive job. It will be also a bug prone when somebody else will be typing the translations into resource files. So You will rather be provided with a text file.

And now imagine this text file has some context information. It still will be bad. It still will be hard, but certainly Your translation may be better. And it will be easier (cheaper) to provide You with screenshots on which there is just an arrow pointing to a certain dialog and saying “this is SomeWindow”.

That’s all for today.

Binary file formats: how to screw it up

In this long and boring blog entry I will try to show You most of mistakes I encountered in specifications of binary file formats.

But first things first.

Binary data format

A binary data format is a 010101…0101 representation of some abstract data You have in Your program. At first glance they do look exactly like data structures in memory but there are subtle yet important differences.

First, most important difference is that a binary data stored in a file do exist outside the program. They can be put on some data storage or travel through the wire or air. They are used to move information over the space, time and machines of different types and architectures. They may be specified in a way independent of the media they exist on or they may be tightly bound with it.

If the binary file format is independent of the media, it leaves some elementary data properties to the media format. In such case we may be most probably speaking about “file format” or an “application layer” if data travel over the wire.

If the binary format is dependent of the media we usually do speak about “protocols”. Both have their specific quirks, tips and tricks.

Since the “protocol” is both a data and a media specific, and “file format” is just data specific let me first talk about “file format”.

Note: All the following text makes an assumption that file media uses “bytes” as elementary transaction data elements.

A bad example

Ok, so let me show how to do it wrong.

Assume now that I am a C programmer. I do live in C world ( something not like A or B class worlds 😉 ) so when I was told to define a simple file format I did something like that:

typedef struct{
 char [32] identifier
 unsigned int  number_of_entities
}Header
typedef struct
{
  float X
  float Y
  float Z
  int attributes
  short signed int level
}Entity

and I have said that a file consist of a Header and a number of Entities following it.

I have then said that:

  • identifier is a text identifying the format, which is “Mój żałosny format”. Notice, I intentionally formulated in in a non-English language;
  • number_of_entities is a number of Entity elements following it;
  • X,Y,Z are some coordinates;
  • attributes are some attributes;
  • level is a level of importance assigned to an Entity.

Ignoring a meaning of data is it a good specification of how to represent them in binary format?

How do You think?

I think it is very bad.

Characters

In the C world “char” is vague. It may, depending on machine or compiler be signed or unsigned 8 bit or longer integer number. Notice, C does not say “integers” are binary. It only says “unsigned integer” is binary.

Second there is always a problem how to represent an actual text and how to encode it to its binary form. Like how to turn to bits “Mój żałosny format” so that it can be ready by any machine on the world and understood correctly.

The source of all the problems with characters encoding comes from a typical Anglo-Saxon arrogance. Since the same beginning of computers in Poland we always struggled with it. A “character” mentally equaled to ASCII and that was all. And we in Poland needed more than the mere arrogant ASCII. I assure You that those little ąęźżć dots and lines over characters do have a critical meaning. Like in famous: “Ona robi mi łaskę” (she does me a favor) which if stripped of those little lines turns to: “Ona robi mi laske” (she gives me a head). You may ques what is a difference whether Your wife receives and SMS with second sentence instead of a first one. And Yes, it still do happen. The arrogance of telecom’s and Google is set so high that the Android smartphones are by default stripping the Polish letters from all SMS messages without a warning. They claim they do it to save Your money, cause telecom prices the Polish SMS twice the ASCII SMS. Well.. what is 0,01$ compared to divorce costs?

But back to business.

Whenever You says “character” or “text” You must specify what “character encoding is to be used”. If You say ASCII then it is fine. But You may be polite and say “UTF-8”, which is, I think a good compatibility path. UTF-8 text always looks acceptable when understood as a straight forward ASCII (just some “dumb letters” do appear) and can be processed by any 8-bit character routines which are unaware of UTF-8 characters encoding.

When You specify the encoding this is wise to avoid a “char” type and use “byte” instead. Byte is always a sequence of 8 bits. Just for clarity.

So the specs should be:

byte [32] identifier ; //An UTF-8 text.

Length of a text

Second element of any text specification is to say how to figure out where text ends.

In my example I intentionally used 32 bytes and a shorter text. How the end of text should be detected?

The standard C way is to add zero at the end. So 32 bytes long array may carry up to 31 ASCII characters and hard to predict number of UTF-8 characters. Notice, this approach means, that character of value zero is prohibited. And if such character is present in a text it may result in false early “end of string”. And to disaster, if it was used to say “and after this string next element follows”.

In Java for an example binary zero is a fully valid “character”.

The other method of saying what is the length of string is adding the “length field” and clearly specifying the length of following text.

In the example I made:

byte [32] identifier ; //An UTF-8 text.

I did however decide to set the space for a text to a fixed size. I did it so that the header would have a fixed, known and finite size.

For short, bound texts it is acceptable to do it that way and to say:

“The encoded text ends with either zero byte or at the 32-th byte. If the encoded text is shorter than 32 bytes all remaining byte should be zero.”

Integers

We have:

   unsigned int  number_of_entities

and

   int attributes
   short signed int level

What does it exactly mean?

Integer is a signed number which can be iterated from -N to N with increments of “1”. That all. In C that is. So first we have to clarify is: This is an binary integer. If we don’t say it it may be a binary-encoded-decimal for an example.

Second, binary signed integers may be encoded with a sign bit, bias, one’s complement or two’s complement. Please specify it.

The two above can be usually skipped because the binary two’s complement integers do dominate the world. So if You would not specify it then You may be 99% sure, that over the next 20-years or so (until a quantum computers will dominate) coders will read “signed integer” as a “binary two’s complement” number.

But how long those numbers are? How many bits or bytes do they have?

“int”, “short”, “char” and etc do have in C only the lower bound. If You are at Java or C# You are very specific saying “int”. But If You will not say in specs that “all types are JAVA types” then no, You haven’t said anything about the exact length.

unsigned int24 number_of_entities
unsigned int11 attributes
signed int24 level

This looks much better. But it is still incomplete. What is the order of bytes? Less significant byte first? Most significant? Or some other mash-up? Please say it.

Now You probably noticed the int11 type. No, it was not a mistake. An 11 bit long type. In 99% of cases You won’t be needing them, but if You need them it is wise to know what to do.

Now please consider, how would You have interpreted above structure of three numbers? At which bit the “level” starts? At 24+11? Or at 24+16?

If Your data are aligned to a certain number of bytes or bits You must always specify that.

Floating points

Basically the screw ups You can make are the same as with integers. You must specify the format (ie. IEEE32), a byte order and an alignment.

read(buffer_ptr,sizeof(Header))

This is most tempting equation to write in C when dealing with binary file format. A nice, good portable line to read the Header in one haul.

Never do it. Never.

Alright, I was joking. You can do it. I do it. But only when You are going to use such a code on a fixed, known CPU architecture and with a fixed, well known C compiler.

Why?

Because C compilers can arrange structure fields in memory the way they like. They only thing they need to preserve is the ordering (I’m not sure of that also) and type. They may add gaps, empty spaces and etc. For an example an MSP430 CPU can fetch 16 bit data from even addresses with one instruction, but to do it from a byte aligned address it needs four instructions. So most C compilers will stuff all 16 bits data at 16 bit boundary and will put all structures the same way.

So depending on CPU and compiler, even if You use the properly sized types for fields, the size of a structure in memory may differ from the sum of the size of all data in it.

But If You, like me, are coding on micro-controllers of fixed brand on a fixed compiler then it pays back in terms of code size and speed to use a hack and define so called “packed structures”, collect them in “packed unions” and lay them over a memory buffers used for data transfer. It is an excellent, fool proof, easy to maintain way of decoding incoming data at near to zero cost. Providing You cast in stone Your compilation environment.

Commenting Your code: How to kill Your software project.

In 2020, in my country (Poland, Europe), there were about 120’000 (one hundred twenty thousands) new regulatory acts. This count do include all local bills, EU directives and regulations and all country wide legal acts.

Above one hundred thousands of new documents.

This is far, far beyond human ability to read, understand and connect all of them.

In the result there is no single person in Poland who can honestly tell: “I do know The LAW”.

Hey, shouldn’t You talk about code?

And am I not doing that? What is a law anyway? Can’t we say it is a kind of program which is run on a machine called “the society”?

For a single average Joe the law is exactly as the:

assert(!kill)

Joe can run any program, but when he hits the assertion the

throw new EGoToPrisonException("Joe")

is thrown.

On the other hand for government and official bureaucrats law is the program they had to “run” according to the fiction: “Citizens must not act against the law, government must act by the law“.

Obey law by “letter” or by “intent”

In the March 2020 when Covid did spread in Poland our government produced a regulation which was stating: “You can’t enter the forest“.

Yes, You read it right. Entering a large, green, vast space filled with trees was prohibited due to prevent the Covid spread.

The wording was clear. You can’t. Except that what You would have to do, if You were living in a village which is inside the forest? Starve to death? Abandon Your job?

This single sentence only looks clear. In fact it is not clear. What is a “forest”? A place with trees? Then is a fruit plantation is the forest? Or is a “forest” a piece of land which has a status “forest” on an official land map? If it would be so, the how anyone could quickly check it?

And “entering”. By foot? By bicycle? By car? Is driving the car over a road passing through forestation is “entering the forest” or is it not?

Nobody knew any of that, nobody understood that and nobody got any idea why this law was created.

Nobody, except a few thousands of Warsaw citizens who when first lock-down was initiated figured out that since they were prohibited from entertainment in cities they can legally and fine go and rest in a forests surrounding the city. This rapid motion created crowds of people and since the law was already written in an un-clear and stupid language they did not feel any need to keep distance.

That was a reason behind: “You can’t enter the forest“.

Later, about the November 2020, the government did produce a law which prohibited organizing the mass protests. This is important to notice the wording: “organizing”. Not “taking a part in protests”.

Due to some political reasons which are beyond the scope of this text it was a time when protests were spreading rapidly.

And Police did stop and punished anyone who took a part in such protests, even tough the law clearly stated “organizing”.

In first case (“You can’t enter the forest”) citizens were punished because they broke the “letter of the law”. Police and institutions did not care about the idea behind this law (“keep distance, avoid making crowds”), probably because there was no way they could figure it out, and followed the somewhat clearer direct meaning of a law.

But in the second case the direct meaning was ignored and the “intent” or the idea (the same again) was what was followed by the Police.

Clear communication of “intent”

How does this apply to the code?

Consider following pseudo-code below:

boolean validateAge(Person person)
{
    if (person.age < 18 ) return false; else return true;
}

This code is a kind of “self commenting” code. It is clear was does it do. It checks if age is below 18 and decides something.

You may say, that compared to the “law” it is a “letter of a law”.

Does it need any comments?

Observing a vast amount of code which can be found on a web many of programmers will say: “No. High level programming language is self commenting”.

And the would be right. I do agree with that.

They would be right in saying that program exactly specifies what does it do.

But what about what it should do?

Now let me extend this piece of code with a declaration comment:

/** A method which validates if person is in an age which 
allows one to consume an alcohol in taverns or similar places. */
boolean validateAge(Person person) 
{     
if (person.age < 18 ) return false; else return true;
}

The code does exactly the same thing but now it is clear for us what is ought to do. The code says what does it do, but the comment says what it was meant to do.

In this simple case You will momentarily notice that 18 is an incorrect value in many regions of the world!

Would You be able to detect it without a comment? I do not think so.

Note: I do still find this comment to be bad and incomplete, but I think we should ignore it now.

The cast of “wise men”

Now let us again take a glance at the legal system and those two regulations I mentioned earlier. In both cases the government (mister M.Morawiecki) wrote the “code” but never clearly communicated an intent behind those program lines.

Now ask Yourself a question: who on the entire Earth did now what the “You can’t enter the forest” was meant to achieve?

Maybe one or two persons in the government who wrote that.

Now take a look at You software project. That big one. How many lines of code are there? 10’000? No.. that’s small. My first big one person project at high school had 30’000 lines. So I suppose it will be near 10’000 files and about 10’000’000 lines of code.

This is the limit, from my personal experience, when a single person starts loosing a sight of it. The: “Hey, I already wrote it?! Really? When then hell I did that?!” is something You may start to hear. Yet still a person who wrote that code will quite quickly get a grip on what it was expected to do and can fix a broken implementation or even a broken concept.

This way You do have your own hand made “wise man”.

“Wise man” is a person in a software project who was in it from the same beginning, who knows or at least remembers the ideas behind all the internal workings and can be used as “quick access” database to point out which part of code is responsible for what and which part of concept may be responsible for such and such behavior.

The important observation to make is that for such a “wise man” the code is indeed self commenting. Mainly because of the content of his or her brain.

Kill the “wise man” and see what would happen.

Short contract employees

So You have Your big project, You had Your team of “wise men”. Now they all are gone.

You hire new person on a short contract. You ask him/her to fix something.

How the hell can that person guess that 18 in that example is the age for alcohol drinking and not for car driving? He can’t, but he knows it is there so he will do something like that:

boolean canDriveCar(Person p){ return validateAge(p); }

because he knows that 18 is the right age for it and validateAge() uses 18, right?

Then You hire somebody else when law changes and now the law tells that 16 is valid for driving a car. This person will then fix:

boolean validateAge(Person person)  
{     
 if (person.age < 18 16 ) return false; else return true; 
}

and will break the alcohol test.

Of course those examples are oversimplified and exaggerated. But in a real life alike things may happen. If there are no clear specification what was the intent of a certain code then people will start using it by what they see it does do. In this example Your API did not specify clearly that it was about an age which let people to legally drink alcohol, so You will end up with code full of calls to this method at any place where age of 18 was to be tested.

Open source projects

Open source, community based projects are full of short term employees. Those projects do base on them. The entire idea was: “You spot the bug, You can fix it”.

I’m a programmer. I do it for living. I can pin-point a bug in my own code within 0.5 to 24 working hours. In well commented code wrote by somebody else I need double of that time.

Without comments I need ten times or more time.

If You are running an open source project whom You like to attract to it most? Inexperienced green-horns who make thousands of mistakes or old lions with years of experience?

If You think about quality You should, I think, focus on gathering the small herd of “lions” surrounded by youngsters playing around and trying to catch some butterflies.

Most of the “lions” do however work for living. They code all day long and the last thing they like to do in a free time is to bite through next piece of code. They probably won’t be making a mass of Your team mates. Yet sometimes they may get so annoyed by bugs that they will make an attempt to fix it.

In their free time.

What do You think, will they love to spent 100 hours digging through non-commented code totally without any information what were main ideas and algorithms in a program, or would they love to spent 2 hours on doing the fix in a well described environment?

If You like to kill Your open source project follow the route with a big yellow sign: “High level code is self commenting”