Java serialization: what are You serializing Your data for?

I think that after this series of posts about Java serialization we should talk a bit about things which are not present in a standard serialization mechanism.

One upon a time I wrote some GUI application. Of course, Swing based. The JavaFX wasn’t there at that time, in fact it did collapse to “abandon-ware” status for a while. Plus I am not a big fan of it….

But back to serialization.

Of course I did create some very specific JComponent GUI components which were responsible for the job. It was basically a scientific dynamic data charting engine, so those components did show some dynamically changed data. The visual effect depended on two sets of settings: one bound with a data themselves, which of course was serialized through the data serialization mechanism, and a second set which was just visual one. Like for an example what font size to use in table and etc. Clear presentation settings having nothing in common with the data.

So I was thinking: why not to serialize those GUI components?

The standard serialization is for…

I tried and failed. Miserably. My first and simplest approach “take and serialize it to disk” was conceptually wrong. Let me tell You why.

After trying and reading the available sources, I realized that the standard serialization was meant for “hot serialization” of living objects. While I wished to do a “cold serialization” on a disk.

“Hot” serialization

Initially JAVA GUI, as apart of Sun enterprise, was meant (I believe that it was, I never worked for them) to run in a philosophy very alike their X-Server concept.

In the X-Server the body of a program runs on a “server” machine, while all GUI related commands, like drawing on a screen, handling mouse and keyboard and etc. are passed over a network to the “client” machine. This is easy to imagine how much the network throughput is stressed by this approach.

Thous year after year, as the power of an average “client” machine grew, the X-Server protocol was trying to move more and more to a client side. This is only natural. Consider how much processing power requires to draw a True-type fonts in Your LibreOffice Writer and compare it with a power required to manipulate UTF-8 characters in memory which do actually represent a document data. It is clear, that GUI-rich applications consume 99% of power on GUI so it is natural to move this consumption as close to end-user as possible.

But how much You can move if You can’t move a code? You can only grow the command set, grow caches and alike, but each user action must be passed to a “server”.

Note: Remember, “server” and “client” are using different CPU architectures. Severs were Sparc or MIPS, clients were x86 or PowerPC. So no, You can’t pass binary code.

With the introduction of “Java Virtual Machine” passing code became possible. Now it was possible to not only send commands to draw on screen, but one might pass the bunch of class files responsible for GUI and run them on the “client”. Of course it should be as transparent as possible. The server side should be able to build GUI, as if run locally, wrap it in RPC wrappers and pass to remote client. A client should just run it in context of own JVM and pass objects back only when necessary.

A part of this process was, I believe, to be handled by standard serialization protocol.

What does it mean for us?

If You will inspect Swing serialization You will notice two things:

  • first, the warning in documentation (JDK 11) that: “(…)Serialized objects of this class will not be compatible with future Swing releases. The current serialization support is appropriate for short term storage or RMI between applications running the same version of Swing.(…)”
  • second, that what is serialized includes all listeners, the whole parent-child tree up and down, and practically everything. From my experience an attempt to serialize any JComponent do serialize the entire application.

This is because of “(…)The current serialization support is appropriate for (…) RMI between applications(…)“.

In simpler words, for transferring objects which have to be alive at the other end of a stream and actively exchange data with the source end of a stream.

Note: I am now ignoring the part about “(…)support for long term storage of all JavaBeans™ has been added to the java.beans package. Please see XMLEncoder.(…)” which You may find in JavaDocs. It is intentional because this mechanism is far, far away from what generic serialization needs.

“Cold” serialization

“Cold” serialization is when You remove the object from it’s environment, stop it, move it into a “dead” binary storage. Then, possibly the next year at the other end of the world, You do remove it from a storage, put it in an another environment (but compatible of course) and revive it.

The reviving process will require binding the object with new environment, but it is not a problem.

Example: Serial port data logger

Now imagine, You wrote a data logger application for a hardware device which is sending data to PC through a serial port connection.

You have a class there which both keeps and tracks data from the serial port. Let’s say it is fixed to be COM1.

How the “hot” and “cold” serialization would look like?

“Hot” example

A “hot” serialization of this class will basically need to pass the already stored data to an another machine and allow control of the logger device from that machine. It means, that it must during serialization create a “pipe” over the network from the remote machine to the machine at which the hardware device is stored.

“Cold” example

A “cold” serialization of this class should save stored data on a disk and do it in such a way, that when it is de-serialized it will connect itself to a said serial port and continue the data logging. It means, that it must save information about to what port connect and create this connection when de-serialized.

Multi purpose

This is clearly seen that if I would try to serialize this logger using standard serialization I must decide on either “hot” or “cold” method. I can’t do both.

But I do need both methods!

Standard Java serialization is single-purpose.

Hey, we have writeReplace()/readResolve()!

Yea, we have them. Except we have either writeReplace() or readResolve(). We can’t use both at the same moment, but let me be silent about it now.

What are those two?

They a “patch” to multi-purpose serialization. Quite a good one, which will work in an above example case, but not in every case.

We may easily imagine that our “hot” and “cold” serialization can be done by:

  class LocalLogger ...
  class HotLogger ...
  class ColdLogger ...

   HotLogger toHot(LocalLogger ...)
   ColdLogger toCold(LocalLogger ...)

That is we provide and ability to construct “hot” and “cold” variants from “local” variants. Both “cold” and “hot” are different objects, but such that when serialized by a standard mechanism they do what we need. Now if we are writing an application which needs “hot” serialization we do use instead of LocalLogger a class like that:

 class LocalLogger_Hot extends LocalLogger
 {
    private Object writeReplace(){ return toHot(this); };
 }

A standard serialization mechanism will notice it and invoke the writeReplace method before it will start to serialize any instance of LocalLogger_Hot. Thous the remote side will see HotLogger in every place where the reference to LocalLogger_Hot was serialized.

We may also mirror the thing, and decide that LocalLogger will serialize information necessary to both creating a hot link and local port connection and that it is up to reading application to act according to it needs. For that the remote must use different source for LocalLogger:

  class LocalLogger
  {
     Object readResolve(){ return toHot(this); }
  }

The de-serialization engine will notice this method and invoke the readResolve after the LocalLogger was de-serialized. Then since that moment it will use the returned object instead of original, what is achieved by modifying the
“stream_reference_map” (see there).

Note: the underscores are intentional.

When it doesn’t work?

So, having my special GUI components which by default are “hot” serialized and I needed to turn them into “cold” serialized, I did add:

class MyGUI extends JComponent
{
    static class MyColdStorage
    {
        private Object readResolve(){ return new MyGUI(this); };
        ....
    }
    ...
    private Object writeReplace(){ return new MyColdStorage(this); };
}

Basically the idea is, that when the standard serialization will serialize an instance of MyGUI it will transform it to a “cold form” of MyColdStorage. Then, whenever it will de-serialize MyColdStorage it will transform it back to MyGUI.

Nice, plain and simple, isn’t it?

Except it doesn’t work.

Cyclic data structures

The GUI is heavily recursive and cyclic data structure. Each JComponent keeps a list of child GUI component (ie. panel keeps a list of contained buttons). And each child component keeps a reference to a parent component (ie. label must known enclosing panel to tell it that the size of label changed so the panel should recompute the layout of children).

For simplicity let us define it like:

 class JComponent
 {
     private JComponent parent;
     private JComponent child;
 }

If You will consider this post You will notice, that such a structure will be serialized like this:

   JComponent Parent =...
   JComponent Child  =..
  serialize(Parent)
   ... →
      write new JComponent (Parent) //(A)
       write Parent.parent = null
        write new JComponent (Child)
             write Child.parent= refid(Parent) //that is stream reference to Parent set in (A);
             write Child.child = null
        write Parent.child = refid(Child)

and during de-serialization:

  x = deserialize(...)
    create new JComponent (Parent)
       set Parent.parent = null
       create new JComponent (Child)
             set Child.parent= Parent;
             set Child.child = null
       set Parent.child = Child
   return Parent

Now arm it with writeReplace and ReadResolve exactly as it was defined above.

serialize(Parent)
   ... →
      call writeReplace(Parent)
      write new MyColdStorage (Parent_cold)
       write Parent_cold.parent = null
        call writeReplace(Child)
        write new MyColdStorage (Child_cold)
             write Child_cold.parent= refid(Parent);
             write Child_cold.child = null
        write Parent_cold.child = refi(Child) 

and during de-serialization:

  x = deserialize(...)
    create new MyColdStorage (Parent_cold)
       set Paren_cold.parent = null
       create new MyColdStorage (Child_cold)
             set Child_cold.parent= Parent_cold;
             set Child_cold.child = null
       call readResolve(Child_cold) (Child)
             new MyGUI (Child)
               Child.parent = Child_cold.parent (Parent_cold)
               Child.child = Child_cold.child (null)
       set Parent_cold.child = Child
   call readResolve(Parent_cold) (Parent)
       new MyGUI (Parent)
         Parent.parent = Parent_cold.parent (null)
         Parent.child = Parent_cold.child (Child)          
   return Parent

Noticed the red lines?

In this cyclic structure the first use of de-serialized parent reference happens before the place in which its readResolve(Parent_cold) is invoked. It is because designers of standard Java serialization assumed, that to resolve an object You need it to be fully read. And of course, since we have a cyclic structure, the process of reading a “Child” in this example will refer to the “Parent” before it was fully read. Thous it will access unresolved object.

In my case it would produce the ClassCastException because MyColdStorage is not
JComponent.

It is even worse, we will have now two objects, one of unresolved MyColdStorage and one of resolved MyGUI were we originally had a single object.

writeReplace/readResolve doesn’t work in cyclic structures.

Note: This is specified and designed behavior. I can’t tell it was intentionally created like that, because a solution is trivial, but never less You will find it in serialization specs.

How to solve it?

The answer is simple: with standard serialization You can’t. Once it is “hot” it will be “hot” for an eternity.

But if You write Your own serialization engine the solution is simple. Instead of one readResolve use two:

class MyColdStorage
{
  Object readReplace()
  void fillReplacement(Object into)
}

Now the readReplace is bound to create an “empty” object of correct type:

  Object readReplace(){ return new MyGUI(); };

and the fillReplacement is bound to transfer data from the stream form to the target form:

  void fillReplacement(Object into)
  {
    ((MyGUI)into).parent = this.parent;
    ((MyGUI)into).child = this.child;
  };

The readReplace is invoked right after new instance is created and returned value it is put into a “stream_reference_map” (see there) instead of the original.

The fillReplacement is invoked in exactly the same place where the standard readResolve() is invoked, but opposite to original, the “stream_reference_map” is left untouched.

Then make de-serialization to look like:

 x = deserialize(...)
   create new MyColdStorage (Parent_cold)
    call readReplace() → since now each time "Parent" is referenced use returned value (Parent_R)
     set Parent_cold.parent = null
     create new MyColdStorage (Child_cold)
     call readReplace() → (Child_R)
       set Child_cold.parent= Parent_R;
       set Child_cold.child = null
       call fillReplacement(Child_R) (Child cold)
         set Child_R.parent = Child_cold.parent (Parent_R);
         set Child_R.child = Child_cold.child (null);
     set Parent_cold.child = Child_R
    call fillReplacement(Parent_R) (Parent)
      set Parent_R.parent = Parent_cold.parent (null);
      set Parent_R.child = Parent_cold.child (Child_R);
   return Parent_R

So is it multi-purpose now?

No.

It allows us to change purpose but not to serialize the same object once for that purpose and once for an another in exactly the same application.

Can we do it?

Of course.

With “object transformers”. There is absolutely no need to have writeReplace/readReplace+fillReplacement trio to be private methods of serialized class. They can be any methods defined anywhere providing the serialization mechanism can find them. For an example we may define:

public interface ITypeTransformer
{
  public boolean isTransformableByMe(Object x)
  public Object writeReplace(Object x)
  public Object readReplace(Object x)
  public void fillReplacement(Object from, Object into)
}

plug it into our serialization engine and be happy.

Can You do it with a standard serialization?

No. Absolutely not.

Summary

After reading this blog entry You should be understand that different applications may need to serialize the same object in a different way. You should be aware of the fact, that standard serialization is “cast in stone” in that manner and that a writeReplace/readResolve mechanism is broken and won’t help You in that manner.

You should also know that if You decide on Your own serialization engine, then You can do it in a very easy way.

Leave a comment