Low-Memory Deep Copy Technique for Java Objects

“Faster Deep Copies of Java Objects” explains the concept of “deep copies” of Java objects and illustrates a common approach for copying objects using Java Object Serialization. A downside of the illustrated approach is that it requires creation of a byte array containing the entire serialized representation of the object being copied. Creation of this array is inherently wasteful, as it is thrown away once the copy has been made.

It would, of course, be possible to save the array and re-use it for subsequent object copies, though that may not be appropriate in some applications. An alternate approach is to use java.io.PipedInputStream and java.io.PipedOutputStream to serialize the object in one thread and simultaneously deserialize it in another thread. This technique is more complex, but allows the two halves of the operation to coordinate using a small buffer rather than a large byte array. Figure 1 shows an example copy utility that uses this approach.


import java.io.IOException;
import java.io.PipedOutputStream;
import java.io.PipedInputStream;
import java.io.ObjectOutputStream;
import java.io.ObjectInputStream;

/**
 * Utility for making deep copies (vs. clone()'s shallow copies) of objects
 * in a memory efficient way. Objects are serialized in the calling thread and
 * de-serialized in another thread.
 *
 * Error checking is fairly minimal in this implementation. If an object is
 * encountered that cannot be serialized (or that references an object
 * that cannot be serialized) an error is printed to System.err and
 * null is returned. Depending on your specific application, it might
 * make more sense to have copy(...) re-throw the exception.
 */
public class PipedDeepCopy {
    /**
     * Flag object used internally to indicate that deserialization failed.
     */
    private static final Object ERROR = new Object();

    /**
     * Returns a copy of the object, or null if the object cannot
     * be serialized.
     */
    public static Object copy(Object orig) {
        Object obj = null;
        try {
            // Make a connected pair of piped streams
            PipedInputStream in = new PipedInputStream();
            PipedOutputStream pos = new PipedOutputStream(in);

            // Make a deserializer thread (see inner class below)
            Deserializer des = new Deserializer(in);

            // Write the object to the pipe
            ObjectOutputStream out = new ObjectOutputStream(pos);
            out.writeObject(orig);

            // Wait for the object to be deserialized
            obj = des.getDeserializedObject();

            // See if something went wrong
            if (obj == ERROR)
                obj = null;
        }
        catch(IOException ioe) {
            ioe.printStackTrace();
        }

        return obj;
    }

    /**
     * Thread subclass that handles deserializing from a PipedInputStream.
     */
    private static class Deserializer extends Thread {
        /**
         * Object that we are deserializing
         */
        private Object obj = null;

        /**
         * Lock that we block on while deserialization is happening
         */
        private Object lock = null;

        /**
         * InputStream that the object is deserialized from.
         */
        private PipedInputStream in = null;

        public Deserializer(PipedInputStream pin) throws IOException {
            lock = new Object();
            this.in = pin;
            start();
        }

        public void run() {
            Object o = null;
            try {
                ObjectInputStream oin = new ObjectInputStream(in);
                o = oin.readObject();
            }
            catch(IOException e) {
                // This should never happen. If it does we make sure
                // that a the object is set to a flag that indicates
                // deserialization was not possible.
                e.printStackTrace();
            }
            catch(ClassNotFoundException cnfe) {
                // Same here...
                cnfe.printStackTrace();
            }

            synchronized(lock) {
                if (o == null)
                    obj = ERROR;
                else
                    obj = o;
                lock.notifyAll();
            }
        }

        /**
         * Returns the deserialized object. This method will block until
         * the object is actually available.
         */
        public Object getDeserializedObject() {
            // Wait for the object to show up
            try {
                synchronized(lock) {
                    while (obj == null) {
                        lock.wait();
                    }
                }
            }
            catch(InterruptedException ie) {
                // If we are interrupted we just return null
            }
            return obj;
        }
    }

}

Figure 1. Deep copy utility using PipedInputStream and PipedOutputStream

The thread that calls copy(…) creates two connected piped streams and a separate thread to handle deserialization. The object is written to one side of the pipe and read (in the Deserializer thread) from the other. Given the non-deterministic nature of thread scheduling, there is no guarantee that the Deserializer thread will finish reading immediately after the calling thread has finished writing, so the calling thread will block if necessary until reading has finished.

Figure 2 shows an example of copying a large object using the PipedDeepCopy class shown in Figure 1.


import java.util.Hashtable;
import java.util.Vector;
import java.util.Date;

public class Example1 {

    public static void main(String[] args) {
        // Make a reasonable large test object. Note that this doesn't
        // do anything useful -- it is simply intended to be large, have
        // several levels of references, and be somewhat random. We start
        // with a hashtable and add vectors to it, where each element in
        // the vector is a Date object (initialized to the current time),
        // a semi-random string, and a (circular) reference back to the
        // object itself. In this case the resulting object produces
        // a serialized representation that is approximate 700K.
        Hashtable obj = new Hashtable();
        for (int i = 0; i < 100; i++) {
            Vector v = new Vector();
            for (int j = 0; j < 100; j++) {
                v.addElement(new Object[] { 
                    new Date(), 
                    "A random number: " + Math.random(),
                    obj
                 });
            }
            obj.put(new Integer(i), v);
        } 

        int iterations = 10;

        // Run piped version
        long time = 0L;
        for (int i = 0; i < iterations; i++) {
            long start = System.currentTimeMillis();
            Object copy = PipedDeepCopy.copy(obj);
            time += (System.currentTimeMillis() - start);

            // Avoid having GC run while we are timing...
            copy = null;
            System.gc();
        }

        System.out.println("Piped time: " + time);
    }

}

Figure 2. Example of copying a large object

The good news is that the extra memory needed to make the copy is essentially bounded by the size of the buffer used by PipedInputStream. By default, this buffer is 1024 bytes. The bad news is that this approach is more than twice as slow as even the unoptimized variant of the deep copy utility illustrated in “Faster Deep Copies of Java Objects“. The reason for this is the added overhead of coordination between the two threads. In the example in Figure 2, copying the 700K object requires that the buffer in PipedInputStream be read and overwritten 700 times. The two threads must do a fair bit of handshaking, with the ObjectInputStream waiting for the ObjectOutputStream to write needed data to the buffer, or the ObjectOutputStream waiting for the ObjectInputStream to pull data from the buffer so that more can be written.

Increasing the size of the buffer used by PipedInputStream can help. Unfortunately, PipedInputStream does not provide a way to change the buffer size. It can, however, be easily subclassed to provide this capability. Figure 3 illustrates this.


    import java.io.PipedInputStream;

    /**
     * PipedInputStream subclass that allows buffer size to be set to
     * a value larger than the default 1024 bytes.
     */
    private static class AdjustableBufferPipedInputStream extends 
        PipedInputStream {

        public AdjustableBufferPipedInputStream(int bufSize) {
            super();
            buffer = new byte[bufSize];
        }
    }

Figure 3. PipedInputStream subclass that allows buffer size to be specified in the constructor.

This helps some, but the overhead of thread coordination is still a limiting factor. If, for example, the buffer size is set to a value larger than the serialized size of the object being copied (effectively eliminating the memory advantages of the piped approach), the copy still takes nearly 50% longer than the single-threaded approach.

One reply on “Low-Memory Deep Copy Technique for Java Objects”