Faster JTextPane Text Insertion (Part II)

In Part I we briefly examined two of the reasons why inserting large quantities of text with different styles (attributes) into a Swing JTextPane can be very slow: Each update can trigger a UI refresh, and the thread-safe design of DefaultStyledDocument imposes a small amount of locking overhead on each update.

As shown in Part I, simply “detaching” the document from its text pane before modifying it avoid the UI refresh problem. This can, for example, improve the speed of initializing a large, multi-style document by a factor of three or more, depending on document complexity. For large documents, however, this may not be enough. A little rummaging through the internals of DefaultStyledDocument reveals a workaround for the second speed issue, internal locking overhead.

Batch Text Insertion

A common way to initialize multi-styled content in a DefaultStyledDocument is to parse data from a file (or other external source) into a series of substrings and corresponding Attributes objects, where the attributes contain the font, color, and other style information for each substring. For example, a code editor for an IDE might provide syntax highlighting by parsing a source file to determine language keywords, variable names, and other relevant constructs, and give each a unique style. Each substring would then be added by calling the insertString(int offset, String str, Attributes attrs) method on the document object.

Since document objects are thread-safe, insertString(…) first acquires a write-lock to ensure that only a single thread is modifying the underlying data representation, then makes the update, and finally releases the lock when it is finished. For modifications made by user input from the keyboard, this processs is sufficiently fast. For the kind of batch updates needed to initialize a large document, however, the lock management overhead is significant.

In DefaultStyledDocument, most of the work of actually updating the document contents when a string is inserted is done by the protected method insertUpdate(…). The string and attributes to be inserted are used to create one or more instances of the ElementSpec class, which are then used to actually effect the modifications.

ElementSpec is a static inner class within DefaultStyledDocument. A quick serach for its uses within DefaultStyledDocument reveals the method:

    protected void insert(int offset, ElementSpec[] data) throws 
        BadLocationException

This version of insert is also thread-safe (like insertString(…)), but processes a list of ElementSpec objects that are to be inserted at a given offset. Unlike insertString(…), the lock is only acquired once, rather than once for each modification. This gives us the tools we need to construct a custom subclass that supports batch inserts. Figure 1 shows a possible implementation of such a Document subclass.


import java.util.ArrayList;
import javax.swing.text.Element;
import javax.swing.text.AttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;

/**
 * DefaultDocument subclass that supports batching inserts.
 */
public class BatchDocument extends DefaultStyledDocument {
    /**
     * EOL tag that we re-use when creating ElementSpecs
     */
    private static final char[] EOL_ARRAY = { '\n' };

    /**
     * Batched ElementSpecs
     */
    private ArrayList batch = null;

    public BatchDocument() {
        batch = new ArrayList();
    }

    /**
     * Adds a String (assumed to not contain linefeeds) for 
     * later batch insertion.
     */
    public void appendBatchString(String str, 
        AttributeSet a) {
        // We could synchronize this if multiple threads 
        // would be in here. Since we're trying to boost speed, 
        // we'll leave it off for now.

        // Make a copy of the attributes, since we will hang onto 
        // them indefinitely and the caller might change them 
        // before they are processed.
        a = a.copyAttributes();
        char[] chars = str.toCharArray();
        batch.add(new ElementSpec(
            a, ElementSpec.ContentType, chars, 0, str.length()));
    }

    /**
     * Adds a linefeed for later batch processing
     */
    public void appendBatchLineFeed(AttributeSet a) {
        // See sync notes above. In the interest of speed, this 
        // isn't synchronized.

        // Add a spec with the linefeed characters
        batch.add(new ElementSpec(
                a, ElementSpec.ContentType, EOL_ARRAY, 0, 1));

        // Then add attributes for element start/end tags. Ideally 
        // we'd get the attributes for the current position, but we 
        // don't know what those are yet if we have unprocessed 
        // batch inserts. Alternatives would be to get the last 
        // paragraph element (instead of the first), or to process 
        // any batch changes when a linefeed is inserted.
        Element paragraph = getParagraphElement(0);
        AttributeSet pattr = paragraph.getAttributes();
        batch.add(new ElementSpec(null, ElementSpec.EndTagType));
        batch.add(new ElementSpec(pattr, ElementSpec.StartTagType));
    }

    public void processBatchUpdates(int offs) throws 
        BadLocationException {
        // As with insertBatchString, this could be synchronized if
        // there was a chance multiple threads would be in here.
        ElementSpec[] inserts = new ElementSpec[batch.size()];
        batch.toArray(inserts);

        // Process all of the inserts in bulk
        super.insert(offs, inserts);
    }
}

Figure 1. BatchDocument, a document subclass that supports batch insertion of text with different styles.

Use of this class differs slightly from a normal DefaultStyledDocument. Strings (and their attributes) that are to be inserted should be added by calling the appendBatchString(…) method. When a new line should be inserted, appendBatchLinefeed(…) should be called. Once all of the batched content has been added, processBatchUpdates(…) should be called to actually insert the text into the document. Note that it would be possible to add methods that would parse arbitrary strings and handle linefeeds automatically.

Testing BatchDocument

Figure 2 shows an example of a test class that initialized a document with a long, multi-format string (using either a standard DefaultStyledDocument or a BatchDocument, and making updates while the document is either attached and visible or detached) and computes the time required.


import java.awt.Color;
import java.awt.BorderLayout;
import javax.swing.JFrame;
import javax.swing.JTextPane;
import javax.swing.JScrollPane;
import javax.swing.text.StyleConstants;
import javax.swing.text.SimpleAttributeSet;
import javax.swing.text.BadLocationException;
import javax.swing.text.DefaultStyledDocument;

/**
 * Demonstration class for BatchDocuments. This class creates a
 * randomly formatted string and adds it to a document.
 */
public class Test {

    public static void main(String[] args) throws 
        BadLocationException {
        if (args.length != 3) {
            System.err.println("Please give 3 arguments:");
            System.err.println(" [true/false] for use batch " +
                "(true) vs. use default doc [false]");
            System.err.println(
                " [true/false] for update while visible");
            System.err.println(
                " [int] for number of strings to insert");
            System.exit(-1);
        }

        boolean useBatch = args[0].toLowerCase().equals("true");
        boolean updateWhileVisible = args[1].equals("true");
        int iterations = Integer.parseInt(args[2]);
        System.out.println("Using batch = " + useBatch);
        System.out.println("Updating while pane visible = " + 
            updateWhileVisible);
        System.out.println("Strings to insert = " + iterations);

        JFrame f = new JFrame("Document Speed Test");
        f.getContentPane().setLayout(new BorderLayout());
        JTextPane jtp = new JTextPane();
        f.getContentPane().add(
            new JScrollPane(jtp), BorderLayout.CENTER);
        f.setSize(400, 400); f.show();

        // Make one of each kind of document. 
        BatchDocument bDoc = new BatchDocument();
        DefaultStyledDocument doc = new DefaultStyledDocument();
        if (updateWhileVisible) {
            if (useBatch)
                jtp.setDocument(bDoc);
            else
                jtp.setDocument(doc);
        }
        long start = System.currentTimeMillis();

        // Make some test data. Normally the text pane 
        // content would come from other source, be parsed, and 
        // have styles applied based on appropriate application 
        // criteria. Here we are interested in the speed of updating 
        // a document, rather than parsing, so we pre-parse the data.
        String[] str = new String[] {
            "The ", "quick ", "brown ", "fox ", "jumps ", 
            "over ", "the ", "lazy ", "dog. " };
        Color[] colors = 
            new Color[] { Color.red, Color.blue, Color.green };
        int[] sizes = new int[] { 10, 14, 12, 9, 16 };

        // Add the test repeatedly
        int offs = 0;
        int count = 0;
        SimpleAttributeSet attrs = new SimpleAttributeSet();
        for (int i = 0; i < iterations; i++) {
            for (int j = 0; j < str.length; j++) {
                // Make some random style changes
                StyleConstants.setFontSize(
                    attrs, sizes[count % sizes.length]);
                StyleConstants.setForeground(
                    attrs, colors[count % colors.length]);

                if (useBatch)
                    bDoc.appendBatchString(str[j], attrs);
                else
                    doc.insertString(offs, str[j], attrs);

                // Update out counters
                count++;
                offs += str[j].length();
            }

            // Add a linefeed after each instance of the string
            if (useBatch)
                bDoc.appendBatchLineFeed(attrs);
            else
                doc.insertString(offs, "\n", attrs);
            offs++;
        }

        // If we're testing the batch document, process all 
        // of the updates now.
        if (useBatch)
            bDoc.processBatchUpdates(0);

        System.out.println("Time to update = " +
            (System.currentTimeMillis() - start));
        System.out.println("Text size = " + offs);

        if (! updateWhileVisible) {
            if (useBatch) {
                jtp.setDocument(bDoc);
            }
            else {
                jtp.setDocument(doc);
            }
        }
    }

}

Figure 2. Simple test class for comparing large document initialization times

The Test class in Figure 2 should be run with three parameters:

“true” if the BatchDocument class should be used or “false” if a DefaultStyledDocument should be used.
“true” if the updates should be made while the document is attached to a visible JTextPane, or “false” if the updates should be made while the document is unattached.
The number of times that the test string should be repeated to build the document.

Figure 3 shows some sample results from running the Test class.

Test	Command line	Time (milliseconds)
Default document, updated while visible	`java Test false true 10000`	460827
Default document, updated while detached	`java Test false false 10000`	151591
Batch document, updated while visible	`java Test true true 10000`	30185
Batch document, updated while detached	`java Test true false 10000`	29444

Figure 3. Sample results from running the Test class.

As we noted in Part I, simply detaching a DefaultStyledDocument before inserting the text is roughly three times faster for this particular test. Switching to BatchDocument boosts the speed by another five times, producing an initialization time that is roughly 15 times faster than initializing a visible DefaultStyledDocument. With BatchDocument, visibility is less of a factor since only a single UI refresh will be triggered. The difference between the visible vs. detached times for BatchDocument shown in the results above is simply “noise”.

Results will, of course, vary considerably based on machine speed and memory, JDK version, document size, and (perhaps most importantly) the complexity of the document content.