本文参考资源：

JAVA9 String新特性，说说你不知道的东西Java人在江湖-CSDN博客

StringBuffer和StringBuilder的区别

这里谈的是JDK8中StringBuffer和StringBuilder的实现，JDK9后随着String的变化，这两个类的实现也有了些许变化。

String是不可变的，String类和它的内部value数组都被声明为final。因此通常对字符串进行拼接操作总是会生成一个新的String对象，效率较低。
为了改进字符串拼接的效率，出现了StringBuffer和StringBuilder，StringBuffer是线程安全的，因为它在主要方法上加了synchronized关键字，不过也因此会有加锁解锁的开销，性能会收到影响，而StringBuilder就是StringBuffer的线程不安全版本，它没有使用synchronized关键字，性能最好。
StringBuffer和StringBuilder都继承自AbstractStringBuilder，主要方法也都由这个父类实现了，基本的原理是它内部有一个字符数组，append的时候通过System.arraycopy将String中的字符数组拷贝至该builder的数组末尾，如果数组空间不足，则扩容为两倍大小，toString的时候通过这个字符数组来构造字符串。
StringBuffer和StringBuider都可以在构造函数中自定义初始化时内部字符数组的大小，如果知道要拼接字符串最后的大致大小，建议给它一个初始容量，避免数组扩容带来的开销，这点就和ArrayList差不多。

AbstractStringBuilder中append的实现：

public AbstractStringBuilder append(String str) {
    if (str == null)
        return appendNull();
    int len = str.length();
    ensureCapacityInternal(count + len);
    str.getChars(0, len, value, count);
    count += len;
    return this;
}

String中getChars的实现：（由String负责将自己内部数组内容附加到builder中的数组末尾）

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > value.length) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

JDK9 String解读

很久之前，我在详解java中的unicode编码（码点）中提到，java使用的char编码就是UTF-16，并且String的底层也是一个char数组。

UTF-16的编码规则是什么，这里复习一下：

BMP（基本多语言平面）内的字符，统一使用2个字节表示，我们称其为一个码点，它已经可以表示大多数常见文字。
辅助平面内的字符，使用高代理码点和低代理码点联合表示，共四个字节，可以表示完所有unicode中的字符。

但是，UTF-16存在的问题是，对于大多数拉丁语系中的字符（例如ASCII码中的字符），都可以使用1个字节表示，然而UTF-16必须使用两个字节来表示它，就会导致空间浪费。

于是JDK9中，大刀阔斧地对String及相关方法进行了重构。

需要注意的是，JDK9中的value数组变为了byte数组，意味着最小允许单字节的存在。并且多了一个coder字段，javadoc中是这样描述的：

The identifier of the encoding used to encode the bytes in value. The supported values in this implementation are LATIN1 UTF16

说明coder用于标识这个bytes数组的编码是utf-16还是latin1。（根据测试1代表UTF16，0代表latin1）

可以看到占用空间直接减少了1/2。

由于byte数组的使用方式，引申出了两个类StringLatin1和StringUTF16两个类，分担String类的操作。

我们来看看JDK9中String几个方法的源码：

public int codePointCount(int beginIndex, int endIndex) {
    if (beginIndex < 0 || beginIndex > endIndex ||
        endIndex > length()) {
        throw new IndexOutOfBoundsException();
    }
    if (isLatin1()) {
        return endIndex - beginIndex;
    }
    return StringUTF16.codePointCount(value, beginIndex, endIndex);
}

这个方法用于计算码点数，对于latin1编码，自然就是byte数组的长度。对于UTF16，和JDK8中的思想类似：

private static int codePointCount(byte[] value, int beginIndex, int endIndex, boolean checked) {
    assert beginIndex <= endIndex;
    int count = endIndex - beginIndex;
    int i = beginIndex;
    if (checked && i < endIndex) {
        checkBoundsBeginEnd(i, endIndex, value);
    }
    for (; i < endIndex - 1; ) {
        if (Character.isHighSurrogate(getChar(value, i++)) &&
            Character.isLowSurrogate(getChar(value, i))) {
            count--;
            i++;
        }
    }
    return count;
}

再来看length():

public int length() {
    return value.length >> coder();
}

static final boolean COMPACT_STRINGS;

static {
    COMPACT_STRINGS = true;
}

byte coder() {
    return COMPACT_STRINGS ? coder : UTF16;
}

javadoc对COMPACT_STRINGS的解释是：

If String compaction is disabled, the bytes in value are always encoded in UTF16

即COMPACT_STRINGS默认是开启的，即同时支持UTF16和LATIN1，但具体的编译器可以将其关闭，使其只支持UTF16。

这里计算长度的方法依然很简单，对于LATIN1就是byte数组的长度，对于UTF16则是byte数组长度除以2。

JDK9中对字符串的改动带来的问题是什么呢？可能有人知道，java中数组的大小是有上限的，许多集合类都有下面这个字段：

/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

即数组的大小上限不应该超过Integer.MAX_VALUE - 8，那么大小相等的情况下，JDK9中byte数组自然也就比JDK8中的char数组能存储的信息少了1/2。

JDK9带来的StringBuffer/StringBuilder变化

JDK9中String的改版，使得StringBuffer/StringBuilder的实现也有了小变化。如下所示：

abstract class AbstractStringBuilder implements Appendable, CharSequence {
    /**
     * The value is used for character storage.
     */
    byte[] value;

    /**
     * The id of the encoding used to encode the bytes in {@code value}.
     */
    byte coder;
    //...
}

AbstractStringBuilder中的内部数组也变为了byte数组，并且增加了coder字段。

public AbstractStringBuilder append(String str) {
    if (str == null) {
        return appendNull();
    }
    int len = str.length();
    ensureCapacityInternal(count + len);
    putStringAt(count, str);
    count += len;
    return this;
}

private final void putStringAt(int index, String str) {
    if (getCoder() != str.coder()) {
        inflate();
    }
    str.getBytes(value, index, coder);
}

void getBytes(byte dst[], int dstBegin, byte coder) {
    if (coder() == coder) {
        System.arraycopy(value, 0, dst, dstBegin << coder, value.length);
    } else {    // this.coder == LATIN && coder == UTF16
        StringLatin1.inflate(value, 0, dst, dstBegin, value.length);
    }
}

可以看到加入了一些对coder的判断，Builder和String必须维持相同的coder。

原创文章，作者：彭晨涛，如若转载，请注明出处：https://www.codetool.top/article/%e8%b0%88%e4%b8%80%e8%b0%88stringbuffer%e3%80%81stringbuilder%e5%8f%8astring%e5%9c%a8jdk9%e4%b8%ad%e7%9a%84%e5%8f%98%e5%8c%96/

谈一谈StringBuffer、StringBuilder及String在JDK9中的变化

StringBuffer和StringBuilder的区别

JDK9 String解读

JDK9带来的StringBuffer/StringBuilder变化

发表回复

谈一谈StringBuffer、StringBuilder及String在JDK9中的变化

StringBuffer和StringBuilder的区别

JDK9 String解读

JDK9带来的StringBuffer/StringBuilder变化

相关推荐

发表回复