深入分析Golang sync.pool优化

前言

最近golang的1.13版本发布了,有很多新特性与改进合入。这里主要分析sync.pool的优化。

本文主要解答以下几个问题:

  1. sync.pool优化体现在哪里?
  2. 优化是如何实现?
  3. 优化的好处有哪些?

优化

具体优化项如下:

  1. 无锁化
  2. GC策略

无锁化

sync.pool实现了无锁化,具体如下:

go1.12.1版本实现

// Local per-P Pool appendix.
type poolLocalInternal struct {
    private interface{}   // Can be used only by the respective P.
    shared  []interface{} // Can be used by any P.
    Mutex                 // Protects shared.
}

go1.13版本

// Local per-P Pool appendix.
type poolLocalInternal struct {
    private interface{} // Can be used only by the respective P.
    shared  poolChain   // Local P can pushHead/popHead; any P can popTail.
}

通过上面对比发现了go1.12版本的Mutex删除了。那么go1.13版本又是如何实现无锁化的呢?

先回答问题:go1.13通过poolChain实现SPMC的无锁队列来实现无锁化。

poolChain是什么东东呢?

别急,代码面前无秘密。 我们具体看一下代码就可以了。

// poolChain is a dynamically-sized version of poolDequeue.
//
// This is implemented as a doubly-linked list queue of poolDequeues
// where each dequeue is double the size of the previous one. Once a
// dequeue fills up, this allocates a new one and only ever pushes to
// the latest dequeue. Pops happen from the other end of the list and
// once a dequeue is exhausted, it gets removed from the list.
type poolChain struct {
    // head is the poolDequeue to push to. This is only accessed
    // by the producer, so doesn't need to be synchronized.
    head *poolChainElt

    // tail is the poolDequeue to popTail from. This is accessed
    // by consumers, so reads and writes must be atomic.
    tail *poolChainElt
}


type poolChainElt struct {
    poolDequeue

    // next and prev link to the adjacent poolChainElts in this
    // poolChain.
    //
    // next is written atomically by the producer and read
    // atomically by the consumer. It only transitions from nil to
    // non-nil.
    //
    // prev is written atomically by the consumer and read
    // atomically by the producer. It only transitions from
    // non-nil to nil.
    next, prev *poolChainElt
} 

关于poolChain是如何实现SPMC无锁队列?具体可以分析poolqueue.go的代码。 这一部分不展开说明,要点如下:

  1. 无锁队列是SPMC
  2. 无锁队列是可以灵活调整大小,调整大小的方法:slice+double-list实现(根据这个思路来阅读代码也是容易理解 )
  3. 无锁队列的实现基础是CAS

好处

  1. 避免锁的开销,mutex变成atomic

GC策略

相比较于go1.12版本,go1.13版本中增加了victim cache。具体作法是:

  1. GC处理过程直接回收oldPools的对象
  2. GC处理并不直接将allPools的object直接进行GC处理,而是保存到oldPools,等到下一个GC周期到了再处理

具体代码如下:

var (
    allPoolsMu Mutex
    allPools   []*Pool
)
var (
    allPoolsMu Mutex

    // allPools is the set of pools that have non-empty primary
    // caches. Protected by either 1) allPoolsMu and pinning or 2)
    // STW.
    allPools []*Pool

    // oldPools is the set of pools that may have non-empty victim
    // caches. Protected by STW.
    oldPools []*Pool
)
func poolCleanup() {
    // This function is called with the world stopped, at the beginning of a garbage collection.
    // It must not allocate and probably should not call any runtime functions.

    // Because the world is stopped, no pool user can be in a
    // pinned section (in effect, this has all Ps pinned).

    // Drop victim caches from all pools.
    for _, p := range oldPools {
        p.victim = nil
        p.victimSize = 0
    }

    // Move primary cache to victim cache.
    for _, p := range allPools {
        p.victim = p.local
        p.victimSize = p.localSize
        p.local = nil
        p.localSize = 0
    }

    // The pools with non-empty primary caches now have non-empty
    // victim caches and no pools have primary caches.
    oldPools, allPools = allPools, nil
}

这样可导致Get的实现有变化,原来的实现是:

  1. 先从本P绑定的poolLocal获取对象:先从本poolLocal的private池获取对象,再从本poolLocal的shared池获取对象
  2. 上一步没有成功获取对象,再从其他P的shared池获取对象
  3. 上一步没有成功获取对象,则从Heap申请对象

引入victim cache,Get实现变成如下:

  1. 先从本P绑定的poolLocal获取对象:先从本poolLocal的private池获取对象,再从本poolLocal的shared池获取对象
  2. 上一步没有成功获取对象,再从其他P的shared池获取对象
  3. 上一步没有成功,则从victim cache获取对象
  4. 上一步没有成功获取对象,则从Heap申请对象

具体代码如下:

func (p *Pool) getSlow(pid int) interface{} {
    // See the comment in pin regarding ordering of the loads.
    size := atomic.LoadUintptr(&p.localSize) // load-acquire
    locals := p.local                        // load-consume
    // Try to steal one element from other procs.
    for i := 0; i < int(size); i++ {
        l := indexLocal(locals, (pid+i+1)%int(size))
        if x, _ := l.shared.popTail(); x != nil {
            return x
        }
    }

    // Try the victim cache. We do this after attempting to steal
    // from all primary caches because we want objects in the
    // victim cache to age out if at all possible.
    // 尝试从victim cache获取
    size = atomic.LoadUintptr(&p.victimSize)
    if uintptr(pid) >= size {
        return nil
    }
    locals = p.victim
    l := indexLocal(locals, pid)
    if x := l.private; x != nil {
        l.private = nil
        return x
    }
    for i := 0; i < int(size); i++ {
        l := indexLocal(locals, (pid+i)%int(size))
        if x, _ := l.shared.popTail(); x != nil {
            return x
        }
    }

    // Mark the victim cache as empty for future gets don't bother
    // with it.
    atomic.StoreUintptr(&p.victimSize, 0)

    return nil
}

好处

  1. 空间上通过引入victim cache增加了Get获取内存的选项,增加了对象复用的概率
  2. 时间上通过延迟GC,增加了对象复用的时间长度
  3. 上面这个两个方面降低了GC开销,增加了对象使用效率

参考

  1. 深入分析Golang sync.pool
Share Comments
comments powered by Disqus