发表文章

[C#] 修复/增强 THP 集成 Fix/enhance THP integration[jemalloc]

jasone 2017-10-9 41

e98a620(标记部分被清除的竞技场块作为 non-hugepage。尝试显式地与 Linux 的透明大页面 (THP) 功能交互, 但它有两个缺点。 首先, 它会错误地假设新的区块被创建, 就像是应用了 madvise(... MADV_HUGEPAGE) , 但情况并非如此, 因此区块创建代码需要添加一个显式调用。 更一般地, THP 请求可能会导致严重的可伸缩性问题, 具体取决于内核版本和配置。 在 Linux 4.6 之前, 甚至不可能调整内核以异步满足 THP 请求。 我们需要提供一种选择不使用显式 THP 请求的方法, 以便应用程序可以在必要时解决内核问题。 这可能只是一个 opt.thp 选项, 默认情况下, 在相关系统上为 true; 很难想象使用用例来实现更细粒度的控制。

原文:

e98a620 (Mark partially purged arena chunks as non-hugepage.) attempts to explicitly interact with Linux's transparent huge page (THP) functionality, but it has two shortcomings. First, it makes the mistake of assuming new chunks are created as if madvise(... MADV_HUGEPAGE) had been applied, but that is not the case, so the chunk creation code needs to add an explicit call. More generally, THP requests can cause serious scalability issues depending on the kernel version and configuration. Prior to Linux 4.6 it wasn't even possible to tune the kernel to satisfy THP requests asynchronously. We need to provide a way to opt out of explicit THP requests, so that applications can work around kernel issues as necessary. This can just be an opt.thp option that defaults to true on relevant systems; it's hard to imagine use cases for finer-grained control.

相关推荐
最新评论 (5)
jasone 2017-10-9
1

--disable-thp配置选项也具有潜在的价值。 有关其他上下文, 请参见#526 , 并确保此处的解决方案涵盖了这些问题。

原文:

A --disable-thp configure option is potentially worthwhile as well. See #526 for additional context and make sure the solution here covers those problems.

jasone 2017-10-9
2

有关其他上下文, 请参见#524

原文:

See #524 for additional context.

jasone 2017-10-9
3

集成为d84d290

原文:

Integrated as d84d290.

thecrazylex 2017-10-9
4

@jasone您是否可以 ellaborate 您所指的 Linux 4.6 中的确切内核补丁, 以及您推荐使用的内核 THP 配置以及 jemalloc 中的完全 THP 支持?

谢谢!

原文:

@jasone Can you maybe ellaborate on which exact kernel patches in Linux 4.6 you are referring to and which kernel THP configuration you would recommend using together with full THP support in jemalloc?

Thank you!

jasone 2017-10-9
5

@TheCrazyLex, 请参阅https://www.kernel.org/doc/Documentation/vm/transhuge.txt以了解有关 "延迟" 选项/系统/内核/mm/transparent_hugepage/碎片整理的文档, 这将使在没有 THPs 时可以避免阻塞立即可用。 虽然我没有个人经验的调整 Linux 与此选项, 它似乎提供了一个解决问题的障碍, 我已经看到的报告在 jemalloc 的上下文。 (NB: "延迟 + madvise" 选项似乎是全新.

jemalloc 4.5.0 中的 THP 支持是一种相当保守的方法, 因为它只保留了默认的 THP 状态, 用于巨大的分配 (2 + MiB), 而且它还为每个小块和大型分配都单独保留了默认状态, 直到从区块内清除未使用的脏页的位置。 一旦发生这种情况, 该区块将被迫非 THP 直到/除非完全丢弃, 此时相应的虚拟内存将恢复到默认的 THP 状态。 默认状态取决于/sys/内核/mm/transparent_hugepage/启用;有关详细信息, 请参见https://www.kernel.org/doc/Documentation/vm/transhuge.txt

我建议尝试的/系统/内核/毫米/transparent_hugepage/启用和/系统/内核/毫米/transparent_hugepage/碎片整理设置, 以找出什么最适合你。 它可能是 "总是" 和 "延迟" 是一个很好的方法, 但根据应用程序的行为, 您可能会更好地与其他一些组合。

原文:

@TheCrazyLex, see https://www.kernel.org/doc/Documentation/vm/transhuge.txt for documentation on the "defer" option for /sys/kernel/mm/transparent_hugepage/defrag, which should make it possible to avoid blocking when no THPs are immediately available. Although I don't have personal experience with tuning Linux with this option, it appears to provide a solution to the blocking issues I've seen reports of in the context of jemalloc. (NB: the "defer+madvise" option appears to be brand new.)

The THP support in jemalloc 4.5.0 is a pretty conservative approach, in that it leaves the default THP state alone for huge allocations (2+ MiB), and it also leaves the default state alone for each chunk from which small and large allocations are carved, up until the point where unused dirty pages are purged from within a chunk. Once that happens, the chunk is forced to be non-THP until/unless it is completely discarded, at which point the corresponding virtual memory is restored to the default THP state. The default state depends on /sys/kernel/mm/transparent_hugepage/enabled; see https://www.kernel.org/doc/Documentation/vm/transhuge.txt for details.

I would recommend experimenting with the /sys/kernel/mm/transparent_hugepage/enabled and /sys/kernel/mm/transparent_hugepage/defrag settings to figure out what works best for you. It may be that "always" and "defer" are a good approach, but depending on application behavior you may be better off with some other combination.

返回
发表文章
jasone
文章数
51
评论数
231
注册排名
5189