30% of RAM is “buffers”. What is it?












9















$ free -h
total used free shared buff/cache available
Mem: 501M 146M 19M 9.7M 335M 331M
Swap: 1.0G 85M 938M

$ free -w -h
total used free shared buffers cache available
Mem: 501M 146M 19M 9.7M 155M 180M 331M
Swap: 1.0G 85M 938M


How can I describe or explain "buffers" in the output of free?



I don't have any (known) problem with this system. I am only surprised and curious to see that "buffers" is almost as high as "cache" (155M v.s. 180M). I thought "cache" represented the page cache of file contents, and tends to be the most significant part of "cache/buffers". I'm less clear what "buffers" are for.



For example I compared this to my laptop, which has more RAM. On my laptop the "buffers" figure is an order of magnitude smaller than "cache" (200M v.s. 4G). If I had a proper understanding of what "buffers" were, then I could start to ask why the buffers might grow to such a larger proportion on the smaller system.



man proc (I ignore the hilariously outdated definition of "large"):




Buffers %lu



Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).



Cached %lu



In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.






$ free -V
free from procps-ng 3.3.12
$ uname -r
4.9.0-6-marvell
$ systemd-detect-virt
none

$ cat /proc/meminfo
MemTotal: 513976 kB
MemFree: 20100 kB
MemAvailable: 339304 kB
Buffers: 159220 kB
Cached: 155536 kB
SwapCached: 2420 kB
Active: 215044 kB
Inactive: 216760 kB
Active(anon): 56556 kB
Inactive(anon): 73280 kB
Active(file): 158488 kB
Inactive(file): 143480 kB
Unevictable: 10760 kB
Mlocked: 10760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513976 kB
LowFree: 20100 kB
SwapTotal: 1048572 kB
SwapFree: 960532 kB
Dirty: 240 kB
Writeback: 0 kB
AnonPages: 126912 kB
Mapped: 40312 kB
Shmem: 9916 kB
Slab: 37580 kB
SReclaimable: 29036 kB
SUnreclaim: 8544 kB
KernelStack: 1472 kB
PageTables: 3108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1305560 kB
Committed_AS: 1155244 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB

$ sudo slabtop --once
Active / Total Objects (% used) : 186139 / 212611 (87.5%)
Active / Total Slabs (% used) : 9115 / 9115 (100.0%)
Active / Total Caches (% used) : 66 / 92 (71.7%)
Active / Total Size (% used) : 31838.34K / 35031.49K (90.9%)
Minimum / Average / Maximum Object : 0.02K / 0.16K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
59968 57222 0% 0.06K 937 64 3748K buffer_head
29010 21923 0% 0.13K 967 30 3868K dentry
24306 23842 0% 0.58K 4051 6 16204K ext4_inode_cache
22072 20576 0% 0.03K 178 124 712K kmalloc-32
10290 9756 0% 0.09K 245 42 980K kmalloc-96
9152 4582 0% 0.06K 143 64 572K kmalloc-node
9027 8914 0% 0.08K 177 51 708K kernfs_node_cache
7007 3830 0% 0.30K 539 13 2156K radix_tree_node
5952 4466 0% 0.03K 48 124 192K jbd2_revoke_record_s
5889 5870 0% 0.30K 453 13 1812K inode_cache
5705 4479 0% 0.02K 35 163 140K file_lock_ctx
3844 3464 0% 0.03K 31 124 124K anon_vma
3280 3032 0% 0.25K 205 16 820K kmalloc-256
2730 2720 0% 0.10K 70 39 280K btrfs_trans_handle
2025 1749 0% 0.16K 81 25 324K filp
1952 1844 0% 0.12K 61 32 244K kmalloc-128
1826 532 0% 0.05K 22 83 88K trace_event_file
1392 1384 0% 0.33K 116 12 464K proc_inode_cache
1067 1050 0% 0.34K 97 11 388K shmem_inode_cache
987 768 0% 0.19K 47 21 188K kmalloc-192
848 757 0% 0.50K 106 8 424K kmalloc-512
450 448 0% 0.38K 45 10 180K ubifs_inode_slab
297 200 0% 0.04K 3 99 12K eventpoll_pwq
288 288 100% 1.00K 72 4 288K kmalloc-1024
288 288 100% 0.22K 16 18 64K mnt_cache
287 283 0% 1.05K 41 7 328K idr_layer_cache
240 8 0% 0.02K 1 240 4K fscrypt_info









share|improve this question




















  • 3





    linuxatemyram.com is useful to read

    – Basile Starynkevitch
    Apr 28 '18 at 17:48
















9















$ free -h
total used free shared buff/cache available
Mem: 501M 146M 19M 9.7M 335M 331M
Swap: 1.0G 85M 938M

$ free -w -h
total used free shared buffers cache available
Mem: 501M 146M 19M 9.7M 155M 180M 331M
Swap: 1.0G 85M 938M


How can I describe or explain "buffers" in the output of free?



I don't have any (known) problem with this system. I am only surprised and curious to see that "buffers" is almost as high as "cache" (155M v.s. 180M). I thought "cache" represented the page cache of file contents, and tends to be the most significant part of "cache/buffers". I'm less clear what "buffers" are for.



For example I compared this to my laptop, which has more RAM. On my laptop the "buffers" figure is an order of magnitude smaller than "cache" (200M v.s. 4G). If I had a proper understanding of what "buffers" were, then I could start to ask why the buffers might grow to such a larger proportion on the smaller system.



man proc (I ignore the hilariously outdated definition of "large"):




Buffers %lu



Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).



Cached %lu



In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.






$ free -V
free from procps-ng 3.3.12
$ uname -r
4.9.0-6-marvell
$ systemd-detect-virt
none

$ cat /proc/meminfo
MemTotal: 513976 kB
MemFree: 20100 kB
MemAvailable: 339304 kB
Buffers: 159220 kB
Cached: 155536 kB
SwapCached: 2420 kB
Active: 215044 kB
Inactive: 216760 kB
Active(anon): 56556 kB
Inactive(anon): 73280 kB
Active(file): 158488 kB
Inactive(file): 143480 kB
Unevictable: 10760 kB
Mlocked: 10760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513976 kB
LowFree: 20100 kB
SwapTotal: 1048572 kB
SwapFree: 960532 kB
Dirty: 240 kB
Writeback: 0 kB
AnonPages: 126912 kB
Mapped: 40312 kB
Shmem: 9916 kB
Slab: 37580 kB
SReclaimable: 29036 kB
SUnreclaim: 8544 kB
KernelStack: 1472 kB
PageTables: 3108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1305560 kB
Committed_AS: 1155244 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB

$ sudo slabtop --once
Active / Total Objects (% used) : 186139 / 212611 (87.5%)
Active / Total Slabs (% used) : 9115 / 9115 (100.0%)
Active / Total Caches (% used) : 66 / 92 (71.7%)
Active / Total Size (% used) : 31838.34K / 35031.49K (90.9%)
Minimum / Average / Maximum Object : 0.02K / 0.16K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
59968 57222 0% 0.06K 937 64 3748K buffer_head
29010 21923 0% 0.13K 967 30 3868K dentry
24306 23842 0% 0.58K 4051 6 16204K ext4_inode_cache
22072 20576 0% 0.03K 178 124 712K kmalloc-32
10290 9756 0% 0.09K 245 42 980K kmalloc-96
9152 4582 0% 0.06K 143 64 572K kmalloc-node
9027 8914 0% 0.08K 177 51 708K kernfs_node_cache
7007 3830 0% 0.30K 539 13 2156K radix_tree_node
5952 4466 0% 0.03K 48 124 192K jbd2_revoke_record_s
5889 5870 0% 0.30K 453 13 1812K inode_cache
5705 4479 0% 0.02K 35 163 140K file_lock_ctx
3844 3464 0% 0.03K 31 124 124K anon_vma
3280 3032 0% 0.25K 205 16 820K kmalloc-256
2730 2720 0% 0.10K 70 39 280K btrfs_trans_handle
2025 1749 0% 0.16K 81 25 324K filp
1952 1844 0% 0.12K 61 32 244K kmalloc-128
1826 532 0% 0.05K 22 83 88K trace_event_file
1392 1384 0% 0.33K 116 12 464K proc_inode_cache
1067 1050 0% 0.34K 97 11 388K shmem_inode_cache
987 768 0% 0.19K 47 21 188K kmalloc-192
848 757 0% 0.50K 106 8 424K kmalloc-512
450 448 0% 0.38K 45 10 180K ubifs_inode_slab
297 200 0% 0.04K 3 99 12K eventpoll_pwq
288 288 100% 1.00K 72 4 288K kmalloc-1024
288 288 100% 0.22K 16 18 64K mnt_cache
287 283 0% 1.05K 41 7 328K idr_layer_cache
240 8 0% 0.02K 1 240 4K fscrypt_info









share|improve this question




















  • 3





    linuxatemyram.com is useful to read

    – Basile Starynkevitch
    Apr 28 '18 at 17:48














9












9








9


4






$ free -h
total used free shared buff/cache available
Mem: 501M 146M 19M 9.7M 335M 331M
Swap: 1.0G 85M 938M

$ free -w -h
total used free shared buffers cache available
Mem: 501M 146M 19M 9.7M 155M 180M 331M
Swap: 1.0G 85M 938M


How can I describe or explain "buffers" in the output of free?



I don't have any (known) problem with this system. I am only surprised and curious to see that "buffers" is almost as high as "cache" (155M v.s. 180M). I thought "cache" represented the page cache of file contents, and tends to be the most significant part of "cache/buffers". I'm less clear what "buffers" are for.



For example I compared this to my laptop, which has more RAM. On my laptop the "buffers" figure is an order of magnitude smaller than "cache" (200M v.s. 4G). If I had a proper understanding of what "buffers" were, then I could start to ask why the buffers might grow to such a larger proportion on the smaller system.



man proc (I ignore the hilariously outdated definition of "large"):




Buffers %lu



Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).



Cached %lu



In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.






$ free -V
free from procps-ng 3.3.12
$ uname -r
4.9.0-6-marvell
$ systemd-detect-virt
none

$ cat /proc/meminfo
MemTotal: 513976 kB
MemFree: 20100 kB
MemAvailable: 339304 kB
Buffers: 159220 kB
Cached: 155536 kB
SwapCached: 2420 kB
Active: 215044 kB
Inactive: 216760 kB
Active(anon): 56556 kB
Inactive(anon): 73280 kB
Active(file): 158488 kB
Inactive(file): 143480 kB
Unevictable: 10760 kB
Mlocked: 10760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513976 kB
LowFree: 20100 kB
SwapTotal: 1048572 kB
SwapFree: 960532 kB
Dirty: 240 kB
Writeback: 0 kB
AnonPages: 126912 kB
Mapped: 40312 kB
Shmem: 9916 kB
Slab: 37580 kB
SReclaimable: 29036 kB
SUnreclaim: 8544 kB
KernelStack: 1472 kB
PageTables: 3108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1305560 kB
Committed_AS: 1155244 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB

$ sudo slabtop --once
Active / Total Objects (% used) : 186139 / 212611 (87.5%)
Active / Total Slabs (% used) : 9115 / 9115 (100.0%)
Active / Total Caches (% used) : 66 / 92 (71.7%)
Active / Total Size (% used) : 31838.34K / 35031.49K (90.9%)
Minimum / Average / Maximum Object : 0.02K / 0.16K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
59968 57222 0% 0.06K 937 64 3748K buffer_head
29010 21923 0% 0.13K 967 30 3868K dentry
24306 23842 0% 0.58K 4051 6 16204K ext4_inode_cache
22072 20576 0% 0.03K 178 124 712K kmalloc-32
10290 9756 0% 0.09K 245 42 980K kmalloc-96
9152 4582 0% 0.06K 143 64 572K kmalloc-node
9027 8914 0% 0.08K 177 51 708K kernfs_node_cache
7007 3830 0% 0.30K 539 13 2156K radix_tree_node
5952 4466 0% 0.03K 48 124 192K jbd2_revoke_record_s
5889 5870 0% 0.30K 453 13 1812K inode_cache
5705 4479 0% 0.02K 35 163 140K file_lock_ctx
3844 3464 0% 0.03K 31 124 124K anon_vma
3280 3032 0% 0.25K 205 16 820K kmalloc-256
2730 2720 0% 0.10K 70 39 280K btrfs_trans_handle
2025 1749 0% 0.16K 81 25 324K filp
1952 1844 0% 0.12K 61 32 244K kmalloc-128
1826 532 0% 0.05K 22 83 88K trace_event_file
1392 1384 0% 0.33K 116 12 464K proc_inode_cache
1067 1050 0% 0.34K 97 11 388K shmem_inode_cache
987 768 0% 0.19K 47 21 188K kmalloc-192
848 757 0% 0.50K 106 8 424K kmalloc-512
450 448 0% 0.38K 45 10 180K ubifs_inode_slab
297 200 0% 0.04K 3 99 12K eventpoll_pwq
288 288 100% 1.00K 72 4 288K kmalloc-1024
288 288 100% 0.22K 16 18 64K mnt_cache
287 283 0% 1.05K 41 7 328K idr_layer_cache
240 8 0% 0.02K 1 240 4K fscrypt_info









share|improve this question
















$ free -h
total used free shared buff/cache available
Mem: 501M 146M 19M 9.7M 335M 331M
Swap: 1.0G 85M 938M

$ free -w -h
total used free shared buffers cache available
Mem: 501M 146M 19M 9.7M 155M 180M 331M
Swap: 1.0G 85M 938M


How can I describe or explain "buffers" in the output of free?



I don't have any (known) problem with this system. I am only surprised and curious to see that "buffers" is almost as high as "cache" (155M v.s. 180M). I thought "cache" represented the page cache of file contents, and tends to be the most significant part of "cache/buffers". I'm less clear what "buffers" are for.



For example I compared this to my laptop, which has more RAM. On my laptop the "buffers" figure is an order of magnitude smaller than "cache" (200M v.s. 4G). If I had a proper understanding of what "buffers" were, then I could start to ask why the buffers might grow to such a larger proportion on the smaller system.



man proc (I ignore the hilariously outdated definition of "large"):




Buffers %lu



Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so).



Cached %lu



In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached.






$ free -V
free from procps-ng 3.3.12
$ uname -r
4.9.0-6-marvell
$ systemd-detect-virt
none

$ cat /proc/meminfo
MemTotal: 513976 kB
MemFree: 20100 kB
MemAvailable: 339304 kB
Buffers: 159220 kB
Cached: 155536 kB
SwapCached: 2420 kB
Active: 215044 kB
Inactive: 216760 kB
Active(anon): 56556 kB
Inactive(anon): 73280 kB
Active(file): 158488 kB
Inactive(file): 143480 kB
Unevictable: 10760 kB
Mlocked: 10760 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 513976 kB
LowFree: 20100 kB
SwapTotal: 1048572 kB
SwapFree: 960532 kB
Dirty: 240 kB
Writeback: 0 kB
AnonPages: 126912 kB
Mapped: 40312 kB
Shmem: 9916 kB
Slab: 37580 kB
SReclaimable: 29036 kB
SUnreclaim: 8544 kB
KernelStack: 1472 kB
PageTables: 3108 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1305560 kB
Committed_AS: 1155244 kB
VmallocTotal: 507904 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB

$ sudo slabtop --once
Active / Total Objects (% used) : 186139 / 212611 (87.5%)
Active / Total Slabs (% used) : 9115 / 9115 (100.0%)
Active / Total Caches (% used) : 66 / 92 (71.7%)
Active / Total Size (% used) : 31838.34K / 35031.49K (90.9%)
Minimum / Average / Maximum Object : 0.02K / 0.16K / 4096.00K

OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
59968 57222 0% 0.06K 937 64 3748K buffer_head
29010 21923 0% 0.13K 967 30 3868K dentry
24306 23842 0% 0.58K 4051 6 16204K ext4_inode_cache
22072 20576 0% 0.03K 178 124 712K kmalloc-32
10290 9756 0% 0.09K 245 42 980K kmalloc-96
9152 4582 0% 0.06K 143 64 572K kmalloc-node
9027 8914 0% 0.08K 177 51 708K kernfs_node_cache
7007 3830 0% 0.30K 539 13 2156K radix_tree_node
5952 4466 0% 0.03K 48 124 192K jbd2_revoke_record_s
5889 5870 0% 0.30K 453 13 1812K inode_cache
5705 4479 0% 0.02K 35 163 140K file_lock_ctx
3844 3464 0% 0.03K 31 124 124K anon_vma
3280 3032 0% 0.25K 205 16 820K kmalloc-256
2730 2720 0% 0.10K 70 39 280K btrfs_trans_handle
2025 1749 0% 0.16K 81 25 324K filp
1952 1844 0% 0.12K 61 32 244K kmalloc-128
1826 532 0% 0.05K 22 83 88K trace_event_file
1392 1384 0% 0.33K 116 12 464K proc_inode_cache
1067 1050 0% 0.34K 97 11 388K shmem_inode_cache
987 768 0% 0.19K 47 21 188K kmalloc-192
848 757 0% 0.50K 106 8 424K kmalloc-512
450 448 0% 0.38K 45 10 180K ubifs_inode_slab
297 200 0% 0.04K 3 99 12K eventpoll_pwq
288 288 100% 1.00K 72 4 288K kmalloc-1024
288 288 100% 0.22K 16 18 64K mnt_cache
287 283 0% 1.05K 41 7 328K idr_layer_cache
240 8 0% 0.02K 1 240 4K fscrypt_info






linux memory cache






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Oct 25 '18 at 15:01







sourcejedi

















asked Apr 28 '18 at 10:13









sourcejedisourcejedi

24.3k440107




24.3k440107








  • 3





    linuxatemyram.com is useful to read

    – Basile Starynkevitch
    Apr 28 '18 at 17:48














  • 3





    linuxatemyram.com is useful to read

    – Basile Starynkevitch
    Apr 28 '18 at 17:48








3




3





linuxatemyram.com is useful to read

– Basile Starynkevitch
Apr 28 '18 at 17:48





linuxatemyram.com is useful to read

– Basile Starynkevitch
Apr 28 '18 at 17:48










2 Answers
2






active

oldest

votes


















8















  1. What is the difference between "buffer", and the other cache?

  2. Why might we expect Buffers in particular to be larger or smaller?




1. What is the difference between "buffer", and the other cache?



In Linux 2.4 (released in 2001) and above, Buffers reports the amount of page cache used for block devices. The kernel has to deliberately subtract this amount from the rest of the page cache when it reports Cached. See meminfo_proc_show():



cached = global_node_page_state(NR_FILE_PAGES) -
total_swapcache_pages() - i.bufferram;
...

show_val_kb(m, "MemTotal: ", i.totalram);
show_val_kb(m, "MemFree: ", i.freeram);
show_val_kb(m, "MemAvailable: ", available);
show_val_kb(m, "Buffers: ", i.bufferram);
show_val_kb(m, "Cached: ", cached);


The page cache is tied to the MMU page size, typically a minimum of 4096 bytes. This is essential for mmap(), i.e. memory-mapped file access.[1][2] It is used to share pages of loaded program/library code between independent processes, and allow loading individual pages on demand. (Also for unloading pages when something else needs the space, and they haven't been used recently).



[1] Memory-mapped I/O - The GNU C Library manual.

[2] mmap - Wikipedia.



UNIX started with a "buffer cache" of disk blocks, and no mmap(). Apparently when mmap() was first added, they simply bolted the page cache on top of the buffer cache. This is as messy as it sounds, so eventually all the UNIX-based OS's removed the buffer cache. Data is cached in units of pages. Pages are looked up by (file, offset), not by location on disk. This was called "unified buffer cache", perhaps because people were more familiar with "buffer cache".[3]



[3] UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD



In Linux 2.2 there was a separate "buffer cache" used for writes, but not for reads. "The page cache used the buffer cache to write back its data, needing an extra copy of the data, and doubling memory requirements for some write loads" (?).[4] Let's not worry too much about the details, but this history would be one reason why Linux reports Buffers usage separately.



[4] Page replacement in Linux 2.4 memory management, Rik van Riel.



In Linux 2.4 and above, there is no extra copy. "The system does disk IO directly to and from the page cache page."[4]



("One interesting twist that Linux adds is that the device block numbers where a page is stored on disk are cached with the page in the form of a list of buffer_head structures. When a modified page is to be written back to disk, the I/O requests can be sent to the device driver right away, without needing to read any indirect blocks to determine where the page's data should be written."[3])



Block device files have page cache. This is used "for filesystem metadata and the caching of raw block devices".[4] But filesystems do not copy file contents through it, so there is no "double caching".



I think of the Buffers part of the page cache as the Linux buffer cache. My use of this term is idiosyncratic, and conflicts with some cited sources.



It varies between filesystem types, but ext3/ext4 do use the Linux buffer cache for filesystem metadata, including for directory contents and the journal. The system in the question uses ext4.




Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data.



...



The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.




-- A pair of Quora answers by Robert Love, last updated 2013.




Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
jbd2 layer to handle their physical block journalling, and this layer
fundamentally uses the buffer cache.




-- Email article by Ted Tso, 2013



The first source is already cited on StackExchange, as the most authoritative answer for this question. linux - What is the buffers column in the output from free?. The second source has more specific technical detail. Both writers are Linux developers who worked with Linux kernel memory management.



It is true that filesystems may perform partial-page metadata writes, even though the cache is indexed in pages. Even user processes can perform partial-page writes when they use write() (as opposed to mmap()), at least directly to a block device.



Linus likes to rant that the buffer cache is not required in order to do block-sized IO, and that filesystems can do partial-page metadata writes even when they attach page cache to virtual file(s) instead of the block device. I am sure he is right that ext2 does this. It is less clear what the difficulty was for ext4 and its journalling system; the people he was ranting at got tired of explaining.



ext4_readdir() has not been changed to satisfy Linus' rant. I don't see the desired approach used in readdir() of other filesystems either. I think XFS uses the buffer cache for directories as well. bcachefs does not use the page cache for readdir(); it uses its own cache for btrees. I might be missing something in btrfs.



2. Why might we expect Buffers in particular to be larger or smaller?



As mentioned above, it varies depending on whether (and how much) the type of filesystem uses the buffer cache.



It turns out the ext4 journal size for my filesystem is 128M. So this explains why 1) my buffer cache can stabilize at slightly over 128M; 2) buffer cache does not scale proportionally with the larger amount of RAM on my laptop.



To verify that journal writes use the buffer cache, simulate a filesystem in nice fast RAM (tmpfs), and compare the maximum buffer usage for different journal sizes.



# dd if=/dev/zero of=/tmp/t bs=1M count=1000
...
# mkfs.ext4 /tmp/t -J size=256
...
# LANG=C dumpe2fs /tmp/t | grep '^Journal size'
dumpe2fs 1.43.5 (04-Aug-2017)
Journal size: 256M
# mount /tmp/t /mnt
# cd /mnt
# free -w -m
total used free shared buffers cache available
Mem: 7855 2521 4321 285 66 947 5105
Swap: 7995 0 7995

# for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
# free -w -m
total used free shared buffers cache available
Mem: 7855 2523 3872 551 237 1223 4835
Swap: 7995 0 7995




# dd if=/dev/zero of=/tmp/t bs=1M count=1000
...
# mkfs.ext4 /tmp/t -J size=16
...
# LANG=C dumpe2fs /tmp/t | grep '^Journal size'
dumpe2fs 1.43.5 (04-Aug-2017)
Journal size: 16M
# mount /tmp/t /mnt
# cd /mnt
# free -w -m
total used free shared buffers cache available
Mem: 7855 2507 4337 285 66 943 5118
Swap: 7995 0 7995

# for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
# free -w -m
total used free shared buffers cache available
Mem: 7855 2509 4290 315 77 977 5086
Swap: 7995 0 7995


How I came to look at the journal



I had found Ted Tso's email first, and was intrigued that it emphasized write caching. I would find it surprising if "dirty", unwritten data was able to reach 30% of RAM on my system. sudo atop shows that over a 10 second interval, the system in question consistently writes only 1MB. The filesystem concerned would be able to keep up with over 100 times this rate. (It's on a USB2 hard disk drive, max throughput ~20MB/s).



Using blktrace (btrace -w 10 /dev/sda) confirms that the IOs which are being cached must be writes, because there is almost no data being read. Also that mysqld is the only userspace process doing IO.



I stopped the service responsible for the writes (icinga2 writing to mysql) and re-checked. I saw "buffers" drop to under 20M - I have no explanation for that - and stay there. Restarting the writer again shows "buffers" rising by ~0.1M for each 10 second interval. I observed it maintain this rate consistently, climbing back to 70M and above.



Running echo 3 | sudo tee /proc/sys/vm/drop_caches was sufficient to lower "buffers" again, to 4.5M. This proves that my accumulation of buffers is a "clean" cache, which Linux can drop immediately when required. This system is not accumulating unwritten data. (drop_caches does not perform any writeback and hence cannot drop dirty pages. If you wanted to run a test which cleaned the cache first, you would use the sync command).



The entire mysql directory is only 150M. The accumulating buffers must represent metadata blocks from mysql writes, but it surprised me to think there would be so many metadata blocks for this data.






share|improve this answer

































    2














    Your version of free has the right idea. By default it combines buffers and cache in its report. This is because they are basically the same thing. They are both the computer remembering in RAM (Faster that secondary storage: Disks and SSD), what it has already seen when reading Disk and SSD.



    If the operating system feels that the memory is better used by something else then it can free it. Therefore don't worry about buffer and cache.



    However watching a DVD can cause buffer to go up, and evict other buffer/cache content. Therefore you may with to use nocache to run the DVD player (if it is causing a problem).






    share|improve this answer
























    • I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

      – sourcejedi
      Apr 28 '18 at 21:17













    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440558%2f30-of-ram-is-buffers-what-is-it%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    8















    1. What is the difference between "buffer", and the other cache?

    2. Why might we expect Buffers in particular to be larger or smaller?




    1. What is the difference between "buffer", and the other cache?



    In Linux 2.4 (released in 2001) and above, Buffers reports the amount of page cache used for block devices. The kernel has to deliberately subtract this amount from the rest of the page cache when it reports Cached. See meminfo_proc_show():



    cached = global_node_page_state(NR_FILE_PAGES) -
    total_swapcache_pages() - i.bufferram;
    ...

    show_val_kb(m, "MemTotal: ", i.totalram);
    show_val_kb(m, "MemFree: ", i.freeram);
    show_val_kb(m, "MemAvailable: ", available);
    show_val_kb(m, "Buffers: ", i.bufferram);
    show_val_kb(m, "Cached: ", cached);


    The page cache is tied to the MMU page size, typically a minimum of 4096 bytes. This is essential for mmap(), i.e. memory-mapped file access.[1][2] It is used to share pages of loaded program/library code between independent processes, and allow loading individual pages on demand. (Also for unloading pages when something else needs the space, and they haven't been used recently).



    [1] Memory-mapped I/O - The GNU C Library manual.

    [2] mmap - Wikipedia.



    UNIX started with a "buffer cache" of disk blocks, and no mmap(). Apparently when mmap() was first added, they simply bolted the page cache on top of the buffer cache. This is as messy as it sounds, so eventually all the UNIX-based OS's removed the buffer cache. Data is cached in units of pages. Pages are looked up by (file, offset), not by location on disk. This was called "unified buffer cache", perhaps because people were more familiar with "buffer cache".[3]



    [3] UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD



    In Linux 2.2 there was a separate "buffer cache" used for writes, but not for reads. "The page cache used the buffer cache to write back its data, needing an extra copy of the data, and doubling memory requirements for some write loads" (?).[4] Let's not worry too much about the details, but this history would be one reason why Linux reports Buffers usage separately.



    [4] Page replacement in Linux 2.4 memory management, Rik van Riel.



    In Linux 2.4 and above, there is no extra copy. "The system does disk IO directly to and from the page cache page."[4]



    ("One interesting twist that Linux adds is that the device block numbers where a page is stored on disk are cached with the page in the form of a list of buffer_head structures. When a modified page is to be written back to disk, the I/O requests can be sent to the device driver right away, without needing to read any indirect blocks to determine where the page's data should be written."[3])



    Block device files have page cache. This is used "for filesystem metadata and the caching of raw block devices".[4] But filesystems do not copy file contents through it, so there is no "double caching".



    I think of the Buffers part of the page cache as the Linux buffer cache. My use of this term is idiosyncratic, and conflicts with some cited sources.



    It varies between filesystem types, but ext3/ext4 do use the Linux buffer cache for filesystem metadata, including for directory contents and the journal. The system in the question uses ext4.




    Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data.



    ...



    The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.




    -- A pair of Quora answers by Robert Love, last updated 2013.




    Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
    jbd2 layer to handle their physical block journalling, and this layer
    fundamentally uses the buffer cache.




    -- Email article by Ted Tso, 2013



    The first source is already cited on StackExchange, as the most authoritative answer for this question. linux - What is the buffers column in the output from free?. The second source has more specific technical detail. Both writers are Linux developers who worked with Linux kernel memory management.



    It is true that filesystems may perform partial-page metadata writes, even though the cache is indexed in pages. Even user processes can perform partial-page writes when they use write() (as opposed to mmap()), at least directly to a block device.



    Linus likes to rant that the buffer cache is not required in order to do block-sized IO, and that filesystems can do partial-page metadata writes even when they attach page cache to virtual file(s) instead of the block device. I am sure he is right that ext2 does this. It is less clear what the difficulty was for ext4 and its journalling system; the people he was ranting at got tired of explaining.



    ext4_readdir() has not been changed to satisfy Linus' rant. I don't see the desired approach used in readdir() of other filesystems either. I think XFS uses the buffer cache for directories as well. bcachefs does not use the page cache for readdir(); it uses its own cache for btrees. I might be missing something in btrfs.



    2. Why might we expect Buffers in particular to be larger or smaller?



    As mentioned above, it varies depending on whether (and how much) the type of filesystem uses the buffer cache.



    It turns out the ext4 journal size for my filesystem is 128M. So this explains why 1) my buffer cache can stabilize at slightly over 128M; 2) buffer cache does not scale proportionally with the larger amount of RAM on my laptop.



    To verify that journal writes use the buffer cache, simulate a filesystem in nice fast RAM (tmpfs), and compare the maximum buffer usage for different journal sizes.



    # dd if=/dev/zero of=/tmp/t bs=1M count=1000
    ...
    # mkfs.ext4 /tmp/t -J size=256
    ...
    # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
    dumpe2fs 1.43.5 (04-Aug-2017)
    Journal size: 256M
    # mount /tmp/t /mnt
    # cd /mnt
    # free -w -m
    total used free shared buffers cache available
    Mem: 7855 2521 4321 285 66 947 5105
    Swap: 7995 0 7995

    # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
    # free -w -m
    total used free shared buffers cache available
    Mem: 7855 2523 3872 551 237 1223 4835
    Swap: 7995 0 7995




    # dd if=/dev/zero of=/tmp/t bs=1M count=1000
    ...
    # mkfs.ext4 /tmp/t -J size=16
    ...
    # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
    dumpe2fs 1.43.5 (04-Aug-2017)
    Journal size: 16M
    # mount /tmp/t /mnt
    # cd /mnt
    # free -w -m
    total used free shared buffers cache available
    Mem: 7855 2507 4337 285 66 943 5118
    Swap: 7995 0 7995

    # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
    # free -w -m
    total used free shared buffers cache available
    Mem: 7855 2509 4290 315 77 977 5086
    Swap: 7995 0 7995


    How I came to look at the journal



    I had found Ted Tso's email first, and was intrigued that it emphasized write caching. I would find it surprising if "dirty", unwritten data was able to reach 30% of RAM on my system. sudo atop shows that over a 10 second interval, the system in question consistently writes only 1MB. The filesystem concerned would be able to keep up with over 100 times this rate. (It's on a USB2 hard disk drive, max throughput ~20MB/s).



    Using blktrace (btrace -w 10 /dev/sda) confirms that the IOs which are being cached must be writes, because there is almost no data being read. Also that mysqld is the only userspace process doing IO.



    I stopped the service responsible for the writes (icinga2 writing to mysql) and re-checked. I saw "buffers" drop to under 20M - I have no explanation for that - and stay there. Restarting the writer again shows "buffers" rising by ~0.1M for each 10 second interval. I observed it maintain this rate consistently, climbing back to 70M and above.



    Running echo 3 | sudo tee /proc/sys/vm/drop_caches was sufficient to lower "buffers" again, to 4.5M. This proves that my accumulation of buffers is a "clean" cache, which Linux can drop immediately when required. This system is not accumulating unwritten data. (drop_caches does not perform any writeback and hence cannot drop dirty pages. If you wanted to run a test which cleaned the cache first, you would use the sync command).



    The entire mysql directory is only 150M. The accumulating buffers must represent metadata blocks from mysql writes, but it surprised me to think there would be so many metadata blocks for this data.






    share|improve this answer






























      8















      1. What is the difference between "buffer", and the other cache?

      2. Why might we expect Buffers in particular to be larger or smaller?




      1. What is the difference between "buffer", and the other cache?



      In Linux 2.4 (released in 2001) and above, Buffers reports the amount of page cache used for block devices. The kernel has to deliberately subtract this amount from the rest of the page cache when it reports Cached. See meminfo_proc_show():



      cached = global_node_page_state(NR_FILE_PAGES) -
      total_swapcache_pages() - i.bufferram;
      ...

      show_val_kb(m, "MemTotal: ", i.totalram);
      show_val_kb(m, "MemFree: ", i.freeram);
      show_val_kb(m, "MemAvailable: ", available);
      show_val_kb(m, "Buffers: ", i.bufferram);
      show_val_kb(m, "Cached: ", cached);


      The page cache is tied to the MMU page size, typically a minimum of 4096 bytes. This is essential for mmap(), i.e. memory-mapped file access.[1][2] It is used to share pages of loaded program/library code between independent processes, and allow loading individual pages on demand. (Also for unloading pages when something else needs the space, and they haven't been used recently).



      [1] Memory-mapped I/O - The GNU C Library manual.

      [2] mmap - Wikipedia.



      UNIX started with a "buffer cache" of disk blocks, and no mmap(). Apparently when mmap() was first added, they simply bolted the page cache on top of the buffer cache. This is as messy as it sounds, so eventually all the UNIX-based OS's removed the buffer cache. Data is cached in units of pages. Pages are looked up by (file, offset), not by location on disk. This was called "unified buffer cache", perhaps because people were more familiar with "buffer cache".[3]



      [3] UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD



      In Linux 2.2 there was a separate "buffer cache" used for writes, but not for reads. "The page cache used the buffer cache to write back its data, needing an extra copy of the data, and doubling memory requirements for some write loads" (?).[4] Let's not worry too much about the details, but this history would be one reason why Linux reports Buffers usage separately.



      [4] Page replacement in Linux 2.4 memory management, Rik van Riel.



      In Linux 2.4 and above, there is no extra copy. "The system does disk IO directly to and from the page cache page."[4]



      ("One interesting twist that Linux adds is that the device block numbers where a page is stored on disk are cached with the page in the form of a list of buffer_head structures. When a modified page is to be written back to disk, the I/O requests can be sent to the device driver right away, without needing to read any indirect blocks to determine where the page's data should be written."[3])



      Block device files have page cache. This is used "for filesystem metadata and the caching of raw block devices".[4] But filesystems do not copy file contents through it, so there is no "double caching".



      I think of the Buffers part of the page cache as the Linux buffer cache. My use of this term is idiosyncratic, and conflicts with some cited sources.



      It varies between filesystem types, but ext3/ext4 do use the Linux buffer cache for filesystem metadata, including for directory contents and the journal. The system in the question uses ext4.




      Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data.



      ...



      The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.




      -- A pair of Quora answers by Robert Love, last updated 2013.




      Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
      jbd2 layer to handle their physical block journalling, and this layer
      fundamentally uses the buffer cache.




      -- Email article by Ted Tso, 2013



      The first source is already cited on StackExchange, as the most authoritative answer for this question. linux - What is the buffers column in the output from free?. The second source has more specific technical detail. Both writers are Linux developers who worked with Linux kernel memory management.



      It is true that filesystems may perform partial-page metadata writes, even though the cache is indexed in pages. Even user processes can perform partial-page writes when they use write() (as opposed to mmap()), at least directly to a block device.



      Linus likes to rant that the buffer cache is not required in order to do block-sized IO, and that filesystems can do partial-page metadata writes even when they attach page cache to virtual file(s) instead of the block device. I am sure he is right that ext2 does this. It is less clear what the difficulty was for ext4 and its journalling system; the people he was ranting at got tired of explaining.



      ext4_readdir() has not been changed to satisfy Linus' rant. I don't see the desired approach used in readdir() of other filesystems either. I think XFS uses the buffer cache for directories as well. bcachefs does not use the page cache for readdir(); it uses its own cache for btrees. I might be missing something in btrfs.



      2. Why might we expect Buffers in particular to be larger or smaller?



      As mentioned above, it varies depending on whether (and how much) the type of filesystem uses the buffer cache.



      It turns out the ext4 journal size for my filesystem is 128M. So this explains why 1) my buffer cache can stabilize at slightly over 128M; 2) buffer cache does not scale proportionally with the larger amount of RAM on my laptop.



      To verify that journal writes use the buffer cache, simulate a filesystem in nice fast RAM (tmpfs), and compare the maximum buffer usage for different journal sizes.



      # dd if=/dev/zero of=/tmp/t bs=1M count=1000
      ...
      # mkfs.ext4 /tmp/t -J size=256
      ...
      # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
      dumpe2fs 1.43.5 (04-Aug-2017)
      Journal size: 256M
      # mount /tmp/t /mnt
      # cd /mnt
      # free -w -m
      total used free shared buffers cache available
      Mem: 7855 2521 4321 285 66 947 5105
      Swap: 7995 0 7995

      # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
      # free -w -m
      total used free shared buffers cache available
      Mem: 7855 2523 3872 551 237 1223 4835
      Swap: 7995 0 7995




      # dd if=/dev/zero of=/tmp/t bs=1M count=1000
      ...
      # mkfs.ext4 /tmp/t -J size=16
      ...
      # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
      dumpe2fs 1.43.5 (04-Aug-2017)
      Journal size: 16M
      # mount /tmp/t /mnt
      # cd /mnt
      # free -w -m
      total used free shared buffers cache available
      Mem: 7855 2507 4337 285 66 943 5118
      Swap: 7995 0 7995

      # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
      # free -w -m
      total used free shared buffers cache available
      Mem: 7855 2509 4290 315 77 977 5086
      Swap: 7995 0 7995


      How I came to look at the journal



      I had found Ted Tso's email first, and was intrigued that it emphasized write caching. I would find it surprising if "dirty", unwritten data was able to reach 30% of RAM on my system. sudo atop shows that over a 10 second interval, the system in question consistently writes only 1MB. The filesystem concerned would be able to keep up with over 100 times this rate. (It's on a USB2 hard disk drive, max throughput ~20MB/s).



      Using blktrace (btrace -w 10 /dev/sda) confirms that the IOs which are being cached must be writes, because there is almost no data being read. Also that mysqld is the only userspace process doing IO.



      I stopped the service responsible for the writes (icinga2 writing to mysql) and re-checked. I saw "buffers" drop to under 20M - I have no explanation for that - and stay there. Restarting the writer again shows "buffers" rising by ~0.1M for each 10 second interval. I observed it maintain this rate consistently, climbing back to 70M and above.



      Running echo 3 | sudo tee /proc/sys/vm/drop_caches was sufficient to lower "buffers" again, to 4.5M. This proves that my accumulation of buffers is a "clean" cache, which Linux can drop immediately when required. This system is not accumulating unwritten data. (drop_caches does not perform any writeback and hence cannot drop dirty pages. If you wanted to run a test which cleaned the cache first, you would use the sync command).



      The entire mysql directory is only 150M. The accumulating buffers must represent metadata blocks from mysql writes, but it surprised me to think there would be so many metadata blocks for this data.






      share|improve this answer




























        8












        8








        8








        1. What is the difference between "buffer", and the other cache?

        2. Why might we expect Buffers in particular to be larger or smaller?




        1. What is the difference between "buffer", and the other cache?



        In Linux 2.4 (released in 2001) and above, Buffers reports the amount of page cache used for block devices. The kernel has to deliberately subtract this amount from the rest of the page cache when it reports Cached. See meminfo_proc_show():



        cached = global_node_page_state(NR_FILE_PAGES) -
        total_swapcache_pages() - i.bufferram;
        ...

        show_val_kb(m, "MemTotal: ", i.totalram);
        show_val_kb(m, "MemFree: ", i.freeram);
        show_val_kb(m, "MemAvailable: ", available);
        show_val_kb(m, "Buffers: ", i.bufferram);
        show_val_kb(m, "Cached: ", cached);


        The page cache is tied to the MMU page size, typically a minimum of 4096 bytes. This is essential for mmap(), i.e. memory-mapped file access.[1][2] It is used to share pages of loaded program/library code between independent processes, and allow loading individual pages on demand. (Also for unloading pages when something else needs the space, and they haven't been used recently).



        [1] Memory-mapped I/O - The GNU C Library manual.

        [2] mmap - Wikipedia.



        UNIX started with a "buffer cache" of disk blocks, and no mmap(). Apparently when mmap() was first added, they simply bolted the page cache on top of the buffer cache. This is as messy as it sounds, so eventually all the UNIX-based OS's removed the buffer cache. Data is cached in units of pages. Pages are looked up by (file, offset), not by location on disk. This was called "unified buffer cache", perhaps because people were more familiar with "buffer cache".[3]



        [3] UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD



        In Linux 2.2 there was a separate "buffer cache" used for writes, but not for reads. "The page cache used the buffer cache to write back its data, needing an extra copy of the data, and doubling memory requirements for some write loads" (?).[4] Let's not worry too much about the details, but this history would be one reason why Linux reports Buffers usage separately.



        [4] Page replacement in Linux 2.4 memory management, Rik van Riel.



        In Linux 2.4 and above, there is no extra copy. "The system does disk IO directly to and from the page cache page."[4]



        ("One interesting twist that Linux adds is that the device block numbers where a page is stored on disk are cached with the page in the form of a list of buffer_head structures. When a modified page is to be written back to disk, the I/O requests can be sent to the device driver right away, without needing to read any indirect blocks to determine where the page's data should be written."[3])



        Block device files have page cache. This is used "for filesystem metadata and the caching of raw block devices".[4] But filesystems do not copy file contents through it, so there is no "double caching".



        I think of the Buffers part of the page cache as the Linux buffer cache. My use of this term is idiosyncratic, and conflicts with some cited sources.



        It varies between filesystem types, but ext3/ext4 do use the Linux buffer cache for filesystem metadata, including for directory contents and the journal. The system in the question uses ext4.




        Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data.



        ...



        The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.




        -- A pair of Quora answers by Robert Love, last updated 2013.




        Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
        jbd2 layer to handle their physical block journalling, and this layer
        fundamentally uses the buffer cache.




        -- Email article by Ted Tso, 2013



        The first source is already cited on StackExchange, as the most authoritative answer for this question. linux - What is the buffers column in the output from free?. The second source has more specific technical detail. Both writers are Linux developers who worked with Linux kernel memory management.



        It is true that filesystems may perform partial-page metadata writes, even though the cache is indexed in pages. Even user processes can perform partial-page writes when they use write() (as opposed to mmap()), at least directly to a block device.



        Linus likes to rant that the buffer cache is not required in order to do block-sized IO, and that filesystems can do partial-page metadata writes even when they attach page cache to virtual file(s) instead of the block device. I am sure he is right that ext2 does this. It is less clear what the difficulty was for ext4 and its journalling system; the people he was ranting at got tired of explaining.



        ext4_readdir() has not been changed to satisfy Linus' rant. I don't see the desired approach used in readdir() of other filesystems either. I think XFS uses the buffer cache for directories as well. bcachefs does not use the page cache for readdir(); it uses its own cache for btrees. I might be missing something in btrfs.



        2. Why might we expect Buffers in particular to be larger or smaller?



        As mentioned above, it varies depending on whether (and how much) the type of filesystem uses the buffer cache.



        It turns out the ext4 journal size for my filesystem is 128M. So this explains why 1) my buffer cache can stabilize at slightly over 128M; 2) buffer cache does not scale proportionally with the larger amount of RAM on my laptop.



        To verify that journal writes use the buffer cache, simulate a filesystem in nice fast RAM (tmpfs), and compare the maximum buffer usage for different journal sizes.



        # dd if=/dev/zero of=/tmp/t bs=1M count=1000
        ...
        # mkfs.ext4 /tmp/t -J size=256
        ...
        # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
        dumpe2fs 1.43.5 (04-Aug-2017)
        Journal size: 256M
        # mount /tmp/t /mnt
        # cd /mnt
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2521 4321 285 66 947 5105
        Swap: 7995 0 7995

        # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2523 3872 551 237 1223 4835
        Swap: 7995 0 7995




        # dd if=/dev/zero of=/tmp/t bs=1M count=1000
        ...
        # mkfs.ext4 /tmp/t -J size=16
        ...
        # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
        dumpe2fs 1.43.5 (04-Aug-2017)
        Journal size: 16M
        # mount /tmp/t /mnt
        # cd /mnt
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2507 4337 285 66 943 5118
        Swap: 7995 0 7995

        # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2509 4290 315 77 977 5086
        Swap: 7995 0 7995


        How I came to look at the journal



        I had found Ted Tso's email first, and was intrigued that it emphasized write caching. I would find it surprising if "dirty", unwritten data was able to reach 30% of RAM on my system. sudo atop shows that over a 10 second interval, the system in question consistently writes only 1MB. The filesystem concerned would be able to keep up with over 100 times this rate. (It's on a USB2 hard disk drive, max throughput ~20MB/s).



        Using blktrace (btrace -w 10 /dev/sda) confirms that the IOs which are being cached must be writes, because there is almost no data being read. Also that mysqld is the only userspace process doing IO.



        I stopped the service responsible for the writes (icinga2 writing to mysql) and re-checked. I saw "buffers" drop to under 20M - I have no explanation for that - and stay there. Restarting the writer again shows "buffers" rising by ~0.1M for each 10 second interval. I observed it maintain this rate consistently, climbing back to 70M and above.



        Running echo 3 | sudo tee /proc/sys/vm/drop_caches was sufficient to lower "buffers" again, to 4.5M. This proves that my accumulation of buffers is a "clean" cache, which Linux can drop immediately when required. This system is not accumulating unwritten data. (drop_caches does not perform any writeback and hence cannot drop dirty pages. If you wanted to run a test which cleaned the cache first, you would use the sync command).



        The entire mysql directory is only 150M. The accumulating buffers must represent metadata blocks from mysql writes, but it surprised me to think there would be so many metadata blocks for this data.






        share|improve this answer
















        1. What is the difference between "buffer", and the other cache?

        2. Why might we expect Buffers in particular to be larger or smaller?




        1. What is the difference between "buffer", and the other cache?



        In Linux 2.4 (released in 2001) and above, Buffers reports the amount of page cache used for block devices. The kernel has to deliberately subtract this amount from the rest of the page cache when it reports Cached. See meminfo_proc_show():



        cached = global_node_page_state(NR_FILE_PAGES) -
        total_swapcache_pages() - i.bufferram;
        ...

        show_val_kb(m, "MemTotal: ", i.totalram);
        show_val_kb(m, "MemFree: ", i.freeram);
        show_val_kb(m, "MemAvailable: ", available);
        show_val_kb(m, "Buffers: ", i.bufferram);
        show_val_kb(m, "Cached: ", cached);


        The page cache is tied to the MMU page size, typically a minimum of 4096 bytes. This is essential for mmap(), i.e. memory-mapped file access.[1][2] It is used to share pages of loaded program/library code between independent processes, and allow loading individual pages on demand. (Also for unloading pages when something else needs the space, and they haven't been used recently).



        [1] Memory-mapped I/O - The GNU C Library manual.

        [2] mmap - Wikipedia.



        UNIX started with a "buffer cache" of disk blocks, and no mmap(). Apparently when mmap() was first added, they simply bolted the page cache on top of the buffer cache. This is as messy as it sounds, so eventually all the UNIX-based OS's removed the buffer cache. Data is cached in units of pages. Pages are looked up by (file, offset), not by location on disk. This was called "unified buffer cache", perhaps because people were more familiar with "buffer cache".[3]



        [3] UBC: An Efficient Unified I/O and Memory Caching Subsystem for NetBSD



        In Linux 2.2 there was a separate "buffer cache" used for writes, but not for reads. "The page cache used the buffer cache to write back its data, needing an extra copy of the data, and doubling memory requirements for some write loads" (?).[4] Let's not worry too much about the details, but this history would be one reason why Linux reports Buffers usage separately.



        [4] Page replacement in Linux 2.4 memory management, Rik van Riel.



        In Linux 2.4 and above, there is no extra copy. "The system does disk IO directly to and from the page cache page."[4]



        ("One interesting twist that Linux adds is that the device block numbers where a page is stored on disk are cached with the page in the form of a list of buffer_head structures. When a modified page is to be written back to disk, the I/O requests can be sent to the device driver right away, without needing to read any indirect blocks to determine where the page's data should be written."[3])



        Block device files have page cache. This is used "for filesystem metadata and the caching of raw block devices".[4] But filesystems do not copy file contents through it, so there is no "double caching".



        I think of the Buffers part of the page cache as the Linux buffer cache. My use of this term is idiosyncratic, and conflicts with some cited sources.



        It varies between filesystem types, but ext3/ext4 do use the Linux buffer cache for filesystem metadata, including for directory contents and the journal. The system in the question uses ext4.




        Prior to Linux kernel version 2.4, Linux had separate page and buffer caches. Since 2.4, the page and buffer cache are unified and Buffers is raw disk blocks not represented in the page cache—i.e., not file data.



        ...



        The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.




        -- A pair of Quora answers by Robert Love, last updated 2013.




        Certain file systems, including ext3, ext4, and ocfs2, use the jbd or
        jbd2 layer to handle their physical block journalling, and this layer
        fundamentally uses the buffer cache.




        -- Email article by Ted Tso, 2013



        The first source is already cited on StackExchange, as the most authoritative answer for this question. linux - What is the buffers column in the output from free?. The second source has more specific technical detail. Both writers are Linux developers who worked with Linux kernel memory management.



        It is true that filesystems may perform partial-page metadata writes, even though the cache is indexed in pages. Even user processes can perform partial-page writes when they use write() (as opposed to mmap()), at least directly to a block device.



        Linus likes to rant that the buffer cache is not required in order to do block-sized IO, and that filesystems can do partial-page metadata writes even when they attach page cache to virtual file(s) instead of the block device. I am sure he is right that ext2 does this. It is less clear what the difficulty was for ext4 and its journalling system; the people he was ranting at got tired of explaining.



        ext4_readdir() has not been changed to satisfy Linus' rant. I don't see the desired approach used in readdir() of other filesystems either. I think XFS uses the buffer cache for directories as well. bcachefs does not use the page cache for readdir(); it uses its own cache for btrees. I might be missing something in btrfs.



        2. Why might we expect Buffers in particular to be larger or smaller?



        As mentioned above, it varies depending on whether (and how much) the type of filesystem uses the buffer cache.



        It turns out the ext4 journal size for my filesystem is 128M. So this explains why 1) my buffer cache can stabilize at slightly over 128M; 2) buffer cache does not scale proportionally with the larger amount of RAM on my laptop.



        To verify that journal writes use the buffer cache, simulate a filesystem in nice fast RAM (tmpfs), and compare the maximum buffer usage for different journal sizes.



        # dd if=/dev/zero of=/tmp/t bs=1M count=1000
        ...
        # mkfs.ext4 /tmp/t -J size=256
        ...
        # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
        dumpe2fs 1.43.5 (04-Aug-2017)
        Journal size: 256M
        # mount /tmp/t /mnt
        # cd /mnt
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2521 4321 285 66 947 5105
        Swap: 7995 0 7995

        # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2523 3872 551 237 1223 4835
        Swap: 7995 0 7995




        # dd if=/dev/zero of=/tmp/t bs=1M count=1000
        ...
        # mkfs.ext4 /tmp/t -J size=16
        ...
        # LANG=C dumpe2fs /tmp/t | grep '^Journal size'
        dumpe2fs 1.43.5 (04-Aug-2017)
        Journal size: 16M
        # mount /tmp/t /mnt
        # cd /mnt
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2507 4337 285 66 943 5118
        Swap: 7995 0 7995

        # for i in $(seq 40000); do dd if=/dev/zero of=t bs=1k count=1 conv=sync status=none; sync t; sync -f t; done
        # free -w -m
        total used free shared buffers cache available
        Mem: 7855 2509 4290 315 77 977 5086
        Swap: 7995 0 7995


        How I came to look at the journal



        I had found Ted Tso's email first, and was intrigued that it emphasized write caching. I would find it surprising if "dirty", unwritten data was able to reach 30% of RAM on my system. sudo atop shows that over a 10 second interval, the system in question consistently writes only 1MB. The filesystem concerned would be able to keep up with over 100 times this rate. (It's on a USB2 hard disk drive, max throughput ~20MB/s).



        Using blktrace (btrace -w 10 /dev/sda) confirms that the IOs which are being cached must be writes, because there is almost no data being read. Also that mysqld is the only userspace process doing IO.



        I stopped the service responsible for the writes (icinga2 writing to mysql) and re-checked. I saw "buffers" drop to under 20M - I have no explanation for that - and stay there. Restarting the writer again shows "buffers" rising by ~0.1M for each 10 second interval. I observed it maintain this rate consistently, climbing back to 70M and above.



        Running echo 3 | sudo tee /proc/sys/vm/drop_caches was sufficient to lower "buffers" again, to 4.5M. This proves that my accumulation of buffers is a "clean" cache, which Linux can drop immediately when required. This system is not accumulating unwritten data. (drop_caches does not perform any writeback and hence cannot drop dirty pages. If you wanted to run a test which cleaned the cache first, you would use the sync command).



        The entire mysql directory is only 150M. The accumulating buffers must represent metadata blocks from mysql writes, but it surprised me to think there would be so many metadata blocks for this data.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 7 mins ago

























        answered Apr 28 '18 at 10:40









        sourcejedisourcejedi

        24.3k440107




        24.3k440107

























            2














            Your version of free has the right idea. By default it combines buffers and cache in its report. This is because they are basically the same thing. They are both the computer remembering in RAM (Faster that secondary storage: Disks and SSD), what it has already seen when reading Disk and SSD.



            If the operating system feels that the memory is better used by something else then it can free it. Therefore don't worry about buffer and cache.



            However watching a DVD can cause buffer to go up, and evict other buffer/cache content. Therefore you may with to use nocache to run the DVD player (if it is causing a problem).






            share|improve this answer
























            • I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

              – sourcejedi
              Apr 28 '18 at 21:17


















            2














            Your version of free has the right idea. By default it combines buffers and cache in its report. This is because they are basically the same thing. They are both the computer remembering in RAM (Faster that secondary storage: Disks and SSD), what it has already seen when reading Disk and SSD.



            If the operating system feels that the memory is better used by something else then it can free it. Therefore don't worry about buffer and cache.



            However watching a DVD can cause buffer to go up, and evict other buffer/cache content. Therefore you may with to use nocache to run the DVD player (if it is causing a problem).






            share|improve this answer
























            • I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

              – sourcejedi
              Apr 28 '18 at 21:17
















            2












            2








            2







            Your version of free has the right idea. By default it combines buffers and cache in its report. This is because they are basically the same thing. They are both the computer remembering in RAM (Faster that secondary storage: Disks and SSD), what it has already seen when reading Disk and SSD.



            If the operating system feels that the memory is better used by something else then it can free it. Therefore don't worry about buffer and cache.



            However watching a DVD can cause buffer to go up, and evict other buffer/cache content. Therefore you may with to use nocache to run the DVD player (if it is causing a problem).






            share|improve this answer













            Your version of free has the right idea. By default it combines buffers and cache in its report. This is because they are basically the same thing. They are both the computer remembering in RAM (Faster that secondary storage: Disks and SSD), what it has already seen when reading Disk and SSD.



            If the operating system feels that the memory is better used by something else then it can free it. Therefore don't worry about buffer and cache.



            However watching a DVD can cause buffer to go up, and evict other buffer/cache content. Therefore you may with to use nocache to run the DVD player (if it is causing a problem).







            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Apr 28 '18 at 12:54









            ctrl-alt-delorctrl-alt-delor

            11.5k42159




            11.5k42159













            • I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

              – sourcejedi
              Apr 28 '18 at 21:17





















            • I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

              – sourcejedi
              Apr 28 '18 at 21:17



















            I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

            – sourcejedi
            Apr 28 '18 at 21:17







            I absolutely agree with your points :). I don't have any (known) problem with this system. I am only curious, after noticing that this system was behaving differently to what I see and remembered seeing on other systems. Suggestions to improve the question are welcome.

            – sourcejedi
            Apr 28 '18 at 21:17




















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f440558%2f30-of-ram-is-buffers-what-is-it%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Loup dans la culture

            How to solve the problem of ntp “Unable to contact time server” from KDE?

            ASUS Zenbook UX433/UX333 — Configure Touchpad-embedded numpad on Linux