
高压缩比且快速读取的只读文件系统
DwarFS是一款专注于实现高压缩比的只读文件系统,尤其适合处理冗余数据。该系统在保持高速读取的同时,提供了优于SquashFS等压缩文件系统的压缩效果。DwarFS的特色功能包括文件相似度聚类、跨块分段分析和文件分类框架,可充分利用多核系统资源。支持Linux和Windows平台,适用于需要高压缩率和快速访问的应用场景。
The Deduplicating Warp-speed Advanced Read-only File System.
A fast high compression read-only file system for Linux and Windows.


DwarFS is a read-only file system with a focus on achieving very high compression ratios in particular for very redundant data.
This probably doesn't sound very exciting, because if it's redundant, it should compress well. However, I found that other read-only, compressed file systems don't do a very good job at making use of this redundancy. See here for a comparison with other compressed file systems.
DwarFS also doesn't compromise on speed and for my use cases I've found it to be on par with or perform better than SquashFS. For my primary use case, DwarFS compression is an order of magnitude better than SquashFS compression, it's 6 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources.
To give you an idea of what DwarFS is capable of, here's a quick comparison of DwarFS and SquashFS on a set of video files with a total size of 39 GiB. The twist is that each unique video file has two sibling files with a different set of audio streams (this is an actual use case). So there's redundancy in both the video and audio data, but as the streams are interleaved and identical blocks are typically very far apart, it's challenging to make use of that redundancy for compression. SquashFS essentially fails to compress the source data at all, whereas DwarFS is able to reduce the size by almost a factor of 3, which is close to the theoretical maximum:
$ du -hs dwarfs-video-test
39G dwarfs-video-test
$ ls -lh dwarfs-video-test.*fs
-rw-r--r-- 1 mhx users 14G Jul 2 13:01 dwarfs-video-test.dwarfs
-rw-r--r-- 1 mhx users 39G Jul 12 09:41 dwarfs-video-test.squashfs
Furthermore, when mounting the SquashFS image and performing a random-read
throughput test using fio-3.34, both
squashfuse and squashfuse_ll top out at around 230 MiB/s:
$ fio --readonly --rw=randread --name=randread --bs=64k --direct=1 \
--opendir=mnt --numjobs=4 --ioengine=libaio --iodepth=32 \
--group_reporting --runtime=60 --time_based
[...]
READ: bw=230MiB/s (241MB/s), 230MiB/s-230MiB/s (241MB/s-241MB/s), io=13.5GiB (14.5GB), run=60004-60004msec
In comparison, DwarFS manages to sustain random read rates of 20 GiB/s:
READ: bw=20.2GiB/s (21.7GB/s), 20.2GiB/s-20.2GiB/s (21.7GB/s-21.7GB/s), io=1212GiB (1301GB), run=60001-60001msec
Distinct features of DwarFS are:
Clustering of files by similarity using a similarity hash function. This makes it easier to exploit the redundancy across file boundaries.
Segmentation analysis across file system blocks in order to reduce the size of the uncompressed file system. This saves memory when using the compressed file system and thus potentially allows for higher cache hit rates as more data can be kept in the cache.
Categorization framework to categorize files or even fragments of files and then process individual categories differently. For example, this allows you to not waste time trying to compress incompressible files or to compress PCM audio data using FLAC compression.
Highly multi-threaded implementation. Both the file system creation tool as well as the FUSE driver are able to make good use of the many cores of your system.
I started working on DwarFS in 2013 and my main use case and major motivation was that I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them.
Up until then, I had been using Cromfs for squeezing them into a manageable size. However, I was getting more and more annoyed by the time it took to build the filesystem image and, to make things worse, more often than not it was crashing after about an hour or so.
I had obviously also looked into SquashFS, but never got anywhere close to the compression rates of Cromfs.
This alone wouldn't have been enough to get me into writing DwarFS, but at around the same time, I was pretty obsessed with the recent developments and features of newer C++ standards and really wanted a C++ hobby project to work on. Also, I've wanted to do something with FUSE for quite some time. Last but not least, I had been thinking about the problem of compressed file systems for a bit and had some ideas that I definitely wanted to try.
The majority of the code was written in 2013, then I did a couple of cleanups, bugfixes and refactors every once in a while, but I never really got it to a state where I would feel happy releasing it. It was too awkward to build with its dependency on Facebook's (quite awesome) folly library and it didn't have any documentation.
Digging out the project again this year, things didn't look as grim as they used to. Folly now builds with CMake and so I just pulled it in as a submodule. Most other dependencies can be satisfied from packages that should be widely available. And I've written some rudimentary docs as well.
DwarFS should usually build fine with minimal changes out of the box.
If it doesn't, please file a issue. I've set up
CI jobs
using Docker images for Ubuntu (22.04
and 24.04),
Fedora Rawhide
and Arch
that can help with determining an up-to-date set of dependencies.
Note that building from the release tarball requires less dependencies
than building from the git repository, notably the ronn tool as well
as Python and the mistletoe Python module are not required when
building from the release tarball.
There are some things to be aware of:
There's a tendency to try and unbundle the folly
and fbthrift libraries that
are included as submodules and are built along with DwarFS.
While I agree with the sentiment, it's unfortunately a bad idea.
Besides the fact that folly does not make any claims about ABI
stability (i.e. you can't just dynamically link a binary built
against one version of folly against another version), it's not
even possible to safely link against a folly library built with
different compile options. Even subtle differences, such as the
C++ standard version, can cause run-time errors.
See this issue
for details. Currently, it is not even possible to use external
versions of folly/fbthrift as DwarFS is building minimal subsets of
both libraries; these are bundled in the dwarfs_common library
and they are strictly used internally, i.e. none of the folly or
fbthrift headers are required to build against DwarFS' libraries.
Similar issues can arise when using a system-installed version
of GoogleTest. GoogleTest itself recommends that it is being
downloaded as part of the build. However, you can use the system
installed version by passing -DPREFER_SYSTEM_GTEST=ON to the
cmake call. Use at your own risk.
For other bundled libraries (namely fmt, parallel-hashmap,
range-v3), the system installed version is used as long as it
meets the minimum required version. Otherwise, the preferred
version is fetched during the build.
Each release has pre-built,
statically linked binaries for Linux-x86_64, Linux-aarch64 and
Windows-AMD64 available for download. These should run without
any dependencies and can be useful especially on older distributions
where you can't easily build the tools from source.
In addition to the binary tarballs, there's a universal binary
available for each architecture. These universal binaries contain
all tools (mkdwarfs, dwarfsck, dwarfsextract and the dwarfs
FUSE driver) in a single executable. These executables are compressed
using upx, so they are much smaller than
the individual tools combined. However, it also means the binaries need
to be decompressed each time they are run, which can have a signficant
overhead. If that is an issue, you can either stick to the "classic"
individual binaries or you can decompress the universal binary, e.g.:
upx -d dwarfs-universal-0.7.0-Linux-aarch64
The universal binaries can be run through symbolic links named after the proper tool. e.g.:
$ ln -s dwarfs-universal-0.7.0-Linux-aarch64 mkdwarfs
$ ./mkdwarfs --help
This also works on Windows if the file system supports symbolic links:
> mklink mkdwarfs.exe dwarfs-universal-0.7.0-Windows-AMD64.exe
> .\mkdwarfs.exe --help
Alternatively, you can select the tool by passing --tool=<name> as
the first argument on the command line:
> .\dwarfs-universal-0.7.0-Windows-AMD64.exe --tool=mkdwarfs --help
Note that just like the dwarfs.exe Windows binary, the universal
Windows binary depends on the winfsp-x64.dll from the
WinFsp project. However, for the
universal binary, the DLL is loaded lazily, so you can still use all
other tools without the DLL.
See the Windows Support section for more details.
DwarFS uses CMake as a build tool.
It uses both Boost and Folly, though the latter is included as a submodule since very few distributions actually offer packages for it. Folly itself has a number of dependencies, so please check here for an up-to-date list.
It also uses Facebook Thrift,
in particular the frozen library, for storing metadata in a highly
space-efficient, memory-mappable and well defined format. It's also
included as a submodule, and we only build the compiler and a very
reduced library that contains just enough for DwarFS to work.
Other than that, DwarFS really only depends on FUSE3 and on a set of compression libraries that Folly already depends on (namely lz4, zstd and liblzma).
The dependency on googletest will be automatically resolved if you build with tests.
A good starting point for apt-based systems is probably:
$ apt install \
gcc \
g++ \
clang \
git \
ccache \
ninja-build \
cmake \
make \
bison \
flex \
fuse3 \
pkg-config \
binutils-dev \
libacl1-dev \
libarchive-dev \
libbenchmark-dev \
libboost-chrono-dev \
libboost-context-dev \
libboost-filesystem-dev \
libboost-iostreams-dev \
libboost-program-options-dev \
libboost-regex-dev \
libboost-system-dev \
libboost-thread-dev \
libbrotli-dev \
libevent-dev \
libhowardhinnant-date-dev \
libjemalloc-dev \
libdouble-conversion-dev \
libiberty-dev \
liblz4-dev \
liblzma-dev \
libzstd-dev \
libxxhash-dev \
libmagic-dev \
libparallel-hashmap-dev \
librange-v3-dev \
libssl-dev \
libunwind-dev \
libdwarf-dev \
libelf-dev \
libfmt-dev \
libfuse3-dev \
libgoogle-glog-dev \
libutfcpp-dev \
libflac++-dev \
nlohmann-json3-dev
Note that when building with gcc, the optimization level will be
set to -O2 instead of the CMake default of -O3 for release
builds. At least with versions up to gcc-10, the -O3 build is
up to 70% slower than a
build with


免费创建高清无水印Sora视频
Vora是一个免费创建高清无水印Sora视频的AI工具


最适合小白的AI自动化工作流平台
无需编码,轻松生成可复用、可变现的AI自动化工作流

大模型驱动的Excel数据处理工具
基于大模型交互的表格处理系统,允许用户通过对话方式完成数据整理和可视化分析。系统采用机器学习算法解析用户指令,自动执行排序、公式计算和数据透视等操作,支持多种文件格式导入导出。数据处理响应速度保持在0.8秒以内,支持超过100万行数据的即时分析。


AI辅助编程,代码自动修复
Trae是一种自适应的集成开发环境(IDE),通过自动化和多元协作改变开发流程。利用Trae,团队能够更快速、精确地编写和部署代码,从而提高编程效率和项目交付速度。Trae具备上下文感知和代码自动完成功能,是提升开发效率的理想工具。


AI论文写作指导平台
AIWritePaper论文写作是一站式AI论文写作辅助工具,简化了选题、文献检索至论文撰写的整个过程。通过简单设定,平台可快速生成高质量论文大纲和全文,配合图表、参考文献等一应俱全,同时提供开题报告和答辩PPT等增值服务,保障数据安全,有效提升写作效率和论文质量。


AI一键生成PPT,就用博思AIPPT!
博思AIPPT,新一代的AI生成PPT平台,支持智能生成PPT、AI美化PPT、文本&链接生成PPT、导入Word/PDF/Markdown文档生成PPT等,内置海量精美PPT模板,涵盖商务、教育、科技等不同风格,同时针对每个页面提供多种版式,一键自适应切换,完美适配各种办公场景。


AI赋能电商视觉革命,一站式智能商拍平台
潮际好麦深耕服装行业,是国内AI试衣效果最好的软件。使用先进AIGC能力为电商卖家批量提供优质的、低成本的商拍图。合作品牌有Shein、Lazada、安踏、百丽等65个国内外头部品牌,以及国内10万+淘宝、天猫、京东等主流平台的品牌商家,为卖家节省将近85%的出图成本,提升约3倍出图效率,让品牌能够快速上架。


企业专属的AI法律顾问
iTerms是法大大集团旗下法律子品牌,基于最先进的大语言模型(LLM)、专业的法律知识库和强大的智能体架构,帮助企业扫清合规障碍,筑牢风控防线,成为您企业专属的AI法律顾问。


稳定高效的流量提升解决方案,助力品牌曝光
稳定高效的流量提升解决方案,助力品牌曝光


最新版Sora2模型免费使用,一键生成无水印视频
最新版Sora2模型免费使用,一键生成无水印视频