qsv

qsv

多功能高性能CSV数据处理工具集

qsv是一款高效的CSV数据处理工具,提供多样化的命令用于数据操作。主要功能包括查询、切片、索引、分析、过滤和转换CSV文件。它支持应用转换、日期格式化、去重、差异比较等高级操作,还能进行Web服务获取和地理编码。qsv内置Luau脚本引擎,可构建复杂的数据处理流程。其优化设计使其在处理大规模CSV数据时表现出色,兼具高性能和灵活性。

qsvCSV数据处理命令行工具RustGithub开源项目

qsv: Blazing-fast CSV data-wrangling toolkit

Linux build status Windows build status macOS build status Security audit Codacy Badge Clones Discussions Crates.io Crates.io downloads Prebuilt Downloads Minimum supported Rust version FOSSA Status

<div align="center">
 Table of Contents
qsv logo<br/>Hi-ho "Quicksilver" away!<br/><sub><sup>logo details</sup></sub><br/>qsv is a command line program for querying, slicing,<br>indexing, analyzing, filtering, enriching, transforming,<br>sorting, validating, joining & converting CSV files.<br>Commands are simple, composable & "blazing fast".<br><br>* Commands<br>* Installation Options<br> * Whirlwind Tour / Notebooks / Lessons & Exercises<br>* Cookbook<br>* FAQ<br>* Performance Tuning<br>* 👉 Benchmarks 🚀<br>* Environment Variables<br>* Feature Flags<br>* Goals/Non-goals<br>* Testing<br>* NYC School of Data 2022/csv,conf,v8 slides<br>* Sponsor
</div> <div align="center">

Try it out at qsv.dathere.com! <!-- markdownlint-disable-line -->

</div>
<a name="available-commands">CommandDescription
apply<br>✨🚀🧠🤖🔣👆Apply series of string, date, math & currency transformations to given CSV column/s. It also has some basic NLP functions (similarity, sentiment analysis, profanity, eudex, language & name gender) detection.
<a name="applydp_deeplink"></a>applydp<br>🚀🔣👆 CKANapplydp is a slimmed-down version of apply with only Datapusher+ relevant subcommands/operations (qsvdp binary variant only).
beheadDrop headers from a CSV.
cat<br>🗄️Concatenate CSV files by row or by column.
clipboardProvide input from the clipboard or save output to the clipboard.
count<br>📇🏎️🐻‍❄️Count the rows in a CSV file. (11.87 seconds for a 15gb, 27m row NYC 311 dataset without an index. Instantaneous with an index.) If the polars feature is enabled, uses Polars' multithreaded, mem-mapped CSV reader for fast counts even without an index
datefmt<br>🚀👆Formats recognized date fields (19 formats recognized) to a specified date format using strftime date format specifiers.
dedup<br>🤯🚀👆Remove duplicate rows (See also extdedup, extsort, sort & sortcheck commands).
describegpt<br>🌐🤖🪄Infer extended metadata about a CSV using a GPT model from OpenAI's API or an LLM from another API compatible with the OpenAI API specification such as Ollama or Jan.
diff<br>🚀Find the difference between two CSVs with ludicrous speed!<br/>e.g. compare two CSVs with 1M rows x 9 columns in under 600ms!
enum<br>👆Add a new column enumerating rows by adding a column of incremental or uuid identifiers. Can also be used to copy a column or fill a new column with a constant value.
excel<br>🚀Exports a specified Excel/ODS sheet to a CSV file.
exclude<br>📇👆Removes a set of CSV data from another set based on the specified columns.
explode<br>🔣👆Explode rows into multiple ones by splitting a column value based on the given separator.
extdedup<br>Remove duplicate rows from an arbitrarily large CSV/text file using a memory-mapped, on-disk hash table. Unlike the dedup command, this command does not load the entire file into memory nor does it sort the deduped file.
extsort<br>🚀Sort an arbitrarily large CSV/text file using a multithreaded external merge sort algorithm.
fetch<br>✨🧠🌐Fetches data from web services for every row using HTTP Get. Comes with HTTP/2 adaptive flow control, jql JSON query language support, dynamic throttling (RateLimit) & caching with available persistent caching using Redis or a disk-cache.
fetchpost<br>✨🧠🌐Similar to fetch, but uses HTTP Post. (HTTP GET vs POST methods)
fill<br>👆Fill empty values.
fixlengthsForce a CSV to have same-length records by either padding or truncating them.
flattenA flattened view of CSV records. Useful for viewing one record at a time.<br />e.g. qsv slice -i 5 data.csv | qsv flatten.
fmtReformat a CSV with different delimiters, record terminators or quoting rules. (Supports ASCII delimited data.)
frequency<br>📇😣🏎️👆🪄Build frequency tables of each column. Uses multithreading to go faster if an index is present.
geocode<br>✨🧠🌐🚀🔣👆Geocodes a location against an updatable local copy of the Geonames cities database. With caching and multi-threading, it geocodes up to 360,000 records/sec!
headers<br>🗄️Show the headers of a CSV. Or show the intersection of all headers between many CSV files.
indexCreate an index (📇) for a CSV. This is very quick (even the 15gb, 28m row NYC 311 dataset takes all of 14 seconds to index) & provides constant time indexing/random access into the CSV. With an index, count, sample & slice work instantaneously; random access mode is enabled in luau; and multithreading (🏎️) is enabled for the frequency, split, stats, schema & tojsonl commands.
inputRead CSV data with special commenting, quoting, trimming, line-skipping & non-UTF8 encoding handling rules. Typically used to "normalize" a CSV for further processing with other qsv commands.
join<br>👆Inner, outer, right, cross, anti & semi joins. Automatically creates a simple, in-memory hash index to make it fast.
joinp<br>✨🚀🐻‍❄️Inner, outer, right, cross, anti, semi & asof joins using the Pola.rs engine. Unlike the join command, joinp can process files larger than RAM, is multithreaded, has join key validation, pre-join filtering, supports asof joins (which is particularly useful for time series data) & its output columns can be coalesced. However, joinp doesn't have an --ignore-case option.
json<br>👆Convert JSON to CSV.
jsonl<br>🚀🔣Convert newline-delimited JSON (JSONL/NDJSON) to CSV. See tojsonl command to convert CSV to JSONL.
<a name="luau_deeplink"></a><br>luau 👑<br>✨📇🌐🔣 CKANCreate multiple new computed columns, filter rows, compute aggregations and build complex data pipelines by executing a Luau 0.635 expression/script for every row of a CSV file (sequential mode), or using random access with an index (random access mode).<br>Can process a single Luau expression or full-fledged data-wrangling scripts using lookup tables with discrete BEGIN, MAIN and END sections.<br> It is not just another qsv command, it is qsv's Domain-specific Language (DSL) with numerous qsv-specific helper functions to build production data pipelines.
partition<br>👆Partition a CSV based on a column value.
promptOpen a file dialog to either pick a file as input or save output to a file.
pseudo<br>🔣👆Pseudonymise the value of the given column by replacing them with an incremental identifier.
py<br>✨🔣Create a new computed column or filter rows by evaluating a python expression on every row of a CSV file. Python's f-strings is particularly useful for extended formatting, [with the ability to evaluate Python expressions as

编辑推荐精选

QoderWork

QoderWork

阿里Qoder团队推出的桌面端AI智能体

QoderWork 是阿里推出的本地优先桌面 AI 智能体,适配 macOS14+/Windows10+,以自然语言交互实现文件管理、数据分析、AI 视觉生成、浏览器自动化等办公任务,自主拆解执行复杂工作流,数据本地运行零上传,技能市场可无限扩展,是高效的 Agentic 生产力办公助手。

音述AI

音述AI

全球首个AI音乐社区

音述AI是全球首个AI音乐社区,致力让每个人都能用音乐表达自我。音述AI提供零门槛AI创作工具,独创GETI法则帮助用户精准定义音乐风格,AI润色功能支持自动优化作品质感。音述AI支持交流讨论、二次创作与价值变现。针对中文用户的语言习惯与文化背景进行专门优化,支持国风融合、C-pop等本土音乐标签,让技术更好地承载人文表达。

lynote.ai

lynote.ai

一站式搞定所有学习需求

不再被海量信息淹没,开始真正理解知识。Lynote 可摘要 YouTube 视频、PDF、文章等内容。即时创建笔记,检测 AI 内容并下载资料,将您的学习效率提升 10 倍。

AniShort

AniShort

为AI短剧协作而生

专为AI短剧协作而生的AniShort正式发布,深度重构AI短剧全流程生产模式,整合创意策划、制作执行、实时协作、在线审片、资产复用等全链路功能,独创无限画布、双轨并行工业化工作流与Ani智能体助手,集成多款主流AI大模型,破解素材零散、版本混乱、沟通低效等行业痛点,助力3人团队效率提升800%,打造标准化、可追溯的AI短剧量产体系,是AI短剧团队协同创作、提升制作效率的核心工具。

seedancetwo2.0

seedancetwo2.0

能听懂你表达的视频模型

Seedance two是基于seedance2.0的中国大模型,支持图像、视频、音频、文本四种模态输入,表达方式更丰富,生成也更可控。

nano-banana纳米香蕉中文站

nano-banana纳米香蕉中文站

国内直接访问,限时3折

输入简单文字,生成想要的图片,纳米香蕉中文站基于 Google 模型的 AI 图片生成网站,支持文字生图、图生图。官网价格限时3折活动

扣子-AI办公

扣子-AI办公

职场AI,就用扣子

AI办公助手,复杂任务高效处理。办公效率低?扣子空间AI助手支持播客生成、PPT制作、网页开发及报告写作,覆盖科研、商业、舆情等领域的专家Agent 7x24小时响应,生活工作无缝切换,提升50%效率!

堆友

堆友

多风格AI绘画神器

堆友平台由阿里巴巴设计团队创建,作为一款AI驱动的设计工具,专为设计师提供一站式增长服务。功能覆盖海量3D素材、AI绘画、实时渲染以及专业抠图,显著提升设计品质和效率。平台不仅提供工具,还是一个促进创意交流和个人发展的空间,界面友好,适合所有级别的设计师和创意工作者。

图像生成AI工具AI反应堆AI工具箱AI绘画GOAI艺术字堆友相机AI图像热门
码上飞

码上飞

零代码AI应用开发平台

零代码AI应用开发平台,用户只需一句话简单描述需求,AI能自动生成小程序、APP或H5网页应用,无需编写代码。

Vora

Vora

免费创建高清无水印Sora视频

Vora是一个免费创建高清无水印Sora视频的AI工具

下拉加载更多