使用 QuickJS 进行数据处理
QuickJS 是一个用 C 编写的嵌入式 JS 引擎。它内置支持读取和写入存储在内存中的文件数据。
¥QuickJS is an embeddable JS engine written in C. It has built-in support for reading and writing file data stored in memory.
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
该演示使用 QuickJS 和 SheetJS 从电子表格中提取数据并打印 CSV 行。我们将探讨如何在 QuickJS 上下文中加载 SheetJS 并处理来自 C 程序的电子表格。
¥This demo uses QuickJS and SheetJS to pull data from a spreadsheet and print CSV rows. We'll explore how to load SheetJS in a QuickJS context and process spreadsheets from C programs.
"集成示例" 部分包括一个完整的命令行工具,用于从文件中读取数据。
¥The "Integration Example" section includes a complete command-line tool for reading data from files.
集成详情
¥Integration Details
许多 QuickJS 函数没有文档记录。该解释已根据最新版本(提交 d378a9f
)进行验证。
¥Many QuickJS functions are not documented. The explanation was verified against
the latest release (commit d378a9f
).
初始化 QuickJS
¥Initialize QuickJS
大多数 QuickJS API 函数与 JSContext
对象 [^1] 交互,该对象通常使用 JS_NewRuntime
和 JS_NewContext
创建:
¥Most QuickJS API functions interact with a JSContext
object[^1], which is
normally created with JS_NewRuntime
and JS_NewContext
:
#include "quickjs.h"
/* initialize context */
JSRuntime *rt = JS_NewRuntime();
JSContext *ctx = JS_NewContext(rt);
QuickJS 通过 JS_GetGlobalObject
提供了 global
对象:
¥QuickJS provides a global
object through JS_GetGlobalObject
:
/* obtain reference to global object */
JSValue global = JS_GetGlobalObject(ctx);
Cleanup (click to show)
Once finished, programs are expected to cleanup by using JS_FreeValue
to free
values, JS_FreeContext
to free the context pointer, and JS_FreeRuntime
to
free the runtime:
/* global is a JSValue */
JS_FreeValue(ctx, global);
/* cleanup */
JS_FreeContext(ctx);
JS_FreeRuntime(rt);
The Integration Example frees JS values after use.
加载 SheetJS 脚本
¥Load SheetJS Scripts
SheetJS 独立脚本 可以在 QuickJS 中加载并执行。
¥SheetJS Standalone scripts can be loaded and executed in QuickJS.
可以通过从文件系统读取脚本并使用 JS_Eval
在 QuickJS 上下文中进行评估来加载主库:
¥The main library can be loaded by reading the script from the file system and
evaluating in the QuickJS context using JS_Eval
:
static char *read_file(const char *filename, size_t *sz) {
FILE *f = fopen(filename, "rb");
if(!f) return NULL;
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
*sz = fread((void *) buf, 1, fsize, f);
fclose(f);
return buf;
}
// ...
{
/* Read `xlsx.full.min.js` from the filesystem */
size_t len; char *buf = read_file("xlsx.full.min.js", &len);
/* evaluate from the QuickJS context */
JS_Eval(ctx, buf, len, "<input>", 0);
/* Free the file buffer */
free(buf);
}
如果加载了库,XLSX.version
将是一个字符串。可以将该字符串拉入主 C 程序中。
¥If the library is loaded, XLSX.version
will be a string. This string can be
pulled into the main C program.
-
使用
JS_GetPropertyStr
获取全局对象的XLSX
属性:¥Get the
XLSX
property of the global object usingJS_GetPropertyStr
:
/* obtain reference to the XLSX object */
JSValue XLSX = JS_GetPropertyStr(ctx, global, "XLSX");
-
使用
JS_GetPropertyStr
获取XLSX
对象的version
属性:¥Get the
version
property of theXLSX
object usingJS_GetPropertyStr
:
/* obtain reference to `XLSX.version` */
JSValue version = JS_GetPropertyStr(ctx, XLSX, "version");
-
使用
JS_ToCStringLen
将字符串拉入 C 代码:¥Pull the string into C code with
JS_ToCStringLen
:
/* pull the version string into C */
size_t vlen; const char *vers = JS_ToCStringLen(ctx, &vlen, version);
printf("Version: %s\n", vers);
读取文件
¥Reading Files
JS_NewArrayBuffer
可以从 C 字节数组生成 ArrayBuffer
。函数签名需要 uint8_t *
而不是 char *
:
¥JS_NewArrayBuffer
can generate an ArrayBuffer
from a C byte array. The
function signature expects uint8_t *
instead of char *
:
/* read file */
size_t dlen; uint8_t * dbuf = (uint8_t *)read_file("pres.numbers", &dlen);
/* load data into array buffer */
JSValue ab = JS_NewArrayBuffer(ctx, dbuf, dlen, NULL, NULL, 0);
ArrayBuffer
将使用 SheetJS read
方法 [^2] 进行解析。CSV 行数据将使用 sheet_to_csv
[^3] 生成。
¥The ArrayBuffer
will be parsed with the SheetJS read
method[^2]. The CSV row
data will be generated with sheet_to_csv
[^3].
解析 ArrayBuffer
¥Parse the ArrayBuffer
目标是运行与以下 JavaScript 代码等效的内容:
¥The goal is to run the equivalent of the following JavaScript code:
/* `ab` is the `ArrayBuffer` from the previous step */
var wb = XLSX.read(ab);
-
获取全局对象的
XLSX
属性和XLSX
的read
属性:¥Get the
XLSX
property of the global object and theread
property ofXLSX
:
/* obtain reference to XLSX.read */
JSValue XLSX = JS_GetPropertyStr(ctx, global, "XLSX");
JSValue XLSX_read = JS_GetPropertyStr(ctx, XLSX, "read");
-
创建要传递给函数的参数数组。在这种情况下,将使用一个参数(
ArrayBuffer
数据)调用read
函数:¥Create an array of arguments to pass to the function. In this case, the
read
function will be called with one argument (ArrayBuffer
data):
/* prepare arguments */
JSValue args[] = { ab };
-
使用
JS_Call
来调用带参数的函数:¥Use
JS_Call
to call the function with the arguments:
/* call XLSX.read(ab) */
JSValue wb = JS_Call(ctx, XLSX_read, XLSX, 1, args);
获取第一个工作表
¥Get First Worksheet
目标是获得第一个工作表。在 JavaScript 中,工作簿的 SheetNames
属性是一个字符串数组,Sheets
属性保存工作表对象 [^4]。所需的操作如下所示:
¥The goal is to get the first worksheet. In JavaScript, the SheetNames
property
of the workbook is an array of strings and the Sheets
property holds worksheet
objects[^4]. The desired action looks like:
/* `wb` is the workbook from the previous step */
var wsname = wb.SheetNames[0];
var ws = wb.Sheets[wsname];
-
使用
JS_GetPropertyStr
将wb.SheetNames[0]
拉入 C 字符串:¥Pull
wb.SheetNames[0]
into a C string usingJS_GetPropertyStr
:
/* get `wb.SheetNames[0]` */
JSValue SheetNames = JS_GetPropertyStr(ctx, wb, "SheetNames");
JSValue Sheet1 = JS_GetPropertyStr(ctx, SheetNames, "0");
/* pull first sheet name into C code */
size_t wslen; const char *wsname = JS_ToCStringLen(ctx, &wslen, Sheet1);
-
获取工作表对象:
¥Get the worksheet object:
/* get wb.Sheets[wsname] */
JSValue Sheets = JS_GetPropertyStr(ctx, wb, "Sheets");
JSValue ws = JS_GetPropertyStr(ctx, Sheets, wsname);
转换为 CSV
¥Convert to CSV
目标是调用 sheet_to_csv
[^5] 并将结果提取到 C 代码中:
¥The goal is to call sheet_to_csv
[^5] and pull the result into C code:
/* `ws` is the worksheet from the previous step */
var csv = XLSX.utils.sheet_to_csv(ws);
-
创建对
XLSX.utils
和XLSX.utils.sheet_to_csv
的引用:¥Create a references to
XLSX.utils
andXLSX.utils.sheet_to_csv
:
/* obtain reference to XLSX.utils.sheet_to_csv */
JSValue utils = JS_GetPropertyStr(ctx, XLSX, "utils");
JSValue sheet_to_csv = JS_GetPropertyStr(ctx, utils, "sheet_to_csv");
-
创建参数数组:
¥Create arguments array:
/* prepare arguments */
JSValue args[] = { ws };
-
使用
JS_Call
调用该函数并使用JS_ToCStringLen
提取 CSV:¥Use
JS_Call
to call the function and useJS_ToCStringLen
to pull the CSV:
JSValue csv = JS_Call(ctx, sheet_to_csv, utils, 1, args);
size_t csvlen; const char *csvstr = JS_ToCStringLen(ctx, &csvlen, csv);
此时,csvstr
是一个可以打印到标准输出的 C 字符串。
¥At this point, csvstr
is a C string that can be printed to standard output.
完整示例
¥Complete Example
"集成示例" 涵盖了 C 应用中的传统集成,而 "CLI 测试" 使用 quickjs
CLI 工具演示了其他概念。
¥The "Integration Example" covers a traditional integration in a C application,
while the "CLI Test" demonstrates other concepts using the quickjs
CLI tool.
集成示例
¥Integration Example
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
架构 | Git 提交 | 日期 |
---|---|---|
darwin-x64 | 6a89d7c | 2024-03-15 |
darwin-arm | d378a9f | 2024-05-23 |
win10-x64 | 9e561d5 | 2024-03-04 |
win11-arm | d378a9f | 2024-05-25 |
linux-x64 | 3b45d15 | 2024-04-25 |
linux-arm | d378a9f | 2024-05-25 |
测试演示时,d378a9f
是 master
分支上的 HEAD 提交。
¥When the demo was tested, d378a9f
was the HEAD commit on the master
branch.
QuickJS 并不正式支持 Windows。win10-x64
和 win11-arm
测试完全在 Linux 的 Windows 子系统中运行。
¥QuickJS does not officially support Windows. The win10-x64
and win11-arm
tests were run entirely within Windows Subsystem for Linux.
-
构建
libquickjs.a
:¥Build
libquickjs.a
:
git clone https://github.com/bellard/quickjs
cd quickjs
git checkout d378a9f
make
cd ..
-
将
libquickjs.a
和quickjs.h
复制到工作目录中:¥Copy
libquickjs.a
andquickjs.h
into the working directory:
cp quickjs/libquickjs.a .
cp quickjs/quickjs.h .
-
下载
sheetjs.quick.c
:¥Download
sheetjs.quick.c
:
curl -LO https://xlsx.nodejs.cn/quickjs/sheetjs.quick.c
-
构建示例应用:
¥Build the sample application:
gcc -o sheetjs.quick -Wall sheetjs.quick.c libquickjs.a -lm
该程序尝试解析第一个参数指定的文件
¥This program tries to parse the file specified by the first argument
-
下载 SheetJS 独立脚本和测试文件。将这两个文件保存在项目目录中:
¥Download the SheetJS Standalone script and test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
-
运行测试程序:
¥Run the test program:
./sheetjs.quick pres.numbers
如果成功,程序将以 CSV 行的形式打印库版本号、文件大小、第一个工作表名称和第一个工作表的内容。
¥If successful, the program will print the library version number, file size, first worksheet name, and the contents of the first sheet as CSV rows.
CLI 测试
¥CLI Test
本 demo 在以下环境下进行了测试:
¥This demo was tested in the following environments:
Git 提交 | 日期 |
---|---|
d378a9f | 2024-05-23 |
测试演示时,d378a9f
是 master
分支上的 HEAD 提交。
¥When the demo was tested, d378a9f
was the HEAD commit on the master
branch.
-
从源代码构建
qjs
命令行实用程序:¥Build the
qjs
command line utility from source:
git clone https://github.com/bellard/quickjs
cd quickjs
git checkout d378a9f
make
cd ..
cp quickjs/qjs .
-
下载 SheetJS Standalone 脚本和测试文件。将这两个文件保存在项目目录中:
¥Download the SheetJS Standalone script and the test file. Save both files in the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
-
¥Download
SheetJSQuick.js
curl -LO https://xlsx.nodejs.cn/quickjs/SheetJSQuick.js
-
测试程序:
¥Test the program:
./qjs SheetJSQuick.js
如果成功,该脚本将打印 CSV 行并生成 SheetJSQuick.xlsx
.txt。生成的文件可以在 Excel 或其他电子表格编辑器中打开。
¥If successful, the script will print CSV rows and generate SheetJSQuick.xlsx
.
The generated file can be opened in Excel or another spreadsheet editor.
[^1]: 请参阅 QuickJS 文档中的 "运行时和上下文"
¥See "Runtime and Contexts" in the QuickJS documentation
[^2]: 见 read
于 "读取文件"
[^3]: 见 sheet_to_csv
于 "CSV 和文本"
¥See sheet_to_csv
in "CSV and Text"
[^4]: 见 "工作簿对象" 于 "SheetJS 数据模型"
¥See "Workbook Object" in "SheetJS Data Model"
[^5]: 见 sheet_to_csv
于 "CSV 和文本"