Skip to main content

使用 QuickJS 进行数据处理

QuickJS 是一个用 C 编写的嵌入式 JS 引擎。它内置支持读取和写入存储在内存中的文件数据。

¥QuickJS is an embeddable JS engine written in C. It has built-in support for reading and writing file data stored in memory.

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

该演示使用 QuickJS 和 SheetJS 从电子表格中提取数据并打印 CSV 行。我们将探讨如何在 QuickJS 上下文中加载 SheetJS 并处理来自 C 程序的电子表格。

¥This demo uses QuickJS and SheetJS to pull data from a spreadsheet and print CSV rows. We'll explore how to load SheetJS in a QuickJS context and process spreadsheets from C programs.

"集成示例" 部分包括一个完整的命令行工具,用于从文件中读取数据。

¥The "Integration Example" section includes a complete command-line tool for reading data from files.

集成详情

¥Integration Details

许多 QuickJS 函数没有文档记录。该解释已根据最新版本(提交 d378a9f)进行验证。

¥Many QuickJS functions are not documented. The explanation was verified against the latest release (commit d378a9f).

初始化 QuickJS

¥Initialize QuickJS

大多数 QuickJS API 函数与 JSContext 对象 [^1] 交互,该对象通常使用 JS_NewRuntimeJS_NewContext 创建:

¥Most QuickJS API functions interact with a JSContext object[^1], which is normally created with JS_NewRuntime and JS_NewContext:

#include "quickjs.h"

/* initialize context */
JSRuntime *rt = JS_NewRuntime();
JSContext *ctx = JS_NewContext(rt);

QuickJS 通过 JS_GetGlobalObject 提供了 global 对象:

¥QuickJS provides a global object through JS_GetGlobalObject:

/* obtain reference to global object */
JSValue global = JS_GetGlobalObject(ctx);
Cleanup (click to show)

Once finished, programs are expected to cleanup by using JS_FreeValue to free values, JS_FreeContext to free the context pointer, and JS_FreeRuntime to free the runtime:

/* global is a JSValue */
JS_FreeValue(ctx, global);

/* cleanup */
JS_FreeContext(ctx);
JS_FreeRuntime(rt);

The Integration Example frees JS values after use.

加载 SheetJS 脚本

¥Load SheetJS Scripts

SheetJS 独立脚本 可以在 QuickJS 中加载并执行。

¥SheetJS Standalone scripts can be loaded and executed in QuickJS.

可以通过从文件系统读取脚本并使用 JS_Eval 在 QuickJS 上下文中进行评估来加载主库:

¥The main library can be loaded by reading the script from the file system and evaluating in the QuickJS context using JS_Eval:

static char *read_file(const char *filename, size_t *sz) {
FILE *f = fopen(filename, "rb");
if(!f) return NULL;
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
*sz = fread((void *) buf, 1, fsize, f);
fclose(f);
return buf;
}

// ...
{
/* Read `xlsx.full.min.js` from the filesystem */
size_t len; char *buf = read_file("xlsx.full.min.js", &len);
/* evaluate from the QuickJS context */
JS_Eval(ctx, buf, len, "<input>", 0);
/* Free the file buffer */
free(buf);
}

如果加载了库,XLSX.version 将是一个字符串。可以将该字符串拉入主 C 程序中。

¥If the library is loaded, XLSX.version will be a string. This string can be pulled into the main C program.

  1. 使用 JS_GetPropertyStr 获取全局对象的 XLSX 属性:

    ¥Get the XLSX property of the global object using JS_GetPropertyStr:

/* obtain reference to the XLSX object */
JSValue XLSX = JS_GetPropertyStr(ctx, global, "XLSX");
  1. 使用 JS_GetPropertyStr 获取 XLSX 对象的 version 属性:

    ¥Get the version property of the XLSX object using JS_GetPropertyStr:

/* obtain reference to `XLSX.version` */
JSValue version = JS_GetPropertyStr(ctx, XLSX, "version");
  1. 使用 JS_ToCStringLen 将字符串拉入 C 代码:

    ¥Pull the string into C code with JS_ToCStringLen:

/* pull the version string into C */
size_t vlen; const char *vers = JS_ToCStringLen(ctx, &vlen, version);
printf("Version: %s\n", vers);

读取文件

¥Reading Files

JS_NewArrayBuffer 可以从 C 字节数组生成 ArrayBuffer。函数签名需要 uint8_t * 而不是 char *

¥JS_NewArrayBuffer can generate an ArrayBuffer from a C byte array. The function signature expects uint8_t * instead of char *:

/* read file */
size_t dlen; uint8_t * dbuf = (uint8_t *)read_file("pres.numbers", &dlen);

/* load data into array buffer */
JSValue ab = JS_NewArrayBuffer(ctx, dbuf, dlen, NULL, NULL, 0);

ArrayBuffer 将使用 SheetJS read 方法 [^2] 进行解析。CSV 行数据将使用 sheet_to_csv[^3] 生成。

¥The ArrayBuffer will be parsed with the SheetJS read method[^2]. The CSV row data will be generated with sheet_to_csv[^3].

解析 ArrayBuffer

¥Parse the ArrayBuffer

目标是运行与以下 JavaScript 代码等效的内容:

¥The goal is to run the equivalent of the following JavaScript code:

/* `ab` is the `ArrayBuffer` from the previous step */
var wb = XLSX.read(ab);
  1. 获取全局对象的 XLSX 属性和 XLSXread 属性:

    ¥Get the XLSX property of the global object and the read property of XLSX:

/* obtain reference to XLSX.read */
JSValue XLSX = JS_GetPropertyStr(ctx, global, "XLSX");
JSValue XLSX_read = JS_GetPropertyStr(ctx, XLSX, "read");
  1. 创建要传递给函数的参数数组。在这种情况下,将使用一个参数(ArrayBuffer 数据)调用 read 函数:

    ¥Create an array of arguments to pass to the function. In this case, the read function will be called with one argument (ArrayBuffer data):

/* prepare arguments */
JSValue args[] = { ab };
  1. 使用 JS_Call 来调用带参数的函数:

    ¥Use JS_Call to call the function with the arguments:

/* call XLSX.read(ab) */
JSValue wb = JS_Call(ctx, XLSX_read, XLSX, 1, args);

获取第一个工作表

¥Get First Worksheet

目标是获得第一个工作表。在 JavaScript 中,工作簿的 SheetNames 属性是一个字符串数组,Sheets 属性保存工作表对象 [^4]。所需的操作如下所示:

¥The goal is to get the first worksheet. In JavaScript, the SheetNames property of the workbook is an array of strings and the Sheets property holds worksheet objects[^4]. The desired action looks like:

/* `wb` is the workbook from the previous step */
var wsname = wb.SheetNames[0];
var ws = wb.Sheets[wsname];
  1. 使用 JS_GetPropertyStrwb.SheetNames[0] 拉入 C 字符串:

    ¥Pull wb.SheetNames[0] into a C string using JS_GetPropertyStr:

/* get `wb.SheetNames[0]` */
JSValue SheetNames = JS_GetPropertyStr(ctx, wb, "SheetNames");
JSValue Sheet1 = JS_GetPropertyStr(ctx, SheetNames, "0");

/* pull first sheet name into C code */
size_t wslen; const char *wsname = JS_ToCStringLen(ctx, &wslen, Sheet1);
  1. 获取工作表对象:

    ¥Get the worksheet object:

/* get wb.Sheets[wsname] */
JSValue Sheets = JS_GetPropertyStr(ctx, wb, "Sheets");
JSValue ws = JS_GetPropertyStr(ctx, Sheets, wsname);

转换为 CSV

¥Convert to CSV

目标是调用 sheet_to_csv[^5] 并将结果提取到 C 代码中:

¥The goal is to call sheet_to_csv[^5] and pull the result into C code:

/* `ws` is the worksheet from the previous step */
var csv = XLSX.utils.sheet_to_csv(ws);
  1. 创建对 XLSX.utilsXLSX.utils.sheet_to_csv 的引用:

    ¥Create a references to XLSX.utils and XLSX.utils.sheet_to_csv:

/* obtain reference to XLSX.utils.sheet_to_csv */
JSValue utils = JS_GetPropertyStr(ctx, XLSX, "utils");
JSValue sheet_to_csv = JS_GetPropertyStr(ctx, utils, "sheet_to_csv");
  1. 创建参数数组:

    ¥Create arguments array:

/* prepare arguments */
JSValue args[] = { ws };
  1. 使用 JS_Call 调用该函数并使用 JS_ToCStringLen 提取 CSV:

    ¥Use JS_Call to call the function and use JS_ToCStringLen to pull the CSV:

JSValue csv = JS_Call(ctx, sheet_to_csv, utils, 1, args);
size_t csvlen; const char *csvstr = JS_ToCStringLen(ctx, &csvlen, csv);

此时,csvstr 是一个可以打印到标准输出的 C 字符串。

¥At this point, csvstr is a C string that can be printed to standard output.

完整示例

¥Complete Example

"集成示例" 涵盖了 C 应用中的传统集成,而 "CLI 测试" 使用 quickjs CLI 工具演示了其他概念。

¥The "Integration Example" covers a traditional integration in a C application, while the "CLI Test" demonstrates other concepts using the quickjs CLI tool.

集成示例

¥Integration Example

测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

架构Git 提交日期
darwin-x646a89d7c2024-03-15
darwin-armd378a9f2024-05-23
win10-x649e561d52024-03-04
win11-armd378a9f2024-05-25
linux-x643b45d152024-04-25
linux-armd378a9f2024-05-25

测试演示时,d378a9fmaster 分支上的 HEAD 提交。

¥When the demo was tested, d378a9f was the HEAD commit on the master branch.

QuickJS 并不正式支持 Windows。win10-x64win11-arm 测试完全在 Linux 的 Windows 子系统中运行。

¥QuickJS does not officially support Windows. The win10-x64 and win11-arm tests were run entirely within Windows Subsystem for Linux.

  1. 构建 libquickjs.a

    ¥Build libquickjs.a:

git clone https://github.com/bellard/quickjs
cd quickjs
git checkout d378a9f
make
cd ..
  1. libquickjs.aquickjs.h 复制到工作目录中:

    ¥Copy libquickjs.a and quickjs.h into the working directory:

cp quickjs/libquickjs.a .
cp quickjs/quickjs.h .
  1. 下载 sheetjs.quick.c

    ¥Download sheetjs.quick.c:

curl -LO https://xlsx.nodejs.cn/quickjs/sheetjs.quick.c
  1. 构建示例应用:

    ¥Build the sample application:

gcc -o sheetjs.quick -Wall sheetjs.quick.c libquickjs.a -lm

该程序尝试解析第一个参数指定的文件

¥This program tries to parse the file specified by the first argument

  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件保存在项目目录中:

    ¥Download the SheetJS Standalone script and test file. Save both files in the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
  1. 运行测试程序:

    ¥Run the test program:

./sheetjs.quick pres.numbers

如果成功,程序将以 CSV 行的形式打印库版本号、文件大小、第一个工作表名称和第一个工作表的内容。

¥If successful, the program will print the library version number, file size, first worksheet name, and the contents of the first sheet as CSV rows.

CLI 测试

¥CLI Test

测试部署

本 demo 在以下环境下进行了测试:

¥This demo was tested in the following environments:

Git 提交日期
d378a9f2024-05-23

测试演示时,d378a9fmaster 分支上的 HEAD 提交。

¥When the demo was tested, d378a9f was the HEAD commit on the master branch.

  1. 从源代码构建 qjs 命令行实用程序:

    ¥Build the qjs command line utility from source:

git clone https://github.com/bellard/quickjs
cd quickjs
git checkout d378a9f
make
cd ..
cp quickjs/qjs .
  1. 下载 SheetJS Standalone 脚本和测试文件。将这两个文件保存在项目目录中:

    ¥Download the SheetJS Standalone script and the test file. Save both files in the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
  1. 下载 SheetJSQuick.js

    ¥Download SheetJSQuick.js

curl -LO https://xlsx.nodejs.cn/quickjs/SheetJSQuick.js
  1. 测试程序:

    ¥Test the program:

./qjs SheetJSQuick.js

如果成功,该脚本将打印 CSV 行并生成 SheetJSQuick.xlsx.txt。生成的文件可以在 Excel 或其他电子表格编辑器中打开。

¥If successful, the script will print CSV rows and generate SheetJSQuick.xlsx. The generated file can be opened in Excel or another spreadsheet editor.

[^1]: 请参阅 QuickJS 文档中的 "运行时和上下文"

¥See "Runtime and Contexts" in the QuickJS documentation

[^2]: 见 read 于 "读取文件"

¥See read in "Reading Files"

[^3]: 见 sheet_to_csv 于 "CSV 和文本"

¥See sheet_to_csv in "CSV and Text"

[^4]: 见 "工作簿对象" 于 "SheetJS 数据模型"

¥See "Workbook Object" in "SheetJS Data Model"

[^5]: 见 sheet_to_csv 于 "CSV 和文本"

¥See sheet_to_csv in "CSV and Text"