Skip to main content

使用 V8 实现超快数据处理

V8 是一个用 C++ 编写的嵌入式 JavaScript 引擎。它为 Chromium 和 Chrome、NodeJS 和 Deno、Adobe UXP 和其他平台提供支持。

¥V8 is an embeddable JavaScript engine written in C++. It powers Chromium and Chrome, NodeJS and Deno, Adobe UXP and other platforms.

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

该演示使用 V8 和 SheetJS 来读取和写入电子表格。我们将探索如何在 V8 上下文中加载 SheetJS,并处理来自 C++ 和 Rust 程序的电子表格和结构化数据。

¥This demo uses V8 and SheetJS to read and write spreadsheets. We'll explore how to load SheetJS in a V8 context and process spreadsheets and structured data from C++ and Rust programs.

"完整示例" 创建了一个 C++ 命令行工具,用于读取电子表格文件和生成新工作簿。"绑定" 涵盖了其他编程语言的 V8 引擎绑定。

¥The "Complete Example" creates a C++ command-line tool for reading spreadsheet files and generating new workbooks. "Bindings" covers V8 engine bindings for other programming languages.

集成详情

¥Integration Details

SheetJS 独立脚本 可以在 V8 上下文中进行解析和评估。

¥The SheetJS Standalone scripts can be parsed and evaluated in a V8 context.

本节描述了每次运行程序时解析和评估脚本的流程。

¥This section describes a flow where the script is parsed and evaluated each time the program is run.

使用 V8 快照,可以在构建时解析和评估 SheetJS 库。这大大提高了程序启动时间。

¥Using V8 snapshots, SheetJS libraries can be parsed and evaluated at build time. This greatly improves program startup time.

"快照" 部分包含一个完整的示例。

¥The "Snapshots" section includes a complete example.

初始化 V8

¥Initialize V8

官方 V8 hello-world 示例涵盖了初始化和清理。出于本演示的目的,关键变量如下:

¥The official V8 hello-world example covers initialization and cleanup. For the purposes of this demo, the key variables are noted below:

v8::Isolate* isolate = v8::Isolate::New(create_params);
v8::Local<v8::Context> context = v8::Context::New(isolate);

以下辅助函数将 C 字符串计算为 JS 代码:

¥The following helper function evaluates C strings as JS code:

v8::Local<v8::Value> eval_code(v8::Isolate *isolate, v8::Local<v8::Context> context, char* code, size_t sz = -1) {
v8::Local<v8::String> source = v8::String::NewFromUtf8(isolate, code, v8::NewStringType::kNormal, sz).ToLocalChecked();
v8::Local<v8::Script> script = v8::Script::Compile(context, source).ToLocalChecked();
return script->Run(context).ToLocalChecked();
}

加载 SheetJS 脚本

¥Load SheetJS Scripts

可以通过从文件系统读取脚本并在 V8 上下文中进行评估来加载主库:

¥The main library can be loaded by reading the scripts from the file system and evaluating in the V8 context:

/* simple wrapper to read the entire script file */
static char *read_file(const char *filename, size_t *sz) {
FILE *f = fopen(filename, "rb");
if(!f) return NULL;
long fsize; { fseek(f, 0, SEEK_END); fsize = ftell(f); fseek(f, 0, SEEK_SET); }
char *buf = (char *)malloc(fsize * sizeof(char));
*sz = fread((void *) buf, 1, fsize, f);
fclose(f);
return buf;
}

// ...
size_t sz; char *file = read_file("xlsx.full.min.js", &sz);
v8::Local<v8::Value> result = eval_code(isolate, context, file, sz);

要确认库已加载,可以检查 XLSX.version

¥To confirm the library is loaded, XLSX.version can be inspected:

  /* get version string */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.version");
v8::String::Utf8Value vers(isolate, result);
printf("SheetJS library version %s\n", *vers);

读取文件

¥Reading Files

V8 原生支持 ArrayBuffer。假设 buf 是一个 C 字节数组,长度为 len,以下代码将数据存储在全局 ArrayBuffer 中:

¥V8 supports ArrayBuffer natively. Assuming buf is a C byte array, with length len, the following code stores the data in a global ArrayBuffer:

Loading data into an ArrayBuffer in the V8 engine
/* load C char array and save to an ArrayBuffer */
std::unique_ptr<v8::BackingStore> back = v8::ArrayBuffer::NewBackingStore(isolate, len);
memcpy(back->Data(), buf, len);
v8::Local<v8::ArrayBuffer> ab = v8::ArrayBuffer::New(isolate, std::move(back));
v8::Maybe<bool> res = context->Global()->Set(context, v8::String::NewFromUtf8Literal(isolate, "buf"), ab);

将原始数据拉入引擎后,SheetJS read 方法 [^1] 即可解析数据。建议将结果附加到全局变量:

¥Once the raw data is pulled into the engine, the SheetJS read method[^1] can parse the data. It is recommended to attach the result to a global variable:

/* parse with SheetJS */
v8::Local<v8::Value> result = eval_code(isolate, context, "globalThis.wb = XLSX.read(buf)");

wb 是一个 SheetJS 工作簿对象,它将是 JS 环境中的一个变量,可以使用各种 SheetJS API 函数进行检查。

¥wb, a SheetJS workbook object[^2], will be a variable in the JS environment that can be inspected using the various SheetJS API functions[^3].

写入文件

¥Writing Files

SheetJS write 方法 [^4] 从工作簿对象生成文件字节。array type[^5] 指示库生成 ArrayBuffer 对象:

¥The SheetJS write method[^4] generates file bytes from workbook objects. The array type[^5] instructs the library to generate ArrayBuffer objects:

/* write with SheetJS using type: "array" */
v8::Local<v8::Value> result = eval_code(isolate, context, "XLSX.write(wb, {type:'array', bookType:'xlsb'})");

ArrayBuffer 的底层内存可以从引擎中提取:

¥The underlying memory from an ArrayBuffer can be pulled from the engine:

Pulling raw bytes from an ArrayBuffer
/* pull result back to C++ */
v8::Local<v8::ArrayBuffer> ab = v8::Local<v8::ArrayBuffer>::Cast(result);
size_t sz = ab->ByteLength();
char *buf = ab->Data();

生成的 buf 可以用 fwrite 写入文件。

¥The resulting buf can be written to file with fwrite.

完整示例

¥Complete Example

测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

V8 版本平台操作系统版本编译器日期
13.3.228darwin-x64macOS 15.1.1clang 16.0.02024-12-03
13.5.92darwin-armmacOS 14.5clang 16.0.02025-02-15
12.7.130win11-x64视窗 11CL 19.42.344352024-12-20
12.7.130linux-x64HoloOS 3.6.20gcc 13.2.12025-01-02
13.5.92linux-armDebian 12gcc 12.2.02025-02-15

该程序解析文件并打印第一个工作表中的 CSV 数据。它还生成 XLSB 文件并写入文件系统。

¥This program parses a file and prints CSV data from the first worksheet. It also generates an XLSB file and writes to the filesystem.

上次测试演示时,官方 V8 嵌入指南中存在错误。更正后的说明如下。

¥When the demo was last tested, there were errors in the official V8 embed guide. Corrected instructions are included below.

构建过程很长,会考验你的耐心。

¥The build process is long and will test your patience.

准备

¥Preparation

  1. 准备 /usr/local/lib

    ¥Prepare /usr/local/lib:

mkdir -p /usr/local/lib
cd /usr/local/lib

如果此步骤引发权限错误,请运行以下命令:

¥If this step throws a permission error, run the following commands:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
  1. 下载并安装 depot_tools

    ¥Download and install depot_tools:

rm -rf depot_tools
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

如果此步骤引发权限错误,请运行以下命令并重试:

¥If this step throws a permission error, run the following commands and retry:

sudo mkdir -p /usr/local/lib
sudo chmod 777 /usr/local/lib
  1. 添加 PATH 环境变量的路径:

    ¥Add the path to the PATH environment variable:

export PATH="/usr/local/lib/depot_tools:$PATH"

此时,强烈建议将该行添加到 shell 启动脚本中,例如 .bashrc.zshrc

¥At this point, it is strongly recommended to add the line to a shell startup script such as .bashrc or .zshrc

  1. 运行 gclient 一次以更新 depot_tools

    ¥Run gclient once to update depot_tools:

gclient

克隆 V8

¥Clone V8

  1. 创建一个基本目录:

    ¥Create a base directory:

mkdir -p ~/dev/v8
cd ~/dev/v8
fetch v8
cd v8

请注意,实际的存储库将放置在 ~/dev/v8/v8 中。

¥Note that the actual repo will be placed in ~/dev/v8/v8.

  1. 检查所需的版本。以下命令拉取 13.5.92

    ¥Checkout the desired version. The following command pulls 13.5.92:

git checkout tags/13.5.92 -b sample

官方文档推荐:

¥The official documentation recommends:

git checkout refs/tags/13.5.92 -b sample -t

该命令在本地测试失败:

¥This command failed in local testing:

E:\v8\v8>git checkout refs/tags/13.5.92 -b sample -t
fatal: cannot set up tracking information; starting point 'refs/tags/13.5.92' is not a branch

构建 V8

¥Build V8

  1. 构建静态库。

    ¥Build the static library.

tools/dev/v8gen.py x64.release.sample
ninja -C out.gn/x64.release.sample v8_monolith

由于重大更改,这在较新的 Python 版本中不起作用!

¥This does not work in newer Python releases due to a breaking change!

Python 3.13 从标准库 [^9] 中删除了 pipes 模块。v8gen.py 将在较新的 Python 版本上失败,并出现以下回溯:

¥Python 3.13 removed the pipes module from the standard library[^9]. v8gen.py will fail on newer Python releases with the following traceback:

Traceback (most recent call last):
File "/Users/sheetjs/dev/v8/v8/tools/dev/v8gen.py", line 53, in <module>
import mb
File "/Users/sheetjs/dev/v8/v8/tools/mb/mb.py", line 21, in <module>
import pipes
ModuleNotFoundError: No module named 'pipes'

建议的解决方法是使用 Homebrew 安装并使用 Python 3.12:

¥The recommended workaround is to use Homebrew to install and use Python 3.12:

brew install python@3.12
export PATH="$(brew --prefix)/opt/python@3.12/libexec/bin:$PATH"

应用解决方法后,构建命令将运行。

¥After applying the workaround, the build commands will run.

  1. 确保示例 hello-world 编译并运行:

    ¥Ensure the sample hello-world compiles and runs:

g++ -I. -Iinclude samples/hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lout.gn/x64.release.sample/obj/ -pthread \
-std=c++20 -DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

在较旧的 V8 版本中,需要标志 -lv8_libbase -lv8_libplatform

¥In older V8 versions, the flags -lv8_libbase -lv8_libplatform were required.

在 V8 版本 12.4.253 中链接到 libv8_libbaselibv8_libplatform 会引发链接器错误:

¥Linking against libv8_libbase or libv8_libplatform in V8 version 12.4.253 elicited linker errors:

ld: multiple errors: unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

准备项目

¥Prepare Project

  1. 新建一个项目文件夹:

    ¥Make a new project folder:

cd ~/dev
mkdir -p sheetjs-v8
cd sheetjs-v8
  1. 复制示例源:

    ¥Copy the sample source:

cp ~/dev/v8/v8/samples/hello-world.cc .
  1. 创建到 include 标头和 obj 库文件夹的符号链接:

    ¥Create symbolic links to the include headers and obj library folders:

ln -s ~/dev/v8/v8/include
ln -s ~/dev/v8/v8/out.gn/x64.release.sample/obj
  1. 从此文件夹构建并运行 hello-world 示例:

    ¥Build and run the hello-world example from this folder:

g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

在某些 V8 版本中,该命令在链接器阶段失败:

¥In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

删除 libv8_libbaselibv8_libplatform 后构建成功:

¥The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX
./hello_world

在 macOS 上,在某些 V8 版本中,需要 Foundation 框架:

¥On macOS, in some V8 versions, the Foundation framework is required:

g++ -I. -Iinclude hello-world.cc -o hello_world -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX -framework Foundation
./hello_world

添加 SheetJS

¥Add SheetJS

  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件保存在项目目录中:

    ¥Download the SheetJS Standalone script and test file. Save both files in the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
  1. 下载 sheetjs.v8.cc

    ¥Download sheetjs.v8.cc:

curl -LO https://xlsx.nodejs.cn/v8/sheetjs.v8.cc
  1. 编译独立的 sheetjs.v8 二进制文件

    ¥Compile standalone sheetjs.v8 binary

g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-lv8_libbase -lv8_libplatform -ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX

在某些 V8 版本中,该命令在链接器阶段失败:

¥In some V8 versions, the command failed in the linker stage:

ld: multiple errors: unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libplatform.a'; unknown file type in '/Users/sheetjs/dev/v8/v8/out.gn/x64.release.sample/obj/libv8_libbase.a'

删除 libv8_libbaselibv8_libplatform 后构建成功:

¥The build succeeds after removing libv8_libbase and libv8_libplatform:

g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX

在 macOS 上,在某些 V8 版本中,需要 Foundation 框架:

¥On macOS, in some V8 versions, the Foundation framework is required:

g++ -I. -Iinclude sheetjs.v8.cc -o sheetjs.v8 -fno-rtti -lv8_monolith \
-ldl -Lobj/ -pthread -std=c++20 \
-DV8_COMPRESS_POINTERS=1 -DV8_ENABLE_SANDBOX -framework Foundation
  1. 运行演示:

    ¥Run the demo:

./sheetjs.v8 pres.numbers

如果程序成功,CSV 内容将打印到控制台并创建文件 sheetjsw.xlsb。该文件可以用 Excel 打开。

¥If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

绑定

¥Bindings

许多语言都存在绑定。由于这些绑定需要 "native" 代码,因此它们可能无法在每个平台上工作。

¥Bindings exist for many languages. As these bindings require "native" code, they may not work on every platform.

Rust

v8 crate[^6] 提供二进制构建和直接绑定。Rust 代码与 C++ 代码类似。

¥The v8 crate[^6] provides binary builds and straightforward bindings. The Rust code is similar to the C++ code.

将数据从 ArrayBuffer 拉回 Rust 涉及不安全的操作:

¥Pulling data from an ArrayBuffer back into Rust involves an unsafe operation:

/* assuming JS code returns an ArrayBuffer, copy result to a Vec<u8> */
fn eval_code_ab(scope: &mut v8::HandleScope, code: &str) -> Vec<u8> {
let source = v8::String::new(scope, code).unwrap();
let script = v8::Script::compile(scope, source, None).unwrap();
let result: v8::Local<v8::ArrayBuffer> = script.run(scope).unwrap().try_into().unwrap();
/* In C++, `Data` returns a pointer. Collecting data into Vec<u8> is unsafe */
unsafe { return std::slice::from_raw_parts_mut(
result.data().unwrap().cast::<u8>().as_ptr(),
result.byte_length()
).to_vec(); }
}
测试部署

该演示最后在以下部署中进行了测试:

¥This demo was last tested in the following deployments:

架构V8 箱子日期
darwin-x64130.0.72025-01-19
darwin-arm134.3.02025-02-13
win11-x64130.0.22024-12-20
linux-x64130.0.72025-01-09
linux-arm134.4.02025-02-15
  1. 创建一个新项目:

    ¥Create a new project:

cargo new sheetjs-rustyv8
cd sheetjs-rustyv8
cargo run
  1. 添加 v8 箱子:

    ¥Add the v8 crate:

cargo add v8
cargo run
  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件保存在项目目录中:

    ¥Download the SheetJS Standalone script and test file. Save both files in the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.numbers
  1. 下载 main.rs 并替换 src/main.rs

    ¥Download main.rs and replace src/main.rs:

curl -L -o src/main.rs https://xlsx.nodejs.cn/v8/main.rs

0.102.0 版本中存在影响 v8::Context::new 的重大更改。当针对旧版本的 crate 时,删除第二个参数:

¥There was a breaking change in version 0.102.0 affecting v8::Context::new. When targeting older versions of the crate, remove the second argument:

src/main.rs
  let context = v8::Context::new(handle_scope); // v8 <= 0.101.0
//let context = v8::Context::new(handle_scope, Default::default()); // v8 >= 0.102.0
  1. 构建并运行应用:

    ¥Build and run the app:

cargo run pres.numbers

如果程序成功,CSV 内容将打印到控制台并创建文件 sheetjsw.xlsb。该文件可以用 Excel 打开。

¥If the program succeeded, the CSV contents will be printed to console and the file sheetjsw.xlsb will be created. That file can be opened with Excel.

Java

Javet 是与 V8 引擎的 Java 绑定。Javet 简化了 Java 数据结构和 V8 等效结构之间的转换。

¥Javet is a Java binding to the V8 engine. Javet simplifies conversions between Java data structures and V8 equivalents.

Java 字节数组(byte[])在 V8 中投影为 Int8Array。SheetJS read 方法需要 Uint8Array。以下脚本片段执行零拷贝转换:

¥Java byte arrays (byte[]) are projected in V8 as Int8Array. The SheetJS read method expects a Uint8Array. The following script snippet performs a zero-copy conversion:

Zero-copy conversion from Int8Array to Uint8Array
// assuming `i8` is an Int8Array
const u8 = new Uint8Array(i8.buffer, i8.byteOffset, i8.length);
测试部署

该演示最后在以下部署中进行了测试:

¥This demo was last tested in the following deployments:

架构V8 版本JavetJava日期
darwin-x6413.2.152.164.1.1222025-01-19
darwin-arm12.6.228.133.1.311.0.232024-06-19
win11-x6412.6.228.133.1.321.0.52024-12-20
linux-x6412.6.228.133.1.317.0.72024-06-20
linux-arm13.2.152.164.1.117.0.142025-02-16
  1. 创建一个新项目:

    ¥Create a new project:

mkdir sheetjs-javet
cd sheetjs-javet
  1. 下载 Javet JAR。不同平台有不同的存档。

    ¥Download the Javet JAR. There are different archives for different platforms.

curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet/4.1.1/javet-4.1.1.jar
curl -LO https://repo1.maven.org/maven2/com/caoccao/javet/javet-v8-macos-x86_64/4.1.1/javet-v8-macos-x86_64-4.1.1.jar
  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件保存在项目目录中:

    ¥Download the SheetJS Standalone script and test file. Save both files in the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.xlsx
  1. 下载 SheetJSJavet.java

    ¥Download SheetJSJavet.java:

curl -LO https://xlsx.nodejs.cn/v8/SheetJSJavet.java
  1. 构建并运行 Java 应用:

    ¥Build and run the Java application:

javac -cp ".:javet-4.1.1.jar:javet-v8-macos-x86_64-4.1.1.jar" SheetJSJavet.java
java -cp ".:javet-4.1.1.jar:javet-v8-macos-x86_64-4.1.1.jar" SheetJSJavet pres.xlsx

如果程序成功,CSV 内容将打印到控制台。

¥If the program succeeded, the CSV contents will be printed to console.

C#

ClearScript 是 V8 引擎的 .NET 接口。

¥ClearScript is a .NET interface to the V8 engine.

C# 字节数组 (byte[]) 必须明确转换为字节数组:

¥C# byte arrays (byte[]) must be explicitly converted to arrays of bytes:

/* read data into a byte array */
byte[] filedata = File.ReadAllBytes("pres.numbers");

/* generate a JS Array (variable name `buf`) from the data */
engine.Script.buf = engine.Script.Array.from(filedata);

/* parse data */
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");
测试部署

该演示最后在以下部署中进行了测试:

¥This demo was last tested in the following deployments:

架构V8 版本日期
darwin-x6412.3.219.122024-07-16
darwin-arm12.3.219.122024-07-16
win11-x6412.3.219.122024-12-20
win11-arm12.3.219.122025-02-23
linux-x6412.3.219.122025-01-10
linux-arm12.3.219.122025-02-16
  1. DOTNET_CLI_TELEMETRY_OPTOUT 环境变量设置为 1

    ¥Set the DOTNET_CLI_TELEMETRY_OPTOUT environment variable to 1.

How to disable telemetry (click to hide)

将以下行添加到 .profile.bashrc.zshrc

¥Add the following line to .profile, .bashrc and .zshrc:

(add to .profile , .bashrc , and .zshrc)
export DOTNET_CLI_TELEMETRY_OPTOUT=1

关闭并重新启动终端以加载更改。

¥Close and restart the Terminal to load the changes.

  1. 安装.NET

    ¥Install .NET

Installation Notes (click to show)

For macOS x64 and ARM64, install the dotnet-sdk Cask with Homebrew:

brew install --cask dotnet-sdk

For Steam Deck Holo and other Arch Linux x64 distributions, the dotnet-sdk and dotnet-runtime packages should be installed using pacman:

sudo pacman -Syu dotnet-sdk dotnet-runtime

https://dotnet.microsoft.com/en-us/download/dotnet/6.0 is the official source for Windows and ARM64 Linux versions.

  1. 在 macOS 中打开新的终端窗口或在 Windows 中打开 PowerShell 窗口。

    ¥Open a new Terminal window in macOS or PowerShell window in Windows.

  2. 创建一个新项目:

    ¥Create a new project:

mkdir SheetJSClearScript
cd SheetJSClearScript
dotnet new console
dotnet run
  1. 将 ClearScript 添加到项目中:

    ¥Add ClearScript to the project:

dotnet add package Microsoft.ClearScript.Complete --version 7.4.5
  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件移动到项目目录:

    ¥Download the SheetJS standalone script and test file. Move both files to the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.xlsx
  1. Program.cs 替换为以下内容:

    ¥Replace Program.cs with the following:

Program.cs
using Microsoft.ClearScript.JavaScript;
using Microsoft.ClearScript.V8;

/* initialize ClearScript */
var engine = new V8ScriptEngine();

/* Load SheetJS Scripts */
engine.Evaluate(File.ReadAllText("xlsx.full.min.js"));
Console.WriteLine("SheetJS version {0}", engine.Evaluate("XLSX.version"));

/* Read and Parse File */
byte[] filedata = File.ReadAllBytes(args[0]);
engine.Script.buf = engine.Script.Array.from(filedata);
engine.Evaluate("var wb = XLSX.read(buf, {type: 'array'});");

/* Print CSV of first worksheet */
engine.Evaluate("var ws = wb.Sheets[wb.SheetNames[0]];");
var csv = engine.Evaluate("XLSX.utils.sheet_to_csv(ws)");
Console.Write(csv);

/* Generate XLSB file and save to SheetJSJint.xlsb */
var xlsb = (ITypedArray<byte>)engine.Evaluate("XLSX.write(wb, {bookType: 'xlsb', type: 'buffer'})");
File.WriteAllBytes("SheetJSClearScript.xlsb", xlsb.ToArray());

保存后,运行程序并将测试文件名作为参数传递:

¥After saving, run the program and pass the test file name as an argument:

dotnet run pres.xlsx

如果成功,程序会将第一张表的内容打印为 CSV 行。它还将创建 SheetJSClearScript.xlsb,一个可以在电子表格编辑器中打开的工作簿。

¥If successful, the program will print the contents of the first sheet as CSV rows. It will also create SheetJSClearScript.xlsb, a workbook that can be opened in a spreadsheet editor.

Python

pyv8 是 V8 的 Python 封装器。

¥pyv8 is a Python wrapper for V8.

stpyv8 包 [^7] 是一个积极维护的带有二进制轮子的分支。

¥The stpyv8 package[^7] is an actively-maintained fork with binary wheels.

上次测试此演示时,Python bytes 和 JavaScript ArrayBuffer 数据之间没有直接转换。

¥When this demo was last tested, there was no direct conversion between Python bytes and JavaScript ArrayBuffer data.

这是一个已知问题 [^8]。当前建议使用 Base64 字符串。

¥This is a known issue[^8]. The current recommendation is Base64 strings.

Python Base64 字符串

¥Python Base64 Strings

SheetJS read[^1] 和 write[^4] 方法通过 base64 类型 [^5] 支持 Base64 字符串。

¥The SheetJS read[^1] and write[^4] methods support Base64 strings through the base64 type[^5].

读取文件

¥Reading Files

建议使用特殊方法创建全局上下文,该方法处理从 Python 读取文件。以下代码片段中的 read_file 助手将从 sheetjs.xlsx 读取字节并生成 Base64 字符串:

¥It is recommended to create a global context with a special method that handles file reading from Python. The read_file helper in the following snippet will read bytes from sheetjs.xlsx and generate a Base64 string:

from base64 import b64encode;
from STPyV8 import JSContext, JSClass;

# Create context with methods for file i/o
class Base64Context(JSClass):
def read_file(self, path):
with open(path, "rb") as f:
data = f.read();
return b64encode(data).decode("ascii");
globals = Base64Context();

# The JSContext starts and cleans up the V8 engine
with JSContext(globals) as ctxt:
print(ctxt.eval("read_file('sheetjs.xlsx')")); # read base64 data and print

写入文件

¥Writing Files

由于 SheetJS write 方法返回 Base64 字符串,因此可以从 Python 解码结果并将其写入文件:

¥Since the SheetJS write method returns a Base64 string, the result can be decoded and written to file from Python:

from base64 import b64decode;
from STPyV8 import JSContext;

# The JSContext starts and cleans up the V8 engine
with JSContext() as ctxt:
# ... initialization and workbook creation ...
xlsb = ctxt.eval("XLSX.write(wb, {type: 'base64', bookType: 'xlsb'})");
with open("SheetJSSTPyV8.xlsb", "wb") as f:
f.write(b64decode(xlsb));

Python 演示

¥Python Demo

测试部署

该演示最后在以下部署中进行了测试:

¥This demo was last tested in the following deployments:

架构V8 版本Python日期
darwin-arm13.0.245.163.13.02024-10-20
  1. 为项目创建一个新文件夹:

    ¥Make a new folder for the project:

mkdir sheetjs-stpyv8
cd sheetjs-stpyv8
  1. 安装 stpyv8

    ¥Install stpyv8:

pip install stpyv8

安装可能会因 externally-managed-environment 错误而失败:

¥The install may fail with a externally-managed-environment error:

error: externally-managed-environment

× This environment is externally managed

可以下载并强制安装 wheel。以下命令在 darwin-arm 上为 Python 3.13 下载并安装版本 13.0.245.16

¥The wheel can be downloaded and forcefully installed. The following commands download and install version 13.0.245.16 for Python 3.13 on darwin-arm:

curl -LO https://github.com/cloudflare/stpyv8/releases/download/v13.0.245.16/stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl
sudo python -m pip install --upgrade stpyv8-13.0.245.16-cp313-cp313-macosx_14_0_arm64.whl --break-system-packages
  1. 下载 SheetJS 独立脚本和测试文件。将这两个文件移动到项目目录:

    ¥Download the SheetJS standalone script and test file. Move both files to the project directory:

curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://xlsx.nodejs.cn/pres.xlsx
  1. 下载 sheetjs-stpyv8.py

    ¥Download sheetjs-stpyv8.py:

curl -LO https://xlsx.nodejs.cn/v8/sheetjs-stpyv8.py
  1. 运行脚本并传递 pres.xlsx 作为第一个参数:

    ¥Run the script and pass pres.xlsx as the first argument:

python sheetjs-stpyv8.py pres.xlsx

脚本将显示第一个工作表中的 CSV 行。它还将创建 SheetJSSTPyV8.xlsb,这是一个可以用电子表格编辑器打开的工作簿。

¥The script will display CSV rows from the first worksheet. It will also create SheetJSSTPyV8.xlsb, a workbook that can be opened with a spreadsheet editor.

快照

¥Snapshots

从高层次来看,V8 快照是 V8 引擎状态的原始转储。程序加载快照比评估代码效率高得多。

¥At a high level, V8 snapshots are raw dumps of the V8 engine state. It is much more efficient for programs to load snapshots than to evaluate code.

快照演示

¥Snapshot Demo

此演示分为两部分:

¥There are two parts to this demo:

A) snapshot 命令使用 SheetJS 独立脚本补充 NUMBERS 脚本 创建快照。它将把快照转储到 snapshot.bin

¥A) The snapshot command creates a snapshot with the SheetJS standalone script and supplementary NUMBERS script. It will dump the snapshot to snapshot.bin

B) sheet2csv 工具嵌入 snapshot.bin。该工具将解析指定的文件,打印命名工作表的 CSV 内容,并将工作簿导出到 NUMBERS。

¥B) The sheet2csv tool embeds snapshot.bin. The tool will parse a specified file, print CSV contents of a named worksheet, and export the workbook to NUMBERS.

测试部署

该演示最后在以下部署中进行了测试:

¥This demo was last tested in the following deployments:

架构V8 版本日期
darwin-x6413.0.245.12130.0.72025-01-19
darwin-arm12.6.228.30.92.02024-12-20
win11-x6412.6.228.30.92.02024-12-20
linux-x6412.6.228.30.92.02025-01-02
linux-arm13.4.114.9134.4.02025-02-15
  1. 为项目创建一个新文件夹:

    ¥Make a new folder for the project:

mkdir sheetjs2csv
cd sheetjs2csv
  1. 下载以下脚本:

    ¥Download the following scripts:

curl -o Cargo.toml https://xlsx.nodejs.cn/cli/Cargo.toml
curl -o snapshot.rs https://xlsx.nodejs.cn/cli/snapshot.rs
curl -o sheet2csv.rs https://xlsx.nodejs.cn/cli/sheet2csv.rs
  1. 下载 SheetJS 独立脚本和 NUMBERS 补充脚本。将两个脚本移动到项目目录:

    ¥Download the SheetJS Standalone script and NUMBERS supplementary script. Move both scripts to the project directory:

curl -o xlsx.full.min.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -o xlsx.zahl.js https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.zahl.js
  1. 构建 V8 快照:

    ¥Build the V8 snapshot:

cargo build --bin snapshot
cargo run --bin snapshot

在一些测试中,Linux AArch64 构建失败并出现错误:

¥In some tests, the Linux AArch64 build failed with an error:

error[E0080]: evaluation of constant value failed

|
1715 | assert!(size_of::<TypeId>() == size_of::<u64>());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'assertion failed: size_of::<TypeId>() == size_of::<u64>()'

已知版本 0.75.10.82.00.92.0 可以正常工作。

¥Versions 0.75.1, 0.82.0, and 0.92.0 are known to work.

  1. 构建 sheet2csv(Windows 中的 sheet2csv.exe):

    ¥Build sheet2csv (sheet2csv.exe in Windows):

cargo build --release --bin sheet2csv
  1. 下载测试文件 https://xlsx.nodejs.cn/pres.numbers

    ¥Download the test file https://xlsx.nodejs.cn/pres.numbers:

curl -o pres.numbers https://xlsx.nodejs.cn/pres.numbers
  1. 测试应用:

    ¥Test the application:

mv target/release/sheet2csv .
./sheet2csv pres.numbers

[^1]: 见 read 于 "读取文件"

¥See read in "Reading Files"

[^2]: 有关对象表示的更多详细信息,请参阅 "SheetJS 数据模型"

¥See "SheetJS Data Model" for more details on the object representation.

[^3]: 有关库附带的函数列表,请参阅 "API 参考""电子表格特性" 涵盖可以直接修改的工作簿和工作表功能。

¥See "API Reference" for a list of functions that ship with the library. "Spreadsheet Features" covers workbook and worksheet features that can be modified directly.

[^4]: 见 write 于 "写入文件"

¥See write in "Writing Files"

[^5]: 见 "写入文件" 中的 "支持的输出格式" 类型

¥See "Supported Output Formats" type in "Writing Files"

[^6]: 该项目没有官方网站。官方 Rust 板条箱 托管在 crates.io 上。

¥The project does not have an official website. The official Rust crate is hosted on crates.io.

[^7]: 该项目没有单独的网站。源存储库托管在 GitHub

¥The project does not have a separate website. The source repository is hosted on GitHub

[^8]: 据维护者称,原始 pyv8 项目中不支持类型化数组

¥According to a maintainer, typed arrays were not supported in the original pyv8 project

[^9]: pipes 和其他模块已作为 "PEP 594" 的一部分从 Python 3.13 的标准库中删除。

¥pipes and other modules were removed from the standard library in Python 3.13 as part of "PEP 594".