使用 Nashorn 进行数据处理
Nashorn 是 Java 的 JS 引擎。它从 Java 8 开始随 Java 发行版一起提供,最终在 Java 15 中被删除。该项目已分拆出来,并提供了适用于 Java 15+ 的兼容独立版本。
¥Nashorn is a JS engine for Java. It shipped with Java distributions starting with Java 8 and was eventually removed in Java 15. The project was spun off and a compatible standalone release is available for Java 15+.
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
"完整示例" 部分包括一个完整的 Java 命令行工具,用于从电子表格读取数据和打印 CSV 行。
¥The "Complete Example" section includes a complete Java command-line tool for reading data from spreadsheets and printing CSV rows.
集成详情
¥Integration Details
初始化 Nashorn
¥Initialize Nashorn
Nashorn 不提供 global
变量。必须创建它:
¥Nashorn does not provide a global
variable. It must be created:
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Scanner;
/* initialize nashorn engine */
ScriptEngine engine = (new ScriptEngineManager()).getEngineByName("javascript");
/* create global */
engine.eval("var global = (function(){ return this; }).call(null);");
加载 SheetJS 脚本
¥Load SheetJS Scripts
SheetJS 独立脚本 可以在 Nashorn 上下文中进行解析和评估。
¥The SheetJS Standalone scripts can be parsed and evaluated in a Nashorn context.
可以通过从文件系统读取脚本并在 Nashorn 上下文中进行评估来加载主库:
¥The main library can be loaded by reading the script from the file system and evaluating in the Nashorn context:
engine.eval(new Scanner(
SheetJS.class.getResourceAsStream("/xlsx.full.min.js")
).useDelimiter("\\Z").next());
要确认库已加载,可以使用 Nashorn print
内置功能打印 XLSX.version
:
¥To confirm the library is loaded, XLSX.version
can be printed using the
Nashorn print
built-in:
engine.eval("print('SheetJS Version ' + XLSX.version);");
读取文件
¥Reading Files
Nashorn 无法正确地将 byte[]
投影到 JS 数组或 Int8Array
中。推荐的解决方法是使用 JS 代码复制 JS 上下文中的数据:
¥Nashorn does not properly project byte[]
into a JS array or Int8Array
. The
recommended workaround is to copy the data in the JS context using the JS code:
function b2a(b) {
var out = typeof Uint8Array == 'function' ? new Uint8Array(b.length) : new Array(b.length);
/* `b` is similar to Int8Array (values in the range -128 .. 127 ) */
for(var i = 0; i < out.length; i++) out[i] = (b[i] + 256) & 0xFF;
return out;
}
这个函数应该嵌入到 Java 代码中:
¥This function should be embedded in the Java code:
/* read spreadsheet bytes */
engine.put("bytes", Files.readAllBytes(Paths.get(args[0])));
/* convert signed byte array to JS Uint8Array or unsigned byte array */
engine.eval(
"function b2a(b) {" +
"var out = typeof Uint8Array == 'function' ? new Uint8Array(b.length) : new Array(b.length);" +
"for(var i = 0; i < out.length; i++) out[i] = b[i] & 0xFF;" +
"return out;" +
"}" +
"var u8a = b2a(bytes)"
);
/* parse workbook */
engine.eval("var wb = XLSX.read(u8a, {type: 'array'})");
完整示例
¥Complete Example
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
OpenJDK | Nashorn | 日期 |
---|---|---|
22.0.1 | 15.4 独立版 | 2024-06-24 |
21.0.3 | 15.4 独立版 | 2024-06-24 |
20.0.2 | 15.4 独立版 | 2024-06-24 |
19.0.2 | 15.4 独立版 | 2024-06-24 |
18.0.2 | 15.4 独立版 | 2024-06-24 |
17.0.11 | 15.4 独立版 | 2024-06-24 |
16.0.1 | 15.4 独立版 | 2024-06-24 |
15.0.10 | 15.4 独立版 | 2024-06-24 |
14.0.2 | 内置 | 2024-06-24 |
13.0.14 | 内置 | 2024-06-24 |
12.0.2 | 内置 | 2024-06-24 |
11.0.23 | 内置 | 2024-06-24 |
10.0.2 | 内置 | 2024-06-24 |
9 | 内置 | 2024-06-24 |
1.8.0 | 内置 | 2024-06-24 |
汇编
¥Compilation
- Java 8 - 14
- Java 15+
Nashorn 无需额外依赖即可使用
¥Nashorn is available without additional dependencies
-
下载 Nashorn 及其依赖:
¥Download Nashorn and its dependencies:
curl -L -o nashorn-core-15.4.jar "https://search.maven.org/remotecontent?filepath=org/openjdk/nashorn/nashorn-core/15.4/nashorn-core-15.4.jar"
curl -L -o asm-9.5.jar "https://search.maven.org/remotecontent?filepath=org/ow2/asm/asm/9.5/asm-9.5.jar"
curl -L -o asm-tree-9.5.jar "https://search.maven.org/remotecontent?filepath=org/ow2/asm/asm-tree/9.5/asm-tree-9.5.jar"
curl -L -o asm-commons-9.5.jar "https://search.maven.org/remotecontent?filepath=org/ow2/asm/asm-commons/9.5/asm-commons-9.5.jar"
curl -L -o asm-analysis-9.5.jar "https://search.maven.org/remotecontent?filepath=org/ow2/asm/asm-analysis/9.5/asm-analysis-9.5.jar"
curl -L -o asm-util-9.5.jar "https://search.maven.org/remotecontent?filepath=org/ow2/asm/asm-util/9.5/asm-util-9.5.jar"
-
下载 SheetJS Standalone 脚本、shim 脚本和测试文件。将所有三个文件移动到项目目录:
¥Download the SheetJS Standalone script, shim script and test file. Move all three files to the project directory:
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js
curl -LO https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/shim.min.js
curl -LO https://xlsx.nodejs.cn/pres.xlsx
-
¥Download
SheetJSNashorn.java
:
curl -LO https://xlsx.nodejs.cn/nashorn/SheetJSNashorn.java
-
构建示例类:
¥Build the sample class:
javac SheetJSNashorn.java
该程序尝试解析第一个参数指定的文件并打印第一个工作表中的 CSV 行。
¥This program tries to parse the file specified by the first argument and prints CSV rows from the first worksheet.
独立测试
¥Standalone Test
-
直接运行命令:
¥Run the command directly:
- Java 8 - 14
- Java 15+
java SheetJSNashorn pres.xlsx
由于 Java 路径不一致,命令取决于操作系统:
¥Due to Java path inconsistencies, the command depends on the operating system:
- Linux/MacOS
- Windows
java -cp ".:asm-9.5.jar:asm-tree-9.5.jar:asm-commons-9.5.jar:asm-analysis-9.5.jar:asm-util-9.5.jar:nashorn-core-15.4.jar" SheetJSNashorn pres.xlsx
java -cp ".;asm-9.5.jar;asm-tree-9.5.jar;asm-commons-9.5.jar;asm-analysis-9.5.jar;asm-util-9.5.jar;nashorn-core-15.4.jar" SheetJSNashorn pres.xlsx
如果成功,将显示第一个工作表中的 CSV 行。
¥If successful, CSV rows from the first worksheet will be displayed.
Java 存档测试
¥Java Archive Test
-
组装 Java 存档:
¥Assemble a Java Archive:
jar -cf SheetJSNashorn.jar SheetJSNashorn.class xlsx.full.min.js shim.min.js
-
创建新目录并复制存档和测试文件:
¥Create new directory and copy the archives and test file:
mkdir -p sheethorn
cp *.jar pres.xlsx sheethorn
cd sheethorn
-
使用 Java Archive 运行程序:
¥Run the program using the Java Archive:
- Java 8 - 14
- Java 15+
由于 Java 路径不一致,命令取决于操作系统:
¥Due to Java path inconsistencies, the command depends on the operating system:
- Linux/MacOS
- Windows
java -cp ".:SheetJSNashorn.jar" SheetJSNashorn pres.xlsx
java -cp ".;SheetJSNashorn.jar" SheetJSNashorn pres.xlsx
由于 Java 路径不一致,命令取决于操作系统:
¥Due to Java path inconsistencies, the command depends on the operating system:
- Linux/MacOS
- Windows
java -cp ".:asm-9.5.jar:asm-tree-9.5.jar:asm-commons-9.5.jar:asm-analysis-9.5.jar:asm-util-9.5.jar:nashorn-core-15.4.jar:SheetJSNashorn.jar" SheetJSNashorn pres.xlsx
java -cp ".;asm-9.5.jar;asm-tree-9.5.jar;asm-commons-9.5.jar;asm-analysis-9.5.jar;asm-util-9.5.jar;nashorn-core-15.4.jar;SheetJSNashorn.jar" SheetJSNashorn pres.xlsx
这应该打印与步骤 4 相同的 CSV 行。
¥This should print the same CSV rows from Step 4.