合成 DOM
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
SheetJS 提供了三种直接处理 HTML DOM TABLE 元素的方法:
¥SheetJS offers three methods to directly process HTML DOM TABLE elements:
-
table_to_sheet
[^1] 从 TABLE 元素生成 SheetJS 工作表 [^2]¥
table_to_sheet
[^1] generates a SheetJS worksheet[^2] from a TABLE element -
table_to_book
[^3] 从 TABLE 元素生成 SheetJS 工作簿 [^4]¥
table_to_book
[^3] generates a SheetJS workbook[^4] from a TABLE element -
sheet_add_dom
[^5] 将 TABLE 元素中的数据添加到现有工作表¥
sheet_add_dom
[^5] adds data from a TABLE element to an existing worksheet
这些方法适用于网络浏览器。NodeJS 和其他服务器端平台传统上缺乏 DOM 实现,但第三方模块填补了空白。
¥These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.
该演示涵盖了非浏览器平台的合成 DOM 实现。我们将探索如何在服务器端环境中使用 SheetJS DOM 方法来解析表格并将数据导出到电子表格。
¥This demo covers synthetic DOM implementations for non-browser platforms. We'll explore how to use SheetJS DOM methods in server-side environments to parse tables and export data to spreadsheets.
服务器端处理最强大的方法是自动化无头 Web 浏览器。"浏览器自动化" 包括演示。
¥The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.
集成详情
¥Integration Details
合成 DOM 实现通常提供一个函数,该函数接受 HTML 字符串并返回代表 document
的对象。诸如 getElementsByTagName
或 querySelector
之类的 API 方法可以提取 TABLE 元素。
¥Synthetic DOM implementations typically provide a function that accept an HTML
string and return an object that represents document
. An API method such as
getElementsByTagName
or querySelector
can pull TABLE elements.
SheetJS 方法使用某些 DOM 实现中可能缺少的功能。
¥SheetJS methods use features that may be missing from some DOM implementations.
表格行数
¥Table rows
TABLE 元素的 rows
属性是 TR 行子级的列表。添加和删除行时,此列表会自动更新。
¥The rows
property of TABLE elements is a list of TR row children. This list
automatically updates when rows are added and deleted.
SheetJS 方法不会改变 rows
。假设没有嵌套表,则可以使用 getElementsByTagName
创建 rows
属性:
¥SheetJS methods do not mutate rows
. Assuming there are no nested tables, the
rows
property can be created using getElementsByTagName
:
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
行单元格
¥Row cells
TR 元素的 cells
属性是 TD 子单元的列表。添加和删除单元格时,此列表会自动更新。
¥The cells
property of TR elements is a list of TD cell children. This list
automatically updates when cells are added and deleted.
SheetJS 方法不会改变 cells
。假设没有嵌套表,则可以使用 getElementsByTagName
创建 cells
属性:
¥SheetJS methods do not mutate cells
. Assuming there are no nested tables, the
cells
property can be created using getElementsByTagName
:
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));
NodeJS
JSDOM
JSDOM 是 NodeJS 的 DOM 实现。合成 DOM 元素与 SheetJS 方法兼容。
¥JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.
以下示例从文件 SheetJSTable.html
中抓取第一个表并生成 XLSX 工作簿:
¥The following example scrapes the first table from the file SheetJSTable.html
and generates a XLSX workbook:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");
/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
JSDOM | 日期 |
---|---|
25.0.1 | 2024-10-30 |
24.1.3 | 2024-10-30 |
23.2.0 | 2024-10-30 |
22.1.0 | 2024-10-30 |
21.1.2 | 2024-10-30 |
20.0.3 | 2024-10-30 |
19.0.0 | 2024-10-30 |
18.1.1 | 2024-10-30 |
17.0.0 | 2024-10-30 |
16.7.0 | 2024-10-30 |
15.2.1 | 2024-10-30 |
14.1.0 | 2024-10-30 |
13.2.0 | 2024-10-30 |
12.2.0 | 2024-10-30 |
11.12.0 | 2024-10-30 |
10.1.0 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and JSDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz jsdom@24.0.0
-
Save the previous codeblock to
SheetJSDOM.js
. -
Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSDOM.js
The script will create a file SheetJSDOM.xlsx
that can be opened.
HappyDOM
HappyDOM 为 NodeJS 提供了一个 DOM 框架。旧版本需要以下补丁:
¥HappyDOM provides a DOM framework for NodeJS. Older versions required the following patches:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above)
HappyDOM 15.7.4
不需要任何解决方法。
¥HappyDOM 15.7.4
did not require any workarounds.
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
HappyDOM | 日期 |
---|---|
15.7.4 | 2024-10-30 |
14.12.3 | 2024-10-30 |
13.10.1 | 2024-10-30 |
12.10.3 | 2024-10-30 |
11.2.0 | 2024-10-30 |
10.11.2 | 2024-10-30 |
9.20.3 | 2024-10-30 |
8.9.0 | 2024-10-30 |
7.8.1 | 2024-10-30 |
6.0.4 | 2024-10-30 |
5.4.0 | 2024-10-30 |
4.1.0 | 2024-10-30 |
3.2.2 | 2024-10-30 |
2.55.0 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and HappyDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz happy-dom@13.3.1
- Download the sample script
SheetJSHappyDOM.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSHappyDOM.js
- Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSHappyDOM.js
The script will create a file SheetJSHappyDOM.xlsx
that can be opened.
XMLDOM
XMLDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(0.8.10
),需要以下补丁:
¥XMLDOM provides a DOM framework for NodeJS. For the
tested version (0.8.10
), the following patches were needed:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above) -
元素
innerHTML
属性:¥Element
innerHTML
property:
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
XMLDOM | 日期 |
---|---|
0.9.5 | 2024-10-30 |
0.8.10 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and XMLDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @xmldom/xmldom@0.9.5
- Download the sample script
SheetJSXMLDOM.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSXMLDOM.js
- Run the script:
node SheetJSXMLDOM.js
The script will create a file SheetJSXMLDOM.xlsx
that can be opened.
CheerioJS
Cheerio 不支持许多开箱即用的基本属性。它们可以进行填充,但强烈建议使用更兼容的库。
¥Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.
CheerioJS 为 NodeJS 提供了一个类似 DOM 的框架。SheetJSCheerio.js
实现了缺失的功能,以确保 SheetJS DOM 方法可以处理 TABLE 元素。
¥CheerioJS provides a DOM-like framework for NodeJS.
SheetJSCheerio.js
implements the missing
features to ensure that SheetJS DOM methods can process TABLE elements.
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
CheerioJS | 日期 |
---|---|
1.0.0 | 2024-10-30 |
1.0.0-rc.12 | 2024-10-30 |
Complete Demo (click to show)
- Install SheetJS and CheerioJS libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz cheerio@1.0.0
- Download the sample script
SheetJSCheerio.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSCheerio.js
- Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSCheerio.js
The script will create a file SheetJSCheerio.xlsx
that can be opened.
其他平台
¥Other Platforms
DenoDOM
DenoDOM 为 Deno 提供了一个 DOM 框架。对于测试版本(0.1.48
),需要以下补丁:
¥DenoDOM provides a DOM framework for Deno. For
the tested version (0.1.48
), the following patches were needed:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above)
此示例获取 样本表:
¥This example fetches a sample table:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.48/deno-dom-wasm.ts';
const doc = new DOMParser().parseFromString(
await (await fetch('https://xlsx.nodejs.cn/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");
/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))
/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
架构 | DenoDOM | Deno | 日期 |
---|---|---|---|
darwin-x64 | 0.1.48 | 2.0.4 | 2024-10-30 |
darwin-arm | 0.1.48 | 2.0.4 | 2024-10-30 |
win11-x64 | 0.1.48 | 2.0.4 | 2024-10-30 |
win11-arm | 0.1.48 | 2.2.1 | 2025-02-23 |
linux-x64 | 0.1.48 | 2.0.5 | 2025-01-10 |
linux-arm | 0.1.48 | 2.1.10 | 2025-02-16 |
Complete Demo (click to show)
-
Save the previous codeblock to
SheetJSDenoDOM.ts
. -
Run the script with
--allow-net
and--allow-write
entitlements:
deno run --allow-net --allow-write SheetJSDenoDOM.ts
The script will create a file SheetJSDenoDOM.xlsx
that can be opened.
Deno 2 additionally requires the --allow-import
entitlement:
deno run --allow-net --allow-write --allow-import SheetJSDenoDOM.ts
[^1]: 见 "HTML" 实用程序中的 table_to_sheet
¥See table_to_sheet
in "HTML" Utilities
[^2]: 详细信息请参见 "工作表对象" 于 "SheetJS 数据模型"。
¥See "Worksheet Object" in "SheetJS Data Model" for more details.
[^3]: 见 "HTML" 实用程序中的 table_to_book
¥See table_to_book
in "HTML" Utilities
[^4]: 详细信息请参见 "工作簿对象" 于 "SheetJS 数据模型"。
¥See "Workbook Object" in "SheetJS Data Model" for more details.
[^5]: 见 "HTML" 实用程序中的 sheet_add_dom