合成 DOM
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
SheetJS 提供了三种直接处理 HTML DOM TABLE 元素的方法:
¥SheetJS offers three methods to directly process HTML DOM TABLE elements:
-
table_to_sheet
[^1] 从 TABLE 元素生成 SheetJS 工作表 [^2]¥
table_to_sheet
[^1] generates a SheetJS worksheet[^2] from a TABLE element -
table_to_book
[^3] 从 TABLE 元素生成 SheetJS 工作簿 [^4]¥
table_to_book
[^3] generates a SheetJS workbook[^4] from a TABLE element -
sheet_add_dom
[^5] 将 TABLE 元素中的数据添加到现有工作表¥
sheet_add_dom
[^5] adds data from a TABLE element to an existing worksheet
这些方法适用于网络浏览器。NodeJS 和其他服务器端平台传统上缺乏 DOM 实现,但第三方模块填补了空白。
¥These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.
该演示涵盖了非浏览器平台的合成 DOM 实现。我们将探索如何在服务器端环境中使用 SheetJS DOM 方法来解析表格并将数据导出到电子表格。
¥This demo covers synthetic DOM implementations for non-browser platforms. We'll explore how to use SheetJS DOM methods in server-side environments to parse tables and export data to spreadsheets.
服务器端处理最强大的方法是自动化无头 Web 浏览器。"浏览器自动化" 包括演示。
¥The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.
集成详情
¥Integration Details
合成 DOM 实现通常提供一个函数,该函数接受 HTML 字符串并返回表示 document
的对象。诸如 getElementsByTagName
或 querySelector
之类的 API 方法可以提取 TABLE 元素。
¥Synthetic DOM implementations typically provide a function that accept a HTML
string and return an object that represents document
. An API method such as
getElementsByTagName
or querySelector
can pull TABLE elements.
SheetJS 方法使用某些 DOM 实现中可能缺少的功能。
¥SheetJS methods use features that may be missing from some DOM implementations.
表格行数
¥Table rows
TABLE 元素的 rows
属性是 TR 行子级的列表。添加和删除行时,此列表会自动更新。
¥The rows
property of TABLE elements is a list of TR row children. This list
automatically updates when rows are added and deleted.
SheetJS 方法不会改变 rows
。假设没有嵌套表,则可以使用 getElementsByTagName
创建 rows
属性:
¥SheetJS methods do not mutate rows
. Assuming there are no nested tables, the
rows
property can be created using getElementsByTagName
:
tbl.rows = Array.from(tbl.getElementsByTagName("tr"));
行单元格
¥Row cells
TR 元素的 cells
属性是 TD 子单元的列表。添加和删除单元格时,此列表会自动更新。
¥The cells
property of TR elements is a list of TD cell children. This list
automatically updates when cells are added and deleted.
SheetJS 方法不会改变 cells
。假设没有嵌套表,则可以使用 getElementsByTagName
创建 cells
属性:
¥SheetJS methods do not mutate cells
. Assuming there are no nested tables, the
cells
property can be created using getElementsByTagName
:
tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));
NodeJS
JSDOM
JSDOM 是 NodeJS 的 DOM 实现。合成 DOM 元素与 SheetJS 方法兼容。
¥JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.
以下示例从文件 SheetJSTable.html
中抓取第一个表并生成 XLSX 工作簿:
¥The following example scrapes the first table from the file SheetJSTable.html
and generates a XLSX workbook:
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");
/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");
/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");
/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);
XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
JSDOM | 日期 |
---|---|
24.1.0 | 2024-06-24 |
23.2.0 | 2024-06-24 |
22.1.0 | 2024-06-24 |
21.1.2 | 2024-06-24 |
20.0.3 | 2024-06-24 |
19.0.0 | 2024-06-24 |
18.1.1 | 2024-06-24 |
17.0.0 | 2024-06-24 |
16.7.0 | 2024-06-24 |
15.2.1 | 2024-06-24 |
14.1.0 | 2024-06-24 |
13.2.0 | 2024-06-24 |
12.2.0 | 2024-06-24 |
11.12.0 | 2024-06-24 |
10.1.0 | 2024-06-24 |
Complete Demo (click to show)
- Install SheetJS and JSDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz jsdom@24.0.0
-
Save the previous codeblock to
SheetJSDOM.js
. -
Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSDOM.js
The script will create a file SheetJSDOM.xlsx
that can be opened.
HappyDOM
HappyDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(13.3.1
),需要以下补丁:
¥HappyDOM provides a DOM framework for NodeJS. For the tested version (13.3.1
),
the following patches were needed:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above)
该演示在以下部署中进行了测试:
¥This demo was tested in the following deployments:
HappyDOM | 日期 |
---|---|
14.12.3 | 2024-06-24 |
13.10.1 | 2024-06-24 |
12.10.3 | 2024-06-24 |
11.2.0 | 2024-06-24 |
10.11.2 | 2024-06-24 |
9.20.3 | 2024-06-24 |
8.9.0 | 2024-06-24 |
7.8.1 | 2024-06-24 |
6.0.4 | 2024-06-24 |
5.4.0 | 2024-06-24 |
4.1.0 | 2024-06-24 |
3.2.2 | 2024-06-24 |
2.55.0 | 2024-06-24 |
Complete Demo (click to show)
- Install SheetJS and HappyDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz happy-dom@13.3.1
- Download the sample script
SheetJSHappyDOM.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSHappyDOM.js
- Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSHappyDOM.js
The script will create a file SheetJSHappyDOM.xlsx
that can be opened.
XMLDOM
XMLDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(0.8.10
),需要以下补丁:
¥XMLDOM provides a DOM framework for NodeJS. For the
tested version (0.8.10
), the following patches were needed:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above) -
元素
innerHTML
属性:¥Element
innerHTML
property:
Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
Complete Demo (click to show)
This demo was last tested on 2024 March 12 against XMLDOM 0.8.10
- Install SheetJS and XMLDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @xmldom/xmldom@0.8.10
- Download the sample script
SheetJSXMLDOM.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSXMLDOM.js
- Run the script:
node SheetJSXMLDOM.js
The script will create a file SheetJSXMLDOM.xlsx
that can be opened.
CheerioJS
Cheerio 不支持许多开箱即用的基本属性。它们可以进行填充,但强烈建议使用更兼容的库。
¥Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.
CheerioJS 为 NodeJS 提供了一个类似 DOM 的框架。SheetJSCheerio.js
实现了缺失的功能,以确保 SheetJS DOM 方法可以处理 TABLE 元素。
¥CheerioJS provides a DOM-like framework for NodeJS.
SheetJSCheerio.js
implements the missing
features to ensure that SheetJS DOM methods can process TABLE elements.
Complete Demo (click to show)
This demo was last tested on 2024 March 12 against Cheerio 1.0.0-rc.12
- Install SheetJS and CheerioJS libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz cheerio@1.0.0-rc.12
- Download the sample script
SheetJSCheerio.js
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSCheerio.js
- Download the sample
SheetJSTable.html
:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
- Run the script:
node SheetJSCheerio.js
The script will create a file SheetJSCheerio.xlsx
that can be opened.
其他平台
¥Other Platforms
DenoDOM
DenoDOM 为 Deno 提供了一个 DOM 框架。对于测试版本(0.1.46
),需要以下补丁:
¥DenoDOM provides a DOM framework for Deno. For
the tested version (0.1.46
), the following patches were needed:
-
表
rows
特性(如上所述)¥TABLE
rows
property (explained above) -
TR
cells
属性(如上所述)¥TR
cells
property (explained above)
此示例获取 样本表:
¥This example fetches a sample table:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.46/deno-dom-wasm.ts';
const doc = new DOMParser().parseFromString(
await (await fetch('https://xlsx.nodejs.cn/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");
/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))
/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
Complete Demo (click to show)
This demo was tested in the following deployments:
Architecture | DenoDOM | Deno | Date |
---|---|---|---|
darwin-x64 | 0.1.46 | 1.44.4 | 2024-06-19 |
darwin-arm | 0.1.46 | 1.44.4 | 2024-06-19 |
-
Save the previous codeblock to
SheetJSDenoDOM.ts
. -
Run the script with
--allow-net
and--allow-write
entitlements:
deno run --allow-net --allow-write SheetJSDenoDOM.ts
The script will create a file SheetJSDenoDOM.xlsx
that can be opened.
[^1]: 见 "HTML" 实用程序中的 table_to_sheet
¥See table_to_sheet
in "HTML" Utilities
[^2]: 详细信息请参见 "工作表对象" 于 "SheetJS 数据模型"。
¥See "Worksheet Object" in "SheetJS Data Model" for more details.
[^3]: 见 "HTML" 实用程序中的 table_to_book
¥See table_to_book
in "HTML" Utilities
[^4]: 详细信息请参见 "工作簿对象" 于 "SheetJS 数据模型"。
¥See "Workbook Object" in "SheetJS Data Model" for more details.
[^5]: 见 "HTML" 实用程序中的 sheet_add_dom