Skip to main content

合成 DOM

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

SheetJS 提供了三种直接处理 HTML DOM TABLE 元素的方法:

¥SheetJS offers three methods to directly process HTML DOM TABLE elements:

  • table_to_sheet[^1] 从 TABLE 元素生成 SheetJS 工作表 [^2]

    ¥table_to_sheet[^1] generates a SheetJS worksheet[^2] from a TABLE element

  • table_to_book[^3] 从 TABLE 元素生成 SheetJS 工作簿 [^4]

    ¥table_to_book[^3] generates a SheetJS workbook[^4] from a TABLE element

  • sheet_add_dom[^5] 将 TABLE 元素中的数据添加到现有工作表

    ¥sheet_add_dom[^5] adds data from a TABLE element to an existing worksheet

这些方法适用于网络浏览器。NodeJS 和其他服务器端平台传统上缺乏 DOM 实现,但第三方模块填补了空白。

¥These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.

该演示涵盖了非浏览器平台的合成 DOM 实现。我们将探索如何在服务器端环境中使用 SheetJS DOM 方法来解析表格并将数据导出到电子表格。

¥This demo covers synthetic DOM implementations for non-browser platforms. We'll explore how to use SheetJS DOM methods in server-side environments to parse tables and export data to spreadsheets.

服务器端处理最强大的方法是自动化无头 Web 浏览器。"浏览器自动化" 包括演示。

¥The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.

集成详情

¥Integration Details

合成 DOM 实现通常提供一个函数,该函数接受 HTML 字符串并返回代表 document 的对象。诸如 getElementsByTagNamequerySelector 之类的 API 方法可以提取 TABLE 元素。

¥Synthetic DOM implementations typically provide a function that accept an HTML string and return an object that represents document. An API method such as getElementsByTagName or querySelector can pull TABLE elements.

SheetJS 方法使用某些 DOM 实现中可能缺少的功能。

¥SheetJS methods use features that may be missing from some DOM implementations.

表格行数

¥Table rows

TABLE 元素的 rows 属性是 TR 行子级的列表。添加和删除行时,此列表会自动更新。

¥The rows property of TABLE elements is a list of TR row children. This list automatically updates when rows are added and deleted.

SheetJS 方法不会改变 rows。假设没有嵌套表,则可以使用 getElementsByTagName 创建 rows 属性:

¥SheetJS methods do not mutate rows. Assuming there are no nested tables, the rows property can be created using getElementsByTagName:

tbl.rows = Array.from(tbl.getElementsByTagName("tr"));

行单元格

¥Row cells

TR 元素的 cells 属性是 TD 子单元的列表。添加和删除单元格时,此列表会自动更新。

¥The cells property of TR elements is a list of TD cell children. This list automatically updates when cells are added and deleted.

SheetJS 方法不会改变 cells。假设没有嵌套表,则可以使用 getElementsByTagName 创建 cells 属性:

¥SheetJS methods do not mutate cells. Assuming there are no nested tables, the cells property can be created using getElementsByTagName:

tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));

NodeJS

JSDOM

JSDOM 是 NodeJS 的 DOM 实现。合成 DOM 元素与 SheetJS 方法兼容。

¥JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.

以下示例从文件 SheetJSTable.html 中抓取第一个表并生成 XLSX 工作簿:

¥The following example scrapes the first table from the file SheetJSTable.html and generates a XLSX workbook:

SheetJSDOM.js
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");

/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");

/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);

XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

JSDOM日期
25.0.12024-10-30
24.1.32024-10-30
23.2.02024-10-30
22.1.02024-10-30
21.1.22024-10-30
20.0.32024-10-30
19.0.02024-10-30
18.1.12024-10-30
17.0.02024-10-30
16.7.02024-10-30
15.2.12024-10-30
14.1.02024-10-30
13.2.02024-10-30
12.2.02024-10-30
11.12.02024-10-30
10.1.02024-10-30
Complete Demo (click to show)
  1. Install SheetJS and JSDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz jsdom@24.0.0
  1. Save the previous codeblock to SheetJSDOM.js.

  2. Download the sample SheetJSTable.html:

curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSDOM.js

The script will create a file SheetJSDOM.xlsx that can be opened.

HappyDOM

HappyDOM 为 NodeJS 提供了一个 DOM 框架。旧版本需要以下补丁:

¥HappyDOM provides a DOM framework for NodeJS. Older versions required the following patches:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

HappyDOM 15.7.4 不需要任何解决方法。

¥HappyDOM 15.7.4 did not require any workarounds.

测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

HappyDOM日期
15.7.42024-10-30
14.12.32024-10-30
13.10.12024-10-30
12.10.32024-10-30
11.2.02024-10-30
10.11.22024-10-30
9.20.32024-10-30
8.9.02024-10-30
7.8.12024-10-30
6.0.42024-10-30
5.4.02024-10-30
4.1.02024-10-30
3.2.22024-10-30
2.55.02024-10-30
Complete Demo (click to show)
  1. Install SheetJS and HappyDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz happy-dom@13.3.1
  1. Download the sample script SheetJSHappyDOM.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSHappyDOM.js
  1. Download the sample SheetJSTable.html:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSHappyDOM.js

The script will create a file SheetJSHappyDOM.xlsx that can be opened.

XMLDOM

XMLDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(0.8.10),需要以下补丁:

¥XMLDOM provides a DOM framework for NodeJS. For the tested version (0.8.10), the following patches were needed:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

  • 元素 innerHTML 属性:

    ¥Element innerHTML property:

Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

XMLDOM日期
0.9.52024-10-30
0.8.102024-10-30
Complete Demo (click to show)
  1. Install SheetJS and XMLDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @xmldom/xmldom@0.9.5
  1. Download the sample script SheetJSXMLDOM.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSXMLDOM.js
  1. Run the script:
node SheetJSXMLDOM.js

The script will create a file SheetJSXMLDOM.xlsx that can be opened.

CheerioJS

Cheerio 不支持许多开箱即用的基本属性。它们可以进行填充,但强烈建议使用更兼容的库。

¥Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.

CheerioJS 为 NodeJS 提供了一个类似 DOM 的框架。SheetJSCheerio.js 实现了缺失的功能,以确保 SheetJS DOM 方法可以处理 TABLE 元素。

¥CheerioJS provides a DOM-like framework for NodeJS. SheetJSCheerio.js implements the missing features to ensure that SheetJS DOM methods can process TABLE elements.

测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

CheerioJS日期
1.0.02024-10-30
1.0.0-rc.122024-10-30
Complete Demo (click to show)
  1. Install SheetJS and CheerioJS libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz cheerio@1.0.0
  1. Download the sample script SheetJSCheerio.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSCheerio.js
  1. Download the sample SheetJSTable.html:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSCheerio.js

The script will create a file SheetJSCheerio.xlsx that can be opened.

其他平台

¥Other Platforms

DenoDOM

DenoDOM 为 Deno 提供了一个 DOM 框架。对于测试版本(0.1.48),需要以下补丁:

¥DenoDOM provides a DOM framework for Deno. For the tested version (0.1.48), the following patches were needed:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

此示例获取 样本表

¥This example fetches a sample table:

SheetJSDenoDOM.ts
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';

import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.48/deno-dom-wasm.ts';

const doc = new DOMParser().parseFromString(
await (await fetch('https://xlsx.nodejs.cn/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");

/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))

/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

架构DenoDOMDeno日期
darwin-x640.1.482.0.42024-10-30
darwin-arm0.1.482.0.42024-10-30
win11-x640.1.482.0.42024-10-30
win11-arm0.1.482.2.12025-02-23
linux-x640.1.482.0.52025-01-10
linux-arm0.1.482.1.102025-02-16
Complete Demo (click to show)
  1. Save the previous codeblock to SheetJSDenoDOM.ts.

  2. Run the script with --allow-net and --allow-write entitlements:

deno run --allow-net --allow-write SheetJSDenoDOM.ts

The script will create a file SheetJSDenoDOM.xlsx that can be opened.

Deno 2 additionally requires the --allow-import entitlement:

deno run --allow-net --allow-write --allow-import SheetJSDenoDOM.ts

[^1]: 见 "HTML" 实用程序中的 table_to_sheet

¥See table_to_sheet in "HTML" Utilities

[^2]: 详细信息请参见 "工作表对象" 于 "SheetJS 数据模型"

¥See "Worksheet Object" in "SheetJS Data Model" for more details.

[^3]: 见 "HTML" 实用程序中的 table_to_book

¥See table_to_book in "HTML" Utilities

[^4]: 详细信息请参见 "工作簿对象" 于 "SheetJS 数据模型"

¥See "Workbook Object" in "SheetJS Data Model" for more details.

[^5]: 见 "HTML" 实用程序中的 sheet_add_dom

¥See sheet_add_dom in "HTML" Utilities