Skip to main content

合成 DOM

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

SheetJS 提供了三种直接处理 HTML DOM TABLE 元素的方法:

¥SheetJS offers three methods to directly process HTML DOM TABLE elements:

  • table_to_sheet[^1] 从 TABLE 元素生成 SheetJS 工作表 [^2]

    ¥table_to_sheet[^1] generates a SheetJS worksheet[^2] from a TABLE element

  • table_to_book[^3] 从 TABLE 元素生成 SheetJS 工作簿 [^4]

    ¥table_to_book[^3] generates a SheetJS workbook[^4] from a TABLE element

  • sheet_add_dom[^5] 将 TABLE 元素中的数据添加到现有工作表

    ¥sheet_add_dom[^5] adds data from a TABLE element to an existing worksheet

这些方法适用于网络浏览器。NodeJS 和其他服务器端平台传统上缺乏 DOM 实现,但第三方模块填补了空白。

¥These methods work in the web browser. NodeJS and other server-side platforms traditionally lack a DOM implementation, but third-party modules fill the gap.

该演示涵盖了非浏览器平台的合成 DOM 实现。我们将探索如何在服务器端环境中使用 SheetJS DOM 方法来解析表格并将数据导出到电子表格。

¥This demo covers synthetic DOM implementations for non-browser platforms. We'll explore how to use SheetJS DOM methods in server-side environments to parse tables and export data to spreadsheets.

服务器端处理最强大的方法是自动化无头 Web 浏览器。"浏览器自动化" 包括演示。

¥The most robust approach for server-side processing is to automate a headless web browser. "Browser Automation" includes demos.

集成详情

¥Integration Details

合成 DOM 实现通常提供一个函数,该函数接受 HTML 字符串并返回表示 document 的对象。诸如 getElementsByTagNamequerySelector 之类的 API 方法可以提取 TABLE 元素。

¥Synthetic DOM implementations typically provide a function that accept a HTML string and return an object that represents document. An API method such as getElementsByTagName or querySelector can pull TABLE elements.

SheetJS 方法使用某些 DOM 实现中可能缺少的功能。

¥SheetJS methods use features that may be missing from some DOM implementations.

表格行数

¥Table rows

TABLE 元素的 rows 属性是 TR 行子级的列表。添加和删除行时,此列表会自动更新。

¥The rows property of TABLE elements is a list of TR row children. This list automatically updates when rows are added and deleted.

SheetJS 方法不会改变 rows。假设没有嵌套表,则可以使用 getElementsByTagName 创建 rows 属性:

¥SheetJS methods do not mutate rows. Assuming there are no nested tables, the rows property can be created using getElementsByTagName:

tbl.rows = Array.from(tbl.getElementsByTagName("tr"));

行单元格

¥Row cells

TR 元素的 cells 属性是 TD 子单元的列表。添加和删除单元格时,此列表会自动更新。

¥The cells property of TR elements is a list of TD cell children. This list automatically updates when cells are added and deleted.

SheetJS 方法不会改变 cells。假设没有嵌套表,则可以使用 getElementsByTagName 创建 cells 属性:

¥SheetJS methods do not mutate cells. Assuming there are no nested tables, the cells property can be created using getElementsByTagName:

tbl.rows.forEach(row => row.cells = Array.from(row.getElementsByTagName("td")));

NodeJS

JSDOM

JSDOM 是 NodeJS 的 DOM 实现。合成 DOM 元素与 SheetJS 方法兼容。

¥JSDOM is a DOM implementation for NodeJS. The synthetic DOM elements are compatible with SheetJS methods.

以下示例从文件 SheetJSTable.html 中抓取第一个表并生成 XLSX 工作簿:

¥The following example scrapes the first table from the file SheetJSTable.html and generates a XLSX workbook:

SheetJSDOM.js
const XLSX = require("xlsx");
const { readFileSync } = require("fs");
const { JSDOM } = require("jsdom");

/* obtain HTML string. This example reads from SheetJSTable.html */
const html_str = readFileSync("SheetJSTable.html", "utf8");

/* get first TABLE element */
const doc = new JSDOM(html_str).window.document.querySelector("table");

/* generate workbook */
const workbook = XLSX.utils.table_to_book(doc);

XLSX.writeFile(workbook, "SheetJSDOM.xlsx");
测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

JSDOM日期
24.1.02024-06-24
23.2.02024-06-24
22.1.02024-06-24
21.1.22024-06-24
20.0.32024-06-24
19.0.02024-06-24
18.1.12024-06-24
17.0.02024-06-24
16.7.02024-06-24
15.2.12024-06-24
14.1.02024-06-24
13.2.02024-06-24
12.2.02024-06-24
11.12.02024-06-24
10.1.02024-06-24
Complete Demo (click to show)
  1. Install SheetJS and JSDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz jsdom@24.0.0
  1. Save the previous codeblock to SheetJSDOM.js.

  2. Download the sample SheetJSTable.html:

curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSDOM.js

The script will create a file SheetJSDOM.xlsx that can be opened.

HappyDOM

HappyDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(13.3.1),需要以下补丁:

¥HappyDOM provides a DOM framework for NodeJS. For the tested version (13.3.1), the following patches were needed:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

测试部署

该演示在以下部署中进行了测试:

¥This demo was tested in the following deployments:

HappyDOM日期
14.12.32024-06-24
13.10.12024-06-24
12.10.32024-06-24
11.2.02024-06-24
10.11.22024-06-24
9.20.32024-06-24
8.9.02024-06-24
7.8.12024-06-24
6.0.42024-06-24
5.4.02024-06-24
4.1.02024-06-24
3.2.22024-06-24
2.55.02024-06-24
Complete Demo (click to show)
  1. Install SheetJS and HappyDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz happy-dom@13.3.1
  1. Download the sample script SheetJSHappyDOM.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSHappyDOM.js
  1. Download the sample SheetJSTable.html:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSHappyDOM.js

The script will create a file SheetJSHappyDOM.xlsx that can be opened.

XMLDOM

XMLDOM 为 NodeJS 提供了一个 DOM 框架。对于测试版本(0.8.10),需要以下补丁:

¥XMLDOM provides a DOM framework for NodeJS. For the tested version (0.8.10), the following patches were needed:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

  • 元素 innerHTML 属性:

    ¥Element innerHTML property:

Object.defineProperty(tbl.__proto__, "innerHTML", { get: function() {
var outerHTML = new XMLSerializer().serializeToString(this);
if(outerHTML.match(/</g).length == 1) return "";
return outerHTML.slice(0, outerHTML.lastIndexOf("</")).replace(/<[^"'>]*(("[^"]*"|'[^']*')[^"'>]*)*>/, "");
}});
Complete Demo (click to show)
Tested Deployments

This demo was last tested on 2024 March 12 against XMLDOM 0.8.10

  1. Install SheetJS and XMLDOM libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @xmldom/xmldom@0.8.10
  1. Download the sample script SheetJSXMLDOM.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSXMLDOM.js
  1. Run the script:
node SheetJSXMLDOM.js

The script will create a file SheetJSXMLDOM.xlsx that can be opened.

CheerioJS

Cheerio 不支持许多开箱即用的基本属性。它们可以进行填充,但强烈建议使用更兼容的库。

¥Cheerio does not support a number of fundamental properties out of the box. They can be shimmed, but it is strongly recommended to use a more compliant library.

CheerioJS 为 NodeJS 提供了一个类似 DOM 的框架。SheetJSCheerio.js 实现了缺失的功能,以确保 SheetJS DOM 方法可以处理 TABLE 元素。

¥CheerioJS provides a DOM-like framework for NodeJS. SheetJSCheerio.js implements the missing features to ensure that SheetJS DOM methods can process TABLE elements.

Complete Demo (click to show)
Tested Deployments

This demo was last tested on 2024 March 12 against Cheerio 1.0.0-rc.12

  1. Install SheetJS and CheerioJS libraries:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz cheerio@1.0.0-rc.12
  1. Download the sample script SheetJSCheerio.js:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSCheerio.js
  1. Download the sample SheetJSTable.html:
curl -LO https://xlsx.nodejs.cn/dom/SheetJSTable.html
  1. Run the script:
node SheetJSCheerio.js

The script will create a file SheetJSCheerio.xlsx that can be opened.

其他平台

¥Other Platforms

DenoDOM

DenoDOM 为 Deno 提供了一个 DOM 框架。对于测试版本(0.1.46),需要以下补丁:

¥DenoDOM provides a DOM framework for Deno. For the tested version (0.1.46), the following patches were needed:

  • rows 特性(如上所述)

    ¥TABLE rows property (explained above)

  • TR cells 属性(如上所述)

    ¥TR cells property (explained above)

此示例获取 样本表

¥This example fetches a sample table:

SheetJSDenoDOM.ts
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';

import { DOMParser } from 'https://deno.land/x/deno_dom@v0.1.46/deno-dom-wasm.ts';

const doc = new DOMParser().parseFromString(
await (await fetch('https://xlsx.nodejs.cn/dom/SheetJSTable.html')).text(),
"text/html",
)!;
const tbl = doc.querySelector("table");

/* patch DenoDOM element */
tbl.rows = tbl.querySelectorAll("tr");
tbl.rows.forEach(row => row.cells = row.querySelectorAll("td, th"))

/* generate workbook */
const workbook = XLSX.utils.table_to_book(tbl);
XLSX.writeFile(workbook, "SheetJSDenoDOM.xlsx");
Complete Demo (click to show)
Tested Deployments

This demo was tested in the following deployments:

ArchitectureDenoDOMDenoDate
darwin-x640.1.461.44.42024-06-19
darwin-arm0.1.461.44.42024-06-19
  1. Save the previous codeblock to SheetJSDenoDOM.ts.

  2. Run the script with --allow-net and --allow-write entitlements:

deno run --allow-net --allow-write SheetJSDenoDOM.ts

The script will create a file SheetJSDenoDOM.xlsx that can be opened.

[^1]: 见 "HTML" 实用程序中的 table_to_sheet

¥See table_to_sheet in "HTML" Utilities

[^2]: 详细信息请参见 "工作表对象" 于 "SheetJS 数据模型"

¥See "Worksheet Object" in "SheetJS Data Model" for more details.

[^3]: 见 "HTML" 实用程序中的 table_to_book

¥See table_to_book in "HTML" Utilities

[^4]: 详细信息请参见 "工作簿对象" 于 "SheetJS 数据模型"

¥See "Workbook Object" in "SheetJS Data Model" for more details.

[^5]: 见 "HTML" 实用程序中的 sheet_add_dom

¥See sheet_add_dom in "HTML" Utilities