数据导入
"导入教程" 是对数据导入和分析的简单介绍。
¥The "Import Tutorial" is a gentle introduction to data import and analysis.
解析工作簿
¥Parsing Workbooks
API
从电子表格字节中提取数据
¥Extract data from spreadsheet bytes
var workbook = XLSX.read(data, opts);
read
方法可以从存储在 JS 字符串、"二进制字符串"、NodeJS 缓冲区或类型化数组(Uint8Array
或 ArrayBuffer
)中的电子表格字节中提取数据。
¥The read
method can extract data from spreadsheet bytes stored in a JS string,
"binary string", NodeJS buffer or typed array (Uint8Array
or ArrayBuffer
).
从本地文件读取电子表格字节并提取数据
¥Read spreadsheet bytes from a local file and extract data
var workbook = XLSX.readFile(filename, opts);
readFile
方法尝试读取所提供路径中的电子表格文件。
¥The readFile
method attempts to read a spreadsheet file at the supplied path.
第二个 opts
参数是可选的。"解析选项" 涵盖了支持的属性和行为。
¥The second opts
argument is optional. "Parsing Options"
covers the supported properties and behaviors.
浏览器一般不允许通过指定文件名的方式读取文件(存在安全风险),在浏览器中运行 XLSX.readFile
会抛出错误。
¥Browsers generally do not allow reading files by specifying filename (it is a
security risk), and running XLSX.readFile
in the browser will throw an error.
必须使用 --allow-read
调用 Deno 脚本才能从文件系统读取。
¥Deno scripts must be invoked with --allow-read
to read from the filesystem.
示例
¥Examples
以下是一些常见的场景(点击每个字幕即可查看代码)。
¥Here are a few common scenarios (click on each subtitle to see the code).
demos 更详细地介绍了特殊部署。
¥The demos cover special deployments in more detail.
示例:本地文件
¥Example: Local File
XLSX.readFile
支持在 NodeJS 等平台读取本地文件。在 React Native 等其他平台中,应该使用文件数据调用 XLSX.read
。
¥XLSX.readFile
supports reading local files in platforms like NodeJS. In other
platforms like React Native, XLSX.read
should be called with file data.
"用户提交内容" 示例。 中介绍了用户拖放文件或使用文件元素的浏览器内处理
¥In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.
- NodeJS
- Electron
- React Native
- Photoshop
- Deno
- Bun
readFile
在底层使用 fs.readFileSync
:
¥readFile
uses fs.readFileSync
under the hood:
var XLSX = require("xlsx");
var workbook = XLSX.readFile("test.xlsx");
对于 Node ESM,必须手动加载 fs
:
¥For Node ESM, fs
must be loaded manually:
import * as fs from "fs";
import { readFile, set_fs } from "xlsx";
set_fs(fs);
const workbook = readFile("test.xlsx");
readFile
可以在渲染器进程中使用:
¥readFile
can be used in the renderer process:
/* From the renderer process */
var XLSX = require("xlsx");
var workbook = XLSX.readFile(path);
Electron API 随着时间的推移而发生变化。electron
演示 显示了完整的示例并详细说明了所需的特定于版本的设置。
¥Electron APIs have changed over time. The electron
demo
shows a complete example and details the required version-specific settings.
React Native 演示 涵盖经过测试的插件。
¥The React Native Demo covers tested plugins.
readFile
在 Photoshop 和其他 ExtendScript 目标中封装了 File
逻辑。指定的路径应该是绝对路径:
¥readFile
wraps the File
logic in Photoshop and other ExtendScript targets.
The specified path should be an absolute path:
#include "xlsx.extendscript.js"
/* Read test.xlsx from the Documents folder */
var workbook = XLSX.readFile(Folder.myDocuments + "/test.xlsx");
对于用户可配置的路径,openDialog
可以显示文件选择器:
¥For user-configurable paths, openDialog
can show a file picker:
#include "xlsx.extendscript.js"
/* Ask user to select path */
var thisFile = File.openDialog("Select a spreadsheet");
var workbook = XLSX.readFile(thisFile.absoluteURI);
extendscript
演示 包括 Photoshop 和 InDesign 的完整示例。
¥The extendscript
demo includes complete
examples for Photoshop and InDesign.
readFile
在底层使用 Deno.readFileSync
:
¥readFile
uses Deno.readFileSync
under the hood:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
const workbook = XLSX.readFile("test.xlsx");
读取文件的应用必须使用 --allow-read
标志来调用。
¥Applications reading files must be invoked with the --allow-read
flag.
Bun readFileSync
输出应该封装在 Buffer
中:
¥Bun readFileSync
output should be wrapped in a Buffer
:
import { readFileSync } from 'fs'
import { read } from './xlsx.mjs'
const workbook = read(Buffer.from(readFileSync(path)));
示例:用户提交内容
¥Example: User Submissions
此示例重点关注用户通过拖放事件、HTML 文件输入元素或网络请求提交的文件。
¥This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.
- Browser
- NodeJS
- Deno
对于针对 Chrome 76+ 的现代网站,建议使用 File#arrayBuffer
:
¥For modern websites targeting Chrome 76+, File#arrayBuffer
is recommended:
- Drag and Drop
- HTML File Input Element
假设 drop_dom_element
是监听变化的 DOM 元素:
¥Assume drop_dom_element
is the DOM element that will listen for changes:
<div id="drop_dom_element">Drop files here</div>
事件属性为 e.dataTransfer
。该代码片段高亮了拖放示例和文件输入示例之间的区别:
¥The event property is e.dataTransfer
. The code snippet highlights the
difference between the drag-and-drop example and the file input example:
// XLSX is a global from the standalone script
async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);
从带有 type="file"
的 HTML INPUT 元素开始:
¥Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
事件属性为 e.target
。该代码片段高亮了拖放示例和文件输入示例之间的区别:
¥The event property is e.target
. The code snippet highlights the difference
between the drag-and-drop example and the file input example:
// XLSX is a global from the standalone script
async function handleFileAsync(e) {
const file = e.target.files[0];
const data = await file.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
}
input_dom_element.addEventListener("change", handleFileAsync, false);
https://oss.sheetjs.com/sheetjs/ 演示了 FileReader 技术。
¥https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.
为了获得最大兼容性 (IE10+),建议使用 FileReader
方法:
¥For maximal compatibility (IE10+), the FileReader
approach is recommended:
- Drag and Drop
- HTML File Input Element
假设 drop_dom_element
是监听变化的 DOM 元素:
¥Assume drop_dom_element
is the DOM element that will listen for changes:
<div id="drop_dom_element">Drop files here</div>
事件属性为 e.dataTransfer
。该代码片段高亮了拖放示例和文件输入示例之间的区别:
¥The event property is e.dataTransfer
. The code snippet highlights the
difference between the drag-and-drop example and the file input example:
function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);
从带有 type="file"
的 HTML INPUT 元素开始:
¥Starting with an HTML INPUT element with type="file"
:
<input type="file" id="input_dom_element">
事件属性为 e.target
。该代码片段高亮了拖放示例和文件输入示例之间的区别:
¥The event property is e.target
. The code snippet highlights the difference
between the drag-and-drop example and the file input example:
function handleFile(e) {
var file = e.target.files[0];
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(e.target.result);
/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(file);
}
input_dom_element.addEventListener("change", handleFile, false);
oldie
演示 显示了 IE 兼容的后备方案。
¥The oldie
demo shows an IE-compatible fallback scenario.
read
可以接受 NodeJS 缓冲区。readFile
可以像 formidable
一样读取由 HTTP POST 请求正文解析器生成的文件:
¥read
can accept a NodeJS buffer. readFile
can read files generated by a
HTTP POST request body parser like formidable
:
const XLSX = require("xlsx");
const http = require("http");
const formidable = require("formidable");
const server = http.createServer((req, res) => {
const form = new formidable.IncomingForm();
form.parse(req, (err, fields, files) => {
/* grab the first file */
const f = Object.entries(files)[0][1];
const path = f.filepath;
const workbook = XLSX.readFile(path);
/* DO SOMETHING WITH workbook HERE */
});
}).listen(process.env.PORT || 7262);
server
演示 包括更高级的示例。
¥The server
demo includes more advanced examples.
Drash 是 Deno 的 HTTP 服务器框架。在 POST
请求处理程序中,主体解析器可以将文件数据拉入 Uint8Array
:
¥Drash is a HTTP server framework for Deno. In a POST
request handler, the
body parser can pull file data into a Uint8Array
:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
import * as Drash from "https://cdn.jsdelivr.net/gh/drashland/drash@v2.8.1/mod.ts";
class SheetResource extends Drash.Resource {
public paths = ["/"];
public POST(request: Drash.Request, response: Drash.Response) {
const file = request.bodyParam<Drash.Types.BodyFile>("file");
if (!file) throw new Error("File is required!");
var wb = XLSX.read(file.content);
var html = XLSX.utils.sheet_to_html(wb.Sheets[wb.SheetNames[0]]);
return response.html(html);
}
}
const server = new Drash.Server({ hostname: "", port: 7262, protocol: "http",
resources: [
SheetResource,
],
});
server.run();
Deno 必须使用 --allow-net
标志运行才能启用网络请求:
¥Deno must be run with the --allow-net
flag to enable network requests:
deno run --allow-net test-server.ts
要进行测试,请向 http://localhost:7262
提交带有附件的 POST 请求:
¥To test, submit a POST request to http://localhost:7262
with an attachment:
curl -X POST -F "file=@test.xlsx" http://localhost:7262/
示例:远程文件
¥Example: Remote File
此示例重点介绍使用 XMLHttpRequest
和 fetch
等 API 以及第三方库来获取文件(浏览器用语中的 "Ajax")。
¥This example focuses on fetching files ("Ajax" in browser parlance) using APIs
like XMLHttpRequest
and fetch
as well as third-party libraries.
- Browser
- NodeJS
- Bun
- Deno
- Electron
对于针对 Chrome 42+ 的现代网站,建议使用 fetch
:
¥For modern websites targeting Chrome 42+, fetch
is recommended:
// XLSX is a global from the standalone script
(async() => {
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
/* DO SOMETHING WITH workbook HERE */
})();
为了获得更广泛的支持,建议使用 XMLHttpRequest
方法:
¥For broader support, the XMLHttpRequest
approach is recommended:
var url = "https://xlsx.nodejs.cn/pres.xlsx";
/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
var workbook = XLSX.read(req.response);
/* DO SOMETHING WITH workbook HERE */
};
req.send();
HTTP 下载演示 包括使用浏览器 API 和封装器库的示例。
¥The HTTP Downloads demo includes examples using browser APIs and wrapper libraries.
https://oss.sheetjs.com/sheetjs/ajax.html 显示了 IE6+ 的后备方法。
¥https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.
NodeJS 从 18.0 版开始发布,原生支持 fetch:
¥NodeJS releases starting from version 18.0 have native support for fetch:
const XLSX = require("xlsx");
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
为了获得更广泛的兼容性,建议使用第三方模块。
¥For broader compatibility, third-party modules are recommended.
request
需要 null
编码才能产生缓冲区:
¥request
requires a null
encoding to yield Buffers:
var XLSX = require("xlsx");
var request = require("request");
var url = "https://xlsx.nodejs.cn/pres.xlsx";
request({url: url, encoding: null}, function(err, resp, body) {
var workbook = XLSX.read(body);
/* DO SOMETHING WITH workbook HERE */
});
axios
在浏览器和 NodeJS 中的工作方式相同:
¥axios
works the same way in browser and in NodeJS:
const XLSX = require("xlsx");
const axios = require("axios");
const url = "https://xlsx.nodejs.cn/pres.xlsx";
(async() => {
const res = await axios.get(url, {responseType: "arraybuffer"});
/* res.data is a Buffer */
const workbook = XLSX.read(res.data);
/* DO SOMETHING WITH workbook HERE */
})();
Bun 原生支持 fetch
。使用 NodeJS 包:
¥Bun has native support for fetch
. Using the NodeJS package:
import * as XLSX from 'xlsx';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'xlsx/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
Deno 原生支持 fetch
。
¥Deno has native support for fetch
.
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
/* load the codepage support library for extended support with older formats */
import * as cptable from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/cpexcel.full.mjs';
XLSX.set_cptable(cptable);
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);
Deno 必须使用 --allow-net
标志运行才能启用网络请求:
¥Deno must be run with the --allow-net
flag to enable network requests:
deno run --allow-net test-fetch.ts
主进程中的 net
模块可以向外部资源发送 HTTP/HTTPS 请求。应使用 Buffer.concat
手动连接响应:
¥The net
module in the main process can make HTTP/HTTPS requests to external
resources. Responses should be manually concatenated using Buffer.concat
:
const XLSX = require("xlsx");
const { net } = require("electron");
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const req = net.request(url);
req.on("response", (res) => {
const bufs = []; // this array will collect all of the buffers
res.on("data", (chunk) => { bufs.push(chunk); });
res.on("end", () => {
const workbook = XLSX.read(Buffer.concat(bufs));
/* DO SOMETHING WITH workbook HERE */
});
});
req.end();
示例:可读流
¥Example: Readable Streams
推荐的方法是在内存中缓冲流,并在收集完所有数据后进行处理。正确的流解析在技术上是不可能的。
¥The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.
Technical details (click to show)
XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.
XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.
- Browser
- NodeJS
- Deno
处理 ReadableStream
时,最简单的方法是缓冲流并在最后处理整个事情:
¥When dealing with ReadableStream
, the easiest approach is to buffer the stream
and process the whole thing at the end:
// XLSX is a global from the standalone script
async function buffer_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}
/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));
let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}
return out;
}
const data = await buffer_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);
处理可读流时,最简单的方法是缓冲流并在最后处理整个事情:
¥When dealing with Readable Streams, the easiest approach is to buffer the stream and process the whole thing at the end:
var XLSX = require("xlsx");
function process_RS(stream, cb) {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
var buffer = Buffer.concat(buffers);
var workbook = XLSX.read(buffer);
/* DO SOMETHING WITH workbook IN THE CALLBACK */
cb(workbook);
});
}
在 NodeJS 的最新版本中,Promise 是首选:
¥In recent versions of NodeJS, Promises are preferred:
var XLSX = require("xlsx");
/* async_RS reads a stream and returns a Promise resolving to a workbook */
const async_RS = (stream) => new Promise((res, rej) => {
var buffers = [];
stream.on("data", function(data) { buffers.push(data); });
stream.on("end", function() {
const buf = Buffer.concat(buffers);
const wb = XLSX.read(buf);
res(wb);
});
});
除了浏览器 ReadableStream
API 之外,Deno 还有一个 Reader
类。
¥In addition to the browser ReadableStream
API, Deno has a Reader
class.
对于这些流,std
提供了 readAll
方法来将数据收集到 Uint8Array
中。此示例使用 Deno.open
从文件中读取并打印工作表名称数组:
¥For these streams, std
provides a readAll
method to collect data into a
Uint8Array
. This example reads from a file using Deno.open
and prints the
worksheet names array:
// @deno-types="https://cdn.sheetjs.com/xlsx-0.20.3/package/types/index.d.ts"
import * as XLSX from 'https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs';
import { readAll } from "https://deno.land/std/streams/conversion.ts";
/* Simple Deno.Reader from a file */
const file = await Deno.open("test.xlsx", {read: true});
/* `content` will be a Uint8Array holding the full contents of the stream */
const content = await readAll(file);
/* Since this is a Uint8Array, `XLSX.read` "just works" */
const wb = XLSX.read(content);
console.log(wb.SheetNames);
包含演示 中介绍了更详细的示例
¥More detailed examples are covered in the included demos
处理 JSON 和 JS 数据
¥Processing JSON and JS Data
JSON 和 JS 数据往往代表单个工作表。本节将使用一些实用函数来生成工作簿。
¥JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.
创建新工作簿
¥Create a new Workbook
var workbook = XLSX.utils.book_new();
book_new
效用函数 创建一个没有工作表的空工作簿。
¥The book_new
utility function creates an empty
workbook with no worksheets.
API
从 JS 值数组创建工作表
¥Create a worksheet from an array of arrays of JS values
var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);
aoa_to_sheet
实用函数按行主序遍历 "数组的数组",生成工作表对象。以下代码片段生成一个工作表,其中单元格 A1
设置为字符串 A1
,单元格 B1
设置为 B1
,等等:
¥The aoa_to_sheet
utility function walks an "array of arrays" in row-major
order, generating a worksheet object. The following snippet generates a sheet
with cell A1
set to the string A1
, cell B1
set to B1
, etc:
var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);
"数组的数组输入" 更详细地描述了该函数和可选的 opts
参数。
¥"Array of Arrays Input" describes
the function and the optional opts
argument in more detail.
从 JS 对象数组创建工作表
¥Create a worksheet from an array of JS objects
var worksheet = XLSX.utils.json_to_sheet(jsa, opts);
json_to_sheet
实用函数按顺序遍历 JS 对象数组,生成工作表对象。默认情况下,它将生成一个标题行和数组中每个对象一行。可选的 opts
参数具有控制列顺序和标题输出的设置。
¥The json_to_sheet
utility function walks an array of JS objects in order,
generating a worksheet object. By default, it will generate a header row and
one row per object in the array. The optional opts
argument has settings to
control the column order and header output.
"对象数组输入" 更详细地描述了该函数和可选的 opts
参数。
¥"Array of Objects Input" describes
the function and the optional opts
argument in more detail.
示例
¥Examples
"导出教程" 包含从 JSON 端点获取数据并生成工作簿的详细示例。
¥"Export Tutorial" contains a detailed example of fetching data from a JSON Endpoint and generating a workbook.
x-spreadsheet
是一个交互式数据网格,用于在网络浏览器中预览和修改结构化数据。
¥x-spreadsheet
is an interactive data grid for
previewing and modifying structured data in the web browser.
"TensorFlow.js" 涵盖了从 ML 库导出(存储在类型化数组中的数据集)创建工作表的策略。
¥"TensorFlow.js" covers strategies for creating worksheets from ML library exports (datasets stored in Typed Arrays).
Records from a database query (SQL or no-SQL) (click to show)
The data
demo includes examples of working with
databases and query results.
处理 HTML 表格
¥Processing HTML Tables
API
通过抓取页面中的 HTML TABLE 创建工作表
¥Create a worksheet by scraping an HTML TABLE in the page
var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);
table_to_sheet
实用程序函数采用 DOM TABLE 元素并迭代行以生成工作表。opts
参数是可选的。"HTML 表格输入" 更详细地描述了该功能。
¥The table_to_sheet
utility function takes a DOM TABLE element and iterates
through the rows to generate a worksheet. The opts
argument is optional.
"HTML Table Input" describes the
function in more detail.
通过抓取页面中的 HTML TABLE 创建工作簿
¥Create a workbook by scraping an HTML TABLE in the page
var workbook = XLSX.utils.table_to_book(dom_element, opts);
table_to_book
效用函数遵循与 table_to_sheet
相同的逻辑。生成工作表后,它会创建一个空白工作簿并附加电子表格。
¥The table_to_book
utility function follows the same logic as table_to_sheet
.
After generating a worksheet, it creates a blank workbook and appends the
spreadsheet.
options 参数支持与 table_to_sheet
相同的选项,并添加了 sheet
属性来控制工作表名称。如果缺少属性或未指定选项,则使用默认名称 Sheet1
。
¥The options argument supports the same options as table_to_sheet
, with the
addition of a sheet
property to control the worksheet name. If the property
is missing or no options are specified, the default name Sheet1
is used.
示例
¥Examples
无头演示 包括使用无头 Chromium ("Puppeteer") 和其他浏览器 ("Playwright") 从 HTML TABLE 元素生成服务器端电子表格的示例
¥The Headless Demo includes examples of server-side spreadsheet generation from HTML TABLE elements using headless Chromium ("Puppeteer") and other browsers ("Playwright")
以下是一些常见的场景(点击各个字幕即可查看代码):
¥Here are a few common scenarios (click on each subtitle to see the code):
HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim -->
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>
<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>
<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));
/* DO SOMETHING WITH workbook HERE */
XLSX.writeFile(workbook, "SheetJSHTMLExport.xlsx");
</script>
Multiple tables on a web page can be converted to individual worksheets:
/* create new workbook */
var workbook = XLSX.utils.book_new();
/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");
/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");
/* workbook now has 2 worksheets */
Alternatively, the HTML code can be extracted and parsed:
var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)
The "Chrome and Chromium" demo includes a complete example and enumerates the required permissions and other settings.
In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:
/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
NodeJS HTML Tables without a browser (click to show)
NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. The "Synthetic DOM" demo includes examples for NodeJS.