Skip to main content

数据导入

"导入教程" 是对数据导入和分析的简单介绍。

¥The "Import Tutorial" is a gentle introduction to data import and analysis.

解析工作簿

¥Parsing Workbooks

API

从电子表格字节中提取数据

¥Extract data from spreadsheet bytes

var workbook = XLSX.read(data, opts);

read 方法可以从存储在 JS 字符串、"二进制字符串"、NodeJS 缓冲区或类型化数组(Uint8ArrayArrayBuffer)中的电子表格字节中提取数据。

¥The read method can extract data from spreadsheet bytes stored in a JS string, "binary string", NodeJS buffer or typed array (Uint8Array or ArrayBuffer).

从本地文件读取电子表格字节并提取数据

¥Read spreadsheet bytes from a local file and extract data

var workbook = XLSX.readFile(filename, opts);

readFile 方法尝试读取所提供路径中的电子表格文件。

¥The readFile method attempts to read a spreadsheet file at the supplied path.

第二个 opts 参数是可选的。"解析选项" 涵盖了支持的属性和行为。

¥The second opts argument is optional. "Parsing Options" covers the supported properties and behaviors.

浏览器一般不允许通过指定文件名的方式读取文件(存在安全风险),在浏览器中运行 XLSX.readFile 会抛出错误。

¥Browsers generally do not allow reading files by specifying filename (it is a security risk), and running XLSX.readFile in the browser will throw an error.

必须使用 --allow-read 调用 Deno 脚本才能从文件系统读取。

¥Deno scripts must be invoked with --allow-read to read from the filesystem.

示例

¥Examples

以下是一些常见的场景(点击每个字幕即可查看代码)。

¥Here are a few common scenarios (click on each subtitle to see the code).

demos 更详细地介绍了特殊部署。

¥The demos cover special deployments in more detail.

示例:本地文件

¥Example: Local File

XLSX.readFile 支持在 NodeJS 等平台读取本地文件。在 React Native 等其他平台中,应该使用文件数据调用 XLSX.read

¥XLSX.readFile supports reading local files in platforms like NodeJS. In other platforms like React Native, XLSX.read should be called with file data.

"用户提交内容" 示例。 中介绍了用户拖放文件或使用文件元素的浏览器内处理

¥In-browser processing where users drag-and-drop files or use a file element are covered in the "User Submissions" example.

readFile 在底层使用 fs.readFileSync

¥readFile uses fs.readFileSync under the hood:

var XLSX = require("xlsx");

var workbook = XLSX.readFile("test.xlsx");

对于 Node ESM,必须手动加载 fs

¥For Node ESM, fs must be loaded manually:

import * as fs from "fs";
import { readFile, set_fs } from "xlsx";
set_fs(fs);

const workbook = readFile("test.xlsx");

示例:用户提交内容

¥Example: User Submissions

此示例重点关注用户通过拖放事件、HTML 文件输入元素或网络请求提交的文件。

¥This example focuses on user-submitted files through a drag-and-drop event, HTML file input element, or network request.

对于针对 Chrome 76+ 的现代网站,建议使用 File#arrayBuffer

¥For modern websites targeting Chrome 76+, File#arrayBuffer is recommended:

假设 drop_dom_element 是监听变化的 DOM 元素:

¥Assume drop_dom_element is the DOM element that will listen for changes:

<div id="drop_dom_element">Drop files here</div>

事件属性为 e.dataTransfer。该代码片段高亮了拖放示例和文件输入示例之间的区别:

¥The event property is e.dataTransfer. The code snippet highlights the difference between the drag-and-drop example and the file input example:

// XLSX is a global from the standalone script

async function handleDropAsync(e) {
e.stopPropagation(); e.preventDefault();
const f = e.dataTransfer.files[0];
/* f is a File */
const data = await f.arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
}
drop_dom_element.addEventListener("drop", handleDropAsync, false);

https://oss.sheetjs.com/sheetjs/ 演示了 FileReader 技术。

¥https://oss.sheetjs.com/sheetjs/ demonstrates the FileReader technique.

为了获得最大兼容性 (IE10+),建议使用 FileReader 方法:

¥For maximal compatibility (IE10+), the FileReader approach is recommended:

假设 drop_dom_element 是监听变化的 DOM 元素:

¥Assume drop_dom_element is the DOM element that will listen for changes:

<div id="drop_dom_element">Drop files here</div>

事件属性为 e.dataTransfer。该代码片段高亮了拖放示例和文件输入示例之间的区别:

¥The event property is e.dataTransfer. The code snippet highlights the difference between the drag-and-drop example and the file input example:

function handleDrop(e) {
e.stopPropagation(); e.preventDefault();
var f = e.dataTransfer.files[0];
/* f is a File */
var reader = new FileReader();
reader.onload = function(e) {
var data = e.target.result;
/* reader.readAsArrayBuffer(file) -> data will be an ArrayBuffer */
var workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
};
reader.readAsArrayBuffer(f);
}
drop_dom_element.addEventListener("drop", handleDrop, false);

oldie 演示 显示了 IE 兼容的后备方案。

¥The oldie demo shows an IE-compatible fallback scenario.

示例:远程文件

¥Example: Remote File

此示例重点介绍使用 XMLHttpRequestfetch 等 API 以及第三方库来获取文件(浏览器用语中的 "Ajax")。

¥This example focuses on fetching files ("Ajax" in browser parlance) using APIs like XMLHttpRequest and fetch as well as third-party libraries.

对于针对 Chrome 42+ 的现代网站,建议使用 fetch

¥For modern websites targeting Chrome 42+, fetch is recommended:

// XLSX is a global from the standalone script

(async() => {
const url = "https://xlsx.nodejs.cn/pres.xlsx";
const data = await (await fetch(url)).arrayBuffer();
/* data is an ArrayBuffer */
const workbook = XLSX.read(data);

/* DO SOMETHING WITH workbook HERE */
})();

为了获得更广泛的支持,建议使用 XMLHttpRequest 方法:

¥For broader support, the XMLHttpRequest approach is recommended:

var url = "https://xlsx.nodejs.cn/pres.xlsx";

/* set up async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";

req.onload = function(e) {
var workbook = XLSX.read(req.response);

/* DO SOMETHING WITH workbook HERE */
};

req.send();

HTTP 下载演示 包括使用浏览器 API 和封装器库的示例。

¥The HTTP Downloads demo includes examples using browser APIs and wrapper libraries.

https://oss.sheetjs.com/sheetjs/ajax.html 显示了 IE6+ 的后备方法。

¥https://oss.sheetjs.com/sheetjs/ajax.html shows fallback approaches for IE6+.

示例:可读流

¥Example: Readable Streams

推荐的方法是在内存中缓冲流,并在收集完所有数据后进行处理。正确的流解析在技术上是不可能的。

¥The recommended approach is to buffer streams in memory and process once all of the data has been collected. A proper streaming parse is technically impossible.

Technical details (click to show)

XLSX, XLSB, NUMBERS, and ODS files are ultimately ZIP files that contain binary and XML entries. The ZIP file format stores the table of contents ("end of central directory" record) at the end of the file, so a proper parse of a ZIP file requires scanning from the end. Streams do not provide random access into the data, so the only correct approach involves buffering the entire stream.

XLS, XLR, QPW, and Works 4 for Mac files use the "Compound File Binary Format". It is a container format that can hold multiple "files" and "folders". It also has a table of contents ("directory sectors") but these can be placed anywhere in the file! The only correct approach involves buffering enough of the stream to find the full table of contents, but the added complexity has little benefit when testing against real-world files generated by various versions of Excel and other tools.

处理 ReadableStream 时,最简单的方法是缓冲流并在最后处理整个事情:

¥When dealing with ReadableStream, the easiest approach is to buffer the stream and process the whole thing at the end:

// XLSX is a global from the standalone script

async function buffer_RS(stream) {
/* collect data */
const buffers = [];
const reader = stream.getReader();
for(;;) {
const res = await reader.read();
if(res.value) buffers.push(res.value);
if(res.done) break;
}

/* concat */
const out = new Uint8Array(buffers.reduce((acc, v) => acc + v.length, 0));

let off = 0;
for(const u8 of buffers) {
out.set(u8, off);
off += u8.length;
}

return out;
}

const data = await buffer_RS(stream);
/* data is Uint8Array */
const workbook = XLSX.read(data);

包含演示 中介绍了更详细的示例

¥More detailed examples are covered in the included demos

处理 JSON 和 JS 数据

¥Processing JSON and JS Data

JSON 和 JS 数据往往代表单个工作表。本节将使用一些实用函数来生成工作簿。

¥JSON and JS data tend to represent single worksheets. This section will use a few utility functions to generate workbooks.

创建新工作簿

¥Create a new Workbook

var workbook = XLSX.utils.book_new();

book_new 效用函数 创建一个没有工作表的空工作簿。

¥The book_new utility function creates an empty workbook with no worksheets.

API

从 JS 值数组创建工作表

¥Create a worksheet from an array of arrays of JS values

var worksheet = XLSX.utils.aoa_to_sheet(aoa, opts);

aoa_to_sheet 实用函数按行主序遍历 "数组的数组",生成工作表对象。以下代码片段生成一个工作表,其中单元格 A1 设置为字符串 A1,单元格 B1 设置为 B1,等等:

¥The aoa_to_sheet utility function walks an "array of arrays" in row-major order, generating a worksheet object. The following snippet generates a sheet with cell A1 set to the string A1, cell B1 set to B1, etc:

var worksheet = XLSX.utils.aoa_to_sheet([
["A1", "B1", "C1"],
["A2", "B2", "C2"],
["A3", "B3", "C3"]
]);

"数组的数组输入" 更详细地描述了该函数和可选的 opts 参数。

¥"Array of Arrays Input" describes the function and the optional opts argument in more detail.

从 JS 对象数组创建工作表

¥Create a worksheet from an array of JS objects

var worksheet = XLSX.utils.json_to_sheet(jsa, opts);

json_to_sheet 实用函数按顺序遍历 JS 对象数组,生成工作表对象。默认情况下,它将生成一个标题行和数组中每个对象一行。可选的 opts 参数具有控制列顺序和标题输出的设置。

¥The json_to_sheet utility function walks an array of JS objects in order, generating a worksheet object. By default, it will generate a header row and one row per object in the array. The optional opts argument has settings to control the column order and header output.

"对象数组输入" 更详细地描述了该函数和可选的 opts 参数。

¥"Array of Objects Input" describes the function and the optional opts argument in more detail.

示例

¥Examples

"导出教程" 包含从 JSON 端点获取数据并生成工作簿的详细示例。

¥"Export Tutorial" contains a detailed example of fetching data from a JSON Endpoint and generating a workbook.

x-spreadsheet 是一个交互式数据网格,用于在网络浏览器中预览和修改结构化数据。

¥x-spreadsheet is an interactive data grid for previewing and modifying structured data in the web browser.

"TensorFlow.js" 涵盖了从 ML 库导出(存储在类型化数组中的数据集)创建工作表的策略。

¥"TensorFlow.js" covers strategies for creating worksheets from ML library exports (datasets stored in Typed Arrays).

Records from a database query (SQL or no-SQL) (click to show)

The data demo includes examples of working with databases and query results.

处理 HTML 表格

¥Processing HTML Tables

API

通过抓取页面中的 HTML TABLE 创建工作表

¥Create a worksheet by scraping an HTML TABLE in the page

var worksheet = XLSX.utils.table_to_sheet(dom_element, opts);

table_to_sheet 实用程序函数采用 DOM TABLE 元素并迭代行以生成工作表。opts 参数是可选的。"HTML 表格输入" 更详细地描述了该功能。

¥The table_to_sheet utility function takes a DOM TABLE element and iterates through the rows to generate a worksheet. The opts argument is optional. "HTML Table Input" describes the function in more detail.

通过抓取页面中的 HTML TABLE 创建工作簿

¥Create a workbook by scraping an HTML TABLE in the page

var workbook = XLSX.utils.table_to_book(dom_element, opts);

table_to_book 效用函数遵循与 table_to_sheet 相同的逻辑。生成工作表后,它会创建一个空白工作簿并附加电子表格。

¥The table_to_book utility function follows the same logic as table_to_sheet. After generating a worksheet, it creates a blank workbook and appends the spreadsheet.

options 参数支持与 table_to_sheet 相同的选项,并添加了 sheet 属性来控制工作表名称。如果缺少属性或未指定选项,则使用默认名称 Sheet1

¥The options argument supports the same options as table_to_sheet, with the addition of a sheet property to control the worksheet name. If the property is missing or no options are specified, the default name Sheet1 is used.

示例

¥Examples

无头演示 包括使用无头 Chromium ("Puppeteer") 和其他浏览器 ("Playwright") 从 HTML TABLE 元素生成服务器端电子表格的示例

¥The Headless Demo includes examples of server-side spreadsheet generation from HTML TABLE elements using headless Chromium ("Puppeteer") and other browsers ("Playwright")

以下是一些常见的场景(点击各个字幕即可查看代码):

¥Here are a few common scenarios (click on each subtitle to see the code):

HTML TABLE element in a webpage (click to show)
<!-- include the standalone script and shim -->
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/shim.min.js"></script>
<script src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>

<!-- example table with id attribute -->
<table id="tableau">
<tr><td>Sheet</td><td>JS</td></tr>
<tr><td>12345</td><td>67</td></tr>
</table>

<!-- this block should appear after the table HTML and the standalone script -->
<script type="text/javascript">
var workbook = XLSX.utils.table_to_book(document.getElementById("tableau"));

/* DO SOMETHING WITH workbook HERE */
XLSX.writeFile(workbook, "SheetJSHTMLExport.xlsx");
</script>

Multiple tables on a web page can be converted to individual worksheets:

/* create new workbook */
var workbook = XLSX.utils.book_new();

/* convert table "table1" to worksheet named "Sheet1" */
var sheet1 = XLSX.utils.table_to_sheet(document.getElementById("table1"));
XLSX.utils.book_append_sheet(workbook, sheet1, "Sheet1");

/* convert table "table2" to worksheet named "Sheet2" */
var sheet2 = XLSX.utils.table_to_sheet(document.getElementById("table2"));
XLSX.utils.book_append_sheet(workbook, sheet2, "Sheet2");

/* workbook now has 2 worksheets */

Alternatively, the HTML code can be extracted and parsed:

var htmlstr = document.getElementById("tableau").outerHTML;
var workbook = XLSX.read(htmlstr, {type:"string"});
Chrome/Chromium Extension (click to show)

The "Chrome and Chromium" demo includes a complete example and enumerates the required permissions and other settings.

In an extension, it is recommended to generate the workbook in a content script and pass the object back to the extension:

/* in the worker script */
chrome.runtime.onMessage.addListener(function(msg, sender, cb) {
/* pass a message like { sheetjs: true } from the extension to scrape */
if(!msg || !msg.sheetjs) return;
/* create a new workbook */
var workbook = XLSX.utils.book_new();
/* loop through each table element */
var tables = document.getElementsByTagName("table")
for(var i = 0; i < tables.length; ++i) {
var worksheet = XLSX.utils.table_to_sheet(tables[i]);
XLSX.utils.book_append_sheet(workbook, worksheet, "Table" + i);
}
/* pass back to the extension */
return cb(workbook);
});
NodeJS HTML Tables without a browser (click to show)

NodeJS does not include a DOM implementation and Puppeteer requires a hefty Chromium build. The "Synthetic DOM" demo includes examples for NodeJS.