HTTP 下载
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
许多 JavaScript API(包括 XMLHttpRequest
和 fetch
)允许脚本下载电子表格以供进一步处理。
¥A number of JavaScript APIs, including XMLHttpRequest
and fetch
, allow
scripts to download spreadsheets for further processing.
此演示使用各种 API 和封装器库下载工作簿并将原始二进制数据传递给 SheetJS 库。
¥This demo uses various APIs and wrapper libraries to download workbooks and pass raw binary data to SheetJS libraries.
-
"浏览器演示" 完全在 Web 浏览器中运行。将在 Web 浏览器中下载和解析测试工作簿。
¥"Browser Demos" run entirely within the web browser. A test workbook will be downloaded and parsed in the web browser.
-
"NodeJS 演示" 在 NodeJS 和其他服务器端平台中运行。
¥"NodeJS Demos" run in NodeJS and other server-side platforms.
这个演示主要是下载文件。其他演示涵盖其他 HTTP 用例:
¥This demo focuses on downloading files. Other demos cover other HTTP use cases:
-
"HTTP 上传" 涵盖上传文件
¥"HTTP Uploads" covers uploading files
-
"HTTP 服务器处理" 涵盖 HTTP 服务器
¥"HTTP Server Processing" covers HTTP servers
AWS 等第三方云平台可能会通过以 UTF-8 字符串对请求和响应进行编码来破坏原始二进制下载。
¥Third-party cloud platforms such as AWS may corrupt raw binary downloads by encoding requests and responses in UTF-8 strings.
对于 AWS,在 API Gateway 控制台的 "二进制媒体类型" 部分中,应添加 "application/vnd.ms-excel"
类型,以确保 AWS Lambda 函数可以将文件发送到客户端。
¥For AWS, in the "Binary Media Types" section of the API Gateway console, the
"application/vnd.ms-excel"
type should be added to ensure that AWS Lambda
functions functions can send files to clients.
二进制数据
¥Binary Data
最有趣的电子表格文件是包含表示无效 UTF-8 字符的字节序列的二进制数据。
¥Most interesting spreadsheet files are binary data that contain byte sequences that represent invalid UTF-8 characters.
API 通常提供选项来控制如何解释下载的数据。arraybuffer
响应类型通常强制将数据呈现为 ArrayBuffer
对象,该对象可以使用 SheetJS read
方法 [^1] 进行解析。
¥APIs generally provide options to control how downloaded data is interpreted.
The arraybuffer
response type usually forces the data to be presented as an
ArrayBuffer
object which can be parsed with the SheetJS read
method[^1].
下面的例子展示了使用 fetch
下载文件的数据流:
¥The following example shows the data flow using fetch
to download files:
/* download data into an ArrayBuffer object */
const res = await fetch("https://xlsx.nodejs.cn/pres.numbers");
const ab = await res.arrayBuffer(); // recover data as ArrayBuffer
/* parse file */
const wb = XLSX.read(ab);
浏览器演示
¥Browser Demos
当页面被访问时,https://xlsx.nodejs.cn/pres.numbers 将被获取并解析。旧表将替换为内容与第一个工作表匹配的表。SheetJS sheet_to_html
方法 [^2] 创建 HTML 表。
¥When the page is accessed, https://xlsx.nodejs.cn/pres.numbers will be fetched
and parsed. The old table will be replaced with a table whose contents match the
first worksheet. The SheetJS sheet_to_html
method[^2] creates the HTML table.
每个浏览器演示都在以下环境中进行了测试:
¥Each browser demo was tested in the following environments:
浏览器 | 日期 |
---|---|
Chrome 126 | 2024-06-19 |
Safari 17.3 | 2024-06-19 |
XMLHttp 请求
¥XMLHttpRequest
对于下载数据,arraybuffer
响应类型生成 ArrayBuffer
,可以将其视为 Uint8Array
并馈送到 SheetJS read
方法。对于旧版浏览器,应指定选项 type: "array"
:
¥For downloading data, the arraybuffer
response type generates an ArrayBuffer
that can be viewed as an Uint8Array
and fed to the SheetJS read
method. For
legacy browsers, the option type: "array"
should be specified:
/* set up an async GET request */
var req = new XMLHttpRequest();
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.onload = function(e) {
/* parse the data when it is received */
var data = new Uint8Array(req.response);
var workbook = XLSX.read(data, {type:"array"});
/* DO SOMETHING WITH workbook HERE */
};
req.send();
Live Download demo (click to show)
This demo uses XMLHttpRequest
to fetch https://xlsx.nodejs.cn/pres.numbers
and show the data in an HTML table.
function SheetJSXHRDL() { const [__html, setHTML] = React.useState(""); /* Fetch and update HTML */ React.useEffect(() => { (async() => { /* Fetch file */ const req = new XMLHttpRequest(); req.open("GET", "https://xlsx.nodejs.cn/pres.numbers", true); req.responseType = "arraybuffer"; req.onload = e => { /* Parse file */ const wb = XLSX.read(new Uint8Array(req.response)); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ setHTML(XLSX.utils.sheet_to_html(ws)); }; req.send(); })(); }, []); return ( <div dangerouslySetInnerHTML={{ __html }}/> ); }
fetch
为了下载数据,Response#arrayBuffer
解析为 ArrayBuffer
,可以将其转换为 Uint8Array
并传递给 SheetJS read
方法:
¥For downloading data, Response#arrayBuffer
resolves to an ArrayBuffer
that
can be converted to Uint8Array
and passed to the SheetJS read
method:
fetch(url).then(function(res) {
/* get the data as a Blob */
if(!res.ok) throw new Error("fetch failed");
return res.arrayBuffer();
}).then(function(ab) {
/* parse the data when it is received */
var data = new Uint8Array(ab);
var workbook = XLSX.read(data, {type:"array"});
/* DO SOMETHING WITH workbook HERE */
});
Live Download demo (click to show)
This demo uses fetch
to download https://xlsx.nodejs.cn/pres.numbers and
show the data in an HTML table.
function SheetJSFetchDL() { const [__html, setHTML] = React.useState(""); /* Fetch and update HTML */ React.useEffect(() => { (async() => { /* Fetch file */ const res = await fetch("https://xlsx.nodejs.cn/pres.numbers"); const ab = await res.arrayBuffer(); /* Parse file */ const wb = XLSX.read(ab); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ setHTML(XLSX.utils.sheet_to_html(ws)); })(); }, []); return ( <div dangerouslySetInnerHTML={{ __html }}/> ); }
jQuery
jQuery 是一个 JavaScript 库,其中包含用于执行 "Ajax" 网络请求的辅助程序。jQuery.ajax
($.ajax
) 不支持开箱即用的二进制数据 [^3]。定制 ajaxTransport
可以添加支持。
¥jQuery is a JavaScript library that includes helpers for
performing "Ajax" network requests. jQuery.ajax
($.ajax
) does not support
binary data out of the box[^3]. A custom ajaxTransport
can add support.
SheetJS 用户报告在 IE10 中成功使用 jquery.binarytransport.js
[^4]。
¥SheetJS users have reported success with jquery.binarytransport.js
[^4] in IE10.
在包含主要的 jquery.js
和 jquery.binarytransport.js
脚本后,$.ajax
将支持 dataType: "binary"
和 processData: false
。
¥After including the main jquery.js
and jquery.binarytransport.js
scripts,
$.ajax
will support dataType: "binary"
and processData: false
.
在 GET 请求中,默认行为是返回 Blob
对象。传递 responseType: "arraybuffer"
在 IE10 中返回正确的 ArrayBuffer
对象:
¥In a GET request, the default behavior is to return a Blob
object. Passing
responseType: "arraybuffer"
returns a proper ArrayBuffer
object in IE10:
$.ajax({
type: "GET", url: "https://xlsx.nodejs.cn/pres.numbers",
/* suppress jQuery post-processing */
processData: false,
/* use the binary transport */
dataType: "binary",
/* pass an ArrayBuffer in the callback */
responseType: "arraybuffer",
success: function (ab) {
/* at this point, ab is an ArrayBuffer */
var wb = XLSX.read(ab);
/* do something with workbook here */
var ws = wb.Sheets[wb.SheetNames[0]];
var html = XLSX.utils.sheet_to_html(ws);
$("#out").html(html);
}
});
封装库
¥Wrapper Libraries
在 fetch
随浏览器一起发布之前,有各种封装器库来简化 XMLHttpRequest
。由于 fetch
的限制,这些库仍然相关。
¥Before fetch
shipped with browsers, there were various wrapper libraries to
simplify XMLHttpRequest
. Due to limitations with fetch
, these libraries are
still relevant.
axios
axios
提供了一个基于 Promise 的接口。将 responseType
设置为 arraybuffer
可确保返回类型是 ArrayBuffer。结果的 data
属性可以传递给 SheetJS read
方法:
¥axios
presents a Promise based interface. Setting
responseType
to arraybuffer
ensures the return type is an ArrayBuffer. The
data
property of the result can be passed to the SheetJS read
method:
async function workbook_dl_axios(url) {
const res = await axios(url, {responseType:'arraybuffer'});
const workbook = XLSX.read(res.data);
return workbook;
}
Live Download demo (click to show)
This demo uses axios
to download https://xlsx.nodejs.cn/pres.numbers and
show the data in an HTML table.
If the live demo shows a message
ReferenceError: axios is not defined
please refresh the page. This is a known bug in the documentation generator.
function SheetJSAxiosDL() { const [__html, setHTML] = React.useState(""); /* Fetch and update HTML */ React.useEffect(() => { (async() => { if(typeof axios != "function") return setHTML("ReferenceError: axios is not defined"); /* Fetch file */ const res = await axios("https://xlsx.nodejs.cn/pres.numbers", {responseType: "arraybuffer"}); /* Parse file */ const wb = XLSX.read(res.data); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ setHTML(XLSX.utils.sheet_to_html(ws)); })(); }, []); return ( <div dangerouslySetInnerHTML={{ __html }}/> ); }
superagent
superagent
是一个带有 "流畅的界面" 的网络请求库。使用 "arraybuffer"
调用 responseType
方法将确保最终响应对象是 ArrayBuffer
:
¥superagent
is a network request library
with a "Fluent Interface". Calling the responseType
method with
"arraybuffer"
will ensure the final response object is an ArrayBuffer
:
/* set up an async GET request with superagent */
superagent
.get(url)
.responseType('arraybuffer')
.end(function(err, res) {
/* parse the data when it is received */
var data = new Uint8Array(res.body);
var workbook = XLSX.read(data, {type:"array"});
/* DO SOMETHING WITH workbook HERE */
});
Live Download demo (click to show)
This demo uses superagent
to download https://xlsx.nodejs.cn/pres.numbers
and show the data in an HTML table.
If the live demo shows a message
ReferenceError: superagent is not defined
please refresh the page. This is a known bug in the documentation generator.
function SheetJSSuperAgentDL() { const [__html, setHTML] = React.useState(""); /* Fetch and update HTML */ React.useEffect(() => { (async() => { if(typeof superagent == "undefined" || typeof superagent.get != "function") return setHTML("ReferenceError: superagent is not defined"); /* Fetch file */ superagent .get("https://xlsx.nodejs.cn/pres.numbers") .responseType("arraybuffer") .end((err, res) => { /* Parse file */ const wb = XLSX.read(res.body); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ setHTML(XLSX.utils.sheet_to_html(ws)); }); })(); }, []); return ( <div dangerouslySetInnerHTML={{ __html }}/> ); }
NodeJS 演示
¥NodeJS Demos
这些示例展示了如何在 NodeJS 中下载数据。
¥These examples show how to download data in NodeJS.
HTTPS 获取
¥HTTPS GET
https
模块为 HTTPS GET 请求提供底层 get
方法:
¥The https
module provides a low-level get
method for HTTPS GET requests:
var https = require("https"), XLSX = require("xlsx");
https.get('https://xlsx.nodejs.cn/pres.xlsx', function(res) {
var bufs = [];
res.on('data', function(chunk) { bufs.push(chunk); });
res.on('end', function() {
var buf = Buffer.concat(bufs);
var wb = XLSX.read(buf);
/* print the first worksheet to console */
var ws = wb.Sheets[wb.SheetNames[0]];
console.log(XLSX.utils.sheet_to_csv(ws));
});
});
本 demo 在以下环境下进行了测试:
¥This demo was tested in the following environments:
NodeJS | 日期 | 解决方法 |
---|---|---|
0.10.48 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
0.12.18 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
4.9.1 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
6.17.1 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
8.17.0 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
10.24.1 | 2024-06-21 | |
12.22.12 | 2024-06-21 | |
14.21.3 | 2024-06-21 | |
16.20.2 | 2024-06-21 | |
18.20.3 | 2024-06-21 | |
20.15.0 | 2024-06-21 | |
22.3.0 | 2024-06-21 |
NODE_TLS_REJECT_UNAUTHORIZED
解决方法将值设置为 '0'
:
¥The NODE_TLS_REJECT_UNAUTHORIZED
workaround sets the value to '0'
:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';
Complete Example (click to show)
- Install the NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
-
Copy the
SheetJSHTTPSGet.js
code snippet to a fileSheetJSHTTPSGet.js
-
Run the script:
node SheetJSHTTPSGet.js
If successful, the script will print CSV contents of the test file.
For older versions of NodeJS, the script will fail due to a certificate error. The error can be suppressed by prepending the following line to the script:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';
It is strongly encouraged to upgrade to a newer NodeJS version!
fetch
NodeJS 16.15.0
中引入了对 fetch
的实验性支持。它在 NodeJS LTS 版本 22
中被认为是稳定的。
¥Experimental support for fetch
was introduced in NodeJS 16.15.0
. It will be
considered stable in NodeJS LTS version 22
.
fetch
实现具有与浏览器版本相同的返回类型:
¥The fetch
implementation has the same return types as the browser version:
async function parse_from_url(url) {
const res = await fetch(url);
if(!res.ok) throw new Error("fetch failed");
const ab = await res.arrayBuffer();
const workbook = XLSX.read(ab);
return workbook;
}
本 demo 在以下环境下进行了测试:
¥This demo was tested in the following environments:
NodeJS | 日期 |
---|---|
18.20.3 | 2024-06-21 |
20.15.0 | 2024-06-21 |
22.3.0 | 2024-06-21 |
Complete Example (click to show)
- Install the NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz
- Save the following to
SheetJSFetch.js
:
var XLSX = require("xlsx");
async function parse_from_url(url) {
const res = await fetch(url);
if(!res.ok) throw new Error("fetch failed");
const ab = await res.arrayBuffer();
const workbook = XLSX.read(ab);
return workbook;
}
(async() => {
const wb = await parse_from_url('https://xlsx.nodejs.cn/pres.numbers');
/* print the first worksheet to console */
var ws = wb.Sheets[wb.SheetNames[0]];
console.log(XLSX.utils.sheet_to_csv(ws));
})();
- Run the script:
node SheetJSFetch.js
If successful, the script will print CSV contents of the test file.
封装库
¥Wrapper Libraries
NodeJS 的最新版本原生支持 fetch
。在将 fetch
支持添加到平台之前,第三方模块封装了原生 API。
¥The latest releases of NodeJS support fetch
natively. Before fetch
support
was added to the platform, third party modules wrapped the native APIs.
request
request
已被弃用,仅应在旧部署中使用。
¥request
has been deprecated and should only be used in legacy deployments.
设置选项 encoding: null
传递原始缓冲区:
¥Setting the option encoding: null
passes raw buffers:
var XLSX = require('xlsx'), request = require('request');
var url = 'https://xlsx.nodejs.cn/pres.xlsx';
/* call `request` with the option `encoding: null` */
request(url, {encoding: null}, function(err, res, data) {
if(err || res.statusCode !== 200) return;
/* if the request was successful, parse the data */
var wb = XLSX.read(data);
/* print the first worksheet to console */
var ws = wb.Sheets[wb.SheetNames[0]];
console.log(XLSX.utils.sheet_to_csv(ws));
});
本 demo 在以下环境下进行了测试:
¥This demo was tested in the following environments:
NodeJS | 日期 | 解决方法 |
---|---|---|
0.10.48 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
0.12.18 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
4.9.1 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
6.17.1 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
8.17.0 | 2024-06-21 | NODE_TLS_REJECT_UNAUTHORIZED |
10.24.1 | 2024-06-21 | |
12.22.12 | 2024-06-21 | |
14.21.3 | 2024-06-21 | |
16.20.2 | 2024-06-21 | |
18.20.3 | 2024-06-21 | |
20.15.0 | 2024-06-21 | |
22.3.0 | 2024-06-21 |
NODE_TLS_REJECT_UNAUTHORIZED
解决方法将值设置为 '0'
:
¥The NODE_TLS_REJECT_UNAUTHORIZED
workaround sets the value to '0'
:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';
Complete Example (click to show)
- Install the NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz request@2.88.2
-
Copy the
SheetJSRequest.js
code snippet to a fileSheetJSRequest.js
-
Run the script:
node SheetJSRequest.js
If successful, the script will print CSV contents of the test file.
For older versions of NodeJS, the script will fail due to a certificate error. The error can be suppressed by prepending the following line to the script:
process.env.NODE_TLS_REJECT_UNAUTHORIZED = '0';
It is strongly encouraged to upgrade to a newer NodeJS version!
axios
当 responseType
为 "arraybuffer"
时,axios
实际上捕获的是 NodeJS Buffer 中的数据。SheetJS read
方法处理 NodeJS Buffer 对象:
¥When the responseType
is "arraybuffer"
, axios
actually captures the data
in a NodeJS Buffer. The SheetJS read
method handles NodeJS Buffer objects:
const XLSX = require("xlsx"), axios = require("axios");
async function workbook_dl_axios(url) {
const res = await axios(url, {responseType:'arraybuffer'});
/* at this point, res.data is a Buffer */
const workbook = XLSX.read(res.data);
return workbook;
}
本 demo 在以下环境下进行了测试:
¥This demo was tested in the following environments:
NodeJS | Axios | 日期 |
---|---|---|
10.24.1 | 0.28.1 | 2024-06-21 |
12.22.12 | 1.7.2 | 2024-06-21 |
14.21.3 | 1.7.2 | 2024-06-21 |
16.20.2 | 1.7.2 | 2024-06-21 |
18.20.3 | 1.7.2 | 2024-06-21 |
20.15.0 | 1.7.2 | 2024-06-21 |
22.3.0 | 1.7.2 | 2024-06-21 |
Complete Example (click to show)
- Install the NodeJS module
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz axios@1.7.2
- Save the following to
SheetJSAxios.js
:
const XLSX = require("xlsx"), axios = require("axios");
async function workbook_dl_axios(url) {
const res = await axios(url, {responseType:'arraybuffer'});
/* at this point, res.data is a Buffer */
const workbook = XLSX.read(res.data);
return workbook;
}
(async() => {
const wb = await workbook_dl_axios('https://xlsx.nodejs.cn/pres.numbers');
/* print the first worksheet to console */
var ws = wb.Sheets[wb.SheetNames[0]];
console.log(XLSX.utils.sheet_to_csv(ws));
})();
- Run the script:
node SheetJSAxios.js
If successful, the script will print CSV contents of the test file.
其他平台
¥Other Platforms
其他演示展示了特殊平台上的网络操作:
¥Other demos show network operations in special platforms:
[^1]: 见 read
于 "读取文件"
[^2]: 见 sheet_to_html
于 "实用工具"
¥See sheet_to_html
in "Utilities"
[^3]: 请参阅 jQuery 官方文档中的 dataType
于 jQuery.ajax
。
¥See dataType
in jQuery.ajax
in the official jQuery documentation.
[^4]: 详细信息请参见 官方 jquery.binarytransport.js
存储库。
¥See the official jquery.binarytransport.js
repo for more details.