Web Worker
解析和编写大型电子表格需要时间。在此过程中,如果 SheetJS 库正在 Web 浏览器中运行,则网站可能会冻结。
¥Parsing and writing large spreadsheets takes time. During the process, if the SheetJS library is running in the web browser, the website may freeze.
工作线程提供了一种减轻繁重工作的方法,以便网站在处理过程中不会冻结。这项工作仍在当地进行。没有数据发送到远程服务器。
¥Workers provide a way to off-load the hard work so that the website does not freeze during processing. The work is still performed locally. No data is sent to a remote server.
下图显示了导出数据集时的正常流程和 Web Worker 流程。当浏览器冻结时,带有红色背景的区域标记。
¥The following diagrams show the normal and Web Worker flows when exporting a dataset. The regions with a red background mark when the browser is frozen.
Normal Export | Web Worker Export |
---|---|
IE10+ 和现代浏览器支持基本的 Web Workers。一些 API 如 fetch
是后来添加的。强烈建议进行功能测试。
¥IE10+ and modern browsers support basic Web Workers. Some APIs like fetch
were
added later. Feature testing is strongly recommended.
由于实时代码块的限制,本节中的所有工作线程都是内联的。该代码嵌入在模板字面量中。对于生产站点,通常将 worker 编写在单独的 JS 文件中。
¥Due to limitations of the live code blocks, all of the workers in this section are in-line. The code is embedded in template literals. For production sites, typically workers are written in separate JS files.
Example (click to show)
For example, an in-line worker like
const worker = new Worker(URL.createObjectURL(new Blob([`\
/* load standalone script from CDN */
importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js");
/* this callback will run once the main context sends a message */
self.addEventListener('message', (e) => {
/* Pass the version string back */
postMessage({ version: XLSX.version });
}, false);
`])));
would typically be stored in a separate JS file like "worker.js":
/* load standalone script from CDN */
importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js");
/* this callback will run once the main context sends a message */
self.addEventListener('message', (e) => {
/* Pass the version string back */
postMessage({ version: XLSX.version });
}, false);
and the main script would pass a URL to the Worker
constructor:
const worker = new Worker("./worker.js");
安装
¥Installation
在所有情况下,Worker 中的 importScripts
都可以加载 SheetJS 独立脚本
¥In all cases, importScripts
in a Worker can load the
SheetJS Standalone scripts
importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js");
对于生产用途,强烈建议下载并托管脚本。
¥For production use, it is highly encouraged to download and host the script.
ECMAScript Module Support (click to hide)
Chromium 系列浏览器(包括 Chrome 和 Edge)以及由 WebKit 支持的浏览器(包括 Safari)中的 Web Workers 支持 ESM。
¥ESM is supported in Web Workers in the Chromium family of browsers (including Chrome and Edge) as well as in browsers powered by WebKit (including Safari).
对于包括 Firefox 和 IE 在内的旧版浏览器,应使用 importScripts
。
¥For legacy browsers including Firefox and IE, importScripts
should be used.
浏览器 ESM 导入需要包含 .mjs
扩展名的完整 URL:
¥Browser ESM imports require a complete URL including the .mjs
extension:
import * as XLSX from "https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs";
使用 Worker ESM 时,Worker 构造函数必须设置 type
选项:
¥When using Worker ESM, the Worker constructor must set the type
option:
const worker = new Worker(
url_to_worker_script,
{ type: "module" } // second argument to Worker constructor
);
内联工作线程还需要 Blob MIME 类型 text/javascript
:
¥Inline workers additionally require the Blob MIME type text/javascript
:
const worker_code = `\
/* load standalone script from CDN */
import * as XLSX from "https://cdn.sheetjs.com/xlsx-0.20.3/package/xlsx.mjs";
// ... do something with XLSX here ...
`;
const worker = new Worker(
URL.createObjectURL(
new Blob(
[ worker_code ],
{ type: "text/javascript" } // second argument to the Blob constructor
)
),
{ type: "module" } // second argument to Worker constructor
);
在线演示
¥Live Demos
每个浏览器演示都在以下环境中进行了测试:
¥Each browser demo was tested in the following environments:
浏览器 | 日期 | 注释 |
---|---|---|
Chrome 122 | 2024-04-25 | |
边缘 122 | 2024-03-12 | |
Safari 17.3 | 2024-03-12 | 不支持文件系统访问 API |
勇敢 1.59 | 2024-03-12 | 不支持文件系统访问 API |
火狐 122 | 2024-03-12 | 不支持文件系统访问 API |
下载远程文件
¥Downloading a Remote File
Chrome 42 和 Safari 10.3 中的 Web Workers 中启用了 fetch
¥fetch
was enabled in Web Workers in Chrome 42 and Safari 10.3
通常,Web Worker 执行 fetch
操作,处理工作簿,并将最终结果(HTML 表或原始数据)发送到主浏览器上下文:
¥Typically the Web Worker performs the fetch
operation, processes the workbook,
and sends a final result (HTML table or raw data) to the main browser context:
Live Demo (click to show)
In the following example, the script:
- downloads https://xlsx.nodejs.cn/pres.numbers in a Web Worker
- loads the SheetJS library and parses the file in the Worker
- generates an HTML string of the first table in the Worker
- sends the string to the main browser context
- adds the HTML to the page in the main browser context
function SheetJSFetchDLWorker() { const [__html, setHTML] = React.useState(""); return ( <> <button onClick={() => { /* this mantra embeds the worker source in the function */ const worker = new Worker(URL.createObjectURL(new Blob([`\ /* load standalone script from CDN */ importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"); /* this callback will run once the main context sends a message */ self.addEventListener('message', async(e) => { try { /* Fetch file */ const res = await fetch("https://xlsx.nodejs.cn/pres.numbers"); const ab = await res.arrayBuffer(); /* Parse file */ const wb = XLSX.read(ab, {dense: true}); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ const html = XLSX.utils.sheet_to_html(ws); /* Reply with result */ postMessage({ html }); } catch(e) { /* Pass the error message back */ postMessage({html: String(e.message || e).bold() }); } }, false); `]))); /* when the worker sends back the HTML, add it to the DOM */ worker.onmessage = function(e) { setHTML(e.data.html); }; /* post a message to the worker */ worker.postMessage({}); }}><b>Click to Start</b></button> <div dangerouslySetInnerHTML={{ __html }}/> </> ); }
创建本地文件
¥Creating a Local File
XLSX.writeFile
在 Web Workers 中不起作用!原始文件数据可以从 Web Worker 传递到主浏览器上下文以供下载。
¥XLSX.writeFile
will not work in Web Workers! Raw file data can be passed from
the Web Worker to the main browser context for downloading.
通常,Web Worker 接收 JS 对象数组,生成工作簿,并将 URL 发送到主浏览器上下文以供下载:
¥Typically the Web Worker receives an array of JS objects, generates a workbook, and sends a URL to the main browser context for download:
Live Demo (click to show)
In the following example, the script:
- sends a dataset (array of JS objects) to the Web Worker
- generates a workbook object in the Web Worker
- generates a XLSB file using
XLSX.write
in the Web Worker - generates an object URL in the Web Worker
- sends the object URL to the main browser context
- performs a download action in the main browser context
function SheetJSWriteFileWorker() { const [__html, setHTML] = React.useState(""); const data = [ { "SheetJS": "வணக்கம்", "in": "สวัสดี", "Web": "你好", "Workers": "가지마" }, { "SheetJS": 1, "in": 2, "Web": 3, "Workers": 4 }, ]; return ( <> <button onClick={() => { setHTML(""); /* this mantra embeds the worker source in the function */ const worker = new Worker(URL.createObjectURL(new Blob([`\ /* load standalone script from CDN */ importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"); /* this callback will run once the main context sends a message */ self.addEventListener('message', async(e) => { try { /* Create a new workbook from the data */ const ws = XLSX.utils.json_to_sheet(e.data.data); const wb = XLSX.utils.book_new(); XLSX.utils.book_append_sheet(wb, ws, "Data"); /* Write XLSB data (Uint8Array) */ const u8 = XLSX.write(wb, { bookType: "xlsb", type: "buffer" }); /* Generate URL */ const url = URL.createObjectURL(new Blob([u8])); /* Reply with result */ postMessage({ url }); } catch(e) { /* Pass the error message back */ postMessage({error: String(e.message || e).bold() }); } }, false); `]))); /* when the worker sends back the data, create a download */ worker.onmessage = function(e) { if(e.data.error) return setHTML(e.data.error); /* this mantra is the standard HTML5 download attribute technique */ const a = document.createElement("a"); a.download = "SheetJSWriteFileWorker.xlsb"; a.href = e.data.url; document.body.appendChild(a); a.click(); document.body.removeChild(a); }; /* post a message to the worker */ worker.postMessage({ data }); }}><b>Click to Start</b></button> <div dangerouslySetInnerHTML={{ __html }}/> </> ); }
用户提交的文件
¥User-Submitted File
通常,FileReader
用于主浏览器上下文。在 Web Workers 中,同步版本 FileReaderSync
效率更高。
¥Typically FileReader
is used in the main browser context. In Web Workers, the
synchronous version FileReaderSync
is more efficient.
通常,Web Worker 接收文件指针,读取并解析文件,并将最终结果(HTML 表或原始数据)发送到主浏览器上下文:
¥Typically the Web Worker receives a file pointer, reads and parses the file, and sends a final result (HTML table or raw data) to the main browser context:
Live Demo (click to show)
In the following example, when a file is dropped over the DIV or when the INPUT element is used to select a file, the script:
- sends the
File
object to the Web Worker - loads the SheetJS library and parses the file in the Worker
- generates an HTML string of the first table in the Worker
- sends the string to the main browser context
- adds the HTML to the page in the main browser context
function SheetJSDragDropWorker() { const [__html, setHTML] = React.useState(""); /* suppress default behavior for drag and drop */ function suppress(e) { e.stopPropagation(); e.preventDefault(); } /* this worker is shared between drag-drop and file input element */ const worker = new Worker(URL.createObjectURL(new Blob([`\ /* load standalone script from CDN */ importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"); /* this callback will run once the main context sends a message */ self.addEventListener('message', (e) => { try { /* Read file data */ const ab = new FileReaderSync().readAsArrayBuffer(e.data.file); /* Parse file */ const wb = XLSX.read(ab, {dense: true}); const ws = wb.Sheets[wb.SheetNames[0]]; /* Generate HTML */ const html = XLSX.utils.sheet_to_html(ws); /* Reply with result */ postMessage({ html }); } catch(e) { /* Pass the error message back */ postMessage({html: String(e.message || e).bold() }); } }, false); `]))); /* when the worker sends back the HTML, add it to the DOM */ worker.onmessage = function(e) { setHTML(e.data.html); }; return ( <> <div onDragOver={suppress} onDragEnter={suppress} onDrop={(e) => { suppress(e); /* post a message with the first File to the worker */ worker.postMessage({ file: e.dataTransfer.files[0] }); }}>Drag a file to this DIV to process! (or use the file input)</div> <input type="file" onChange={(e) => { suppress(e); /* post a message with the first File to the worker */ worker.postMessage({ file: e.target.files[0] }); }}/> <div dangerouslySetInnerHTML={{ __html }}/> </> ); }
流式写入
¥Streaming Write
"流导出" 部分更详细地介绍了流式写入方法。
¥The "Stream Export" section covers the streaming write methods in more detail.
"大型数据集" 演示包括浏览器实时示例。
¥The "Large Datasets" demo includes browser live examples.
文件系统访问 API
¥File System Access API
截至撰写本文时,文件系统访问 API 仅在 Chromium 和基于 Chromium 的浏览器(例如 Chrome 和 Edge)中可用。
¥At the time of writing, the File System Access API is only available in Chromium and Chromium-based browsers like Chrome and Edge.
在本地测试中,在生成每个 CSV 行时提交它比在最后累积并写入一次要慢得多。
¥In local testing, committing each CSV row as it is generated is significantly slower than accumulating and writing once at the end.
当已知目标的 CSV 小于 500MB 时,最好进行批处理。较大的文件可能会达到浏览器长度限制。
¥When the target CSV is known to be less than 500MB, it is preferable to batch. Larger files may hit browser length limits.
Live Demo (click to show)
The following live demo fetches and parses a file in a Web Worker. The script:
- prompts user to save file (
window.showSaveFilePicker
in the main thread) - passes the URL and the file object to the Web Worker
- loads the SheetJS library in the Web Worker
- fetches the requested URL and parses the workbook from the Worker
- creates a Writable Stream from the file object.
- uses
XLSX.stream.to_csv
to generate CSV rows of the first worksheet- every 100th row, a progress message is sent back to the main thread
- at the end, a completion message is sent back to the main thread
The demo has a checkbox. If it is not checked (default), the Worker will collect each CSV row and write once at the end. If it is checked, the Worker will try to commit each row as it is generated.
The demo also has a URL input box. Feel free to change the URL. For example:
https://raw.githubusercontent.com/SheetJS/test_files/master/large_strings.xls
is an XLS file over 50 MB. The generated CSV file is about 55 MB.
https://raw.githubusercontent.com/SheetJS/libreoffice_test-files/master/calc/xlsx-import/perf/8-by-300000-cells.xlsx
is an XLSX file with 300000 rows (approximately 20 MB) yielding a CSV of 10 MB.
function SheetJSFetchCSVStreamFile() { const [state, setState] = React.useState(""); const [__html, setHTML] = React.useState(""); const [cnt, setCnt] = React.useState(0); const [hz, setHz] = React.useState(0); const [url, setUrl] = React.useState("https://xlsx.nodejs.cn/test_files/large_strings.xlsx"); const ref = React.useRef(null); return ( <> <b>URL: </b><input type="text" value={url} onChange={(e) => setUrl(e.target.value)} size="80"/><br/> <b>Commit each row: </b><input type="checkbox" ref={ref}/><br/> <button onClick={async() => { /* this mantra embeds the worker source in the function */ const worker = new Worker(URL.createObjectURL(new Blob([`\ /* load standalone script from CDN */ importScripts("https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"); function sheet_to_csv_cb(ws, cb, opts, batch = 1000) { XLSX.stream.set_readable(() => ({ __done: false, // this function will be assigned by the SheetJS stream methods _read: function() { this.__done = true; }, // this function is called by the stream methods push: function(d) { if(!this.__done) cb(d); if(d == null) this.__done = true; }, resume: function pump() { for(var i = 0; i < batch && !this.__done; ++i) this._read(); if(!this.__done) setTimeout(pump.bind(this), 0); } })); return XLSX.stream.to_csv(ws, opts); } /* this callback will run once the main context sends a message */ self.addEventListener('message', async(e) => { try { /* Fetch file */ postMessage({state: "fetching"}); var t = Date.now(); const res = await fetch(e.data.url); const ab = await res.arrayBuffer(); postMessage({time: "fetch", ts: Date.now() - t}); /* Parse file */ postMessage({state: "parsing"}); t = Date.now(); const wb = XLSX.read(ab, {dense: true}); const ws = wb.Sheets[wb.SheetNames[0]]; postMessage({time: "parse", ts: Date.now() - t}); /* Generate CSV rows */ postMessage({state: "begin"}); t = Date.now(); const wstream = await e.data.wFile.createWritable(); let c = 0, buf = "", each = !!e.data.each; const strm = sheet_to_csv_cb(ws, async(csv) => { if(csv != null) { if(each) await wstream.write(csv); else buf += csv; if(!(++c % 100)) postMessage({ state: "writing", c, ts: Date.now() - t }); } else { if(buf) await wstream.write(buf); await wstream.close(); postMessage({state: "done", c, ts: Date.now() - t }); } }); strm.resume(); } catch(e) { /* Pass the error message back */ postMessage({error: String(e.message || e) }); } }, false); `]))); /* when the worker sends back data, add it to the DOM */ const log = (s, t) => setHTML(h => h + `${s}: ${(t/1000).toFixed(3).padStart(8)} sec\n`); worker.onmessage = function(e) { if(e.data.error) return setState(`Processing Error: ${e.data.error}`); else if(e.data.state) { setState(e.data.state); if(e.data.c) setCnt(e.data.c); if(e.data.ts) setHz((e.data.c || cnt) * 1000 / e.data.ts); if(e.data.state == "done") log("write", e.data.ts); } else if(e.data.time) log(e.data.time, e.data.ts); }; setCnt(0); setHz(0); setState(""); setHTML(""); if(!window.showSaveFilePicker) setState("Browser missing API support!"); else try { /* Show picker and get handle to file */ const wFile = await window.showSaveFilePicker({ suggestedName: "SheetJSStream.csv", types: [ { description: 'csv', accept: { 'text/csv': ['.csv'] } } ] }); /* post a message to the worker with the URL to fetch */ worker.postMessage({url, wFile, each: !!ref.current.checked}); } catch(e) { setState(`Selection Error: ${e && e.message || e}`); } }}><b>Click to Start</b></button> <pre>State: <b>{state}</b><br/>Count: <b>{cnt}</b> <b>({hz|0} Hz)</b></pre> <pre dangerouslySetInnerHTML={{__html}}/> </> ); }