读取文件
SheetJS 读取文件的主要方法是 read
。它希望开发者以受支持的表示形式提供实际数据。
¥The main SheetJS method for reading files is read
. It expects developers to
supply the actual data in a supported representation.
readFile
辅助方法接受文件名并尝试使用标准 API 读取指定的文件。它在网络浏览器中不起作用!
¥The readFile
helper method accepts a filename and tries to read the specified
file using standard APIs. It does not work in web browsers!
解析文件数据并生成 SheetJS 工作簿对象
¥Parse file data and generate a SheetJS workbook object
var wb = XLSX.read(data, opts);
read
尝试解析 data
并返回 工作簿对象
¥read
attempts to parse data
and return a workbook object
opts
对象的 type
属性控制如何解释 data
。对于字符串数据,默认解释是 Base64。
¥The type
property of the opts
object controls how data
is
interpreted. For string data, the default interpretation is Base64.
读取指定文件并生成 SheetJS 工作簿对象
¥Read a specified file and generate a SheetJS workbook object
var wb = XLSX.readFile(filename, opts);
readFile
尝试读取指定 filename
的本地文件。
¥readFile
attempts to read a local file with specified filename
.
readFile
适用于特定平台。它不支持网络浏览器!
¥readFile
works in specific platforms. It does not support web browsers!
NodeJS 安装注意事项 包含针对非标准用例的附加说明。
¥The NodeJS installation note includes additional instructions for non-standard use cases.
解析选项
¥Parsing Options
读取函数接受一个选项参数:
¥The read functions accept an options argument:
选项名称 | 默认 | 描述 |
---|---|---|
type | 输入数据表示 | |
raw | false | 如果为 true,纯文本解析将不会解析值 ** |
dense | false | 如果为真,请使用 密集工作表表示 |
codepage | 如果指定,则在适当时使用代码页** | |
cellFormula | true | 保存 .f 字段的公式 |
cellHTML | true | 解析富文本并将 HTML 保存到 .h 字段 |
cellNF | false | 将数字格式字符串保存到 .z 字段 |
cellStyles | false | 将样式/主题信息保存到 .s 字段 |
cellText | true | 生成格式化文本到 .w 字段 |
cellDates | false | 将日期存储为 d 类型(默认为 n ) |
dateNF | 如果指定,请使用日期代码 14 ** 的字符串 | |
sheetStubs | false | 为存根单元创建 z 类型的单元对象 |
sheetRows | 0 | 如果 >0,则读取 指定行数 |
bookDeps | false | 如果为 true,则解析计算链 |
bookFiles | false | 如果为 true,则将原始文件添加到图书对象 ** |
bookProps | false | 如果为真,则仅解析足以获取图书元数据** |
bookSheets | false | 如果为 true,则仅解析足以获取工作表名称的内容 |
bookVBA | false | 如果为 true,则生成 VBA 块 |
password | "" | 如果已定义且文件已加密,则使用密码 ** |
WTF | false | 如果为 true,则对意外的文件功能抛出错误 ** |
sheets | 如果指定,则仅解析指定的工作表 ** | |
nodim | false | 如果为真,则计算 工作表范围 |
PRN | false | 如果为 true,则允许解析 PRN 文件 ** |
xlfn | false | 如果为 true,则公式中的 保留前缀 |
FS | DSV 字段分隔符覆盖 | |
UTC | true | 如果明确为 false,则解析本地时间的文本日期 |
-
即使
cellNF
为假,也会生成格式化文本并保存到.w
¥Even if
cellNF
is false, formatted text will be generated and saved to.w
-
在某些情况下,即使
bookSheets
为 false,也可能会解析工作表。¥In some cases, sheets may be parsed even if
bookSheets
is false. -
Excel 积极尝试解释 CSV 和其他纯文本中的值。这会导致令人惊讶的行为!
raw
选项禁止值解析。¥Excel aggressively tries to interpret values from CSV and other plain text. This leads to surprising behavior! The
raw
option suppresses value parsing. -
bookSheets
和bookProps
结合起来给出两组信息¥
bookSheets
andbookProps
combine to give both sets of information -
如果
bookDeps
为 false,则Deps
将为空对象¥
Deps
will be an empty object ifbookDeps
is false -
bookFiles
行为取决于文件类型:¥
bookFiles
behavior depends on file type:-
基于 ZIP 的格式的
keys
数组(ZIP 中的路径)¥
keys
array (paths in the ZIP) for ZIP-based formats -
ZIP 的
files
哈希(将路径映射到代表文件的对象)¥
files
hash (mapping paths to objects representing the files) for ZIP -
使用 CFB 容器的格式的
cfb
对象¥
cfb
object for formats using CFB containers
-
-
默认情况下,所有工作表都会被解析。
sheets
根据输入类型进行限制:¥By default all worksheets are parsed.
sheets
restricts based on input type:-
数字:要解析的工作表的从零开始的索引(
0
是第一个工作表)¥number: zero-based index of worksheet to parse (
0
is first worksheet) -
字符串:要解析的工作表名称(不区分大小写)
¥string: name of worksheet to parse (case insensitive)
-
用于选择多个工作表的数字和字符串数组。
¥array of numbers and strings to select multiple worksheets.
-
-
codepage
应用于 BIFF2 - 没有CodePage
记录的 BIFF5 文件和type:"binary"
中没有 BOM 的 CSV 文件。BIFF8 XLS 始终默认为 1200。¥
codepage
is applied to BIFF2 - BIFF5 files withoutCodePage
records and to CSV files without BOM intype:"binary"
. BIFF8 XLS always defaults to 1200. -
PRN
影响对没有公共分隔符的文本文件的解析。¥
PRN
affects parsing of text files without a common delimiter character. -
目前仅支持 XOR 加密。使用其他加密方法的文件将引发不受支持的错误。
¥Currently only XOR encryption is supported. Unsupported error will be thrown for files employing other encryption methods.
-
WTF
主要是为了开发。默认情况下,解析器将抑制单个工作表上的读取错误,从而允许你从正确解析的工作表中进行读取。设置WTF:true
会强制抛出这些错误。¥
WTF
is mainly for development. By default, the parser will suppress read errors on single worksheets, allowing you to read from the worksheets that do parse properly. SettingWTF:true
forces those errors to be thrown. -
UTC
适用于 CSV、文本和 HTML 格式。当显式设置为false
时,解析器将假定文件是在本地时间指定的。默认情况下,与其他文件格式一样,日期和时间以 UTC 格式解释。¥
UTC
applies to CSV, Text and HTML formats. When explicitly set tofalse
, the parsers will assume the files are specified in local time. By default, as is the case for other file formats, dates and times are interpreted in UTC.
密集
¥Dense
SheetJS 数据模型文档的 "单元格储存" 部分更详细地解释了工作表表示。
¥The "Cell Storage" section of the SheetJS Data Model documentation explains the worksheet representation in more detail.
处理 SheetJS 工作簿对象的实用函数 通常处理稀疏和密集工作表。
¥Utility functions that process SheetJS workbook objects typically process both sparse and dense worksheets.
范围
¥Range
某些文件格式(包括 XLSX 和 XLS)可以自行报告工作表范围。默认情况下使用自我报告的范围。
¥Some file formats, including XLSX and XLS, can self-report worksheet ranges. The self-reported ranges are used by default.
如果设置了 sheetRows
选项,则将从工作表中解析最多 sheetRows
行。查看 JSON 对象输出时将生成 sheetRows-1
行(因为解析数据时标题行被计为一行)。工作表的 !ref
属性将保存调整后的范围。对于自行报告工作表范围的格式,!fullref
属性将保存原始范围。
¥If the sheetRows
option is set, up to sheetRows
rows will be parsed from the
worksheets. sheetRows-1
rows will be generated when looking at the JSON object
output (since the header row is counted as a row when parsing the data). The
!ref
property of the worksheet will hold the adjusted range. For formats that
self-report sheet ranges, the !fullref
property will hold the original range.
nodim
选项指示解析器忽略自我报告的范围并使用工作表中的实际单元格来确定范围。这解决了不合规第三方导出器的已知问题。
¥The nodim
option instructs the parser to ignore self-reported ranges and use
the actual cells in the worksheet to determine the range. This addresses known
issues with non-compliant third-party exporters.
公式
¥Formulae
对于某些文件格式,必须显式启用 cellFormula
选项以确保提取公式。
¥For some file formats, the cellFormula
option must be explicitly enabled to
ensure that formulae are extracted.
较新的 Excel 函数使用 _xlfn.
前缀进行序列化,对用户隐藏。SheetJS 会正常剥离 _xlfn.
。xlfn
选项保留它们。"公式" 文档 更详细地介绍了这一点。
¥Newer Excel functions are serialized with the _xlfn.
prefix, hidden from the
user. SheetJS will strip _xlfn.
normally. The xlfn
option preserves them.
The "Formulae" docs
covers this in more detail.
"公式" 更详细地介绍了这些功能。
¥"Formulae" covers the features in more detail.
VBA
解析启用宏的文件时,如果 bookVBA
选项为 true
,则原始 VBA Blob 将存储在工作簿的 vbaraw
属性中。
¥When a macro-enabled file is parsed, if the bookVBA
option is true
, the raw
VBA blob will be stored in the vbaraw
property of the workbook.
"VBA 和宏" 更详细地介绍了这些功能。
¥"VBA and Macros" covers the features in more detail.
Implementation Details (click to show)
The bookVBA
merely exposes the raw VBA CFB object. It does not parse the data.
XLSM and XLSB store the VBA CFB object in xl/vbaProject.bin
. BIFF8 XLS mixes
the VBA entries alongside the core Workbook entry, so the library generates a
new blob from the XLS CFB container that works in XLSM and XLSB files.
输入类型
¥Input Type
read
的 type
参数控制如何解释数据:
¥The type
parameter for read
controls how data is interpreted:
type | 预期输入 |
---|---|
"base64" | 字符串:文件的 Base64 编码 |
"binary" | 字符串:二进制字符串(字节 n 是 data.charCodeAt(n) ) |
"string" | 字符串:JS 字符串(仅适用于 UTF-8 文本格式) |
"buffer" | Node.js 缓冲区 |
"array" | 数组8 位无符号整数数组(字节 n 是 data[n] ) |
"file" | 字符串:将读取的文件的路径(仅限 nodejs) |
一些常见类型是从数据输入类型自动推导出来的,包括 NodeJS Buffer
对象、Uint8Array
和 ArrayBuffer
对象以及数字数组。
¥Some common types are automatically deduced from the data input type, including
NodeJS Buffer
objects, Uint8Array
and ArrayBuffer
objects, and arrays of
numbers.
当传递 JS string
而没有 type
时,库假定数据是 Base64 字符串。FileReader#readAsBinaryString
或 ASCII 数据需要 "binary"
类型。包括 FileReader#readAsText
的 DOM 字符串应使用类型 "string"
。
¥When a JS string
is passed with no type
, the library assumes the data is a
Base64 string. FileReader#readAsBinaryString
or ASCII data requires "binary"
type. DOM strings including FileReader#readAsText
should use type "string"
.
猜测文件类型
¥Guessing File Type
Implementation Details (click to show)
Excel and other spreadsheet tools read the first few bytes and apply other
heuristics to determine a file type. This enables file type punning: renaming
files with the .xls
extension will tell your computer to use Excel to open the
file but Excel will know how to handle it. This library applies similar logic:
Byte 0 | Raw File Type | Spreadsheet Types |
---|---|---|
0xD0 | CFB Container | BIFF 5/8 or protected XLSX/XLSB or WQ3/QPW or XLR |
0x09 | BIFF Stream | BIFF 2/3/4/5 |
0x3C | XML/HTML | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x50 | ZIP Archive | XLSB or XLSX/M or ODS or UOS2 or NUMBERS or text |
0x49 | Plain Text | SYLK or plain text |
0x54 | Plain Text | DIF or plain text |
0xEF | UTF-8 Text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0xFF | UTF-16 Text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x00 | Record Stream | Lotus WK* or Quattro Pro or plain text |
0x7B | Plain text | RTF or plain text |
0x0A | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x0D | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
0x20 | Plain text | SpreadsheetML / Flat ODS / UOS1 / HTML / plain text |
DBF files are detected based on the first byte as well as the third and fourth bytes (corresponding to month and day of the file date)
Works for Windows files are detected based on the BOF
record with type 0xFF
Plain text format guessing follows the priority order:
Format | Test |
---|---|
XML | <?xml appears in the first 1024 characters |
HTML | starts with < and HTML tags appear in the first 1024 characters |
XML | starts with < and the first tag is valid |
RTF | starts with {\rt |
DSV | starts with sep= followed by field delimiter and line separator |
DSV | more unquoted | chars than ; \t or , in the first 1024 |
DSV | more unquoted ; chars than \t or , in the first 1024 |
TSV | more unquoted \t chars than , chars in the first 1024 |
CSV | one of the first 1024 characters is a comma "," |
ETH | starts with socialcalc:version: |
PRN | PRN option is set to true |
CSV | (fallback) |
HTML tags include html
, table
, head
, meta
, script
, style
, div
Why are random text files valid? (click to hide)
Excel 在读取文件时非常积极。将 XLS 扩展名添加到任何文本文件(其中唯一的字符是 ANSI 显示字符)会诱骗 Excel 将文件视为 CSV 或 TSV 文件,即使结果没有用!这个库试图复制这种行为。
¥Excel is extremely aggressive in reading files. Adding the XLS extension to any text file (where the only characters are ANSI display chars) tricks Excel into processing the file as if it were a CSV or TSV file, even if the result is not useful! This library attempts to replicate that behavior.
最好的方法是验证所需的工作表并确保其具有预期的行数或列数。提取范围非常简单:
¥The best approach is to validate the desired worksheet and ensure it has the expected number of rows or columns. Extracting the range is extremely simple:
var range = XLSX.utils.decode_range(worksheet['!ref']);
var ncols = range.e.c - range.s.c + 1, nrows = range.e.r - range.s.r + 1;