Skip to main content

文件格式

SheetJS 支持读取和写入多种电子表格文件格式。

¥SheetJS supports reading and writing a number of spreadsheet file formats.

格式
Excel 工作表/工作簿格式:-----::-----:
Excel 2007+ XML 格式 (XLSX/XLSM)
Excel 2007+ 二进制格式 (XLSB BIFF12)
Excel 2003-2004 XML 格式 (XML "电子表格 ML")
Excel 97-2004 (XLS BIFF8)
Excel 5.0/95 (XLS BIFF5)
Excel 4.0(XLS/XLW BIFF4)
Excel 3.0 (XLS BIFF3)
Excel 2.0/2.1 / Multiplan 4.x DOS (XLS BIFF2)
Excel 支持的文本格式:-----::-----:
分隔符分隔值 (CSV/TXT)
数据交换格式 (DIF)
符号链接 (SYLK/SLK)
Lotus Formatted Text (PRN)
UTF-16 Unicode 文本 (TXT)
其他工作簿/工作表格式:-----::-----:
Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)
WPS 电子表格 (ET)
开放文档电子表格 (ODS)
扁平 XML ODF 电子表格 (FODS)
统一 Office 格式电子表格(标文通 UOS1/UOS2)
dBASE II/III/IV / Visual FoxPro (DBF)
莲花 1-2-3 (WK1/WK3)
莲花 1-2-3 (WKS/WK2/WK4/123)
Quattro Pro 电子表格 (WQ1/WQ2/WB1/WB2/WB3/QPW)
适用于 1.x-3.x DOS / 2.x-5.x Windows 电子表格 (WKS)
适用于 6.x-9.x 电子表格 (XLR)
其他常见的电子表格输出格式:-----::-----:
HTML 表格
富文本格式表 (RTF)
Ethercalc 记录格式 (ETH)

graph of format support

graph legend

将不会写入给定文件格式不支持的功能。

¥Features not supported by a given file format will not be written.

工作表范围限制

¥Worksheet Range Limits

有范围限制的格式将被静默截断。例如,Lotus WKS 格式有 2048 行的限制,因此第 2048 行之后的数据将不会被保存。

¥Formats with range limits will be silently truncated. For example, the Lotus WKS format has a limit of 2048 rows, so data after the 2048th row will not be saved.

格式最后一个单元格最大列数最大行数
Excel 2007+ XML 格式 (XLSX/XLSM)XFD1048576163841048576
Excel 2007+ 二进制格式 (XLSB BIFF12)XFD1048576163841048576
民数记 13.1(数字)ALL100000010001000000
Quattro Pro 9+ (QPW)IV1000000 2561000000
Excel 97-2004 (XLS BIFF8)IV65536 25665536
Excel 5.0/95 (XLS BIFF5)IV16384 25616384
Excel 4.0 (XLS BIFF4)IV16384 25616384
Excel 3.0 (XLS BIFF3)IV16384 25616384
Excel 2.0/2.1 (XLS BIFF2)IV16384 25616384
莲花 1-2-3 R2 - R5 (WK1/WK3/WK4)IV8192 2568192
莲花 1-2-3 R1 (WKS)IV2048 2562048

Excel 2003 SpreadsheetML 范围限制由 Excel 版本控制,并非由编写者强制执行。

¥Excel 2003 SpreadsheetML range limits are governed by the version of Excel and are not enforced by the writer.

常见文件格式

¥Common File Formats

Excel 2007+ XML (XLSX/XLSM)

XLSX 和 XLSM 文件是包含一系列符合开放打包约定 (OPC) 的 XML 文件的 ZIP 容器。XLSM 格式与 XLSX 几乎相同,用于包含宏的文件。

¥XLSX and XLSM files are ZIP containers containing a series of XML files in accordance with the Open Packaging Conventions (OPC). The XLSM format, almost identical to XLSX, is used for files containing macros.

该格式在 ECMA-376ISO/IEC 29500 中标准化。Excel 不遵循该规范,并且还有其他文档讨论 Excel 如何偏离该规范。

¥The format is standardized in ECMA-376 and ISO/IEC 29500. Excel does not follow the specification, and there are additional documents discussing how Excel deviates from the specification.

Excel 2.0-95(BIFF2/BIFF3/BIFF4/BIFF5)

BIFF 2/3 XLS 是单页二进制记录流。Excel 4 引入了工作簿(XLW 文件)的概念,但也具有单工作表 XLS 格式。该结构与 Lotus 1-2-3 文件格式非常相似。BIFF5/8/12 以各种方式扩展了格式,但很大程度上坚持相同的记录格式。

¥BIFF 2/3 XLS are single-sheet streams of binary records. Excel 4 introduced the concept of a workbook (XLW files) but also had single-sheet XLS format. The structure is largely similar to the Lotus 1-2-3 file formats. BIFF5/8/12 extended the format in various ways but largely stuck to the same record format.

Multiplan 4 "普通的" 文件在结构上与 BIFF2 相同,并使用相同的单元值记录。有一些不同的记录类型可用于更高级的功能,例如打印设置。BIFF2 编写器生成可以在 Multiplan 4 中读取的文件,解析器可以从 "普通的" 文件中提取值。

¥Multiplan 4 "Normal" files are identical in structure to BIFF2 and use the same cell value records. There are some different record types for more advanced features like Print Settings. The BIFF2 writer generates files that can be read in Multiplan 4 and the parser can extract values from "Normal" files.

这些格式都没有官方规范。Excel 95 可以以这些格式写入文件,因此记录长度和字段是通过以所有支持的格式写入并比较文件来确定的。Excel 2016 可以生成 BIFF5 文件,从而启用从 XLSX 或 BIFF2 开始的全套文件测试。

¥There is no official specification for any of these formats. Excel 95 can write files in these formats, so record lengths and fields were determined by writing in all of the supported formats and comparing files. Excel 2016 can generate BIFF5 files, enabling a full suite of file tests starting from XLSX or BIFF2.

Excel 97-2004 Binary (BIFF8)

BIFF8 专门使用复合文件二进制容器格式,将一些内容拆分为文件内的流。从本质上讲,它仍然使用旧版本 BIFF 的二进制记录格式的扩展版本。

¥BIFF8 exclusively uses the Compound File Binary container format, splitting some content into streams within the file. At its core, it still uses an extended version of the binary record format from older versions of BIFF.

MS-XLS 规范涵盖了文件格式的基础知识,其他规范则扩展了属性等功能的序列化。

¥The MS-XLS specification covers the basics of the file format, and other specifications expand on serialization of features like properties.

Excel 2003-2004(SpreadsheetML)

SpreadsheetML 文件早于 XLSX,是简单的 XML 文件。尽管微软已经发布了有关该格式的文档,但还没有正式且全面的规范。由于 Excel 2016 可以生成 SpreadsheetML 文件,因此映射功能非常简单。

¥Predating XLSX, SpreadsheetML files are simple XML files. There is no official and comprehensive specification, although MS has released documentation on the format. Since Excel 2016 can generate SpreadsheetML files, mapping features is pretty straightforward.

Excel 2007+ Binary (XLSB, BIFF12)

XLSB 格式与 XLSX 并行推出,将 BIFF 架构与 XLSX 的内容分离和 ZIP 容器相结合。大多数情况下,XLSX 子文件中的节点可以映射到相应子文件中的 XLSB 记录。

¥Introduced in parallel with XLSX, the XLSB format combines the BIFF architecture with the content separation and ZIP container of XLSX. For the most part nodes in an XLSX sub-file can be mapped to XLSB records in a corresponding sub-file.

MS-XLSB 规范涵盖了文件格式的基础知识,其他规范则扩展了属性等功能的序列化。

¥The MS-XLSB specification covers the basics of the file format, and other specifications expand on serialization of features like properties.

分隔符分隔值 (CSV/TXT)

¥Delimiter-Separated Values (CSV/TXT)

Excel CSV 在许多重要方面与 RFC4180 有所不同。生成的 CSV 文件通常应在 Excel 中运行,但可能无法在 RFC4180 兼容阅读器中运行。解析器通常应该理解 Excel CSV。如果值不可用,编写器会主动生成公式单元格。

¥Excel CSV deviates from RFC4180 in a number of important ways. The generated CSV files should generally work in Excel although they may not work in RFC4180 compatible readers. The parser should generally understand Excel CSV. The writer proactively generates cells for formulae if values are unavailable.

Excel TXT 使用制表符作为分隔符,代码页为 1200。

¥Excel TXT uses tab as the delimiter and code page 1200.

与 Excel 中一样,以 0x49 0x44 ("ID") 开头的文件被视为符号链接文件。与 Excel 不同,如果文件没有有效的 SYLK 标头,它将主动重新解释为 CSV。有些带有分号分隔符的文件与有效的 SYLK 文件对齐。为了实现最广泛的兼容性,所有值为 ID 的单元格都会自动用双引号引起来。

¥Like in Excel, files starting with 0x49 0x44 ("ID") are treated as Symbolic Link files. Unlike Excel, if the file does not have a valid SYLK header, it will be proactively reinterpreted as CSV. There are some files with semicolon delimiter that align with a valid SYLK file. For the broadest compatibility, all cells with the value of ID are automatically wrapped in double-quotes.

HTML

Excel HTML 工作表包含以样式编码的特殊元数据。例如,mso-number-format 是包含数字格式的本地化字符串。尽管有元数据,但输出是有效的 HTML,尽管它确实接受裸 & 符号。

¥Excel HTML worksheets include special metadata encoded in styles. For example, mso-number-format is a localized string containing the number format. Despite the metadata the output is valid HTML, although it does accept bare & symbols.

作者通过 t 标签将类型元数据添加到 TD 元素。解析器查找这些标签并覆盖默认解释。例如,像 <td>12345</td> 这样的文本将被解析为数字,但 <td t="s">12345</td> 将被解析为文本。

¥The writer adds type metadata to the TD elements via the t tag. The parser looks for those tags and overrides the default interpretation. For example, text like <td>12345</td> will be parsed as numbers but <td t="s">12345</td> will be parsed as text.

各种工作簿格式

¥Miscellaneous Workbook Formats

对其他格式的支持通常远远落后于 XLS/XLSB/XLSX 支持,部分原因是缺乏公开可用的文档。测试文件在各自的应用中生成,并与其 XLS 导出进行比较以确定结构。主要关注点是数据提取。

¥Support for other formats is generally far behind XLS/XLSB/XLSX support, due in part to a lack of publicly available documentation. Test files were produced in the respective apps and compared to their XLS exports to determine structure. The main focus is data extraction.

Lotus 1-2-3 (WKS/WK1/WK2/WK3/WK4/123)

Lotus 格式由类似于 BIFF 结构的二进制记录组成。Lotus 确实在几十年前发布了涵盖原始 WK1 格式的规范。其他功能是通过生成文件并与 Excel 支持进行比较来推断的。

¥The Lotus formats consist of binary records similar to the BIFF structure. Lotus did release a specification decades ago covering the original WK1 format. Other features were deduced by producing files and comparing to Excel support.

生成的 WK1 工作表与 Lotus 1-2-3 R2 和 Excel 5.0 兼容。

¥Generated WK1 worksheets are compatible with Lotus 1-2-3 R2 and Excel 5.0.

生成的 WK3 工作簿与 Lotus 1-2-3 R9 和 Excel 5.0 兼容。

¥Generated WK3 workbooks are compatible with Lotus 1-2-3 R9 and Excel 5.0.

Quattro Pro (WQ1/WQ2/WB1/WB2/WB3/QPW)

Quattro Pro 格式使用二进制记录的方式与 BIFF 和 Lotus 相同。一些较新的格式(即 WB3 和 QPW)使用 CFB 外壳,就像 BIFF8 XLS 一样。

¥The Quattro Pro formats use binary records in the same way as BIFF and Lotus. Some of the newer formats (namely WB3 and QPW) use a CFB enclosure just like BIFF8 XLS.

适用于 DOS / Windows Spreadsheet (WKS/XLR)

¥Works for DOS / Windows Spreadsheet (WKS/XLR)

Works 的所有版本都仅限于单个工作表。

¥All versions of Works were limited to a single worksheet.

适用于 DOS 1.x - 3.x 和 Works for Windows 2.x 使用其他记录类型扩展了 Lotus WKS 格式。

¥Works for DOS 1.x - 3.x and Works for Windows 2.x extends the Lotus WKS format with additional record types.

适用于 Windows 3.x - 5.x 使用相同的格式和 WKS 扩展名。BOF 记录的类型为 0xFF

¥Works for Windows 3.x - 5.x uses the same format and WKS extension. The BOF record has type 0xFF

适用于 Windows 6.x - 9.x 使用 XLR 格式。XLR 与 BIFF8 XLS 几乎相同:它使用带有 Workbook 流的 CFB 容器。Works 9 为 XLR 和 97-2003 XLS 导出保存准确的工作簿流。Works 6 XLS 包括两个空工作表,但主工作表具有相同的编码。XLR 还包括类似于 Lotus FM3/FMT 文件的 WksSSWorkBook 流。

¥Works for Windows 6.x - 9.x use the XLR format. XLR is nearly identical to BIFF8 XLS: it uses the CFB container with a Workbook stream. Works 9 saves the exact Workbook stream for the XLR and the 97-2003 XLS export. Works 6 XLS includes two empty worksheets but the main worksheet has an identical encoding. XLR also includes a WksSSWorkBook stream similar to Lotus FM3/FMT files.

Numbers 3.0+ / iWork 2013+ Spreadsheet (NUMBERS)

iWork 2013(Numbers 3.0 / Pages 5.0 / Keynote 6.0)从基于 XML 的专有格式切换到基于 iWork Archive (IWA) 的当前文件格式。此格式已在当前版本 (Numbers 13.1) 以及 Numbers 的 iCloud.com Web 界面中使用完毕。

¥iWork 2013 (Numbers 3.0 / Pages 5.0 / Keynote 6.0) switched from a proprietary XML-based format to the current file format based on the iWork Archive (IWA). This format has been used up through the current release (Numbers 13.1) as well as the iCloud.com web interface to Numbers.

解析器专注于从表中提取原始数据。从技术上讲,Numbers 支持逻辑工作表中的多个表格,包括自定义标题。该解析器将为每个 Numbers 表生成一个工作表。

¥The parser focuses on extracting raw data from tables. Numbers technically supports multiple tables in a logical worksheet, including custom titles. This parser will generate one worksheet per Numbers table.

作者每页生成一个表格。

¥The writer generates one table per sheet.

OpenDocument 电子表格 (ODS/FODS)

¥OpenDocument Spreadsheet (ODS/FODS)

ODS 是一种类似于 XLSX 的 XML-in-ZIP 格式,而 FODS 是一种类似于 SpreadsheetML 的 XML 格式。两者在 OASIS 标准中都有详细说明,但 LibreOffice 添加了未记录的扩展。解析器和编写器没有实现完整的标准,而是专注于提取和存储原始数据所需的部分。

¥ODS is an XML-in-ZIP format akin to XLSX while FODS is an XML format akin to SpreadsheetML. Both are detailed in the OASIS standard, but LibreOffice adds undocumented extensions. The parsers and writers do not implement the full standard, instead focusing on parts necessary to extract and store raw data.

统一办公电子表格(UOS1/2)

¥Uniform Office Spreadsheet (UOS1/2)

UOS 是一种非常相似的格式,它有两种类型,分别对应于 ODS 和 FODS。大多数情况下,格式之间的差异在于标签和属性的名称。

¥UOS is a very similar format, and it comes in 2 varieties corresponding to ODS and FODS respectively. For the most part, the difference between the formats is in the names of tags and attributes.

WPS Office 电子表格 (ET)

¥WPS Office Spreadsheet (ET)

ET 是 WPS Office 电子表格的原生格式。它通过专有扩展扩展了 BIFF8 XLS 格式。

¥ET is the native format for WPS Office Spreadsheet. It extends the BIFF8 XLS format with proprietary extensions.

各种工作表格式

¥Miscellaneous Worksheet Formats

许多旧格式仅支持一张工作表:

¥Many older formats supported only one worksheet:

dBASE 和 Visual FoxPro (DBF)

¥dBASE and Visual FoxPro (DBF)

DBF 是一种类型表格式:每列只能保存一种数据类型,并且每条记录都会省略类型信息。解析器生成标题行并从工作表的第二行开始插入记录。作者使文件与 Visual FoxPro 扩展兼容。

¥DBF is a typed table format: each column can only hold one data type and each record omits type information. The parser generates a header row and inserts records starting at the second row of the worksheet. The writer makes files compatible with Visual FoxPro extensions.

目前不支持外部备忘录和表格等多文件扩展名,受限于在 Web 浏览器中读取任意文件的一般能力。读者了解 DBF 7 级扩展,如 DATETIME

¥Multi-file extensions like external memos and tables are currently unsupported, limited by the general ability to read arbitrary files in the web browser. The reader understands DBF Level 7 extensions like DATETIME.

¥Symbolic Link (SYLK)

https://oss.sheetjs.com/notes/sylk 是基于我们的实验和之前的文档工作的非正式规范。

¥https://oss.sheetjs.com/notes/sylk is an informal specification based on our experimentation and previous documentation efforts.

Lotus Formatted Text (PRN)

没有真正的文档,事实上 Excel 将 PRN 视为仅输出文件格式。尽管如此,我们可以猜测列宽并对原始布局进行逆向工程。Excel 的 240 个字符宽度限制不是强制执行的。

¥There is no real documentation, and in fact Excel treats PRN as an output-only file format. Nevertheless we can guess the column widths and reverse-engineer the original layout. Excel's 240 character width limitation is not enforced.

数据交换格式 (DIF)

¥Data Interchange Format (DIF)

没有统一的定义。Visicalc DIF 不同于 Lotus DIF,也不同于 Excel DIF。如果不明确,解析器/编写器将遵循 Excel 的预期行为。特别是,Excel 以不兼容的方式扩展了 DIF:

¥There is no unified definition. Visicalc DIF differs from Lotus DIF, and both differ from Excel DIF. Where ambiguous, the parser/writer follows the expected behavior from Excel. In particular, Excel extends DIF in incompatible ways:

  • 由于 Excel 会自动将数字字符串转换为数字,因此数字字符串常量会转换为公式:"0.3" -> "=""0.3""

    ¥Since Excel automatically converts numbers-as-strings to numbers, numeric string constants are converted to formulae: "0.3" -> "=""0.3""

  • 从技术上讲,DIF 期望数字单元格保存原始数字数据,但 Excel 允许格式化数字(包括日期)

    ¥DIF technically expects numeric cells to hold the raw numeric data, but Excel permits formatted numbers (including dates)

  • DIF 从技术上讲不支持公式,但 Excel 会自动转换普通公式。不保留数组公式。

    ¥DIF technically has no support for formulae, but Excel will automatically convert plain formulae. Array formulae are not preserved.

富文本格式 (RTF)

¥Rich Text Format (RTF)

从工作表复制单元格或范围时,Excel RTF 工作表存储在剪贴板中。支持的代码是 Word RTF 支持的子集。

¥Excel RTF worksheets are stored in clipboard when copying cells or ranges from a worksheet. The supported codes are a subset of the Word RTF support.

Ethercalc 记录格式 (ETH)

¥Ethercalc Record Format (ETH)

Ethercalc 是一个开源网络电子表格,其记录格式让人想起封装在 MIME 多部分消息中的 SYLK。

¥Ethercalc was an open source web spreadsheet powered by a record format reminiscent of SYLK wrapped in a MIME multi-part message.