Skip to main content

Mathematica 中的电子表格处理

Mathematica 是一个数学和科学计算的软件系统。它支持命令行工具和 JavaScript 扩展。

¥Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

该演示使用 SheetJS 从电子表格中提取数据,以便在 Mathematica 中进行进一步分析。我们将探讨如何运行外部工具从不透明电子表格生成 CSV 数据并解析 Mathematica 中的数据。

¥This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.

测试部署

此演示由 SheetJS 用户在以下部署中进行了测试:

¥This demo was tested by SheetJS users in the following deployments:

架构版本日期
darwin-x6414.02024-06-05
win10-x6414.02024-06-05

集成详情

¥Integration Details

SheetJS NodeJS 模块 可以在 NodeJS 脚本中加载,包括使用 ExternalEvaluate[^1] Mathematica 函数的 "NodeJS" 模式调用的脚本。

¥The SheetJS NodeJS module can be loaded in NodeJS scripts, including scripts invoked using the "NodeJS" mode of the ExternalEvaluate[^1] Mathematica function.

然而,当前的跨平台建议涉及一个专用的命令行工具,该工具利用 SheetJS 库来执行电子表格处理。

¥However, the current cross-platform recommendation involves a dedicated command line tool that leverages SheetJS libraries to to perform spreadsheet processing.

外部引擎

¥External Engines

下图描绘了练习册华尔兹:

¥The following diagram depicts the workbook waltz:

Mathematica

可以使用 RegisterExternalEvaluator[^2] 从 Mathematica 激活 NodeJS。激活后,JavaScript 代码可以使用 ExternalEvaluate[^3] 运行。如果 NodeJS 代码返回 CSV 数据,ImportString[^4] 可以生成 Dataset[^5]。

¥NodeJS can be activated from Mathematica using RegisterExternalEvaluator[^2]. Once activated, JavaScript code can be run using ExternalEvaluate[^3]. If the NodeJS code returns CSV data, ImportString[^4] can generate a Dataset[^5].

SheetJS

对于驻留在文件系统上的文件,SheetJS readFile 函数 [^6] 可以生成工作簿对象。确切的位置可以通过在 ExternalEvaluate 中打印 require("process").cwd()[^7] 来确定:

¥For a file residing on the filesystem, the SheetJS readFile function[^6] can generate a workbook object. The exact location can be determined by printing require("process").cwd()[^7] in ExternalEvaluate:

In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"

拉取第一个工作表 [^8] 后,SheetJS sheet_to_csv 函数 [^9] 生成一个 CSV 字符串。

¥After pulling the first worksheet[^8], the SheetJS sheet_to_csv function[^9] generates a CSV string.

功能齐全

¥Complete Function

以下函数读取文件,解析第一个工作表并返回一个数据集对象(假设有一个标题行)。

¥The following function reads a file, parses the first worksheet and returns a Dataset object assuming one header row.

SheetJSImportFileEE
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];

(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];

(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]

命令行工具

¥Command-Line Tools

"命令行工具" 演示 创建 xlsx-cli,这是一个命令行工具,用于读取电子表格文件并从第一个工作表生成 CSV 行。

¥The "Command-Line Tools" demo creates xlsx-cli, a command-line tool that reads a spreadsheet file and generates CSV rows from the first worksheet.

ExternalEvaluate[^10] 可以运行命令行工具并捕获标准输出。以下代码片段处理 ~/Downloads/pres.numbers 并将 CSV 数据拉入 Mathematica 中的变量中:

¥ExternalEvaluate[^10] can run command-line tools and capture standard output. The following snippet processes ~/Downloads/pres.numbers and pulls CSV data into a variable in Mathematica:

cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];

ImportString[^11] 可以将 CSV 数据解释为 Dataset[^12]。通常,CSV 输出的第一行是标题行。HeaderLines[^13] 选项控制 Mathematica 如何解析数据:

¥ImportString[^11] can interpret the CSV data as a Dataset[^12]. Typically the first row of the CSV output is the header row. The HeaderLines[^13] option controls how Mathematica parses the data:

data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]

下图描绘了练习册华尔兹:

¥The following diagram depicts the workbook waltz:

完整演示

¥Complete Demo

此演示测试 NodeJS 外部引擎和专用命令行工具。

¥This demo tests the NodeJS external engine and dedicated command line tools.

NodeJS 引擎

¥NodeJS Engine

  1. 安装 NodeJS。测试演示时,安装了版本 20.14.0

    ¥Install NodeJS. When the demo was tested, version 20.14.0 was installed.

  2. 在主文件夹(~$HOME%HOMEPATH%)中安装依赖:

    ¥Install dependencies in the Home folder (~ or $HOME or %HOMEPATH%):

npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz zeromq@6.0.0-beta.19
  1. 打开一个新的 Mathematica Notebook 并注册 NodeJS。当在 Windows 中测试该示例时,命令为:

    ¥Open a new Mathematica Notebook and register NodeJS. When the example was tested in Windows, the commands were:

RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]

RegisterExternalEvaluator 的第二个参数应该是 node 程序的路径,可以通过在新的终端窗口中运行以下命令找到:

¥The second argument to RegisterExternalEvaluator should be the path to the node program, which can be found by running the following command in a new terminal window:

which node

如果注册了 NodeJS,则 "挂号的" 列中的值为 "真的"。

¥If NodeJS is registered, the value in the "Registered" column will be "True".

  1. 要确定基本文件夹,请从 NodeJS 运行 require("process").cwd()

    ¥To determine the base folder, run require("process").cwd() from NodeJS:

ExternalEvaluate["NodeJS", "require('process').cwd()"]
  1. 下载 pres.numbers 并将文件移动到基本文件夹,如上一步所示。

    ¥Download pres.numbers and move the file to the base folder as shown in the previous step.

  2. 将以下代码片段复制(但不要运行)到正在运行的注意本中:

    ¥Copy, but do not run, the following snippet into the running notebook:

SheetJSImportFileEE
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];

(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];

(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]

粘贴后,编辑高亮的行以反映 nodenode.exe 二进制文件的路径。此路径是在步骤 2 中发现的。

¥After pasting, edit the highlighted line to reflect the path of the node or node.exe binary. This path was discovered in Step 2.

编辑代码片段后,运行表达式。

¥After editing the snippet, run the expression.

  1. 运行该函数并确认结果是正确的数据集:

    ¥Run the function and confirm the result is a proper Dataset:

SheetJSImportFileEE["pres.numbers"]

SheetJSImportFileEE result

独立二进制文件

¥Standalone Binary

  1. 创建独立的 xlsx-cli 二进制文件 [^14]。命令应在终端或 PowerShell 窗口中运行:

    ¥Create the standalone xlsx-cli binary[^14]. The commands should be run in a Terminal or PowerShell window:

npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz exit-on-epipe commander@2
curl -LO https://xlsx.nodejs.cn/cli/xlsx-cli.js
npx -y nexe -t 14.15.3 xlsx-cli.js
  1. 将生成的 xlsx-cli 移动到 /usr/local/bin 中的固定位置:

    ¥Move the generated xlsx-cli to a fixed location in /usr/local/bin:

mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/

如果出现权限错误,则应使用 root 用户运行该命令:

¥If there are permission errors, the command should be run with the root user:

sudo mv xlsx-cli /usr/local/bin/

读取本地文件

¥Reading a Local File

  1. 在新的 Mathematica 注意本中,运行以下代码片段:

    ¥In a new Mathematica notebook, run the following snippet:

SheetJSImportFile
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
  1. 下载 https://xlsx.nodejs.cn/pres.numbers 并保存到下载文件夹:

    ¥Download https://xlsx.nodejs.cn/pres.numbers and save to Downloads folder:

cd ~/Downloads/
curl -LO https://xlsx.nodejs.cn/pres.numbers
  1. 在 Mathematica 注意本中,运行新函数。如果文件保存到 Downloads 文件夹,则 macOS 中的路径将为 "~/Downloads/pres.numbers"

    ¥In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be "~/Downloads/pres.numbers" in macOS:

data = SheetJSImportFile["~/Downloads/pres.numbers"]

结果应显示在简洁的表格中。

¥The result should be displayed in a concise table.

SheetJSImportFile result

从 URL 读取

¥Reading from a URL

FetchURL[^15] 从指定的 URL 下载文件并返回该文件的路径。该函数将被封装在一个名为 SheetJSImportURL 的新函数中。

¥FetchURL[^15] downloads a file from a specified URL and returns a path to the file. This function will be wrapped in a new function called SheetJSImportURL.

  1. 在同一注意本中,运行以下命令:

    ¥In the same notebook, run the following:

SheetJSImportURL
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
path = FetchURL[x];
SheetJSImportFile[path]
)];
  1. 通过下载注意本中的测试文件进行测试:

    ¥Test by downloading the test file in the notebook:

data = SheetJSImportURL["https://xlsx.nodejs.cn/pres.numbers"]

SheetJSImportURL result

[^1]: 请参阅 Mathematica 文档中的 ExternalEvaluate Node.js 示例

¥See the ExternalEvaluate Node.js example in the Mathematica documentation.

[^2]: 请参阅 Mathematica 文档中的 RegisterExternalEvaluator

¥See RegisterExternalEvaluator in the Mathematica documentation.

[^3]: 请参阅 Mathematica 文档中的 ExternalEvaluate

¥See ExternalEvaluate in the Mathematica documentation.

[^4]: 请参阅 Mathematica 文档中的 ImportString

¥See ImportString in the Mathematica documentation.

[^5]: 使用 ImportString 中的 "Dataset" 元素 时将创建 Dataset

¥A Dataset will be created when using the "Dataset" element in ImportString

[^6]: 见 readFile 于 "读取文件"

¥See readFile in "Reading Files"

[^7]: 请参阅 NodeJS 文档中的 process.cwd()

¥See process.cwd() in the NodeJS documentation.

[^8]: 工作簿对象的 SheetsSheetNames 属性在 "工作簿对象" 中描述

¥The Sheets and SheetNames properties of workbook objects are described in "Workbook Object"

[^9]: 见 sheet_to_csv 于 "CSV 和文本"

¥See sheet_to_csv in "CSV and Text"

[^10]: 请参阅 Mathematica 文档中的 ExternalEvaluate

¥See ExternalEvaluate in the Mathematica documentation.

[^11]: 请参阅 Mathematica 文档中的 ImportString

¥See ImportString in the Mathematica documentation.

[^12]: 使用 ImportString 中的 "Dataset" 元素 时将创建 Dataset

¥A Dataset will be created when using the "Dataset" element in ImportString

[^13]: 请参阅 Mathematica 文档中的 HeaderLines

¥See HeaderLines in the Mathematica documentation.

[^14]: 详细信息请参见 "命令行工具"

¥See "Command-line Tools" for more details.

[^15]: Mathematica 11 引入了包括 URLRead 在内的新方法。

¥Mathematica 11 introduced new methods including URLRead.