Mathematica 中的电子表格处理
Mathematica 是一个数学和科学计算的软件系统。它支持命令行工具和 JavaScript 扩展。
¥Mathematica is a software system for mathematics and scientific computing. It supports command-line tools and JavaScript extensions.
SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。
¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.
该演示使用 SheetJS 从电子表格中提取数据,以便在 Mathematica 中进行进一步分析。我们将探讨如何运行外部工具从不透明电子表格生成 CSV 数据并解析 Mathematica 中的数据。
¥This demo uses SheetJS to pull data from a spreadsheet for further analysis within Mathematica. We'll explore how to run an external tool to generate CSV data from opaque spreadsheets and parse the data from Mathematica.
此演示由 SheetJS 用户在以下部署中进行了测试:
¥This demo was tested by SheetJS users in the following deployments:
架构 | 版本 | 日期 |
---|---|---|
darwin-x64 | 14.0 | 2024-06-05 |
win10-x64 | 14.0 | 2024-06-05 |
集成详情
¥Integration Details
SheetJS NodeJS 模块 可以在 NodeJS 脚本中加载,包括使用 ExternalEvaluate
[^1] Mathematica 函数的 "NodeJS"
模式调用的脚本。
¥The SheetJS NodeJS module can be
loaded in NodeJS scripts, including scripts invoked using the "NodeJS"
mode
of the ExternalEvaluate
[^1] Mathematica function.
然而,当前的跨平台建议涉及一个专用的命令行工具,该工具利用 SheetJS 库来执行电子表格处理。
¥However, the current cross-platform recommendation involves a dedicated command line tool that leverages SheetJS libraries to to perform spreadsheet processing.
外部引擎
¥External Engines
下图描绘了练习册华尔兹:
¥The following diagram depicts the workbook waltz:
Mathematica
可以使用 RegisterExternalEvaluator
[^2] 从 Mathematica 激活 NodeJS。激活后,JavaScript 代码可以使用 ExternalEvaluate
[^3] 运行。如果 NodeJS 代码返回 CSV 数据,ImportString
[^4] 可以生成 Dataset
[^5]。
¥NodeJS can be activated from Mathematica using RegisterExternalEvaluator
[^2].
Once activated, JavaScript code can be run using ExternalEvaluate
[^3]. If the
NodeJS code returns CSV data, ImportString
[^4] can generate a Dataset
[^5].
SheetJS
对于驻留在文件系统上的文件,SheetJS readFile
函数 [^6] 可以生成工作簿对象。确切的位置可以通过在 ExternalEvaluate
中打印 require("process").cwd()
[^7] 来确定:
¥For a file residing on the filesystem, the SheetJS readFile
function[^6] can
generate a workbook object. The exact location can be determined by printing
require("process").cwd()
[^7] in ExternalEvaluate
:
In[1]:= ExternalEvaluate["NodeJS", "require('process').cwd()"]
Out[1]= "C:\Users\Me\Documents"
拉取第一个工作表 [^8] 后,SheetJS sheet_to_csv
函数 [^9] 生成一个 CSV 字符串。
¥After pulling the first worksheet[^8], the SheetJS sheet_to_csv
function[^9]
generates a CSV string.
功能齐全
¥Complete Function
以下函数读取文件,解析第一个工作表并返回一个数据集对象(假设有一个标题行)。
¥The following function reads a file, parses the first worksheet and returns a Dataset object assuming one header row.
- Linux/MacOS
- Windows
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
命令行工具
¥Command-Line Tools
"命令行工具" 演示 创建 xlsx-cli
,这是一个命令行工具,用于读取电子表格文件并从第一个工作表生成 CSV 行。
¥The "Command-Line Tools" demo creates xlsx-cli
, a
command-line tool that reads a spreadsheet file and generates CSV rows from the
first worksheet.
ExternalEvaluate
[^10] 可以运行命令行工具并捕获标准输出。以下代码片段处理 ~/Downloads/pres.numbers
并将 CSV 数据拉入 Mathematica 中的变量中:
¥ExternalEvaluate
[^10] can run command-line tools and capture standard output.
The following snippet processes ~/Downloads/pres.numbers
and pulls CSV data
into a variable in Mathematica:
cmd = "/usr/local/bin/xlsx-cli ~/Downloads/pres.numbers"
csvdata = ExternalEvaluate["Shell" -> "StandardOutput", cmd];
ImportString
[^11] 可以将 CSV 数据解释为 Dataset
[^12]。通常,CSV 输出的第一行是标题行。HeaderLines
[^13] 选项控制 Mathematica 如何解析数据:
¥ImportString
[^11] can interpret the CSV data as a Dataset
[^12]. Typically the
first row of the CSV output is the header row. The HeaderLines
[^13] option
controls how Mathematica parses the data:
data = ImportString[csvdata, "Dataset", "HeaderLines" -> 1]
下图描绘了练习册华尔兹:
¥The following diagram depicts the workbook waltz:
完整演示
¥Complete Demo
此演示测试 NodeJS 外部引擎和专用命令行工具。
¥This demo tests the NodeJS external engine and dedicated command line tools.
NodeJS 引擎
¥NodeJS Engine
-
安装 NodeJS。测试演示时,安装了版本
20.14.0
。¥Install NodeJS. When the demo was tested, version
20.14.0
was installed. -
在主文件夹(
~
或$HOME
或%HOMEPATH%
)中安装依赖:¥Install dependencies in the Home folder (
~
or$HOME
or%HOMEPATH%
):
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz zeromq@6.0.0-beta.19
-
打开一个新的 Mathematica Notebook 并注册 NodeJS。当在 Windows 中测试该示例时,命令为:
¥Open a new Mathematica Notebook and register NodeJS. When the example was tested in Windows, the commands were:
- Linux/MacOS
- Windows
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"]
FindExternalEvaluators["NodeJS"]
RegisterExternalEvaluator
的第二个参数应该是 node
程序的路径,可以通过在新的终端窗口中运行以下命令找到:
¥The second argument to RegisterExternalEvaluator
should be the path to the
node
program, which can be found by running the following command in a new
terminal window:
which node
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"]
FindExternalEvaluators["NodeJS"]
RegisterExternalEvaluator
的第二个参数应该是 node.exe
程序的路径,可以通过在新的 PowerShell 窗口中运行以下命令找到:
¥The second argument to RegisterExternalEvaluator
should be the path to the
node.exe
program, which can be found by running the following command in a new
PowerShell window:
Get-Command node.exe
如果注册了 NodeJS,则 "挂号的" 列中的值为 "真的"。
¥If NodeJS is registered, the value in the "Registered" column will be "True".
-
要确定基本文件夹,请从 NodeJS 运行
require("process").cwd()
:¥To determine the base folder, run
require("process").cwd()
from NodeJS:
ExternalEvaluate["NodeJS", "require('process').cwd()"]
-
下载
pres.numbers
并将文件移动到基本文件夹,如上一步所示。¥Download
pres.numbers
and move the file to the base folder as shown in the previous step. -
将以下代码片段复制(但不要运行)到正在运行的注意本中:
¥Copy, but do not run, the following snippet into the running notebook:
- Linux/MacOS
- Windows
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","/usr/local/bin/node"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
(* Import file stored in the Documents folder (e.g. C:\Users\Me\Documents) *)
SheetJSImportFileEE[filename_]:=Module[{csv}, (
(* This was required in local testing *)
RegisterExternalEvaluator["NodeJS","C:\\Program Files\\nodejs\\node.exe"];
(* Generate CSV from first sheet *)
csv:=ExternalEvaluate["NodeJS", StringJoin[
(* module installed in home directory *)
"var XLSX = require('xlsx');",
(* read specified filename *)
"var wb = XLSX.readFile('",filename,"');",
(* grab first worksheet *)
"var ws = wb.Sheets[wb.SheetNames[0]];",
(* convert to CSV *)
"XLSX.utils.sheet_to_csv(ws)"
]];
(* Parse CSV into a dataset *)
Return[ImportString[csv, "Dataset", "HeaderLines"->1]];
)]
粘贴后,编辑高亮的行以反映 node
或 node.exe
二进制文件的路径。此路径是在步骤 2 中发现的。
¥After pasting, edit the highlighted line to reflect the path of the node
or
node.exe
binary. This path was discovered in Step 2.
编辑代码片段后,运行表达式。
¥After editing the snippet, run the expression.
-
运行该函数并确认结果是正确的数据集:
¥Run the function and confirm the result is a proper Dataset:
SheetJSImportFileEE["pres.numbers"]
独立二进制文件
¥Standalone Binary
-
创建独立的
xlsx-cli
二进制文件 [^14]。命令应在终端或 PowerShell 窗口中运行:¥Create the standalone
xlsx-cli
binary[^14]. The commands should be run in a Terminal or PowerShell window:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz exit-on-epipe commander@2
curl -LO https://xlsx.nodejs.cn/cli/xlsx-cli.js
npx -y nexe -t 14.15.3 xlsx-cli.js
- Linux/MacOS
- Windows
-
将生成的
xlsx-cli
移动到/usr/local/bin
中的固定位置:¥Move the generated
xlsx-cli
to a fixed location in/usr/local/bin
:
mkdir -p /usr/local/bin
mv xlsx-cli /usr/local/bin/
如果出现权限错误,则应使用 root 用户运行该命令:
¥If there are permission errors, the command should be run with the root user:
sudo mv xlsx-cli /usr/local/bin/
-
找到当前目录:
¥Find the current directory:
pwd
生成的二进制文件在显示的路径中将为 xlsx-cli.exe
。
¥The generated binary will be xlsx-cli.exe
in the displayed path.
读取本地文件
¥Reading a Local File
- Linux/MacOS
- Windows
-
在新的 Mathematica 注意本中,运行以下代码片段:
¥In a new Mathematica notebook, run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
-
在新的 Mathematica 注意本中,复制但不运行以下代码片段:
¥In a new Mathematica notebook, copy but do not run the following snippet:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"/usr/local/bin/xlsx-cli " <> x
]], "Dataset", "HeaderLines" -> 1]
将字符串中的 /usr/local/bin/xlsx-cli
更改为生成的 xlsx-cli.exe
二进制文件的路径。例如,如果步骤 2 中的路径为 C:\Users\Me\Documents\
,则代码应为:
¥Change /usr/local/bin/xlsx-cli
in the string to the path to the generated
xlsx-cli.exe
binary. For example, if the path in step 2 was
C:\Users\Me\Documents\
, then the code should be:
SheetJSImportFile[x_] := ImportString[Block[{Print}, ExternalEvaluate[
"Shell" -> "StandardOutput",
"C:\\Users\\Me\\Documents\\xlsx-cli.exe " <> x
]], "Dataset", "HeaderLines" -> 1]
Mathematica 要求 \
字符必须加倍。
¥Mathematica requires the \
characters must be doubled.
进行更改后,运行代码片段。
¥After making the change, run the snippet.
-
下载 https://xlsx.nodejs.cn/pres.numbers 并保存到下载文件夹:
¥Download https://xlsx.nodejs.cn/pres.numbers and save to Downloads folder:
cd ~/Downloads/
curl -LO https://xlsx.nodejs.cn/pres.numbers
-
在 Mathematica 注意本中,运行新函数。如果文件保存到 Downloads 文件夹,则 macOS 中的路径将为
"~/Downloads/pres.numbers"
:¥In the Mathematica notebook, run the new function. If the file was saved to the Downloads folder, the path will be
"~/Downloads/pres.numbers"
in macOS:
- Linux/MacOS
- Windows
data = SheetJSImportFile["~/Downloads/pres.numbers"]
在 Windows 上,必须使用文件的绝对路径。要找到此路径,请在 PowerShell 中运行以下命令:
¥On Windows, the absolute path to the file must be used. To find this path, run the following commands in PowerShell:
cd $HOME\Downloads
pwd
将 \\pres.numbers
附加到显示的路径。例如,如果路径是 C:\Users\Me\Downloads
,则命令将是
¥Append \\pres.numbers
to the displayed path. For example, if the path was
C:\Users\Me\Downloads
, the command will be
data = SheetJSImportFile["C:\\Users\\Me\\Downloads\\pres.numbers"]
\
字符必须加倍。
¥The \
characters must be doubled.
结果应显示在简洁的表格中。
¥The result should be displayed in a concise table.
从 URL 读取
¥Reading from a URL
FetchURL
[^15] 从指定的 URL 下载文件并返回该文件的路径。该函数将被封装在一个名为 SheetJSImportURL
的新函数中。
¥FetchURL
[^15] downloads a file from a specified URL and returns a path to the
file. This function will be wrapped in a new function called SheetJSImportURL
.
-
在同一注意本中,运行以下命令:
¥In the same notebook, run the following:
Needs["Utilities`URLTools`"];
SheetJSImportURL[x_] := Module[{path},(
path = FetchURL[x];
SheetJSImportFile[path]
)];
-
通过下载注意本中的测试文件进行测试:
¥Test by downloading the test file in the notebook:
data = SheetJSImportURL["https://xlsx.nodejs.cn/pres.numbers"]
[^1]: 请参阅 Mathematica 文档中的 ExternalEvaluate
Node.js 示例。
¥See the ExternalEvaluate
Node.js example in the Mathematica documentation.
[^2]: 请参阅 Mathematica 文档中的 RegisterExternalEvaluator
。
¥See RegisterExternalEvaluator
in the Mathematica documentation.
[^3]: 请参阅 Mathematica 文档中的 ExternalEvaluate
。
¥See ExternalEvaluate
in the Mathematica documentation.
[^4]: 请参阅 Mathematica 文档中的 ImportString
。
¥See ImportString
in the Mathematica documentation.
[^5]: 使用 ImportString
中的 "Dataset"
元素 时将创建 Dataset
¥A Dataset
will be created when using the "Dataset"
element in ImportString
[^6]: 见 readFile
于 "读取文件"
¥See readFile
in "Reading Files"
[^7]: 请参阅 NodeJS 文档中的 process.cwd()
。
¥See process.cwd()
in the NodeJS documentation.
[^8]: 工作簿对象的 Sheets
和 SheetNames
属性在 "工作簿对象" 中描述
¥The Sheets
and SheetNames
properties of workbook objects are described in "Workbook Object"
[^9]: 见 sheet_to_csv
于 "CSV 和文本"
¥See sheet_to_csv
in "CSV and Text"
[^10]: 请参阅 Mathematica 文档中的 ExternalEvaluate
。
¥See ExternalEvaluate
in the Mathematica documentation.
[^11]: 请参阅 Mathematica 文档中的 ImportString
。
¥See ImportString
in the Mathematica documentation.
[^12]: 使用 ImportString
中的 "Dataset"
元素 时将创建 Dataset
¥A Dataset
will be created when using the "Dataset"
element in ImportString
[^13]: 请参阅 Mathematica 文档中的 HeaderLines
。
¥See HeaderLines
in the Mathematica documentation.
[^14]: 详细信息请参见 "命令行工具"。
¥See "Command-line Tools" for more details.
[^15]: Mathematica 11 引入了包括 URLRead
在内的新方法。
¥Mathematica 11 introduced new methods including URLRead
.