Skip to main content

TensorFlow 中的工作表

TensorFlow.js(简称 TF.js)是一个 JavaScript 机器学习库。

¥TensorFlow.js (shortened to TF.js) is a library for machine learning in JavaScript.

SheetJS 是一个用于从电子表格读取和写入数据的 JavaScript 库。

¥SheetJS is a JavaScript library for reading and writing data from spreadsheets.

该演示使用 TensorFlow.js 和 SheetJS 处理电子表格中的数据。我们将探讨如何将电子表格数据加载到 TF.js 数据集中以及如何将结果导出回电子表格。

¥This demo uses TensorFlow.js and SheetJS to process data in spreadsheets. We'll explore how to load spreadsheet data into TF.js datasets and how to export results back to spreadsheets.

  • "CSV 数据交换" 使用 SheetJS 处理 sheet 并生成 TF.js 可以导入的 CSV 数据。

    ¥"CSV Data Interchange" uses SheetJS to process sheets and generate CSV data that TF.js can import.

  • "JS 数组交换" 使用 SheetJS 处理工作表并生成可进行后处理的对象行。

    ¥"JS Array Interchange" uses SheetJS to process sheets and generate rows of objects that can be post-processed.

测试部署

每个浏览器演示都在以下环境中进行了测试:

¥Each browser demo was tested in the following environments:

浏览器TF.js 版本日期
Chrome 1274.20.02024-08-16
Safari 17.44.20.02024-08-16

NodeJS 演示在以下环境中进行了测试:

¥The NodeJS demo was tested in the following environments:

NodeJSTF.js 版本日期
22.3.04.20.0(@tensorflow/tfjs)2024-08-16

Kaioken 演示在以下环境中进行了测试:

¥The Kaioken demo was tested in the following environments:

KaiokenTF.js 版本日期
0.25.34.20.02024-08-16

安装

¥Installation

独立浏览器脚本

¥Standalone Browser Scripts

此页面中的实时代码块使用 TF.js 4.20.0 独立构建。

¥Live code blocks in this page use the TF.js 4.20.0 standalone build.

独立脚本可在包括 UNPKG 在内的各种 CDN 上使用。可以使用以下 SCRIPT 标签加载最新版本。

¥Standalone scripts are available on various CDNs including UNPKG. The latest version can be loaded with the following SCRIPT tag.

SheetJS 独立脚本 可以在 TF.js 独立脚本之后加载。

¥The SheetJS Standalone scripts can be loaded after the TF.js standalone script.

<!-- latest version of TF.js -->
<script src="https://unpkg.com/@tensorflow/tfjs@latest/dist/tf.min.js"></script>
<!-- use version 0.20.3 -->
<script lang="javascript" src="https://cdn.sheetjs.com/xlsx-0.20.3/package/dist/xlsx.full.min.js"></script>

框架和打包器

¥Frameworks and Bundlers

"构架" 章节 涵盖了 Yarn 和其他包管理器的安装。

¥The "Frameworks" section covers installation with Yarn and other package managers.

应使用包管理器安装 @tensorflow/tfjs 和 SheetJS 模块:

¥@tensorflow/tfjs and SheetJS modules should be installed using a package manager:

yarn add https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs

Yarn 的较新版本可能会引发错误:

¥Newer releases of Yarn may throw an error:

Usage Error: It seems you are trying to add a package using a https:... url; we now require package names to be explicitly specified.
Try running the command again with the package name prefixed: yarn add my-package@https:...

解决方法是将 xlsx@ 添加到 URL 前面:

¥The workaround is to prepend the URL with xlsx@:

yarn add xlsx@https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs

NodeJS

SheetJS NodeJS 模块 可以在使用 TF.js 的 NodeJS 脚本中导入。

¥The SheetJS NodeJS module can be imported in NodeJS scripts that use TF.js.

NodeJS 有两个选项:

¥There are two options for NodeJS:

  • 纯 JavaScript 绑定模块是 @tensorflow/tfjs

    ¥the pure JavaScript bindings module is @tensorflow/tfjs

  • 原生绑定模块是 @tensorflow/tfjs-node

    ¥the native bindings module is @tensorflow/tfjs-node

上次测试此演示时,原生绑定存在问题:

¥When this demo was last tested, there were issues with the native binding:

Error: The specified module could not be found.
\\?\C:\Users\SheetJS\node_modules\@tensorflow\tfjs-node\lib\napi-v8\tfjs_binding.node

为了实现一般兼容性,演示使用纯 @tensorflow/tfjs 绑定。

¥For general compatibility, the demos use the pure @tensorflow/tfjs binding.

yarn add https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs @tensorflow/tfjs-node

Yarn 的较新版本可能会引发错误:

¥Newer releases of Yarn may throw an error:

Usage Error: It seems you are trying to add a package using a https:... url; we now require package names to be explicitly specified.
Try running the command again with the package name prefixed: yarn add my-package@https:...

解决方法是将 xlsx@ 添加到 URL 前面:

¥The workaround is to prepend the URL with xlsx@:

yarn add xlsx@https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs @tensorflow/tfjs-node

CSV 数据交换

¥CSV Data Interchange

tf.data.csv[^1] 从 CSV 数据生成数据集。该函数需要一个 URL。

¥tf.data.csv[^1] generates a Dataset from CSV data. The function expects a URL.

上次测试此演示时,没有直接方法将 CSV 字符串传递给底层解析器。

¥When this demo was last tested, there was no direct method to pass a CSV string to the underlying parser.

幸运的是,支持 blob URL。

¥Fortunately blob URLs are supported.

SheetJS sheet_to_csv 方法 [^2] 从工作表对象生成 CSV 字符串。使用标准 JavaScript 技术,可以构建 blob URL:

¥The SheetJS sheet_to_csv method[^2] generates a CSV string from a worksheet object. Using standard JavaScript techniques, a blob URL can be constructed:

function worksheet_to_csv_url(worksheet) {
/* generate CSV */
const csv = XLSX.utils.sheet_to_csv(worksheet);

/* CSV -> Uint8Array -> Blob */
const u8 = new TextEncoder().encode(csv);
const blob = new Blob([u8], { type: "text/csv" });

/* generate a blob URL */
return URL.createObjectURL(blob);
}

CSV 演示

¥CSV Demo

该演示展示了使用 TensorFlow 中的 "cars" 数据集进行的简单模型拟合。示例 XLS 文件 包含数据。数据处理镜像官方 "根据 2D 数据进行预测" demo[^3]。

¥This demo shows a simple model fitting using the "cars" dataset from TensorFlow. The sample XLS file contains the data. The data processing mirrors the official "Making Predictions from 2D Data" demo[^3].

该演示构建了一个模型,用于根据马力数据预测 MPG。它:

¥The demo builds a model for predicting MPG from Horsepower data. It:

  • 获取 https://xlsx.nodejs.cn/cd.xls

    ¥fetches https://xlsx.nodejs.cn/cd.xls

  • 使用 SheetJS read[^4] 方法解析数据

    ¥parses the data with the SheetJS read[^4] method

  • 选择第一个工作表 [^5] 并使用 sheet_to_csv[^6] 转换为 CSV

    ¥selects the first worksheet[^5] and converts to CSV using sheet_to_csv[^6]

  • 从 CSV 文本生成 blob URL

    ¥generates a blob URL from the CSV text

  • 生成 tf.data.csv[^7] 的 TF.js 数据集并选择数据列

    ¥generates a TF.js dataset with tf.data.csv[^7] and selects data columns

  • 构建模型并使用 fitDataset[^8] 进行训练

    ¥builds a model and trains with fitDataset[^8]

  • 根据一组样本输入预测 MPG 并将结果显示在表格中

    ¥predicts MPG from a set of sample inputs and displays results in a table

在线演示

¥Live Demo

Live Demo (click to show)

In some test runs, the results did not make sense given the underlying data. The dependent and independent variables are expected to be anti-correlated.

This is a known issue in TF.js and affects the official demos

If the live demo shows a message

ReferenceError: tf is not defined

please refresh the page. This is a known bug in the documentation generator.

Result
Loading...
Live Editor
function SheetJSToTFJSCSV() {
  const [output, setOutput] = React.useState("");
  const [results, setResults] = React.useState([]);
  const [disabled, setDisabled] = React.useState(false);

  function worksheet_to_csv_url(worksheet) {
    /* generate CSV */
    const csv = XLSX.utils.sheet_to_csv(worksheet);

    /* CSV -> Uint8Array -> Blob */
    const u8 = new TextEncoder().encode(csv);
    const blob = new Blob([u8], { type: "text/csv" });

    /* generate a blob URL */
    return URL.createObjectURL(blob);
  }

  const doit = React.useCallback(async () => {
    setResults([]); setOutput(""); setDisabled(true);
    try {
    /* fetch file */
    const f = await fetch("https://xlsx.nodejs.cn/cd.xls");
    const ab = await f.arrayBuffer();
    /* parse file and get first worksheet */
    const wb = XLSX.read(ab);
    const ws = wb.Sheets[wb.SheetNames[0]];

    /* generate blob URL */
    const url = worksheet_to_csv_url(ws);

    /* feed to tf.js */
    const dataset = tf.data.csv(url, {
      hasHeader: true,
      configuredColumnsOnly: true,
      columnConfigs:{
        "Horsepower": { required: false, default: 0},
        "Miles_per_Gallon": { required: false, default: 0, isLabel: true }
      }
    });

    /* pre-process data */
    let flat = dataset
      .map(({xs,ys}) =>({xs: Object.values(xs), ys: Object.values(ys)}))
      .filter(({xs,ys}) => [...xs,...ys].every(v => v>0));

    /* normalize manually :( */
    let minX = Infinity, maxX = -Infinity, minY = Infinity, maxY = -Infinity;
    await flat.forEachAsync(({xs, ys}) => {
      minX = Math.min(minX, xs[0]); maxX = Math.max(maxX, xs[0]);
      minY = Math.min(minY, ys[0]); maxY = Math.max(maxY, ys[0]);
    });
    flat = flat.map(({xs, ys}) => ({xs:xs.map(v => (v-minX)/(maxX - minX)),ys:ys.map(v => (v-minY)/(maxY-minY))}));
    flat = flat.batch(32);

    /* build and train model */
    const model = tf.sequential();
    model.add(tf.layers.dense({inputShape: [1], units: 1}));
    model.compile({ optimizer: tf.train.sgd(0.000001), loss: 'meanSquaredError' });
    await model.fitDataset(flat, { epochs: 100, callbacks: { onEpochEnd: async (epoch, logs) => {
      setOutput(`${epoch}:${logs.loss}`);
    }}});

    /* predict values */
    const inp = tf.linspace(0, 1, 9);
    const pred = model.predict(inp);
    const xs = await inp.dataSync(), ys = await pred.dataSync();
    setResults(Array.from(xs).map((x, i) => [ x * (maxX - minX) + minX, ys[i] * (maxY - minY) + minY ]));
    setOutput("");

    } catch(e) { setOutput(`ERROR: ${String(e)}`); } finally { setDisabled(false);}
  });
  return ( <>
    <button onClick={doit} disabled={disabled}>Click to run</button><br/>
    {output && <pre>{output}</pre> || <></>}
    {results.length && <table><thead><tr><th>Horsepower</th><th>MPG</th></tr></thead><tbody>
    {results.map((r,i) => <tr key={i}><td>{r[0]}</td><td>{r[1].toFixed(2)}</td></tr>)}
    </tbody></table> || <></>}
  </> );
}

NodeJS 演示

¥NodeJS Demo

Demo Steps (click to show)
  1. Create a new project:
mkdir sheetjs-tfjs-csv
cd sheetjs-tfjs-csv
npm init -y
  1. Download SheetJSTF.js:
curl -LO https://xlsx.nodejs.cn/tfjs/SheetJSTF.js
  1. Install SheetJS and TF.js dependencies:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs @tensorflow/tfjs-node
  1. Run the script:
node SheetJSTF.js

Kaioken 演示

¥Kaioken Demo

Kaioken 是一个流行的前端框架,它使用 ReactJS 开发者熟悉的模式。

¥Kaioken is a popular front-end framework that uses patterns that will be familiar to ReactJS developers.

SheetJS 团队强烈建议在使用 TF.js 的项目中采用 Kaioken。

¥The SheetJS team strongly recommends using Kaioken in projects using TF.js.

Demo Steps (click to show)
  1. Create a new site.
npm create vite sheetjs-tfjs-kaioken -- --template vanilla-ts
cd sheetjs-tfjs-kaioken
npm add --save kaioken
npm add --save vite-plugin-kaioken -D
  1. Create a new file vite.config.ts with the following content:
vite.config.ts (create new file)
import { defineConfig } from "vite"
import kaioken from "vite-plugin-kaioken"

export default defineConfig({
plugins: [kaioken()],
})
  1. Edit tsconfig.json and add "jsx": "preserve" within compilerOptions:
tsconfig.json (add highlighted line)
{
"compilerOptions": {
"jsx": "preserve",
  1. Replace src/main.ts with the following codeblock:
src/main.ts
import { mount } from "kaioken";
import App from "./SheetJSTF";

const root = document.getElementById("app");
mount(App, root!);
  1. Download SheetJSTF.tsx to the src directory:
curl -L -o src/SheetJSTF.tsx https://xlsx.nodejs.cn/tfjs/SheetJSTF.tsx
  1. Install SheetJS and TF.js dependencies:
npm i --save https://cdn.sheetjs.com/xlsx-0.20.3/xlsx-0.20.3.tgz @tensorflow/tfjs
  1. Start the development server:
npm run dev

The process will display a URL:

  ➜  Local:   http://localhost:5173/

Open the displayed URL (http://localhost:5173/ in this example) with a web browser. Click the "Click to Run" button to see the results.

JS 数组交换

¥JS Array Interchange

官方线性回归教程 从 JSON 文件加载数据:

¥The official Linear Regression tutorial loads data from a JSON file:

[
{
"Name": "chevrolet chevelle malibu",
"Miles_per_Gallon": 18,
"Cylinders": 8,
"Displacement": 307,
"Horsepower": 130,
"Weight_in_lbs": 3504,
"Acceleration": 12,
"Year": "1970-01-01",
"Origin": "USA"
},
// ...
]

在实际用例中,数据存储在 spreadsheets

¥In real use cases, data is stored in spreadsheets

cd.xls screenshot

遵循本教程,数据获取方法可以适应处理对象数组,例如由 SheetJS sheet_to_json 方法 [^9] 生成的数组。

¥Following the tutorial, the data fetching method can be adapted to handle arrays of objects, such as those generated by the SheetJS sheet_to_json method[^9].

与官方示例的差异如下:

¥Differences from the official example are highlighted below:

/**

* Get the car data reduced to just the variables we are interested

* and cleaned of missing data.
*/
async function getData() {
/* fetch file and pull data into an ArrayBuffer */
const carsDataResponse = await fetch('https://xlsx.nodejs.cn/cd.xls');
const carsDataAB = await carsDataResponse.arrayBuffer();

/* parse */
const carsDataWB = XLSX.read(carsDataAB);

/* get first worksheet */
const carsDataWS = carsDataWB.Sheets[carsDataWB.SheetNames[0]];
/* generate array of JS objects */
const carsData = XLSX.utils.sheet_to_json(carsDataWS);

const cleaned = carsData.map(car => ({
mpg: car.Miles_per_Gallon,
horsepower: car.Horsepower,
}))
.filter(car => (car.mpg != null && car.horsepower != null));

return cleaned;
}

底层操作

¥Low-Level Operations

数据转置

¥Data Transposition

电子表格中的典型数据集将从一个标题行开始,并在其自己的行中表示每个数据记录。例如,Iris 数据集可能看起来像

¥A typical dataset in a spreadsheet will start with one header row and represent each data record in its own row. For example, the Iris dataset might look like

Iris dataset

SheetJS sheet_to_json 方法 [^10] 会将工作表对象转换为行对象数组:

¥The SheetJS sheet_to_json method[^10] will translate worksheet objects into an array of row objects:

const aoo = [
{"sepal length": 5.1, "sepal width": 3.5, ...},
{"sepal length": 4.9, "sepal width": 3, ...},
...
];

TF.js 和其他库倾向于对单独的列进行操作,相当于:

¥TF.js and other libraries tend to operate on individual columns, equivalent to:

const sepal_lengths = [5.1, 4.9, ...];
const sepal_widths = [3.5, 3, ...];

tensor2d 可以导出时,它看起来与电子表格不同:

¥When a tensor2d can be exported, it will look different from the spreadsheet:

const data_set_2d = [
[5.1, 4.9, /*...*/],
[3.5, 3, /*...*/],
// ...
];

这是人们使用电子表格的方式的转置!

¥This is the transpose of how people use spreadsheets!

将数据集导出到工作表

¥Exporting Datasets to a Worksheet

aoa_to_sheet 方法 [^11] 可以从数组的数组生成工作表。ML 库通常提供 API 来提取数组数组,但它会被转置。以下函数转置普通数组和类型数组的数组:

¥The aoa_to_sheet method[^11] can generate a worksheet from an array of arrays. ML libraries typically provide APIs to pull an array of arrays, but it will be transposed. The following function transposes arrays of normal and typed arrays:

Transpose array of arrays
/* `data` is an array of (typed or normal) arrays */
function transpose_array_of_arrays(data) {
const aoa = [];
for(let i = 0; i < data.length; ++i) {
for(let j = 0; j < data[i].length; ++j) {
if(!aoa[j]) aoa[j] = [];
aoa[j][i] = data[i][j];
}
}
return aoa;
}

建议从标题行创建新的工作表并使用 sheet_add_aoa 方法添加转置数据。选项 origin: -1[^12] 确保数据写入标题后:

¥It is recommended to create a new worksheet from the header row and add the transposed data using the sheet_add_aoa method. The option origin: -1[^12] ensures that the data is written after the headers:

const headers = [ "sepal length", "sepal width"];
const data_set_2d = [
[5.1, 4.9, /*...*/],
[3.5, 3, /*...*/],
// ...
];

/* transpose data */
const transposed_data = transpose_array_of_arrays(data_set_2d);

/* create worksheet from headers */
const ws = XLSX.utils.aoa_to_sheet([ headers ])

/* add the transposed data starting on row 2 */
XLSX.utils.sheet_add_aoa(ws, transposed_data, { origin: 1 });

从电子表格导入数据

¥Importing Data from a Spreadsheet

带有选项 header: 1[^13] 的 sheet_to_json 将生成一个可以转置的行优先数组。但是,手动遍历工作表效率更高。以下函数接受要跳过的标题行数:

¥sheet_to_json with the option header: 1[^13] will generate a row-major array of arrays that can be transposed. However, it is more efficient to walk the sheet manually. The following function accepts a number of header rows to skip:

Worksheet to transposed array of typed arrays
function sheet_to_array_of_f32(ws, header_row_count) {
const out = [];

/* find worksheet range */
const range = XLSX.utils.decode_range(ws['!ref']);

/* skip specified number of headers */
range.s.r += (header_row_count | 0);

/* walk the columns */
for(let C = range.s.c; C <= range.e.c; ++C) {
/* create the typed array */
const ta = new Float32Array(range.e.r - range.s.r + 1);

/* walk the rows */
for(let R = range.s.r; R <= range.e.r; ++R) {
/* find the cell, skip it if the cell isn't numeric or boolean */
const cell = ws["!data"] ? (ws["!data"][R]||[])[C] : ws[XLSX.utils.encode_cell({r:R, c:C})];
if(!cell || cell.t != 'n' && cell.t != 'b') continue;

/* assign to the typed array */
ta[R - range.s.r] = cell.v;
}

/* add typed array to output */
out.push(ta);
}

return out;
}

TF.js 张量

¥TF.js Tensors

单个 Array#map 可以从结果中提取单独的命名字段,这些字段可用于构造 TensorFlow.js 张量对象:

¥A single Array#map can pull individual named fields from the result, which can be used to construct TensorFlow.js tensor objects:

const aoo = XLSX.utils.sheet_to_json(worksheet);
const lengths = aoo.map(row => row["sepal length"]);
const tensor = tf.tensor1d(lengths);

tf.Tensor 对象可以直接使用 transpose 转置:

¥tf.Tensor objects can be directly transposed using transpose:

const aoo = XLSX.utils.sheet_to_json(worksheet);
// "x" and "y" are the fields we want to pull from the data
con st data = aoo.map(row => ([row["x"], row["y"]]));

// create a tensor representing two column datasets
const tensor = tf.tensor2d(data).transpose();

// individual columns can be accessed
const col1 = tensor.slice([0,0], [1,tensor.shape[1]]).flatten();
const col2 = tensor.slice([1,0], [1,tensor.shape[1]]).flatten();

对于导出,stack 可用于将列折叠成线性数组:

¥For exporting, stack can be used to collapse the columns into a linear array:

/* pull data into a Float32Array */
const result = tf.stack([col1, col2]).transpose();
const shape = tensor.shape;
const f32 = tensor.dataSync();

/* construct an array of arrays of the data in spreadsheet order */
const aoa = [];
for(let j = 0; j < shape[0]; ++j) {
aoa[j] = [];
for(let i = 0; i < shape[1]; ++i) aoa[j][i] = f32[j * shape[1] + i];
}

/* add headers to the top */
aoa.unshift(["x", "y"]);

/* generate worksheet */
const worksheet = XLSX.utils.aoa_to_sheet(aoa);

[^1]: 请参阅 TensorFlow.js 文档中的 tf.data.csv

¥See tf.data.csv in the TensorFlow.js documentation

[^2]: 见 sheet_to_csv 于 "CSV 和文本"

¥See sheet_to_csv in "CSV and Text"

[^3]: "根据 2D 数据进行预测" 示例 使用托管 JSON 文件。示例 XLS 文件 包含相同的数据。

¥The "Making Predictions from 2D Data" example uses a hosted JSON file. The sample XLS file includes the same data.

[^4]: 见 read 于 "读取文件"

¥See read in "Reading Files"

[^5]: 见 "工作簿对象"

¥See "Workbook Object"

[^6]: 见 sheet_to_csv 于 "CSV 和文本"

¥See sheet_to_csv in "CSV and Text"

[^7]: 请参阅 TensorFlow.js 文档中的 tf.data.csv

¥See tf.data.csv in the TensorFlow.js documentation

[^8]: 请参阅 TensorFlow.js 文档中的 tf.LayersModel.fitDataset

¥See tf.LayersModel.fitDataset in the TensorFlow.js documentation

[^9]: 见 sheet_to_json 于 "实用工具"

¥See sheet_to_json in "Utilities"

[^10]: 见 sheet_to_json 于 "实用工具"

¥See sheet_to_json in "Utilities"

[^11]: 见 aoa_to_sheet 于 "实用工具"

¥See aoa_to_sheet in "Utilities"

[^12]: 见 "实用工具" 中 sheet_add_aoaorigin 选项

¥See the origin option of sheet_add_aoa in "Utilities"

[^13]: 见 sheet_to_json 于 "实用工具"

¥See sheet_to_json in "Utilities"