Skip to main content

数学与统计

Excel 中的每个有效数字都可以表示为 "IEEE754 双"[^1]。

¥Each valid number in Excel can be represented as an "IEEE754 double"[^1].

JavaScript 完全支持 IEEE754 双打和单打,是数学和统计分析的优秀语言。它也被证明是一个可行的机器学习平台。

¥With full support for IEEE754 doubles and singles, JavaScript is an excellent language for mathematics and statistical analysis. It has also proven to be a viable platform for machine learning.

演示

¥Demos

各种库的演示包含在单独的页面中:

¥Demos for various libraries are included in separate pages:

类型化数组

¥Typed Arrays

现代 JavaScript 数学和统计库通常使用 Float64ArrayFloat32Array 对象来有效存储数据变量。

¥Modern JavaScript math and statistics libraries typically use Float64Array or Float32Array objects to efficiently store data variables.

Technical details (click to show)

Under the hood, ArrayBuffer objects represent raw binary data. "Typed arrays" such as Float64Array and Float32Array are objects designed for efficient interpretation and mutation of ArrayBuffer data.

ArrayBuffer objects are roughly analogous to heap-allocated memory. Typed arrays behave like typed pointers.

JavaScript

const buf = new ArrayBuffer(16);
const dbl = new Float64Array(buf);
dbl[1] = 3.14159;
const u8 = new Uint8Array(buf);
for(let i = 0; i < 8; ++i)
console.log(u8[i+8]);

Equivalent C

void *const buf = malloc(16);
double *const dbl = (double *)buf;
dbl[1] = 3.14159;
uint8_t *const u8 = (uint8_t *)buf;
for(uint8_t i = 0; i < 8; ++i)
printf("%u\n", u8[i+8]);

从表格中读取

¥Reading from Sheets

每个类型化数组类都有一个 from 静态方法,用于将数据转换为类型化数组。Float64Array.from 返回 double 类型数组(每个值 8 字节),Float32Array.from 生成 float 类型数组(4 字节)。

¥Each typed array class has a from static method for converting data into a typed array. Float64Array.from returns a double typed array (8 bytes per value) and Float32Array.from generates a float typed array (4 bytes).

const column_f32 = Float32Array.from(arr); // 4-byte floats
const column_f64 = Float64Array.from(arr); // 8-byte doubles

数组中的值将被强制为相关的数据类型。不支持的条目将转换为安静的 NaN 值。

¥Values in the array will be coerced to the relevant data type. Unsupported entries will be converted to quiet NaN values.

提取工作表数据

¥Extracting Worksheet Data

带有选项 header: 1 的 SheetJS sheet_to_json[^2] 方法从工作表对象生成数组的数组。结果按行优先顺序排列:

¥The SheetJS sheet_to_json[^2] method with the option header: 1 generates an array of arrays from a worksheet object. The result is in row-major order:

const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});

分类变量

¥Categorical Variables

二分变量通常表示为电子表格 TRUEFALSE。SheetJS sheet_to_json 方法会将这些值转换为 truefalse。类型化数组方法将分别将值解释为 10

¥Dichotomous variables are commonly represented as spreadsheet TRUE or FALSE. The SheetJS sheet_to_json method will translate these values to true and false. Typed array methods will interpret values as 1 and 0 respectively.

多分变量必须手动映射到数值。例如,使用 Iris 数据集:

¥Polychotomous variables must be manually mapped to numeric values. For example, using the Iris dataset:

Iris dataset

[
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]

E 列 (class) 是一个多分变量,必须手动翻译:

¥Column E (class) is a polychotomous variable and must be manually translated:

const aoa = XLSX.utils.sheet_to_json(worksheet, {header: 1});

/* index_to_class will be needed to recover the values later */
const index_to_class = [];

/* map from class name to number */
const class_to_index = new Map();

/* loop over the data */
for(let R = 1; R < aoa.length; ++R) {
/* Column E = SheetJS row 4 */
const category = aoa[R][4];
const val = class_to_index.get(category);
if(val == null) {
/* assign a new index */
class_to_index.set(category, index_to_class.length);
aoa[R][4] = index_to_class.length;
index_to_class.push(category);
} else aoa[R][4] = val;
}
Live Demo (click to show)

This example fetches and parses iris.xlsx. The first worksheet is processed and the new data and mapping are printed.

Result
Loading...
Live Editor
function SheetJSPolychotomy() {
  const [cat, setCat] = React.useState([]);
  const [aoa, setAoA] = React.useState([]);

  React.useEffect(() => { (async() => {
    const ab = await (await fetch("/typedarray/iris.xlsx")).arrayBuffer();
    const wb = XLSX.read(ab);
    const aoa = XLSX.utils.sheet_to_json(wb.Sheets[wb.SheetNames[0]], {header:1});

    const index_to_class = [];
    const class_to_index = new Map();
    for(let R = 1; R < aoa.length; ++R) {
      const category = aoa[R][4];
      const val = class_to_index.get(category);
      if(val == null) {
        class_to_index.set(category, index_to_class.length);
        aoa[R][4] = index_to_class.length;
        index_to_class.push(category);
      } else aoa[R][4] = val;
    }

    /* display every 25 rows, skipping the header row */
    setAoA(aoa.filter((_, i) => (i % 25) == 1));
    setCat(index_to_class);
  })(); }, []);

  return ( <>
  <b>Mapping</b><br/>
  <table><thead><tr><th>Index</th><th>Name</th></tr></thead><tbody>
    {cat.map((name, i) => (<tr><td>{i}</td><td>{name}</td></tr>))}
  </tbody></table>
  <b>Sample Data</b><br/>
  <table><thead><tr>{"ABCDE".split("").map(c => (<th>{c}</th>))}</tr></thead><tbody>
    {aoa.map(row => (<tr>{row.map(col => (<td>{col}</td>))}</tr>))}
  </tbody></table>
  </>
  );
}

每列一个变量

¥One Variable per Column

存储数据集时,每行代表一个观察值,每列代表一个变量是很常见的:

¥It is common to store datasets where each row represents an observation and each column represents a variable:

Iris dataset

var aoa = [
["sepal length", "sepal width", "petal length", "petal width", "class"],
[5.1, 3.5, 1.4, 0.2, "Iris-setosa"],
[4.9, 3, 1.4, 0.2, "Iris-setosa"],
]

数组 map 操作可以从单个列中提取数据。映射后,slice 可以删除标头标签。例如,以下代码片段将列 C ("花瓣长度") 拉入 Float64Array

¥An array map operation can pull data from an individual column. After mapping, a slice can remove the header label. For example, the following snippet pulls column C ("petal length") into a Float64Array:

const C = XLSX.utils.decode_col("C"); // Column "C" = SheetJS index 2
const petal_length = Float64Array.from(aoa.map(row => row[C]).slice(1));

每行一个变量

¥One Variable per Row

一些数据集存储在表中,其中每行代表一个变量,每列代表一个观察值:

¥Some datasets are stored in tables where each row represents a variable and each column represents an observation:

JavaScriptSpreadsheet
var aoa = [
["sepal length", 5.1, 4.9],
["sepal width", 3.5, 3],
["petal length", 1.4, 1.4],
["petal width", 0.2, 0.2],
["class", "setosa", "setosa"]
]

Single column of data

在行优先数组中,外部数组的每个条目都是一行。

¥From the row-major array of arrays, each entry of the outer array is a row.

许多工作表都包含标题列。slice 方法可以删除标头。去掉头部后,Float64Array.from 可以生成一个类型化数组。例如,此代码片段将第 3 行 ("花瓣长度") 拉入 Float64Array

¥Many sheets include header columns. The slice method can remove the header. After removing the header, Float64Array.from can generate a typed array. For example, this snippet pulls row 3 ("petal length") into a Float64Array:

const petal_length = Float64Array.from(aoa[2].slice(1));

写入表格

¥Writing to Sheets

SheetJS aoa_to_sheet[^3] 方法可以从数组数组生成工作表。同样,sheet_add_aoa[^4] 可以将数据数组的数组添加到现有工作表对象中。origin 选项 [^5] 控制数据将写入工作表中的位置。

¥The SheetJS aoa_to_sheet[^3] method can generate a worksheet from an array of arrays. Similarly, sheet_add_aoa[^4] can add an array of arrays of data into an existing worksheet object. The origin option[^5] controls where data will be written in the worksheet.

这两种方法都不理解类型化数组,因此数据列必须转换为数组的数组。

¥Neither method understands typed arrays, so data columns must be converted to arrays of arrays.

每行一个变量

¥One Variable per Row

可以使用 Array.from 将单个类型数组转换为纯 JS 数组:

¥A single typed array can be converted to a pure JS array with Array.from:

const arr = Array.from(row);

可以从数组创建数组的数组:

¥An array of arrays can be created from the array:

const aoa = [
arr // this array is the first element of the array literal
];

aoa_to_sheetsheet_add_aoa 将其视为一行。默认情况下,数据将写入工作表第一行的单元格中。

¥aoa_to_sheet and sheet_add_aoa treat this as one row. By default, data will be written to cells in the first row of the worksheet.

可以使用 unshift 操作将标题添加到数据行,但使用 aoa_to_sheet 构建工作表效率更高:

¥Titles can be added to data rows with an unshift operation, but it is more efficient to build up the worksheet with aoa_to_sheet:

/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";

/* convert sample data to array */
const arr = Array.from(data);
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, [ arr ], { origin: "B1" });

Typed Array to single row with title

Live Demo (click to hide)

在此示例中,导出两个类型化数组。aoa_to_sheet 创建工作表,sheet_add_aoa 将数据添加到工作表中。

¥In this example, two typed arrays are exported. aoa_to_sheet creates the worksheet and sheet_add_aoa will add the data to the sheet.

Result
Loading...
Live Editor
function SheetJSeriesToRows() { return (<button onClick={() => {
  /* typed arrays */
  const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]);
  const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]);

  /* create worksheet from first typed array */
  const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
  const arr1 = Array.from(ta1);
  XLSX.utils.sheet_add_aoa(ws, [ arr1 ], { origin: "B1" });

  /* add second title to cell A2 */
  XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "A2" });

  /* add second typed array starting from cell B2 */
  const arr2 = Array.from(ta2);
  XLSX.utils.sheet_add_aoa(ws, [ arr2 ], { origin: "B2" });

  /* export to file */
  const wb = XLSX.utils.book_new(ws, "Export");
  XLSX.writeFile(wb, "SheetJSeriesToRows.xlsx");
}}><b>Click to export</b></button>); }

每列一个变量

¥One Variable per Column

可以使用 Array.from 将单个类型数组转换为纯 JS 数组。对于列,每个值应单独封装在一个数组中:

¥A single typed array can be converted to a pure JS array with Array.from. For columns, each value should be individually wrapped in an array:

JavaScriptSpreadsheet
var data = [
[54337.95],
[3.14159],
[2.718281828]
];

Single column of data

Array.from 采用第二个参数。如果它是一个函数,则将在每个元素上调用该函数,并且将使用该值代替原始值(实际上,映射数据)。要生成数据列,每个元素必须封装在数组字面量中:

¥Array.from takes a second argument. If it is a function, the function will be called on each element and the value will be used in place of the original value (in effect, mapping over the data). To generate a data column, each element must be wrapped in an array literal:

var arr = Array.from(column, (value) => ([ value ]));

aoa_to_sheetsheet_add_aoa 将此视为每行一列数据的行。默认情况下,数据将写入 "A" 列中的单元格。

¥aoa_to_sheet and sheet_add_aoa treat this as rows with one column of data per row. By default, data will be written to cells in column "A".

可以使用 unshift 操作将标题添加到数据列,但使用 aoa_to_sheet 构建工作表效率更高:

¥Titles can be added to data columns with an unshift operation, but it is more efficient to build up the worksheet with aoa_to_sheet:

/* sample data */
const data = new Float64Array([54337.95, 3.14159, 2.718281828]);
const title = "Values";

/* convert sample data to array */
const arr = Array.from(data, (value) => ([value]));
/* create worksheet from title (array of arrays) */
const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);
/* add data starting at B1 */
XLSX.utils.sheet_add_aoa(ws, arr, { origin: "A2" });

Typed Array to single column with title

Live Demo (click to hide)

在此示例中,导出两个类型化数组。aoa_to_sheet 创建工作表,sheet_add_aoa 将数据添加到工作表中。

¥In this example, two typed arrays are exported. aoa_to_sheet creates the worksheet and sheet_add_aoa will add the data to the sheet.

Result
Loading...
Live Editor
function SheetJSeriesToCols() { return (<button onClick={() => {
  /* typed arrays */
  const ta1 = new Float64Array([54337.95, 3.14159, 2.718281828]);
  const ta2 = new Float64Array([281.3308004, 201.8675309, 1900.6492568]);

  /* create worksheet from first title */
  const ws = XLSX.utils.aoa_to_sheet([ [ "Values" ] ]);

  /* add first typed array starting from cell B1 */
  const arr1 = Array.from(ta1, (value) => ([value]));
  XLSX.utils.sheet_add_aoa(ws, arr1, { origin: "A2" });

  /* add second title to cell B1 */
  XLSX.utils.sheet_add_aoa(ws, [["Value2"]], { origin: "B1" });

  /* add second typed array starting from cell B2 */
  const arr2 = Array.from(ta2, (value) => ([value]));
  XLSX.utils.sheet_add_aoa(ws, arr2, { origin: "B2" });

  /* export to file */
  const wb = XLSX.utils.book_new(ws, "Export");
  XLSX.writeFile(wb, "SheetJSeriesToCols.xlsx");
}}><b>Click to export</b></button>); }

[^1]: 详情请参阅 "基本值" 于 "单元格对象"

¥See "Underlying Values" in "Cell Objects" for more details

[^2]: 见 sheet_to_json 于 "实用工具"

¥See sheet_to_json in "Utilities"

[^3]: 见 aoa_to_sheet 于 "实用工具"

¥See aoa_to_sheet in "Utilities"

[^4]: 见 sheet_add_aoa 于 "实用工具"

¥See sheet_add_aoa in "Utilities"

[^5]: 见 "实用工具" 中 sheet_add_aoaorigin 选项

¥See the origin option of sheet_add_aoa in "Utilities"