基本统计 - 基于 RDD 的 API
\[ \newcommand{\R}{\mathbb{R}} \newcommand{\E}{\mathbb{E}} \newcommand{\x}{\mathbf{x}} \newcommand{\y}{\mathbf{y}} \newcommand{\wv}{\mathbf{w}} \newcommand{\av}{\mathbf{\alpha}} \newcommand{\bv}{\mathbf{b}} \newcommand{\N}{\mathbb{N}} \newcommand{\id}{\mathbf{I}} \newcommand{\ind}{\mathbf{1}} \newcommand{\0}{\mathbf{0}} \newcommand{\unit}{\mathbf{e}} \newcommand{\one}{\mathbf{1}} \newcommand{\zero}{\mathbf{0}} \]
我们通过 Statistics
中的 colStats
函数为 RDD[Vector]
返回一个 MultivariateStatisticalSummary
有关 API 的更多详细信息,请参阅 MultivariateStatisticalSummary
Python 文档。
import numpy as np
from pyspark.mllib.stat import Statistics
mat = sc.parallelize(
[np.array([1.0, 10.0, 100.0]), np.array([2.0, 20.0, 200.0]), np.array([3.0, 30.0, 300.0])]
) # an RDD of Vectors
# Compute column summary statistics.
summary = Statistics.colStats(mat)
print(summary.mean()) # a dense vector containing the mean value for each column
print(summary.variance()) # column-wise variance
print(summary.numNonzeros()) # number of nonzeros in each column
返回一个 MultivariateStatisticalSummary
有关 API 的详细信息,请参阅 MultivariateStatisticalSummary
Scala 文档。
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics}
val observations = sc.parallelize(
Vectors.dense(1.0, 10.0, 100.0),
Vectors.dense(2.0, 20.0, 200.0),
Vectors.dense(3.0, 30.0, 300.0)
// Compute column summary statistics.
val summary: MultivariateStatisticalSummary = Statistics.colStats(observations)
println(summary.mean) // a dense vector containing the mean value for each column
println(summary.variance) // column-wise variance
println(summary.numNonzeros) // number of nonzeros in each column
返回一个 MultivariateStatisticalSummary
有关 API 的详细信息,请参阅 MultivariateStatisticalSummary
Java 文档。
import java.util.Arrays;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;
JavaRDD<Vector> mat = jsc.parallelize(
Vectors.dense(1.0, 10.0, 100.0),
Vectors.dense(2.0, 20.0, 200.0),
Vectors.dense(3.0, 30.0, 300.0)
); // an RDD of Vectors
// Compute column summary statistics.
MultivariateStatisticalSummary summary = Statistics.colStats(mat.rdd());
System.out.println(summary.mean()); // a dense vector containing the mean value for each column
System.out.println(summary.variance()); // column-wise variance
System.out.println(summary.numNonzeros()); // number of nonzeros in each column
计算两个数据系列之间的相关性是统计学中常见的操作。在 spark.mllib
提供了计算系列之间相关性的方法。根据输入类型,两个 RDD[Double]
或一个 RDD[Vector]
,输出将分别为一个 Double
或相关性 Matrix
有关 API 的更多详细信息,请参阅 Statistics
Python 文档。
from pyspark.mllib.stat import Statistics
seriesX = sc.parallelize([1.0, 2.0, 3.0, 3.0, 5.0]) # a series
# seriesY must have the same number of partitions and cardinality as seriesX
seriesY = sc.parallelize([11.0, 22.0, 33.0, 33.0, 555.0])
# Compute the correlation using Pearson's method. Enter "spearman" for Spearman's method.
# If a method is not specified, Pearson's method will be used by default.
print("Correlation is: " + str(Statistics.corr(seriesX, seriesY, method="pearson")))
data = sc.parallelize(
[np.array([1.0, 10.0, 100.0]), np.array([2.0, 20.0, 200.0]), np.array([5.0, 33.0, 366.0])]
) # an RDD of Vectors
# calculate the correlation matrix using Pearson's method. Use "spearman" for Spearman's method.
# If a method is not specified, Pearson's method will be used by default.
print(Statistics.corr(data, method="pearson"))
提供了计算系列之间相关性的方法。根据输入类型,两个 RDD[Double]
或一个 RDD[Vector]
,输出将分别为一个 Double
或相关性 Matrix
有关 API 的详细信息,请参阅 Statistics
Scala 文档。
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.rdd.RDD
val seriesX: RDD[Double] = sc.parallelize(Array(1, 2, 3, 3, 5)) // a series
// must have the same number of partitions and cardinality as seriesX
val seriesY: RDD[Double] = sc.parallelize(Array(11, 22, 33, 33, 555))
// compute the correlation using Pearson's method. Enter "spearman" for Spearman's method. If a
// method is not specified, Pearson's method will be used by default.
val correlation: Double = Statistics.corr(seriesX, seriesY, "pearson")
println(s"Correlation is: $correlation")
val data: RDD[Vector] = sc.parallelize(
Vectors.dense(1.0, 10.0, 100.0),
Vectors.dense(2.0, 20.0, 200.0),
Vectors.dense(5.0, 33.0, 366.0))
) // note that each Vector is a row and not a column
// calculate the correlation matrix using Pearson's method. Use "spearman" for Spearman's method
// If a method is not specified, Pearson's method will be used by default.
val correlMatrix: Matrix = Statistics.corr(data, "pearson")
提供了计算系列之间相关性的方法。根据输入类型,两个 JavaDoubleRDD
或一个 JavaRDD<Vector>
,输出将分别为一个 Double
或相关性 Matrix
有关 API 的详细信息,请参阅 Statistics
Java 文档。
import java.util.Arrays;
import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.Matrix;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.stat.Statistics;
JavaDoubleRDD seriesX = jsc.parallelizeDoubles(
Arrays.asList(1.0, 2.0, 3.0, 3.0, 5.0)); // a series
// must have the same number of partitions and cardinality as seriesX
JavaDoubleRDD seriesY = jsc.parallelizeDoubles(
Arrays.asList(11.0, 22.0, 33.0, 33.0, 555.0));
// compute the correlation using Pearson's method. Enter "spearman" for Spearman's method.
// If a method is not specified, Pearson's method will be used by default.
double correlation = Statistics.corr(seriesX.srdd(), seriesY.srdd(), "pearson");
System.out.println("Correlation is: " + correlation);
// note that each Vector is a row and not a column
JavaRDD<Vector> data = jsc.parallelize(
Vectors.dense(1.0, 10.0, 100.0),
Vectors.dense(2.0, 20.0, 200.0),
Vectors.dense(5.0, 33.0, 366.0)
// calculate the correlation matrix using Pearson's method.
// Use "spearman" for Spearman's method.
// If a method is not specified, Pearson's method will be used by default.
Matrix correlMatrix = Statistics.corr(data.rdd(), "pearson");
与其他位于 spark.mllib
中的统计函数不同,分层抽样方法 sampleByKey
和 sampleByKeyExact
可以对键值对的 RDD 进行操作。对于分层抽样,键可以被认为是标签,值是特定属性。例如,键可以是男性或女性,或文档 ID,相应的值可以是人口中的人员年龄列表或文档中的单词列表。 sampleByKey
方法将抛硬币来决定是否对观察结果进行抽样,因此需要对数据进行一次遍历,并提供一个预期的样本大小。 sampleByKeyExact
需要比 sampleByKey
中使用的每个分层简单随机抽样更多的资源,但将以 99.99% 的置信度提供确切的样本大小。 sampleByKeyExact
目前在 python 中不受支持。
允许用户近似地对 $\lceil f_k \cdot n_k \rceil \, \forall k \in K$ 个项目进行抽样,其中 $f_k$ 是键 $k$ 的期望分数,$n_k$ 是键 $k$ 的键值对数量,而 $K$ 是键集。
注意: sampleByKeyExact()
目前在 Python 中不受支持。
# an RDD of any key value pairs
data = sc.parallelize([(1, 'a'), (1, 'b'), (2, 'c'), (2, 'd'), (2, 'e'), (3, 'f')])
# specify the exact fraction desired from each key as a dictionary
fractions = {1: 0.1, 2: 0.6, 3: 0.3}
approxSample = data.sampleByKey(False, fractions)
允许用户对 $\lceil f_k \cdot n_k \rceil \, \forall k \in K$ 个项目进行精确抽样,其中 $f_k$ 是键 $k$ 的期望分数,$n_k$ 是键 $k$ 的键值对数量,而 $K$ 是键集。不放回抽样需要对 RDD 进行一次额外的遍历以保证样本大小,而放回抽样需要进行两次额外的遍历。
// an RDD[(K, V)] of any key value pairs
val data = sc.parallelize(
Seq((1, 'a'), (1, 'b'), (2, 'c'), (2, 'd'), (2, 'e'), (3, 'f')))
// specify the exact fraction desired from each key
val fractions = Map(1 -> 0.1, 2 -> 0.6, 3 -> 0.3)
// Get an approximate sample from each stratum
val approxSample = data.sampleByKey(withReplacement = false, fractions = fractions)
// Get an exact sample from each stratum
val exactSample = data.sampleByKeyExact(withReplacement = false, fractions = fractions)
允许用户对 $\lceil f_k \cdot n_k \rceil \, \forall k \in K$ 个项目进行精确抽样,其中 $f_k$ 是键 $k$ 的期望分数,$n_k$ 是键 $k$ 的键值对数量,而 $K$ 是键集。不放回抽样需要对 RDD 进行一次额外的遍历以保证样本大小,而放回抽样需要进行两次额外的遍历。
import java.util.*;
import scala.Tuple2;
import org.apache.spark.api.java.JavaPairRDD;
List<Tuple2<Integer, Character>> list = Arrays.asList(
new Tuple2<>(1, 'a'),
new Tuple2<>(1, 'b'),
new Tuple2<>(2, 'c'),
new Tuple2<>(2, 'd'),
new Tuple2<>(2, 'e'),
new Tuple2<>(3, 'f')
JavaPairRDD<Integer, Character> data = jsc.parallelizePairs(list);
// specify the exact fraction desired from each key Map<K, Double>
ImmutableMap<Integer, Double> fractions = ImmutableMap.of(1, 0.1, 2, 0.6, 3, 0.3);
// Get an approximate sample from each stratum
JavaPairRDD<Integer, Character> approxSample = data.sampleByKey(false, fractions);
// Get an exact sample from each stratum
JavaPairRDD<Integer, Character> exactSample = data.sampleByKeyExact(false, fractions);
假设检验是统计学中一个强大的工具,用于确定结果是否具有统计学意义,即该结果是偶然发生的还是非偶然发生的。 spark.mllib
目前支持皮尔逊卡方 ( $\chi^2$) 拟合优度检验和独立性检验。输入数据类型决定执行拟合优度检验还是独立性检验。拟合优度检验需要 Vector
类型的输入,而独立性检验需要 Matrix
还支持 RDD[LabeledPoint]
有关 API 的更多详细信息,请参阅 Statistics
Python 文档。
from pyspark.mllib.linalg import Matrices, Vectors
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.stat import Statistics
vec = Vectors.dense(0.1, 0.15, 0.2, 0.3, 0.25) # a vector composed of the frequencies of events
# compute the goodness of fit. If a second vector to test against
# is not supplied as a parameter, the test runs against a uniform distribution.
goodnessOfFitTestResult = Statistics.chiSqTest(vec)
# summary of the test including the p-value, degrees of freedom,
# test statistic, the method used, and the null hypothesis.
print("%s\n" % goodnessOfFitTestResult)
mat = Matrices.dense(3, 2, [1.0, 3.0, 5.0, 2.0, 4.0, 6.0]) # a contingency matrix
# conduct Pearson's independence test on the input contingency matrix
independenceTestResult = Statistics.chiSqTest(mat)
# summary of the test including the p-value, degrees of freedom,
# test statistic, the method used, and the null hypothesis.
print("%s\n" % independenceTestResult)
obs = sc.parallelize(
[LabeledPoint(1.0, [1.0, 0.0, 3.0]),
LabeledPoint(1.0, [1.0, 2.0, 0.0]),
LabeledPoint(1.0, [-1.0, 0.0, -0.5])]
) # LabeledPoint(label, feature)
# The contingency table is constructed from an RDD of LabeledPoint and used to conduct
# the independence test. Returns an array containing the ChiSquaredTestResult for every feature
# against the label.
featureTestResults = Statistics.chiSqTest(obs)
for i, result in enumerate(featureTestResults):
print("Column %d:\n%s" % (i + 1, result))
import org.apache.spark.mllib.linalg._
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.mllib.stat.test.ChiSqTestResult
import org.apache.spark.rdd.RDD
// a vector composed of the frequencies of events
val vec: Vector = Vectors.dense(0.1, 0.15, 0.2, 0.3, 0.25)
// compute the goodness of fit. If a second vector to test against is not supplied
// as a parameter, the test runs against a uniform distribution.
val goodnessOfFitTestResult = Statistics.chiSqTest(vec)
// summary of the test including the p-value, degrees of freedom, test statistic, the method
// used, and the null hypothesis.
// a contingency matrix. Create a dense matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))
val mat: Matrix = Matrices.dense(3, 2, Array(1.0, 3.0, 5.0, 2.0, 4.0, 6.0))
// conduct Pearson's independence test on the input contingency matrix
val independenceTestResult = Statistics.chiSqTest(mat)
// summary of the test including the p-value, degrees of freedom
val obs: RDD[LabeledPoint] =
LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)),
LabeledPoint(1.0, Vectors.dense(1.0, 2.0, 0.0)),
LabeledPoint(-1.0, Vectors.dense(-1.0, 0.0, -0.5)
) // (label, feature) pairs.
// The contingency table is constructed from the raw (label, feature) pairs and used to conduct
// the independence test. Returns an array containing the ChiSquaredTestResult for every feature
// against the label.
val featureTestResults: Array[ChiSqTestResult] = Statistics.chiSqTest(obs)
featureTestResults.zipWithIndex.foreach { case (k, v) =>
println(s"Column ${(v + 1)} :")
} // summary of the test
有关 API 的详细信息,请参阅 ChiSqTestResult
Java 文档。
import java.util.Arrays;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.linalg.Matrices;
import org.apache.spark.mllib.linalg.Matrix;
import org.apache.spark.mllib.linalg.Vector;
import org.apache.spark.mllib.linalg.Vectors;
import org.apache.spark.mllib.regression.LabeledPoint;
import org.apache.spark.mllib.stat.Statistics;
import org.apache.spark.mllib.stat.test.ChiSqTestResult;
// a vector composed of the frequencies of events
Vector vec = Vectors.dense(0.1, 0.15, 0.2, 0.3, 0.25);
// compute the goodness of fit. If a second vector to test against is not supplied
// as a parameter, the test runs against a uniform distribution.
ChiSqTestResult goodnessOfFitTestResult = Statistics.chiSqTest(vec);
// summary of the test including the p-value, degrees of freedom, test statistic,
// the method used, and the null hypothesis.
System.out.println(goodnessOfFitTestResult + "\n");
// Create a contingency matrix ((1.0, 2.0), (3.0, 4.0), (5.0, 6.0))
Matrix mat = Matrices.dense(3, 2, new double[]{1.0, 3.0, 5.0, 2.0, 4.0, 6.0});
// conduct Pearson's independence test on the input contingency matrix
ChiSqTestResult independenceTestResult = Statistics.chiSqTest(mat);
// summary of the test including the p-value, degrees of freedom...
System.out.println(independenceTestResult + "\n");
// an RDD of labeled points
JavaRDD<LabeledPoint> obs = jsc.parallelize(
new LabeledPoint(1.0, Vectors.dense(1.0, 0.0, 3.0)),
new LabeledPoint(1.0, Vectors.dense(1.0, 2.0, 0.0)),
new LabeledPoint(-1.0, Vectors.dense(-1.0, 0.0, -0.5))
// The contingency table is constructed from the raw (label, feature) pairs and used to conduct
// the independence test. Returns an array containing the ChiSquaredTestResult for every feature
// against the label.
ChiSqTestResult[] featureTestResults = Statistics.chiSqTest(obs.rdd());
int i = 1;
for (ChiSqTestResult result : featureTestResults) {
System.out.println("Column " + i + ":");
System.out.println(result + "\n"); // summary of the test
提供了 Kolmogorov-Smirnov (KS) 检验的单样本双边实现,用于检验概率分布的相等性。通过提供理论分布的名称(目前仅支持正态分布)及其参数,或提供一个根据给定理论分布计算累积分布的函数,用户可以检验其样本是否来自该分布的零假设。如果用户针对正态分布进行检验 (distName="norm"
提供了运行单样本双边 Kolmogorov-Smirnov 检验的方法。以下示例演示了如何运行和解释假设检验。
有关 API 的更多详细信息,请参阅 Statistics
Python 文档。
from pyspark.mllib.stat import Statistics
parallelData = sc.parallelize([0.1, 0.15, 0.2, 0.3, 0.25])
# run a KS test for the sample versus a standard normal distribution
testResult = Statistics.kolmogorovSmirnovTest(parallelData, "norm", 0, 1)
# summary of the test including the p-value, test statistic, and null hypothesis
# if our p-value indicates significance, we can reject the null hypothesis
# Note that the Scala functionality of calling Statistics.kolmogorovSmirnovTest with
# a lambda to calculate the CDF is not made available in the Python API
提供了运行单样本双边 Kolmogorov-Smirnov 检验的方法。以下示例演示了如何运行和解释假设检验。
有关 API 的详细信息,请参阅 Statistics
Scala 文档。
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.rdd.RDD
val data: RDD[Double] = sc.parallelize(Seq(0.1, 0.15, 0.2, 0.3, 0.25)) // an RDD of sample data
// run a KS test for the sample versus a standard normal distribution
val testResult = Statistics.kolmogorovSmirnovTest(data, "norm", 0, 1)
// summary of the test including the p-value, test statistic, and null hypothesis if our p-value
// indicates significance, we can reject the null hypothesis.
// perform a KS test using a cumulative distribution function of our making
val myCDF = Map(0.1 -> 0.2, 0.15 -> 0.6, 0.2 -> 0.05, 0.3 -> 0.05, 0.25 -> 0.1)
val testResult2 = Statistics.kolmogorovSmirnovTest(data, myCDF)
提供了运行单样本双边 Kolmogorov-Smirnov 检验的方法。以下示例演示了如何运行和解释假设检验。
有关 API 的详细信息,请参阅 Statistics
Java 文档。
import java.util.Arrays;
import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.mllib.stat.Statistics;
import org.apache.spark.mllib.stat.test.KolmogorovSmirnovTestResult;
JavaDoubleRDD data = jsc.parallelizeDoubles(Arrays.asList(0.1, 0.15, 0.2, 0.3, 0.25));
KolmogorovSmirnovTestResult testResult =
Statistics.kolmogorovSmirnovTest(data, "norm", 0.0, 1.0);
// summary of the test including the p-value, test statistic, and null hypothesis
// if our p-value indicates significance, we can reject the null hypothesis
提供了一些检验的在线实现,以支持 A/B 测试等用例。这些检验可以在 Spark Streaming DStream[(Boolean, Double)]
上执行,其中每个元组的第一个元素表示控制组 (false
) 或处理组 (true
- 从流中忽略的初始数据点的数量,用于减轻新颖性效应。windowSize
- 用于执行假设检验的过去批次的数量。设置为0
val data = ssc.textFileStream(dataDir).map(line => line.split(",") match {
case Array(label, value) => BinarySample(label.toBoolean, value.toDouble)
val streamingTest = new StreamingTest()
val out = streamingTest.registerStream(data)
import org.apache.spark.mllib.stat.test.BinarySample;
import org.apache.spark.mllib.stat.test.StreamingTest;
import org.apache.spark.mllib.stat.test.StreamingTestResult;
JavaDStream<BinarySample> data = ssc.textFileStream(dataDir).map(line -> {
String[] ts = line.split(",");
boolean label = Boolean.parseBoolean(ts[0]);
double value = Double.parseDouble(ts[1]);
return new BinarySample(label, value);
StreamingTest streamingTest = new StreamingTest()
JavaDStream<StreamingTestResult> out = streamingTest.registerStream(data);
随机数据生成对于随机算法、原型设计和性能测试很有用。 spark.mllib
支持生成随机 RDD,其中 i.i.d. 值从给定分布中抽取:均匀分布、标准正态分布或泊松分布。
提供了生成随机双精度 RDD 或向量 RDD 的工厂方法。以下示例生成一个随机双精度 RDD,其值遵循标准正态分布 N(0, 1)
,然后将其映射到 N(1, 4)
有关 API 的更多详细信息,请参阅 RandomRDDs
Python 文档。
提供了生成随机双精度 RDD 或向量 RDD 的工厂方法。以下示例生成一个随机双精度 RDD,其值遵循标准正态分布 N(0, 1)
,然后将其映射到 N(1, 4)
有关 API 的详细信息,请参阅 RandomRDDs
Scala 文档。
提供了生成随机双精度 RDD 或向量 RDD 的工厂方法。以下示例生成一个随机双精度 RDD,其值遵循标准正态分布 N(0, 1)
,然后将其映射到 N(1, 4)
有关 API 的详细信息,请参阅 RandomRDDs
Java 文档。
核密度估计 是一种用于可视化经验概率分布的技术,无需对观察样本所来自的特定分布进行假设。它计算随机变量的概率密度函数的估计值,在给定的一组点上进行评估。它通过将经验分布的 PDF 在特定点的估计值表示为以每个样本为中心的正态分布的 PDF 的平均值来实现此估计。
提供了从样本 RDD 计算核密度估计的方法。以下示例演示了如何执行此操作。
有关 API 的更多详细信息,请参阅 KernelDensity
Python 文档。
from pyspark.mllib.stat import KernelDensity
# an RDD of sample data
data = sc.parallelize([1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0])
# Construct the density estimator with the sample data and a standard deviation for the Gaussian
# kernels
kd = KernelDensity()
# Find density estimates for the given values
densities = kd.estimate([-1.0, 2.0, 5.0])
提供了从样本 RDD 计算核密度估计的方法。以下示例演示了如何执行此操作。
有关 API 的详细信息,请参阅 KernelDensity
Scala 文档。
import org.apache.spark.mllib.stat.KernelDensity
import org.apache.spark.rdd.RDD
// an RDD of sample data
val data: RDD[Double] = sc.parallelize(Seq(1, 1, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9))
// Construct the density estimator with the sample data and a standard deviation
// for the Gaussian kernels
val kd = new KernelDensity()
// Find density estimates for the given values
val densities = kd.estimate(Array(-1.0, 2.0, 5.0))
提供了从样本 RDD 计算核密度估计的方法。以下示例演示了如何执行此操作。
有关 API 的详细信息,请参阅 KernelDensity
Java 文档。
import java.util.Arrays;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.mllib.stat.KernelDensity;
// an RDD of sample data
JavaRDD<Double> data = jsc.parallelize(
Arrays.asList(1.0, 1.0, 1.0, 2.0, 3.0, 4.0, 5.0, 5.0, 6.0, 7.0, 8.0, 9.0, 9.0));
// Construct the density estimator with the sample data
// and a standard deviation for the Gaussian kernels
KernelDensity kd = new KernelDensity().setSample(data).setBandwidth(3.0);
// Find density estimates for the given values
double[] densities = kd.estimate(new double[]{-1.0, 2.0, 5.0});