Table of Contents

Spark - Dense Vector

About

A DenseVector is class within the module pyspark.mllib.linalg.

DenseVector is used to store arrays of values for use in PySpark.

Implementation

DenseVector actually stores values in a NumPy array and delegates calculations to that object.

Note that:

Construction

You can create a new DenseVector using DenseVector() and passing in an NumPy array or a Python list.

# Create a DenseVector consisting of the values [3.0, 4.0, 5.0]
myDenseVector = DenseVector([3.0, 4.0, 5.0])

Function

dot

dot operates just like np.ndarray.dot().

#Numpy vector
numpyVector = np.array([-3, -4, 5])
print '\nnumpyVector:\n{0}'.format(numpyVector)

#Dense vector
myDenseVector = DenseVector([3.0, 4.0, 5.0])
print 'myDenseVector:\n{0}'.format(myDenseVector)

# The dot product between the two vectors.
denseDotProduct = myDenseVector.dot(numpyVector)
print '\ndenseDotProduct:\n{0}'.format(denseDotProduct)
numpyVector:
[-3 -4  5]
myDenseVector:
[3.0,4.0,5.0]
denseDotProduct:
0.0