Python Data Analysis and Visualization – Week One/Unit Two/NumPy Data Storage and Functions

中文版本: https://today2tmr.com/2017/08/29/python数据分析与展示-第一周单元二numpy数据存取与函数/

CSV File

  • CSV(Comma-Separated Value)
  • CSV is one common format to store a batch of data.
  • For one or two dimensions data.

Write into CSV

  • np.savetxt(frame, array, fmt='%.18e', delimiter=None)
    • frame: File, string or generator, could be .gz or .bz2 compressed files.
    • array: Array to be stored in file.
    • fmt: Data format like %d, %2.f, %.18e
    • delimiter: String to separate data, default is blank space. In CSV, it should be comma.

     

Read File

  • np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)
    • frame: File, string or generator, could be .gz or .bz2 compressed files.
    • dtype: data type
    • delimiter: string to separate
    • unpack: default is to generate one array. If it is True, different attributions would be written into different variables.

     

Limitations for CSV

  • Only valid for one or two dimensions data.

File Access for Multiple Dimensional data

Store in File

  • a.tofile(frame, sep='', format='%s')
    • frame: File, string
    • sep: String to, separate. File would be binary if it is empty.
      • Binary file would occupy less space than text file.
    • format: data type

     

     

Read File

  • a.fromfile(frame, dtype=np.float, count=-1, sep='')
    • frame: File, string
    • dtype: Data type
    • count: Number of elements to read in, -1 means the whole file
    • sep: String to separate, empty is for binary file

     

    • There is need to know origin dimension info
    • .tofile() is used with np.fromfile()
    • Store extra infos by metadata file

File Access Easily by NumPy

  • np.save(frame, array) or np.savez(frame, array)
    • frame: file name,extension is .npy,while for compressed files it is .npz
  • np.load(frame)
  • Metadata is saved in first line
  • If we need data interaction, we may use tofile and fromfile
  • When we only use NumPy, we could use np.save and np.load

 

Random Functions

  • np.random
Function Explanation
rand(d0,d1,…,dn) Create random array by d0-dn,float,[0,1),mean distribution
randn(d0,d1,…,dn) Create random array by d0-dn,integer,normal distribution
randint(low[,high,shape]) Create random array by shape,[low,high)
seed(s) Set seed for random numbers

 

  • Set seed repeatedly to obtain the same random numbers
Function Explanation
shuffle(a) Random permutation, change a
permutation(a) Random permutation, create new array
choice(a[,size,replace,p]) Extract element with probability p from one dimension array a,generate new array with size. Replace means reusability,defalut is True

 

 

Function Explanation
uniform(low,high,size) Create uniform distribution array, started with low, ended with high
normal(loc,scale,size) Create normal distribution array, mean is loc, standard deviation is scale
poisson(lam,size) Create poisson distribution array, lam is probability for random event

 

Statistical Function

Function Explanation
sum(a,axis=None) Calculate sum of elements, axis is integer or tuple 整数或元组
mean(a,axis=None) Calculate expectation of elements
average(a,axis=None,weights=None) Calculate weighted average value of elements
std(a,axis=None) Calculate standard deviation of elements
var(a,axis=None) Calculate variance of elements

 

Function Explanation
min(a) max(a) Calculate minimum and maximum value in a
argmin(a) argmax(a) Calculate index of min and max value in one dimension
unravel_index(index,shape) transform index in one dimension to multiple dimensions
ptp(a) Calculate difference between min and max
median(a) Calculate median in array

 

Gradient Function

  • np.gradient(f)
  • Calculate gradient in array f. If f is multiple dimensional, return gradients of each dimension
  • Gradient:Rate of change between continuous values, which is the slope
  • For three continuous Y values: a, b, c, gradient of b is : (c-a)/2
    • With two sides:(atter value – former value)/distance of two values
    • With one side:(current value – previous value) or (next value – current value)

 

  • Gradient could help find rim of images or sounds.

Unit Summary

  • File access for one or two dimensions: CSV
    • np.loadtxt()
    • np.savetxt()
  • File access for multiple dimensions
    • a.tofile()
    • np.fromfile()
    • np.save()
    • np.savez()
    • np.load()
  • Random functions
    • np.random.rand()
    • np.random.randn()
    • np.random.randint()
    • np.random.seed()
    • np.random.shuffle()
    • np.random.permutation()
    • np.random.choice()
  • Statistical functions
    • np.sum()
    • np.mean()
    • np.average()
    • np.std()
    • np.var()
    • np.median()
    • np.min()
    • np.argmin()
  • Gradient functions
    • np.gradient()

2 Replies to “Python Data Analysis and Visualization – Week One/Unit Two/NumPy Data Storage and Functions”

Leave a Reply

Your email address will not be published.