Showing posts with label numpy. Show all posts
Showing posts with label numpy. Show all posts

Wednesday, February 13, 2019

Anaconda . Python

Anaconda package version 2018.12, 5.3.1 and 5.3.0 raise errors "cannot load mkl_intel_thread.dll" on window OS.

This killed me for 2 days to resolve it.

Lesson learnt, don't simply update packages.

Tuesday, February 27, 2018

Data Analysis

When there are millions row of data,  it is not suggested to use looping + conditions check + column assign. It is very slow.

For index,  row in df.iterrow():
  If df.loc[index, "x"]=="y":
    If df.loc[index+1,"x"]=="z":
      df.loc[index,"k"]=df.loc[index+1,"a"]
......

Some advise not to use iterrow() if possible. Well, need to find alternative way.

Last time a simple problem buzzed me for few months, where read_csv was slowed on looping reading multiple csv files, size up to GiB, where data were  originally generated and extracted from text files. The problem was the datetime format, seems like python/pandas prefer in certain format.

The fun of the data analysis is: problem solving skills,  logics,  and get the statistics as evidence to support own suggestions.

Others may do data analysis to get insight of business performance,  I do it because to validate a system structure and system performance.

Fun when get the solutions, headache along the way.