The Basics of Python¶
import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")
Strings and Basic Functions¶
A = 10
A.bit_length()
4
"ABC"=='ABC'
True
S='A B C'
S.lower()
'a b c'
'A' + 'B'
'AB'
' '.join('ABC')
'A B C'
The above needs to be of type string. Tripple quotes can be used for multiline strings.
S = """
A
B
C
"""
print(S)
A B C
SS = """
A
\tB
C
"""
print(SS)
A B C
SSS = """
A
\\tB
C
"""
print(SSS)
A \tB C
Format¶
Format accepts as many arguments as needed, and replaces the brackets with entry. It needs to be able to convert to a string. The order is crucial, and it is not most efficient for repetitive substitution.
"{}$${}".format(1, 2)
'1$$2'
"{} = {} + {} +{}".format('a', 1, 2, 1)
'a = 1 + 2 +1'
#"{} = {} + {} +{}".format('a', 1, 2)
code = "{} + {} + {} ".format(1, 2, 1)
code
'1 + 2 + 1 '
eval(code)
4
code = "{} = {} + {} +{}".format('a', 1, 2, 1)
"{var} = {arg1} + {arg2} + {arg1}".format(arg1 = 1, arg2 = 2, var = 'a')
'a = 1 + 2 + 1'
Python knows that these variables will be defined! It makes the order that they are in within "format" irrelevant.
We want a better looking way to do this, so we will use a new type of data structure.
Dictionary¶
Dictionaries are lookup/hash tables. They assosciate keys with values in an efficient way.
args = dict(arg1 = 1, arg2 = 2, var = 'a')
Dictionaries can be indexed in the following way. Just saying 'arg1' will give an error because it is not a variable- it is only a key in a dictionary. args[arg1] will give the same error for the same reason.
args['arg1']
1
When defining a variable, the name will be assumed to be a string.
Dictionaries can also be defined using curly brackets, and note that lists can be made as entries in a dictionary.
args_ = {
'arg1' : 1,
'arg2' : 2,
'var' : 'a',
2 : [10, 11],
}
args_ == args
False
args_['arg2']
2
Dictionary lookup time is immediate- not ennumerating through options in a list! This is what Google does, and why it is so fast searching through all of the internet!
Multiple keys will rewrite each other, but CAN assosicate a key to a list!
"{var} = {arg1} + {arg2} + {arg1}".format(**args_)
'a = 1 + 2 + 1'
Rounding Operations with Format¶
a = 1/3
print(a)
0.3333333333333333
what if you only want to see the first couple of digits?
print("{}".format(a))
0.3333333333333333
print("result = {:.2f}".format(a))
result = 0.33
print("result = {:.40f}".format(a))
result = 0.3333333333333333148296162562473909929395
^What happened there? the computer stores things in binary format not decimal format. suggestion: don't trust more than the first 16 digits!
We can also print in exponential form:
print("result = {:.2e}".format(a))
result = 3.33e-01
Lists¶
S = ["A", "B", "C", "D", "E"]
S[0]
'A'
S[-2]
'D'
A = list(range(10))
A
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A[-1]
9
Slicing Lists¶
A double colon is a slicing, 3rd thing is optional. A[start:end:step] If no start or end is specified, it will just return the start or end.
A[1::2]
[1, 3, 5, 7, 9]
A[::2]
[0, 2, 4, 6, 8]
A negative step size will start slicing from the back.
A[::-2]
[9, 7, 5, 3, 1]
A[:]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
IMPORTANT NOTE: A variable is a pointer to a block of memory where a value is stored with important consequences
B = A
B
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A[0] = -100
A
[-100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
B
[-100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Remember: B is just a pointer pointing to another list!
A == B
True
C = [-100, 1, 2, 3, 4, 5, 6, 7, 8, 9]
A == C
True
C is A
False
Now we are going to make a copy of A, but it points to different place.
C = A[:]
C is A
False
B is A
True
type(A)
list
The Append Function¶
L=[1,]
L
[1]
len(L)
1
L.append(["a"])
L
[1, ['a']]
This is a list within a list! List elements don't have to be homogeneous (this is unlike numpy array- lists are made for this!)
L.append("b")
L
[1, ['a'], 'b']
The time it takes to append is independent of the size of the list and doesn't take much memory (deleting as well).
del L[1]
L
[1, 'b']
Inserting Elements in Lists¶
L
[1, 'b']
L.insert(1, 'f')
L
[1, 'f', 'b']
Tuple¶
T= (1, 1, 2)
A Tuple is like a list but with very few methods- more like an array, to change it would need to make a copy.
adding an element to a touple:
T1 = T + (3,)
T1
(1, 1, 2, 3)
T1 == T
False
T1 is T
False
They are not the same block in memory: compare to lists (here we had to make a new object just to add things on)
L.append(3)
L
[1, 'f', 'b', 3]
^lists just use the same space
Now we want to add one element within the tuple:
T2 = (T[0],) + (3,) + T[1:]
T2
(1, 3, 1, 2)
Why bother? These seems like a pain in the butt! Because tuples take less space!
Looping¶
A = list(range(10))
A
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
task: double each of the elements in the list¶
first we will learn the way not to do it¶
result = [] #result = list()
for i in range(len(A)):
result.append(2*A[i])
print(result)
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
Why is this the worst way to do it? It's the slowest! The result changes size on each iteration of the loop. Indexing also costs time- here it starts from the beginning every time. We can measure the time it takes to run the code! To do that we need to convert the code into one line. To do that, convert it into a function.
You only append when you don't know the size of the list!!!
def worst(A):
result = []
for i in range(len(A)):
result.append(2*A[i])
return result
This is the worst way to do it! The function takes very long.
r = worst(A)
r
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
r == result
True
Result is only a variable inside the function: cannot call from outside. We made a new variable with the same value as result to get around this issue.
del result
#result
Result is a local variable, so it cannot be called from outside the function: this does not mean that the block of code result got cleamed up. The pointer result was eliminated but the block of code exists. (cool!)
The return of the function does not create a block of memory!!
Now, we can analyze the time:
%time worst(A)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs Wall time: 5.01 µs
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
^this time depends on the hardware, so will be different for others
now, redefine A:
A = list(range(1000000))
kernal- restart will be bad
%time is an unscientific way to time something: runs it only once and prints out the result. for repeated trials, use %timeit
A = list(range(10))
%timeit r=worst(A)
717 ns ± 2.07 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Note: this may or may not define r
this is the best way to do it... (for now)¶
r1 = [2 * a for a in A]
%timeit r1 = [2 * a for a in A]
365 ns ± 1.17 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
r = worst(A)
r == r1
True
B = [2 * a for a in A if a not in [0, 1j, 11]]
print(B)
[2, 4, 6, 8, 10, 12, 14, 16, 18]
22 in B
False
the double of 11 is not in B
mediocre method¶
def mid(A):
result = []
for a in A:
result.append(2 * a)
return result
%timeit r = mid(A)
542 ns ± 6.84 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
list comprehension: does not append, like math notation of 2a for every a
Profiler¶
This is a library from a standard package of libraries: python has a lot of great libraries, use the import function
import cProfile
A = list(range(100000))
cProfile.run("worst(A)")
100005 function calls in 0.012 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.009 0.009 0.011 0.011 2330539353.py:1(worst) 1 0.000 0.000 0.012 0.012 <string>:1(<module>) 1 0.000 0.000 0.012 0.012 {built-in method builtins.exec} 1 0.000 0.000 0.000 0.000 {built-in method builtins.len} 100000 0.002 0.000 0.002 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
cProfile.run("mid(A)")
100004 function calls in 0.010 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.007 0.007 0.010 0.010 2930335151.py:1(mid) 1 0.000 0.000 0.010 0.010 <string>:1(<module>) 1 0.000 0.000 0.010 0.010 {built-in method builtins.exec} 100000 0.003 0.000 0.003 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
*note: the size of A must be sufficiently large to not have all 0's show up
to make loops efficient, we will find the following module very useful
from itertools import permutations
for p in permutations([1, 2, 3]):
print(p)
(1, 2, 3) (1, 3, 2) (2, 1, 3) (2, 3, 1) (3, 1, 2) (3, 2, 1)
sometimes we will need to iterate over a couple of containers at the same time: zipper
for a, b in zip([1, 2, 3], "abc"):
print("a = {}, b={}\n".format(a, b))
a = 1, b=a a = 2, b=b a = 3, b=c
this groups all the elements with the same index really efficient: but does it without indexing because "indexing is the enemy of the people"
if there is something that is semi-routine, there is likely an optomized library already made to do it (like permutations and combinations)
do not use the notation "intertools.permuations()" if doing it a lot because it is actually slower
Functions¶
def f(a, b, *args):
print(args)
print(sum(args))
return a + b + sum(args)
f(1, 2, )
() 0
3
^suppose we want to generalize the function to accept an arbitary number of arguments. Python does not know how many parameters we want. f has two mandatory arguments and everything else goes into the tuple *args.
f(1, 2, 3, )
(3,) 3
6
now args is a touple of one element
f(1, 2, 3, 4, )
(3, 4) 7
10
L = [1, 2, 3, 4]
f(*L)
(3, 4) 7
10
The * is essentially the unpacking of a container. here it is a list but it could be a tuple.
#f(L)
this error message appears because it needs mroe than one argument
We can also set a default value for some arguments.
def f(a, b=0, *args):
print(args)
print(sum(args))
return a + b + sum(args)
f(1)
() 0
1
f(a=1)
() 0
1
#f(c=1)
The ** is even more interesting than the *!
def f(a, b=0, *args, **kwargs):
print(args)
print(sum(args))
print(kwargs)
return a + b + sum(args)
f(a=0, c=1)
() 0 {'c': 1}
0
kwargs allows for things that don't have a name in the declaration of the function. a does not go into kwargs
f(0, 7, c=1)
() 0 {'c': 1}
7
the 7 clearly goes somewhere! it actually goes into b: the definition of b only specifies it as its initial condition.
f(0, 7, 8, c=1, blah=11, blahblah=12, )
(8,) 8 {'c': 1, 'blah': 11, 'blahblah': 12}
15
args will always be a tuple and kwargs will always be a dictionary!
Here, a is a positional argument with no default value- it must be specified. B has a default value but can be reset and it is still positional! The argument 8 is positional, but there is no explicit position for it so it will be bundled into args. On the other hand, c, blah, and blahblah will go into kwargs because they are given a name- will become a dictionary.
f(0, 7, 8, c=1, blahblah=12, blah=11, )
(8,) 8 {'c': 1, 'blahblah': 12, 'blah': 11}
15
See above: the order does not matter if given a name.
What if we define b=7? Will it go into kwargs or not? It will give an error.
#f(0, b=7, 8, c=1, blah=11, blahblah=12, )
All keyword arguments must be at the end. The order matters here!
New topic: looking up neural network sklearn will show documentation and what goes on behind the scenes. There are a lot of default values for the functions. You can read the source code as well. Don't be scared by the official documentation! Many of these built-in functions use the args and kwargs functions. It is allowed to use other names than args but that is not convention so don't do it for ease of others reading code.
Classes¶
Everything in Python is an instance of a class!¶
Classes are essentially "types" with specified operations. We will create a very simple class below: a point on a plane (good practice to capitalize first word of a class). The tabulated lines below describe the class- must say everything about it because it is entirely new to Python! The class with have properties and methods.
class Point(object):
"""
This is a class for a 2D point.
"""
def __init__(self, x=0., y=0.):
"""
Constructor: it is going to be called once at the creation of an isntance of the class.
"""
self.x = float(x)
self.y = float(y)
def __add__(self, p):
"""
adds self to p.
"""
if not isinstance(p, Point):
#Exceptions: professionally handling errors
raise ValueError("incorrect argument")
return Point(self.x + p.x, self.y + p.y)
def __repr__(self):
return "{}, {}".format(self.x, self.y)
To create the point: described by two numbers, must be intitialized upon creation of a point. Usually the first aurgument will be "self". The self references to the particular instance of the class: actually block of memory where it will be stored. You should do a description of every function! To save the variables x and y, use the "self.___" notation. This will allow the x and y methods to show up after typing "p.'tab'".
p1 = Point()
p2 = Point('1', 2)
p2.x, p2.y
(1.0, 2.0)
The "float" in the definition allows for error correction! Would obivously create error message if inputted "1a"
In the "add" function, we want to make it professional and check that the objects are Points.
isinstance(p1, Point)
True
#p1.add(p2)
^This code is a remnant of an old function "add": note that we changed the code to look better
p1.x, p1.y
(0.0, 0.0)
Go back to definition code and change "add" to "(underscore)add(underscore)" then the "Add" function will no longer show up when you hit tab.
p3 = p1 + p2
p3.x
1.0
When fixed these new additions will make it so that p1 and p2 are not modified (return statement). See below:
p1.x, p1.y
(0.0, 0.0)
p2.x, p2.y
(1.0, 2.0)
p1
0.0, 0.0
Without the "repr" functionm this would tell us literally where the memory is stored. The purpose of the repr function is to make the respresentation of the data pretty.
p1.k = None
This is called "binding"- you bind the new property here, there is a lot of flexibility in manipulating objects
Now, by calling actions of p2, the k will not show up- the p1.k only modifies that particular instance of the class
p1.k = 3
p1.k
3
p1.__doc__
'\n This is a class for a 2D point. \n '
p1._ will open up a lot more properties that the class has
Why do we want to write classes?¶
Membership: properties of a class are members of another class¶
class Rect(object):
"""
This has 4 points
"""
def __init__(self, p1, p2, p3, p4):
if not (isinstance(p1, Point) and isinstance(p2, Point) \
and isinstance(p3, Point) and isinstance(p4, Point)):
raise ValueError("Args must be Points")
self.p1 = p1
self.p2 = p2
self.p3 = p3
self.p4 = p4
For the init function: we must make sure that all of the arguments in the constructor are points, with isinstance function
Using muptiple "or" statements: will be true if any amount are true
p4 = Point(3, 4)
R = Rect(p1, p2, p3, p4)
This works because all of them are points! Now let's mess up the second argument
# Rect(p1, A, p3, p4)
Now we will rewrite the "init" statement to include only one "not" statement. We do this with the parentheses! This is essentially the DeMorgan laws from probability: not(a and b)=(not a) or (not b), not(a or b)=(not a) and (not b). And and Or are negations of each other
Inheritance: really, really important¶
Our class "Point" is pretty good. Now assume we want to write a software to display, or draw, those points. Then it is obviously not sufficient. We would also want to know things like color! We want to do this without rewriting the whole class. We will reuse what is already written, but add one more feature. The original class is the "parent class", creating a new class in the following way.
class ColoredPoint(Point):
"""
Colored point
"""
def __init__(self, color='black', **kwargs):
# constructor of a parent class
Point.__init__(self, **kwargs)
self.color = color
def __repr__(self):
return "{}, {}".format(Point.__repr__(self), self.color)
def learning_glob_and_loc(self, A):
print(A)
A = 11
print(A)
Let's mention the "pass" command: it is basically a placeholder
cp = ColoredPoint()
It already initialized! We want to find out what's under the hood: cp.tab, find that it has x and y that are initialized to 0.0 and will print nicely. This is because it pretty much just copied all of the code from the Point class. The initialized ColoredPoint object is the same as the initialized Point object with no other changes to the code.
Note: when we create an init function inside the ColoredPoint class definition, we already have an init function with the same name! The init inside ColoredPoint will overwrite the init inside Point. Inside the new constructor, we can call the old constructor and accept all of the old arguments with the **kwargs function.
cp.x
0.0
cp
0.0, 0.0, black
It doesn't display color information because we never told it to- the representation method is taken from the parent. We want to redefine our representation.
Global vs. Local Variables¶
This will be illustrated by defining the method learning_glob_and_loc. The function "locals" will return a dictionary of local variables as seen below. Local variables are only defined within the function or the method.
x = 9
cp.learning_glob_and_loc(x)
9 11
^This looks nice because it called the "repr" fucntion, note that the dictionary has no order!
A
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755, 756, 757, 758, 759, 760, 761, 762, 763, 764, 765, 766, 767, 768, 769, 770, 771, 772, 773, 774, 775, 776, 777, 778, 779, 780, 781, 782, 783, 784, 785, 786, 787, 788, 789, 790, 791, 792, 793, 794, 795, 796, 797, 798, 799, 800, 801, 802, 803, 804, 805, 806, 807, 808, 809, 810, 811, 812, 813, 814, 815, 816, 817, 818, 819, 820, 821, 822, 823, 824, 825, 826, 827, 828, 829, 830, 831, 832, 833, 834, 835, 836, 837, 838, 839, 840, 841, 842, 843, 844, 845, 846, 847, 848, 849, 850, 851, 852, 853, 854, 855, 856, 857, 858, 859, 860, 861, 862, 863, 864, 865, 866, 867, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 891, 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903, 904, 905, 906, 907, 908, 909, 910, 911, 912, 913, 914, 915, 916, 917, 918, 919, 920, 921, 922, 923, 924, 925, 926, 927, 928, 929, 930, 931, 932, 933, 934, 935, 936, 937, 938, 939, 940, 941, 942, 943, 944, 945, 946, 947, 948, 949, 950, 951, 952, 953, 954, 955, 956, 957, 958, 959, 960, 961, 962, 963, 964, 965, 966, 967, 968, 969, 970, 971, 972, 973, 974, 975, 976, 977, 978, 979, 980, 981, 982, 983, 984, 985, 986, 987, 988, 989, 990, 991, 992, 993, 994, 995, 996, 997, 998, 999, ...]
^Before, we defined A outside of the class- this is a GLOBAL variable. Just calling globals() will print a lit of all global variables in the code. The global function inside the code will overwrite the global variable A.
"A" in (globals())
True
If a global and a local variable both share the same name, the local variables will always be preferred.
Now change the function to accept an argument A
x = 9
cp.learning_glob_and_loc(x)
9 11
This shows that the global variable x was not modified, but the local variable A is more important than the global variable A
This will do it for our general overview of Python¶
Numpy¶
Numpy is an external library not a part of python- must import
import numpy as np
We will create an array of random values to make things more interesting. Random has sub methods, ran is a uniform distribution.
A = np.random.rand(100)
A
array([7.01599762e-01, 1.89995637e-01, 7.05020518e-01, 6.21967396e-01, 7.21618063e-01, 7.55808344e-01, 5.52010748e-01, 7.11532084e-01, 4.37597828e-01, 3.89407207e-01, 6.70570362e-01, 9.29731072e-01, 7.36024896e-01, 7.16652331e-02, 5.18327639e-01, 7.26035999e-01, 1.34479213e-01, 5.60728067e-01, 8.76632921e-01, 4.30041125e-01, 6.32426794e-01, 8.13249706e-01, 7.23359320e-01, 3.16946953e-01, 2.26697214e-02, 2.96834241e-01, 3.34539788e-01, 1.11991954e-01, 7.13158092e-02, 7.69036304e-01, 7.27578905e-02, 5.93617785e-01, 1.41733101e-01, 7.52722568e-01, 1.92786885e-01, 7.65249928e-01, 4.33754918e-01, 7.13216812e-01, 2.67752545e-01, 7.28149995e-01, 1.80095005e-04, 1.46590132e-01, 6.19436406e-01, 1.17386760e-01, 7.27105902e-01, 4.50617248e-02, 7.50944474e-01, 3.43272077e-01, 6.54274224e-01, 8.15289512e-01, 5.64277069e-01, 6.98988021e-01, 5.02034176e-01, 4.95584303e-01, 4.83951381e-01, 6.34671623e-01, 1.08022211e-01, 8.02994525e-01, 8.33295166e-01, 1.90984038e-01, 8.54703057e-02, 6.41526652e-01, 9.32832035e-01, 2.92443628e-01, 7.35945603e-01, 9.13068819e-01, 5.51458657e-01, 8.68538468e-01, 4.18990006e-02, 4.55244690e-01, 9.44412755e-02, 1.18704128e-01, 4.51150239e-01, 1.35182741e-01, 6.42216732e-02, 1.47258831e-01, 4.61746189e-01, 7.31770551e-01, 2.44117358e-01, 3.50988901e-01, 4.19935284e-01, 4.99500443e-01, 3.25240494e-01, 7.22386893e-01, 1.72100757e-01, 5.07295797e-01, 8.75989768e-01, 2.58309017e-01, 6.06169816e-01, 3.38139738e-01, 4.11637693e-01, 7.81001118e-01, 3.55074483e-01, 2.44517915e-02, 1.91380812e-01, 9.87586070e-01, 4.20631912e-01, 3.06743367e-01, 4.36637030e-01, 2.63174083e-01])
Arrays are similar to lists: it is essentially a list taylored to numbers
A.max(), A.min()
(0.9875860699632983, 0.00018009500475590912)
type(A)
numpy.ndarray
Rand has a flexible number of arguments: can create n dimensional array depending on n inputs
numpy.random.uniform(low = 0, high = 1, size = None) is a more general form of rand
A.shape
(100,)
^This results in a tuple- not changable. The number of elements will be the number of dimensions. We can do all of the indexing things with arrays.
A[::2]
array([7.01599762e-01, 7.05020518e-01, 7.21618063e-01, 5.52010748e-01, 4.37597828e-01, 6.70570362e-01, 7.36024896e-01, 5.18327639e-01, 1.34479213e-01, 8.76632921e-01, 6.32426794e-01, 7.23359320e-01, 2.26697214e-02, 3.34539788e-01, 7.13158092e-02, 7.27578905e-02, 1.41733101e-01, 1.92786885e-01, 4.33754918e-01, 2.67752545e-01, 1.80095005e-04, 6.19436406e-01, 7.27105902e-01, 7.50944474e-01, 6.54274224e-01, 5.64277069e-01, 5.02034176e-01, 4.83951381e-01, 1.08022211e-01, 8.33295166e-01, 8.54703057e-02, 9.32832035e-01, 7.35945603e-01, 5.51458657e-01, 4.18990006e-02, 9.44412755e-02, 4.51150239e-01, 6.42216732e-02, 4.61746189e-01, 2.44117358e-01, 4.19935284e-01, 3.25240494e-01, 1.72100757e-01, 8.75989768e-01, 6.06169816e-01, 4.11637693e-01, 3.55074483e-01, 1.91380812e-01, 4.20631912e-01, 4.36637030e-01])
Say you want to change the shape of the array into a 10 x 10 (ok because the number of elements has stayed the same)
B = A.reshape(10, 10)
B.shape
(10, 10)
#B
^Note that this is 10 lists of 10 elements
B[0]
array([0.70159976, 0.18999564, 0.70502052, 0.6219674 , 0.72161806, 0.75580834, 0.55201075, 0.71153208, 0.43759783, 0.38940721])
^This is the first row of 10 numbers
A[0]
0.7015997620036596
^This is the first element in A: here it is just one element
C = A.reshape(5, 5, 4)
C.shape
(5, 5, 4)
#C
C[0]
array([[0.70159976, 0.18999564, 0.70502052, 0.6219674 ], [0.72161806, 0.75580834, 0.55201075, 0.71153208], [0.43759783, 0.38940721, 0.67057036, 0.92973107], [0.7360249 , 0.07166523, 0.51832764, 0.726036 ], [0.13447921, 0.56072807, 0.87663292, 0.43004112]])
C is an array of an array of an array. The first component is an array of an array that is 5 x 4. It will also have a lot of properties.
Notice that for numpy arrays all of the elements have to be of the same type (not like lists)
Now let's talk about the dtype methods: the type of each elements in an array
C.dtype
dtype('float64')
F = np.random.rand(100).astype(np.float32)
F.dtype
dtype('float32')
F = np.zeros(100, dtype=np.float32)
F[0]
0.0
A[0]
0.7015997620036596
A[0] = -10
#A
#B
Changing the first element of A will also change the first element of B! On a hardware level, it means that B points to the same block of memmory as A. The "is" asks if the two things are pointing to the same block of memory. There is something in the middle though for numpy:
A is B
False
So A and B are not pointing to the same thing! There is a gap: the "view" is a special instance of a class. A and B both point to two "view"'s which then point to the same block of memory. There is a special program called "driver" whose job is to communicate with microchips: translate things frm the software to the hardware. Between the program and actual hardware there is an invisible intermediary which makes writing all of the programs very smooth. The "view" is similar to that- it manages the access to the block of memory. The drivers are distinguishable because they do very different things!
x = np.linspace(-5, 5, 300)
#x
x.shape
(300,)
Want to plot the very simple Gaussian exp((-x^2)/2):
y = np.exp( - x ** 2 / 2.)
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.show()
Note that y is also an array! If we were to do x*y, it would return an array of element-wise multiplication (same as +, etc). Example: if z=x+y, zi=xi+yi. Or, if z=np.exp(x), then zi=np.exp(xi)
Numerical python helps us because it can save us from writing any loops! Or if we feel that we need to write a loop we can consult numpy documentation, because it is likely that a function for that already exists!
Much more interesting things can happen with multi-dimensional arrays: want to plot a 2-D Gaussian of f=exp(-x^2-y^2), x from -1 to 1 and y from -2 to 0.5.
The methods we will learn are scalable- we can do this with as many variables as we could like. The resulting array f (made of two arrays of dimension N) will have N^2 elements. The memory required will grow very equickly with more dimensions! Note that the two arrays do not need to be the same size.
2-D plot formation¶
first we need to generate the x array:
x = np.linspace(-1., 1., 100)
y = np.linspace(-2., 0.5, 50)
x.shape, y.shape
((100,), (50,))
# f = np.exp(-x**2 - y**2)
This does not work in y has 50 elements because the computer is trying to do element-wise subtraction, which is subtracting 50 elements from 100 elements
Therefore, we are going to change the y to have 100 elements as well. At this point, though, f is a 1-dimensional array only! We want it to be 2-dimensionsal, so we will rewrite f in an element-wise way
f[i,j] = np.exp(-x[i]2 - y[j]2)
We want it to be more of the form: f[i] = np.exp(-x[i]^2 - y[i]^2). To do this we need to make both x and y into 2-dimensional arrays! We will add some fake additional axes (only run it once so they don't keep adding fake axes!)
x = x[:, np.newaxis]
y = y[np.newaxis, :]
x.shape, y.shape
((100, 1), (1, 50))
#x
x has 100 sub-arrays, each of size 1. Essentially, it is a collumn!
f = np.exp(-x**2 - y**2)
f.shape
(100, 50)
Now, the x dimension is 100 and the y dimension is 50! X and y are no longer 1-dimensional arrays: they are 2-dimensional. The element-wise operations become: z[i,j]=x[i,j]+y[i,j]
Why do we not get an error now? It is a concept called broadcasting (described will in quickstart tutorial). It doesn't matter how many elements you have as long as the dimensionalities agree- so 100 elements in a row and 50 elements in a collumn will be ok! They are not the same dimension. For the f[i,j] function, it will essentialy create a matrix!
x = np.linspace(-1., 1., 3)
y = np.linspace(-2., 0.5, 2)
x, y
(array([-1., 0., 1.]), array([-2. , 0.5]))
x = x[:, np.newaxis]
y = y[np.newaxis, :]
x, x.shape
(array([[-1.], [ 0.], [ 1.]]), (3, 1))
y, y.shape
(array([[-2. , 0.5]]), (1, 2))
x + y
array([[-3. , -0.5], [-2. , 0.5], [-1. , 1.5]])
This is easier to see in this format: z[i,j]=x[i,0] + y[0,j] because each of them only has one row/column. This is essnetially an outer product in linear algebra!
x = np.linspace(-1., 1., 100)
y = np.linspace(-2., 0.5, 50)
x = x[:, np.newaxis]
y = y[np.newaxis, :]
x.shape, y.shape
((100, 1), (1, 50))
f = np.exp(-x**2 - y**2)
f.shape
(100, 50)
plt.imshow(f, origin='upper')
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x10fa86ee0>
This is an example of image show. Notice that the axes were weird originally so added argument, but this is convention for computer graphics: the upper left corner has the point (0,0). We could make it a 3-D plot, but this is actually better quantitatively because it is easier to read out numbers here.
Now, we want to add a 3rd dimension¶
x = np.linspace(-1., 1., 100)
y = np.linspace(-2., 0.5, 50)
z = np.linspace(-1., 1., 20)
x = x[:, np.newaxis, np.newaxis]
y = y[np.newaxis, :, np.newaxis]
z = z[np.newaxis, np.newaxis, :]
x.shape, y.shape, z.shape
((100, 1, 1), (1, 50, 1), (1, 1, 20))
f = np.exp(-x ** 2 - y ** 2 - z)
^Discussion: what happens when we say C = A x B? Python looks up A and B first, then allocates an empty array C where results of the operation will be stores (it creates an instance of a class and then fills it up). What if you also want to multiply by 2? First multiply A and B and creates the temporary storage space that we dont have access to, then it creates a second block of memory that will be C. The data from the temporary array gets multiplied by 2 and then stored there. 2 MEMORY LOCATIONS ARE CREATED! If it's small, we don't care. So what do we do to solve it if we have a big data set. The longer the expression, the more memory is going to be used. But it is easy to overcome!
f.shape
(100, 50, 20)
The most common operation that we have is the "." to find methods/propeties of classes. So the question is: how are classes implemented in python? What are they actually? Underneath everything is a dictionary! Makes sense for immediate lookup time, etc. The class itself is a dictionary.
ColoredPoint.__dict__
mappingproxy({'__module__': '__main__', '__doc__': '\n Colored point\n ', '__init__': <function __main__.ColoredPoint.__init__(self, color='black', **kwargs)>, '__repr__': <function __main__.ColoredPoint.__repr__(self)>, 'learning_glob_and_loc': <function __main__.ColoredPoint.learning_glob_and_loc(self, A)>})
Element-Wise multiplication of a 2-D array: if C=AxB, then C[i,j]=A[i,j] x B[i,j]. The dimensionality always has to be the same (length of tuple shape) but with broadcastng the i and j of the individual A and B arrays can be different.
Everything above can be on the quiz, the next section will not¶
Improving Memory Usage of Numpy (speeding up!)¶
Often if some new method is "faster", comparing it to an old method that is implemented in an unintentionally crappy way. We can speed up code by improving memory.
What if you just want to double x? x(times)=x is much faster than x=x(times)2 because it does not create a new array!
# x *= y
This gives an error because of dimensions! It would need to create new memory.
f *= x
This is ok because have the same dimensionality, and no new memory needs to be created.
A = np.arange(100)
B = np.arange(100)
So A and B are arrays with the same elements but are physically different blocks of memory.
#np.exp(A)
What if you want to save this into B?
# np.exp(A, out=B)
Why did we get this error? The data in A and B are integers, but the exp(into) gives float! We need to make B into an array of type float!
np.exp(A).dtype
dtype('float64')
B = np.arange(100, dtype=np.float)
#np.exp(A, out=B)
Even if we had defined B before and then said B=exp(A), there would have been ANOTHER memory spot created! This is much better because it just inserts the data into the array B.
Now, say that this is ugly and we are thinking too much about how much we are designing code! We are going to use a new module as an example (numerical expressions)
from numexpr import evaluate
Evaluate accepts a string of the RHS of what you want to calculate
C1 = evaluate("exp(1e-20 * (1e10 * A) ** 2) * B * 2")
C2 = np.exp(A ** 2) * B * 2
all(C1 == C2)
False
By expanding and evaluating on paper, the expressions are equal. So why are these not equal?? It is rounding error (machine precision). Multiplication by a huge numer and then by a small numer will accumulate errors. Compare to analogy of pumping up a balloon until it almost explodes and then a tiny bit less- they will look the same! It becomes indestinguishable because surpassed the difference you can tell.
All methods are bound by this artifact! Every time you do anything, you accumulate errors. The numbers you get will become less and less trustworthy. This is sometimes called "numerically unstable". The stable version of the code is usually a mess by comparison.
Obviously, C1 is equal to C2 though! There is a method to check for roundoff error essentially. It will exactly compare C1 and C2 up to machine precision. ALWAYS use this when comparing floats!
np.allclose(C1, C2)
True
Remember that "eval" will take in a string in python. The "evaluate" is like that but optimized for numpy arrays.
C1 does not create any intermediate arrays! It is still doing it element-wise. It is not creating any problems with memory!
We will not use this method, even though it's pretty cute. We are going to use something else.
Numba¶
from numba import jit
jit = just in time
All of these issues with python have arisen because it is an interpreted language (ex- evaluate actually compiles the code!). So it would be nice to compile some of the code- this is done by "jit"
Just start a new function to show how it's done
def f(a, b):
return np.exp(1e-20 * (1e10 * a) ** 2) * b * 2
f(0., 10.)
20.0
#f(A, B) this will still work
Now we will see how to compile it:
@jit
def f(a, b):
return np.exp(1e-20 * (1e10 * a) ** 2) * b * 2
f(0., 10.)
20.0
Let's discuss what @ means: (jit is a function) it will first define f and then say that f=g(f). Then g is another function that takes a function as an argument. Note: adding another jit(f) inside would make it infinitely recursive. JIT means the code gets compiled when you try to run it! so defining the function like that would be fine, but the computer would freak out as soon as you tried to run it with any inputs.
Now let's time the 3 approaches:
A = np.arange(1000000, dtype=np.float)
B = np.arange(1000000, dtype=np.float)
%timeit C2 = np.exp(A ** 2) * B * 2
3.8 ms ± 61.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit C1 = evaluate("exp(1e-20 * (1e10 * A) ** 2) * B * 2")
1.15 ms ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%timeit f(A, B)
2.58 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)