Python 3: The Next Generation
Python was created in 1991 -- it's 20 years old now! Lots of time for cruft.
Python 3 is backwards incompatible. Stuff will break. Hopefully the migration will not be grudging, the main thing for most programs will likely be unicode strings.
In 1997 Guido wrote "Python regrets" which later on and with other things became the basis for Python 3000.
A few of the changes (note from me: they were not all mentioned and I didn't take note of everything either!):
- The print statement becomes the print() function.
- Numbers, divisions: division will now be true division by default (1/2 == 0.5 as opposed to the current 1/2 == 0 -- better to teach, especially young ones)
- Dictionaries will use itertools by default to save memory
- Likewise for built-ins like map/filter, many changes to have better speed and/or memory efficiency
With regard to migrating:
- Wait for your dependencies to port
- Have a good test suite
- Move your code to at least 2.6, which is when Python 3 functionality started to be backported
- The -3 switch tells you about incompatibilities
- The 2to3 tool offers diffs on what should be ported
Be careful with Python 3 books, if they cover 3.0 they are already obsolete. Lots of changes!
This talks aims to fill in the gaps in the knowledge of not-quite-beginners-anymore. We'll have a look at: the memory model, mutability, and methods.
Mutable objects are passed by reference, immutable objects (like a string or number) are passed by value.
class Stuff: version = 1.0 john = Stuff()
As a shortcut you can call john.version rather than Stuff.version. However if you assign john.version = "blah", you're hiding access to the class 'version' attribute, and only changing this attribute for the john instance -- basically creating a new local variable.
To initialise, if there's one parent you should do Parent.__init__(self). If you're lower down, try to use super() for MRO and stuff. Extra reading: Python's super() considered super! and Python's Super Considered Harmful.
You can make your own classes act like Python types, by overloading existing operators and existing built-in functions. There are loads of them! __init__, __new__ (for immutable objects), __len__, __eq__, __getslice__, __getitem__, __contains__ (for the 'in' keyword), ...
How to (not) use the stack - performance
The **timeit** module will run a problem a million times and help find which implementation is faster.
The second one is faster, that's the penalty of a stackframe: the len() value is not cached because it could change! (With overloading, etc.) With the 2nd version we removed a function call.
Objects and references
All Python objects have:
- An identity (similar to the memory address)
- A type
- A value (only that one can change, sort of)
References are also called aliases. There is a reference count to track the total number. Variables do not hold data, they point to objects. For instance if 2 variables are assigned 1, they would both be pointing to the same (immutable) Integer object representing 1.
3 types of standard types
His classification, nothing official.
- Storage: Linear/Scalar vs. Container
- Update: Mutable vs. Immutable
- Access: Direct (number) vs. Sequence (string, list) vs. Mapping (dictionary)
"Interned" numbers are a list of integers in range(-5, 257): these are special numbers that are never deallocated.
"is" is a keyword to assess if 2 variables point to the same object.
Beware shallow copy vs. deep copy when you have a list of lists.
Growing memory: When you call .append() and the list is full, 4 free units are malloc'ed, then double that, then a bit less: usually about 12.5% additional free slots are created at once in advance. What this means is that you should try not to have a lot of short lists.
Shrinking memory is a fairly inexpensive operation.
Inserting in the middle of a list creates a lot of shifting. Deque is better for push() and pop(), though less good for access.