Archives for July 2011


EuroPython 2011: Simon Willison on Advanced Aspects of the Django Ecosystem

Links: Talk description and video and slides.

This talk will be about 3 tools, that can be considered secret weapons: they offer great payoffs, for low efforts.

(Note: the slides enhance most of these concepts with lots of code examples, have a look!)

Haystack

Haystack does full text search, and is available as modular search for Django. It's very easy to get a nice search interface if you already use the Django ORM, and the queryset can also be defined to limit search queries to what you want (e.g. only published entries).

You can have different templates/html bits depending on the type of objects returned by the search.

Scaling/Backend

  • Woosh (Python) - Good if you have no write traffic, and not a lot of traffic in geenral
  • Xapian (C)
  • Solr (Java) tends to be the default choice. It has an HTTP interface, and there are tons of things that are already baked into Solr, like filtering down by object type. Objects can be associated with a template, although it sounds like it's more about relevance than display: the speaker mentioned showing the title twice in the template to increase its weight in search results. It can scale with SearchQuerySet, and works faster than complicated crazy SQL.

Search indexes usually don't like being updated much. Haystack offers several solutions. Sites with low write traffic can update the index in real time at every change. Or changes can be batched every 6 hours. At a higher scale, you have to roll your own solution. For Lanyrd they have added a "needs_indexing" boolean to their models that defaults to True and is also set in the save() hook. Then using a management command or something else, it's possible to look at what needs to be indexed, process it and set the flag to False.

Solr has master/slaves capabilities and knows how to balance the reads between slaves, the writes should be sent to master. Haystack only knows how to talk to one url, but using nginx it's possible to balance and to set up different proxies depending on the URLs to make sure the writes go where they should -- remember, Solr speaks HTTP.

Celery

Celery is a fully featured, distributed task queue manager. Any task that would take more than 200ms should go on the queue! For instance...

  • Updating the search index
  • Resizing images
  • Hitting external APIs
  • Generating reports

Using the @task decorator, the method works normally if called directly, but also gains a delay() that adds the method to the queue to be picked up by workers.

For tasks launched by users (such as uploading a picture or figuring out what's at a url):

  • To deal with people using Javascript or not: if 'ajax' in request.POST, show an inline loading indicator, otherwise redirect.
  • Use memcached for automatic house keeping, in case the user closes the browser and doesn't come back, don't keep the task around forever. The oldest will get dropped out automatically after a few hours.

Use celerybeat for scheduling, celerymon for monitoring the worker cluster, celerycam to take snapshots -- this helps figuring out when/where things go wrong.

The activity stream pattern gives everyone an "inbox" when everyone needs to receive something, like a tweet: it gets written to everyone's stream. Redis can handle 100,000 writes/second and is a handy tool to deal with this; this is also the kind of tasks that's a great candidate for queueing.

Fabric

Fabric is great for automated and repeatable deployments, it also makes it easier to roll back. You could use chef and puppet, which are ridiculously powerful but quite complex to set up. Fabric fits the developer mental model better generally, it kind of wraps your current processes into Python.

For instance, you can create a clear_cache() that calls flush_all() on the cache. Then, to clear the cache on your server, call from your machine:

fab -H host1,host2 clear_cache

The file (fabfile.py) is version controlled therefore documenting your process -- so you don't forget how to do it 6 months from now!

env is a global variable used by Fabric, you can add your own variables to it that can be reused in other commands, for instance env.deploy_date to store the deployment date and time and make it easier to roll back and roll forward (he uses symlinks).

They use a servers.json configuration file that documents the instance id, public dns, private dns, names and roles (solr, memcached, etc). Fabric can use this to deploy, nginx can use it to load balance, Django can import it in the settings to know what to talk to.

Leave a comment

Squishy Circuits Workshop: A success!

The Squishy Circuits workshop organised in Tog yesterday was a success! Many thanks to awesome teacher Triona for showing us how to make a mess, I mean, have a lot of fun with conductive dough and learn a few basics of electricity and electronics :) Triona also shared tips on teaching the material, especially to kids.

Picture of a group of people during the workshop

Thanks a lot also to all our attendees for making the event such a success (and for helping clean up afterwards :-))! Notably this was a very kid-friendly event, and I hope we can have more of these in the hackerspace -- show first-hand the next generation of could-be scientists how fun and interesting science and technology are!

A tiny sample of the afternoon's creations:

(Update: You can see more pictures over at the Tog website)

Leave a comment

EuroPython 2011: Simone Federici on Django Productivity Tips and Tricks

Links: Talk description, video and slides


Know the environment

Use Linux with the "Terminator" shell.

Use a version control system.

Use virtualenv, for managing different versions of Python and dependencies, e.g.

virtualenv path --python=python2.7 --no-sites-packages
system libs: ./configure --prefix=envpath ; export LD_LIBRARY_PATH=envpath/lib

Use yolk, handy tool to query pypi and status of pypi installed packages

yolk -l (to see installed packages)
yolk -U (to see if there are updates on pypi)
yolk -a

Use the bash autocomplete script.

Use djangodevtools (PyPi/site with description of the new commands), for lots of useful things such as adding test coverage (./manage.py cover test myapp).

Continuous integration

Use Buildbot and Django-Buildbot (note: there was a configuration example on the speaker slides).

In settings.py:

try:
    from setting_local import *

setting_local.py shouldn't be shared or checked in.

Thanks to alias, from djangodevtools, you can simply create commands to do whatever you want, e.g. clean up rabbit queues. The commands are stored in a manage.cfg file that is shared.

uWSGI is an application server with many options. --auto-snapshot sounds quite interesting. It supports clustering. It'll be in the official Django deployment documentation from the next release (ticket 16057).

Coding

get_profile() tends to be a problem. It's possible to monkey patch the User.get_profile() method (to make sure a new profile is created if it doesn't exist) but you have to be careful where it's loaded. It's also possible to use a Meta proxy together with a new middleware (set up after the authentication middleware)

Django model form factory (django.forms.models.modelform_factory) sounds interesting to create forms more quickly.

uWSGI

There was an on-the-fly short talk on uWSGI after the talk, by someone whose name I didn't pick up. It can talk to many protocols, it has lots of plugins so you can only use what you need. It's not the fastest but speed isn't the main factor that should make you decide to use it.

Leave a comment

EuroPython 2011: Brian Fitzpatrick on the Myth of the Genius Programmer

Link: talk description and video

A lot of user questions for the Google Code project are along the lines of "how do I hide code until it's ready", "how can I wipe the history and start from scratch" and so on: they are about people's insecurities.

When you have elitism and anonymity, suddenly everyone is elite. There's a whole mythology that gets built around programming heroes (Torvalds, van Rossum, Walls).

There are no genius. The only thing this has created is a fear of showing mistakes. This insecurity inhibits progress, makes the process slower, brings lower quality and a lower bus factor.

Avoiding the trap

  • Lose the ego
  • Criticism is not evil: give and take it politely. You are not your code. People criticising your code are not out to get you.
  • Embrace failure (just don't fail the same thing over and over!)

The speaker shared an interesting story (probably an aphorism?) of an executive that makes a mistake that costs his company 10 million dollars. The day after, he starts packing up his things and when the CEO summons him to his office, he's ready to hand in his resignation, saying there is no need to fire him. The CEO replies "Fire you? Why would I do that? I just spent 10 millions dollars training you!"

  • Iterate quickly
  • Practice is key
  • Be a small fish, as in don't be the smartest person in your company. You'll learn much more and faster
  • Be influenced. "Alpha" programmers think they know everything and won't ever listen -- you may find that you actually gain more influence, by being willing to be influenced!
  • Be vulnerable, repeated vulnerability becomes a strength long term.

Tools

Tools won't solve sociological problems, but they may influence behaviours. Pay attention to the tools, they can influence culture and moral, for instance by encouraging "cave" behaviour where developers work on their own for a long time and dump a big chunk of code: it's bad for collaboration, reviewing, etc.

You don't need to hide a project until it's "ready." Simply don't advertise it. People may find you because they are looking for something like this.

Don't let people collaborate until it's too late: they may help with code reviews, or pointing out to you existing libraries you missed. If it's too late in the project, they have no possibility to drive, to be a strong part of the project and it's less likely they will contribute.

Certainly get a prototype ready, some running code and some design, but let it still be something that you're happy to step back on.

Conclusion

  • Don't try to be a genius
  • Collaborate early and often
  • Pay attention to default behaviours (the ones encouraged by tools especially)
  • Pay attention to timing

...and if you do all this, people will think you're a genius!

Some of the questions/responses

Make sure to write a manifesto with the direction you want for your project from early development, so there are no major clashes or misunderstandings later on when people get involved.

If you don't care about credit, wow will you go places. If someone "steals" your idea and they have more reach (e.g. more clout/connections), it's great! It means it's more likely the idea will be implemented, and you'll have more ideas anyway.

On influencing a new team you just joined with best/better practices: start by doing good work and building up a good reputation, then you get to pay it back on something you believe matters. You have to choose your battles, you can't step in front of every train (resistance to change is like a very fast train! Hard to stop.)

Leave a comment

EuroPython 2011: Raymond Hettinger on Python Tips, Tricks and Idioms

Link: Talk description and video

I couldn't find the slides online but please do link me if you find them, they were a treasure trove of awesome tips and very well laid out!

The talk touched on many things, the following tips are unrelated to each other and in no particular order.


Beware the corner cases of negative slicing -- if you use a variable for slicing there's likely a bug lurking in your code!

mytuple = 1, 2, 3 # Tuple declaration also works without parenthesis

Learn Python the way Guido intended it and indented it :P

for x in reversed(seq): # Better than negative slicing because it's clearer, there's no need to do a double take

for i, x in enumerate(seq): # That's Pythonic! Forget about i in range, seq[i]

for x, y in zip(seqx, seqy)

for/else: a for loop with an else, the else executes if there was no break in the loop, that is, when the loop runs to completion. It's only useful when there is a break (the searching use case).

None is always smaller than everything else, in comparisons.

Sets aren't guaranteed order after being sorted, because __lt__ has been overridden to indicate subsets and supersets.

If you intend to define __lt__ you must also implement the 6 ordering functions.

# Key function:
sorted(s, key=str.lower) # (awesome!)

If a class takes in an iterable, and emits iterables lots of awesome may occur, with the prebuilt tools and other surprising uses (kind of works like pipes and filters).

d = defaultdict(dict)
d[stuff][otherstuff] = "blah"
# as opposed to
d = dict()
d[stuff] = dict()
# etc...

d.update(dict(d))
Leave a comment | 3 so far

EuroPython 2011: Mark Ramm on Relate or !Relate

Link: Talk description and video

This talk was about non-relational databases. I didn't take a lot of notes :o) The most important morale is probably: don't keep the mind altering substances and the tools in the same shed.

With 2 decades of relational databases, they are pretty robust by now. They cover different spectrum of ACID compliance ; for instance MySQL is faster, Postgres is more reliable (though becoming faster... if you tick off the reliability options!). Relational databases are supposed to be normalised, except they are not really: there is also a spectrum here as databases tend to get denormalised for performance reasons.

Amazon uses an "eventually consistent" system, which they can pull off by charging at shipping time only. Conflicts are rare, if 2 orders are placed and there is only 1 item available, someone might get a gift certificate instead.

The NoSQL taxonomy includes wildly different tools that don't have much in common except for the fact that they don't use SQL: key-value stores, document stores, ....

CAP: Consistency, Availability, Partition tolerance. You can have 1 or 2, not all 3. (Brewer's Theorem)

There was only 1 Postgres database for all of SourceForge for a long time, while they were in the top 100 sites. Don't obsess about scale you'll never achieve.

One of the question was about how difficult it is to convert from a relational database to NoSQL. The answer is, from something like Postgres to MongoDB, it wouldn't be that much work (he did suggest 4 people 6 weeks though, which doesn't sound that trivial to me). Changing to Cassandra on the other hand would be a huge effort.

Leave a comment

Successful origami workshop

The origami workshop in Tog on Friday night was a success \o/

Turn-out was great despite a couple of late cancellations, another Tog member with strong origami powers stopped by to help out if needed, and everybody seemed to be having fun, and I think everybody learnt something new -- I certainly did, though I could bear to be told again how to do some of the shapes we built!

 

People came with varied levels of origami skills, and of maturity :-)

I need to start bringing a real camera to these event, my phone don't do them justice. I'm really enjoying these beginner friendly workshops on different topics. If you're around Dublin and have some interesting technological or artistic skills you'd like to share over an evening or week-end afternoon, please get in touch with me!

See also:

Leave a comment

EuroPython 2011: Mark Ramm on A Python Takeover

Link: Talk description and video

2 years and a half ago, SourceForge was all PHP except for one little Python service. SourceForge was originally written in 1999, when Python wasn't so great at the web. After 10 years though, it all started to atrophy.

So SourceForge decided to do a little experiment and assigned 2 people to create FossFor.us. They chose a web framework, thinking that Python was good and impressed with the Django documentation: there was no real learning curve. Couchdb was a bit slow, though.

The experiment was a success (although the site is now gone, due to managerial reasons if I understood correctly) and so they were then given 6 weeks to redo the download page/files of SourceForge. It had to be dynamic, to offer you a download relevant to your platform: they use the user agent string to figure out the operating system, and borrowed code from setuptools to figure out which release is the latest.

Their admins loved Apache so they went with mod_wsgi, which worked out well.

The whole system worked fine on a laptop handling all the traffic from SourceForge: that was the only load testing! And unfortunately when the system went live it took about 8 seconds to load a page. There was no CPU or memory usage. It turned out they saturated their gigabit Ethernet card, by loading all the releases, which some projects have a lot of (e.g. JBoss). Memcached turned out to be slower than MongoDB, because of pickling, the 4 megabytes objects and CPU.

Finally they updated the list to not include all releases and the project was deemed a success! Thus everything shall become Python. They are developing their new platform openly, Allura. They use a FUSE filesystem in Python to control permissions.

From now on, they have an internal mandate that everything should be written using Python unless there is a good reason not to. And now SourceForge can compete again! As well as explore new directions.

Leave a comment

EuroPython 2011: Lightning Talks

The lightning talks were very fast paced (5 minutes) so I only jolted down some project names I want to check out and interesting tidbits, and missed speaker names just about all the time. Sorry!

To easily create diagrams, check out blockdiag. It includes different shapes that make me hopeful it might be a less painful way to do nice topologies.

Someone's project to remove the GIL in Pypy (future):

global_lock.acquire() / .release()
object.acquire()
with Transaction():

Learning a language in 60 hours: http://sotospeak.se/ (English homepage). It's a piece of software for your mobile phone, that encourages you to learn a language like children do. It's written in Python.

shlex for simple lexical analysis.

Python Edinburgh are a bunch of cool folks with their own conference :)

DjangoZoom, effortless deployment for Django (like Heroku?)

pip install null, if a need to use the Null object pattern arises.

Leave a comment

EuroPython 2011: Nicholas Tollervey on the London Python Code Dojo

Link: Talk description and video

The Python Code Dojo is a community organised monthly meeting.

Dojo

A dojo is a place where you go to practice stuff, learning is a continuous process. It's based on the idea of deliberate practice.

Paris

Codingdojo.org was started in Paris, where it follows a very structured format.

Katas are forms that you practice to prepare yourself. You learn how to solve a problem using baby steps. In Paris they do this in silence, unless you really don't understand and have to ask a question. "Randori kata" is public pair programming, with a pilot and a co-pilot that solve a problem on stage.

London

The London Dojo works more like a seminar and attendees are encouraged to interrupt. Participation is expected. They do team dojo where the team must solve a problem within a timeframe. Problems are written on a blackboard, people vote for one and then everyone works at solving it in a team of 5 or 6 people over 1h30. Finally there is a show, tell, review and question event where each team presents their solution/approach.

Why participate in a dojo?

  • The educational benefit, of learning by doing
  • You can fail safely in a sympathetic environment, and experiment
  • People teach one another, all levels can attend
  • You build a community: in London, that's relaxing with pizza and beer

What's a good dojo?

From the attendee's perspective: it's fun, you get to solve problems, it's safe to make mistakes, show and tell is encouraged which is good to get feedback.

From the organiser's perspective: it self-organises, mostly.

To see if it's going well: see that there is a positive aim, something is done to reach this aim, with some sort of feedback at the end.

Personal observations

Beware of systems and gurus. Ignore systems if something else works for you, you can actually do damage otherwise. Learn to practice learning!

Q&A tidbits

When they (or another dojo?) started using meetup.com they doubled their numbers! Or EventBrite, the idea is to have a centralised system, with tickets to predict attendance.

Leave a comment

Archives