While it’s not your typical nerds in T-shirts meet-up; if you use Python to hack data, this conference is
Attendance
Curiously, for a ‘data science’ conference the attendance list (which I would have crawled LinkedIn with…heh), was not available. Bases on my (biased) observations, the attendance was roughly as follows…Type | Sub-type | Percentage (%) |
---|---|---|
Industry | 70 | |
Self-employed | 20 | |
SME | 40 | |
Sponsors | 10 | |
Large | <1 | |
Academia | 30 | |
Ugrad | <1 | |
Masters | <1 | |
PhD | 15 | |
Postdoc | 5 | |
Professor | 10 | |
Government | <1 |
* Self-employed contractors and consultants were very well represented.
Conference feel
A your data conference, not a ‘big data’ conference
Hadoop has delivered value for <10% of the companies that have installed itThis conference is data focused, i.e. focused on using the Python ecosystem to solve your data challenges. The focus is on practice, and practical tools, not theory.
- Paraphrase, anon
Type | Approx Size | Appropriate tools |
---|---|---|
Micro-data | <1Gb | Ipython |
Small-data (Memory-limited) | ~10Gb | Pandas |
Medium-data (Disk-limited) | <1Tb | Ad-hoc databases |
Big-data | Tb - Pb | Consider enterprise solutions, or grep |
His message was very clear. Life sciences tends to have very detailed, very heterogeneous data in hundreds to thousands of rows (small/medium data): let the data guide the solutions: you probably don’t need enterprise software, so just don’t waste your money.
A Python is useful conference, not a “Python is deity” conference
All tools are shyte, but some tools (Python!) are useful.Speakers like Russel Winder and his talk on the lack of computation efficiency in Python, even using libraries like numpy set a memento mori undertone to some of the more blatant Python triumphalism.
- Paraphrase, anon
An interpersonal conference, not a Cloister
The very high-level of interpersonal interaction is yet another way in which the conference betrays the nerds in T-shirts. This is very much a conference that one goes to seek guidance and solve problems.While there are always the stragglers that don’t head down the pub, a good 2/3s of the conference went for fruitful discussion and drink on Saturday. Unsurprisingly, pub attendance was lower on Sunday, but still fruitful.
A place to get hired/take action, not heavy on theory
Folks were hiring like crazy, and it was very much a sellers market.If you’re a job seeker anywhere on the Python+data spectrum, I’d strongly recommend attending. Companies were recruiting along the entire spectrum, everywhere from AWS-ineering to user-focused commercial data analysis with IPython notebooks (or re-dash, see Arik’s talk for more details on this user-friendly database interaction framework).
In-line with the action oriented nature of the conference, the Pivigo Recruitment founds were there, doing resume/CV screens and offering advice, both to students and established professionals.
If you are a PhD/Postdoc looking to make the transition, I highty recommend taking a look at their Science to Data Science training program.
Continuum may also be prototyping a training programme of their own through its Client Facing Consultant position. Not entirely sure, but 6-months of training via a 3rd-party consultuncy followed by an intentional poach (Continuum –> 3rd party) could be an interesting model.
Talks
I found the spread of talks fantastic. At least amongst the talks I attended…Type | Percentage (%) |
---|---|
Tools | 40 |
War story | 30 |
Skills | 20 |
Under the hood | 10 |
Tools
Tools talks were the most common. They covered ‘non-brand name’ and upcoming tools with emerging communities.- Will Usher: Sensitivity analysis with SALib
- David MacIver: Randomly test initial conditions in your code simply with Hypothesis. And while you’re at it, why not use contracts to enforce. Wasn’t a talk, but it should be!
(i) You want to learn about specific tools that may be applicable to your problem.
(ii) You want to collaborate on extending / adopting new tools.
War story
These talks gave the horrifying and nitty-grity details of a specific problem the speaker faced, and how they went about solving it (including gotcha’s and failures). The focus isn’t ‘wow, look at me’; but rather, this was some B.S., and I want no one to go through what I went through ever again.- Paul Agapow: Don’t use ‘big data’ tools when simpler solutions will do, particularly in the life sciences.
(i) You want help with the problems you are immediately facing
(ii) You want exposure to problems you’ve never thought-of.
Skills
These were high-level talks that focused more on skills and knowledge than specific tools.- Ian Ozdvald: Writing code for you is only the begining, lets see what it takes to push a Bloomberg model to production.
(i) You want to learn what you need to know in a new area.
(ii) You want an overview of a topic you’ve never heard of.
(iii) You want to chat with the speaker about specific War Stories, after the talk.
Under the hood
These talks focused on low-level implementation details of numpy, pandas, Cython, Numba, etc with a particular focus on performance and appropriateness. Personally, I found these talks the most useful. Where else can one gather such concentrated information from the mouth of the open-source contributors.- Russel Winder: If you want performance, use Python as a glue-language, and write your computationally intensive functions in a ‘real’ language.
- Jeff Reback: In pandas, think about idioms and built-in vectorization to get the most out of your code (then write in a ‘real’ language if you still need to go faster).
- James Powell: Why does writing good numpy feel so different than writing good Python: because the styles have diverged, and will probably continue to do so.
(i) You want a fire-hose of information about low-level topics.
(ii) You want to know how the ‘magic’ happens.
Take-home
This is very much a conference focused on solutions. If you have a problem, don’t be shy!. Ask around, and there will be people there that have faced similar problems, eager to help.As for me, I look forward to attending next year!