Any tips to help a scientist become a better programmer?
Hey there!
I'm a chemical physicist who has been using python (as well as matlab and R) for a lot of different tasks over the last ~10 years, mostly for data analysis but also to automate certain tasks. I am almost completely self-taught, and though I have gotten help and tips from professors throughout the completion of my degrees, I have never really been educated in best practices when it comes to coding.
I have some friends who work as developers but have a similar academic background as I do, and through them I have become painfully aware of how bad my code is. When I write code, it simply needs to do the thing, conventions be damned. I do try to read up on the "right" way to do things, but the holes in my knowledge become pretty apparent pretty quickly.
For example, I have never written a class and I wouldn't know why or where to start (something to do with the init method, right?). I mostly just write functions and scripts that perform the tasks that I need, plus some work with jupyter notebooks from time to time. I only recently got started with git and uploading my projects to github, just as a way to try to teach myself the workflow.
So, I would like to learn to be better. Can anyone recommend good resources for learning programming, but perhaps that are aimed at people who already know a language? It'd be nice to find a guide that assumes you already know more than a beginner. Any help would be appreciated.
As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.
I recommend you use
git. It's nice to be able to revert changes without worry.
descriptive variable names. The meaning of descriptive is highly dependent on your situation. Single letters can have an obvious meaning, but err on the side of longer names if you're unsure. The goal is to be able to look at a variable and instantly know what it represents.
virtual environments and requirements.txt. when you have your code working you should have pip (or anaconda or whatever) take a snapshot of your current python installation. Then you can install the exact same requirements when you want to revive your code a few months or years down the line. I didn't do that and it's kinda biting me in the ass right now.
As a researcher: all the professional software engineers here have no idea about the requirements for code in a research setting.
As someone with extensive experience in both: my first requirement would be readability. Single python file? Fine with that. 1k+ lines single python file without functions or other means of structuring the code: please no.
The nice thing about python is that your IDE let's you jump into the code of the libraries you're using, I find that to be a good way to look at how experienced python devs write code.
You can jump to definition in any language. In fact, python may be one of the worst ones, because compiled libraries are so common. "Real signature unknown" is all you will get some times. E.g. Numpy is implemented in C not python.
My point about the jumping into was that you can immediately start reading the sources. Most alternative languages are compiled in some form or other so all you'll see is an API, not the implementation.
My comment was not asking for clarification, I am contradicting your claim.
Granted, my experience is mostly limited to python and rust. But I find that in python you reach the end of "jump to definition" much much sooner. Fundamental core libraries of Python are written in C, simply because the performance required cannot be reached with python alone. So after jumping two levels you are through the thin wrapper type and your compiler will give you an "I don't know, it's byte code".
In Rust I have yet to encounter this. Byte code is rarely used as a dependency, because compiling whatever is needed is no issue - you're compiling anyway - and actually can allow a few more optimizations to be performed.
Edit: since wasm is not yet wide spread, JavaScript may be the best language to dig deep into libraries.
Mostly ML or data processing libraries I would assume, I've read tons of REST server and ORM python code for instance, none of that is written in C.
Wrt rust: no experience with that. I do do a lot of C++, there you quickly reach the end as typically you're consuming quite a bit of libraries but the complete sources of those aren't part of what is parsed by the IDE as keeping all that in memory would be unworkable.