What Is Python?
In our June BenchMarq, we profiled our new Python for Finance Professionals course and subsequently held our very first public class in September! The course also gained traction over the summer with many of our banking and pension fund clients.
Since then, we have received more inbound requests for private sessions with financial professionals who have little or no coding experience. A recent Globe and Mail article, titled “Investment managers need to become coders, says former CPPIB CEO”, has added fuel to the fire and more and more people are starting to ask about Python and how it could help them in their roles.
This article outlines what Python is and why it has become one of the most popular programming languages amongst finance professionals.
Python is an open-source interpreted programming language that has grown in popularity over the last decade due to its versatility and data science tools. Python includes real time debugging and packages which make it quick to learn and easy to create a program in a few lines of code (other languages require several lines just to load data).
Companies use Python in a variety of ways such as running primary business applications, filling holes in business systems (moving data from one system to another), automating systems (robotics), facilitating machine learning, scraping web data, and conducting data analysis.
How Is Python Used in Finance?
You don’t need a lot of programming experience to start seeing the usefulness and efficiency of Python. One of its most simple uses is automating day-to-day data manipulation that could take hours in Excel or other programming languages:
- Data Gathering – either from Excel files or web “scraping”
- Data Aggregation and Cleaning – combining multiple data sets into a merged table, cleaning up “messy” or incorrect data
- Data Analysis – summarizing data by categories, adding custom columns/formulas, etc.
- Data Visualization – plotting key data in graphs and charts
In the era of Big Data, many investment analysts at hedge funds, private equity firms, and pension funds are finding and using alternative data sets to get an edge on their investment theses. The Globe and Mail article had examples of how BlackRock was purchasing mapping data from commercial satellites to look into the yards of steel manufacturers in China. This example just scratches the surface. There is an extensive list of data providers that sell proprietary data to hedge funds and other investment managers to help them get any edge on their investments. In addition, there is a plethora of public information available for free on websites; however, it usually takes hours to try and aggregate it unless you use a bit of programming to help speed up the process.
So, where does Python fit into all of this? Python really shines when you start running into limitations when working with large data sets in Excel. Each Excel sheet only has 1,048,576 rows (that’s in the latest version of Excel, older versions had even fewer rows). It’s not uncommon to see data sets today that can easily reach double digit millions or even billions of rows. However, you don’t need to reach the Excel limit to start seeing the application slow down. Even at around a hundred thousand rows, Excel can really clog up when you start running Pivot Tables, charts, and other summary analysis. Most of the time, financial analysts are not doing anything overly complex with their data sets; they are seeking an easy way to summarize the information to reveal any useful insights.
With Python, you can import large data sets from Excel files, .csv files, websites, etc. and then easily run different analyses but in a fraction of the time that it would take in Excel. Also, once you complete the analysis you can export the summary results to Excel or continue with the analysis in Python (such as data visualization, running regressions, or other types of analysis).
Python is not only great for loading and analyzing large data sets, but also merging data. Some of our clients are large organizations that have numerous input files from different investment companies in their portfolio, or different departments in their firm. Every week, month or quarter, they must open and aggregate these files to obtain a consolidated view. In Python, these steps can be executed in a few lines of code and then automated for new files in future time periods.
All these tasks are fairly easy to learn with a bit of practice and we teach them during the first day of our Python course. This course is intended for financial professionals with very little or no programming background. For most investment analysts and managers, this is usually more than enough in terms of gaining efficiency for their day-to-day tasks. However, our second Python day delves into more complex uses of Python in finance. With Python you can perform many of the tasks that analysts are currently doing in Matlab, R, or other statistical programs such as running linear and time series regressions and solving optimization problems. These are used to build and test financial market analysis to explore common tasks such as capital asset pricing, time series forecasting, multi-factor models, and portfolio optimization. These analyses can sometimes take hours to configure in Excel or other programming languages. However, due to Python’s versatility and open-source packages that can be imported, many of these tasks can sometimes be done in just a few lines of code.
How Does Python Compare to Other Programming Languages or Software?
In our discussions over the year with group heads, we’ve learned that more and more teams are moving their analysis and programs from VBA, R or MATLAB to Python. Two of the main reasons for this are:
- Python is more versatile, full-fledged programming language
- Python is much easier to learn due to the simplicity of writing out code and the pre-existing packages that can be imported
Python vs R
There is still significant debate in the data science community over which program is better for statistical analysis. Both programming languages have pre-built packages that allow for regressions and time series functions. R was built by statisticians while Python has more recently become popular with data scientists for advanced regressions. However, since Python is a general-purpose language and has a more readable syntax (“grammar” and “spelling” of writing code), it has gained a broader reach outside of the statistician community, and has become more popular in the finance community due to its ease of data manipulation.
Python vs VBA
VBA is still considered the pinnacle programming language for financial professionals due to its integration with Microsoft Office applications. VBA is great when specific objects or things are manipulated inside Excel (such as inserting or deleting sheets, cells, formatting tables, charts, etc.). However, when it comes to data manipulation, aggregation, or more complex analysis, Python is easier to use. Not only is Python easier to write (it looks more like “plain English” than VBA does) but some programs could take hundreds of lines of code in VBA while only a dozen in Python. The primary reason being that Python’s popular packages can easily be imported for data manipulation, web scraping, and statistical analysis. VBA also runs into the same issues as Excel when working with larger data sets, because at the end of the day it is still manipulating the data inside Excel.
Future of Python in Finance
Marquee has watched the finance industry move towards Python because of the single language/environment to perform everything from data manipulation to more complex forecasting and statistical analysis. As the community becomes more well-versed in its applications, we will start seeing more advanced applications such as machine learning. Some of our more experienced contacts in the industry are already using it to create efficient investment portfolios and optimize weights of investments on the fly each quarter.
If you are interested in learning more about Python or how your team could use it at your organization, let’s chat. If you are interested in taking one of our courses, we will also be hosting another public session on January 22, 2019. Please stay tuned for our follow-up article in January that will demonstrate how to retrieve financial data from a financial website.