Alpha pursuit: how we build machine learning investment products

Let’s start with the good news:

Companies in North America added a record number of robots in the first nine months of this year as they rushed to speed up assembly lines and struggled to add human workers.

Factories and other industrial users ordered 29,000 robots, 37% more than during the same period last year, valued at $1.48 billion, according to data compiled by the industry group the Association for Advancing Automation. That surpassed the previous peak set in the same time period in 2017, before the global pandemic upended economies.

In the real estate industry the excitement around ML/AI keeps growing. It’s not hard to understand: robots work faster than humans, 24/7, they don’t take vacations and don’t call in sick.

I recently spoke at Realcomm and my friends at Zetta Advisory conducted a quick survey during the data analytics panel. The results showed that over 20% of the institutional managers regularly use advanced data analytics now (compared to ~10% last year). This is encouraging.

At Capital Brain we’ve built, tested and launched several investment products and the product methodology evolved around two approaches: ranking based (beta-generating) or probabilistic (alpha-generating).

Ranking based approach – find your next investment:

This idea is pretty straightforward. You build a library of factors that you think affect the value of real estate in the MSAs you are after (better yet you build a model that tells you which MSAs to go after). Then you run correlations and determine which factors actually matter.

We learned that various factors affect properties differently depending on the MSA. For example, office buildings in Kentucky could be affected by per capita income to much larger extent than office buildings in Manhattan.

If you are investing in physical assets such approach will allow you to understand which markets to go after or stress-test your existing portfolios. This predetermined, user-driven approach is rather simplistic – the analyst is making an assumption and the model determines if the factor is positive or negative and to what extent. Once you add a feedback loop, it will improve knowledge about the performance of various factors.

This approach works best with just a few factors for an investment shop that has very focused strategies, however, it heavily relies on the human understanding of the subject and the assignment of correct weights to the factor contribution to the ranking. It doesn’t work well when you need to solve for factors multicollinearity. Although, from programming standpoint it is not that hard to execute, but in reality this approach tends to result in overfitted model, that performs well on test but has inferior performance in real life.


Once you add a feedback loop, you start implementing so called judgement-based approach when machine learning is used to write rules-based software that’s able to make decisions for a human.

Such approach allows to create various sensitivity analyses in near real-time environment. For example, if you are tracking hyper-local submarket activity across MSAs and you know the correlation between certain types of events (like a conference or a new development announcement) and your asking rent, you can write a machine learning algorithm that would optimize the rent whenever the same type of event occurs next time.

Probabilistic (Quantamental) approach – alpha pursuit:

Truly a combination of human intuition and a computer power.

We have successfully implemented this approach for investing in public markets (e.g. REIT securities or S&P 500). Benchmarking analyses show consistent outperformance of the benchmark in 500-1000 bps range. In a similar manner the algorithm can value physical assets and real estate portfolios. Unlike ranking-based approach, this methodology successfully eliminates human bias in the assignment of weights in factor contribution to the model.

The model is based on fundamental analysis of factors specific to each building or company, using statistical and machine learning techniques, with the objective to understand the relationship of factors to the building’s value or a share price performance.

In short, the algorithm pursues a probabilistic approach strategy, maximizing returns by allocating funds to properties or securities whose statistical probability of value or market capitalization appreciation is statistically significant and larger than minimal threshold of certainty set by the fund manager.

The main premise of this approach is that building value or company market capitalizations are primarily affected by a finite number of specific factors from four major fundamental categories: company overview, balance sheet, cash flow and income statement and loosely affected by an unlimited number of macroeconomic and space-investment market factors.

Probabilistic approach strategy takes all four fundamental categories into account; factors with highest correlation to building value or market capitalization movements greatly impact conditional probabilities which are then selected, rated – and as a result of machine learning algorithm (and our “secret sauce”) – a single probability of growth is returned.

For example, the “best fit” blue trendline on the graph below shows that we correctly predict general trend for a sample REIT share 3 quarters out, minus the volatility fluctuation beyond the model inputs or bad news aka Black Swan event (Covid, acts of God etc. – isn’t Covid an act of God?).

This approach identifies not only the possible “winners” as a ranking-based approach would, but also gives a fund manager an insight of potential “losers”, opening a plethora of investment and trading strategies opportunities.

Implementation approach

It usually takes about 3-6 months to set up a pilot for a model and then an additional 12-18 months to train the model once you add a machine-learning loop. As always, when working with data, you have to be very careful what you put in, garbage in garbage out rule applies.

Knowledge of what to build

Aside from the usual issues such as the quality of data and elimination of human bias, there are also business-related barriers of building machine-learning products. Understanding where machine learning will help the company the most. Where the biggest effect on the bottom line would be. This initiative should come from the top. The executive team must realize that these days machine intelligence within a real estate investment world is not a tactical opportunity, it’s a strategy. Once implemented, it’ll change the operational models and perpetually change how the work is distributed among people and systems within a real estate fund.

Planning / Piloting / Communication

It’s no doubt, implementation of machine intelligence requires disciplined execution, time and some grit. You have to think about how automation will affect the composition of your talent pool and the org chart. It’s clear that after implementation the size of your team will change (shrink) and you have to keep an open communication about it. But in the end, the team will benefit greatly: they’ll be performing less mechanical tasks, will spend more time applying judgement, and will acquire new skills to accommodate higher value work.

Talent and Org Structure

Once the fund goes beyond traditional technology set up (purchasing existing technology solutions from industry vendors and maintaining them in-house by a group of admins), it will have to rethink the hiring strategy.

Machine learning products are built by data scientists, they are expensive and the good ones are hard to come by. PhD-level data scientists hiring costs are around $1 million a year now. The CTO’s knowledge base needs to go beyond the IT systems management to be able to manage the data scientists. If the fund is strategically invested in machine intelligence for the long haul CTO rightfully becomes a major stakeholder and a member of the executive team (we discussed this in the paper I co-authored for the Journal of Property Investment and Finance).

Machine intelligence will do more than make a few processes better. With the right approach it’ll will make you challenge the established thinking, will bring substantial decrease in costs and a huge competitive advantage in the form of higher ROI and alpha for your fund.

If you want to learn some valuable tips on using machine learning in your real estate fund operations, please Reach Out