MovieLens-25m Experiments (Part 1): Matrix Factorization

Matrix Factorization modeling and ablation experiments using the MovieLens-25m dataset. The aim is to better understand different factors that contribute to the effectiveness of matrix factorization for recommendation use cases.

Backtesting for Intraday Execution

Intraday execution involves buying or selling a certain quantity of shares in a given time period. Backtesting is really important in trying to improve execution algorithms. This post explores a backtesting for a simplified scenario.

My Rules of Thumb for Unit/Integration Tests

RPC Frameworks: gRPC vs Thrift vs RPyC for python

I recently looked into RPC frameworks (gRPC, Thrift, and RPyC) when I was looking to migrate a set of python classes to a service. I am summarizing my initial findings in this post. Because I mostly use python for everything, I am approaching these frameworks from that point of view.

Stock Movement Prediction from Tweets and Historical Prices (Paper Summary)

This paper suggests a way of using both historical prices and text data together for financial time series prediction. They call it Stocknet. There seems to be 2 major contributions here: (a) Encoding both market data and text data together, (b) VAE (Variational AutoEncoder) inspired generative model.

Microbes

Microbes are fascinating. They are intriguing. And we're just starting to find out the relationship they have with their hosts (us humans). I recently read 'I contain multitudes' book. It turned out to be much better than my expectations. I attempt to highlight intriguing points from that book along with other things I picked elsewhere.

Why is machine learning in finance so hard?

Financial markets have been one of the earliest adopters of machine learning (ML). People have been using ML to spot patterns in the markets since 1980s. Even though ML has had enormous successes in predicting the market outcomes in the past, the recent advances in deep learning haven’t helped financial market predictions much. While deep learning and other ML techniques have finally made it possible for Alexa, Google Assistant and Google Photos to work, there hasn’t been much progress when it comes to stock markets.

Python - C++ bindings

Python - C++ bindings are useful for several reasons. Performance is one of them. Exposing existing C++ classes to a python module is another important reason.

Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals (Paper Summary)

Factor-based strategies are very common in quant funds. Doing a good job of forecasting the fundamentals directly translates into better returns in the factor strategies. The authors used the US company data from 1970 to 2017. They compare MLP/RNN approach against the linear regression and a naive predictor.

A Brief Review of Reinforcement Learning

Reinforcement Learning is a mathematical framework for experience-driven autonomous learning. An RL agent interacts with its environment and, upon observing the consequences of its actions, can learn to alter its own behaviour in response to the rewards received. The goal of the agent is to learn a policy ππ that maximizes the expected return (cumulative, discounted reward).

Machine Learning for Intraday Stock Price Prediction 2: Neural Networks

This is the second of a series of posts on the task of applying machine learning for intraday stock price/return prediction. Price prediction is extremely crucial to most trading firms. People have been using various prediction techniques for many years. We will explore those techniques as well as recently popular algorithms like neural networks. In this post, we will focus on applying neural networks on the features derived from market data.

Machine Learning for Intraday Stock Price Prediction 1: Linear Models

This is the first of a series of posts on the task of applying machine learning for intraday stock price/return prediction. Price prediction is extremely crucial to most trading firms. People have been using various prediction techniques for many years. We will explore those techniques as well as recently popular algorithms like neural networks. In this post, we will focus on applying linear models on the features derived from market data.

StarSpace: Embed All The Things! (Paper Summary)

This paper describes a way to generate embeddings for various tasks. The algorithm is general enough which enables it to achieve strong results in very diverse tasks.

Deep Neural Networks for Youtube Recommendations (Paper Summary)

Youtube switched their recommender system from matrix factorization to neural networks few years ago. This paper describes the neural network models as well as the overall system around it, including the data processing and deployment aspects.

Deep learning networks for stock market analysis and prediction (Paper Summary)

In this paper, Deep learning techniques are applied to the financial market data directly rather than using any text/alternative data sources. This has been a relatively tricky dataset for any non-linear machine learning technique because of the extremely high noise-to-signal ratio. The authors use a relatively high-frequency dataset sampled at every 5 minutes. They consider 38 stocks from Korea KOSPI.

Deep Learning for Event-Driven Stock Prediction (Paper Summary)

In this post, I attempt to summarize this paper by Ding et al. This paper proposes a way to use the convolutional neural network on the news events for stock direction prediction. The data and the prediction granularity is 1-day.

Practical Text Classification for Production Systems

This post is about using a relatively simple yet powerful text classification model for a production text classificaiton system. Other topics like deployment, testing for out-of-sample texts are also discussed - they are often not the sexiest aspects, but it makes sense to discuss them in this post.

SELU vs RELU activation in simple NLP models

RELU activation function has become the de facto choice in neural networks these days. Few weeks ago, some researchers proposed Scaled Exponential Linear Unit (SELU) activation function. They show a far better convergence using SELU. In this post, I am posting a simple comparison of SELU against RELU using a simple BoW model on SNLI dataset.

Prophet - Time series prediction

Predicting daily (and intraday) volume is a classic time series problem in finance. We try to use the Prophet library for this task.