Julia for Structural Econometrics

(or: how I learned to stop vectorizing and love Julia)

Romeo P. Greminger, 19.02.2020

Outline

  1. Programming language as a tool
  2. A brief introduction to Julia
  3. Simple benchmark
  4. Where does the speed come from?
  5. BLP example
  6. Logit example
  7. Resources

Programming language as a tool

  • Economists are (on average) not computer scientists
  • We use programming languages as a tool to obtain results
  • So what do we care for in a programming language?

A list of demands

  • Short time to obtain results
    • Time to code it up
    • Execution time
  • Generalizability
  • Replicability
  • (Feasibility)

A brief introduction to Julia

What is Julia and what does it offer?

  • Dynamically typed programming language with just-in-time compilation
  • Fast execution of loops
    • No vectorization required
  • Solves "two language problem"
    • "Make it run, then make it fast" within same language
    • Low level control, performance macros etc.
  • Multiple dispatch
  • Package interoperability
  • Well implemented automatic differentiation
  • Easy access to GPU
  • Open Source (MIT license)

More information: https://docs.julialang.org/en/v1/

Caveats

  • Fewer off-the-shelf routines available
  • Precompilation time
  • Usage outside academia? Future?

Basic operations

In [1]:
A = [1 2 ; 3 4] ; display(A)
B = [1.0,2.0] ; display(B) 
C = A[1,2] ; C = A[:,2] 
D = A * B 
E = B' * B  
F = log.(A) 
tuple = (1,2,3) ;
2×2 Array{Int64,2}:
 1  2
 3  4
2-element Array{Float64,1}:
 1.0
 2.0

Functions / Methods

In [2]:
f(x) = x ^ 2 
@show f(2) 

# Equivalent: 
function f(x) 
    y = x ^ 2
    return y
end

# Keyword arguments (with defaults)
g(x; y = 1) = x + y
@show g(2)

# Generates error 
f([2,3])
f(2) = 4
g(2) = 3
MethodError: no method matching ^(::Array{Int64,1}, ::Int64)
Closest candidates are:
  ^(!Matched::Float16, ::Integer) at math.jl:862
  ^(!Matched::Regex, ::Integer) at regex.jl:712
  ^(!Matched::Missing, ::Integer) at missing.jl:151
  ...

Stacktrace:
 [1] macro expansion at .\none:0 [inlined]
 [2] literal_pow at .\none:0 [inlined]
 [3] f(::Array{Int64,1}) at .\In[2]:6
 [4] top-level scope at In[2]:13

Multiple Dispatch

In [3]:
f(x::Array{T,1} where T<: Real) = x .^ 2

# Now no error 
@show f([2.0,3.0]) 

# Note: f([2 3]) would give error, as is 2d

# Alternative: Dot notation 
@show f.([2.0,3.0]) ;
f([2.0, 3.0]) = [4.0, 9.0]
f.([2.0, 3.0]) = [4.0, 9.0]

Type hierarchy

  • Types have hierarchy, e.g. Array{Float64,1} is subtype of Array{Real,1}
  • If a method works on a type, it should work on all subtypes
In [4]:
# Check whether subtype  
@show Int32 <: Integer
@show Integer <: Real 
@show Array <: Real
Int32 <: Integer = true
Integer <: Real = true
Array <: Real = false
Out[4]:
false

Packages

In [5]:
using CSV,DataFrames 
@show  example_data = CSV.read("example.csv")

using Distributions
d = Normal() 
@show cdf(d,0.0) 

# Add functionality
import Base:+ 
function +(d1::Normal,d2::Normal)
    return Normal(mean(d1)+mean(d2),sqrt(var(d1)+var(d2)))
end
@show dist = Normal() + Normal(-1,2) ;
example_data = CSV.read("example.csv") = 3×2 DataFrame
│ Row │ var1     │ var2  │
│     │ Float64⍰ │ Int64 │
├─────┼──────────┼───────┤
│ 1   │ 1.2      │ 1     │
│ 2   │ 3.2      │ 1     │
│ 3   │ missing  │ 2     │
cdf(d, 0.0) = 0.5
dist = Normal() + Normal(-1, 2) = Normal{Float64}(μ=-1.0, σ=2.23606797749979)

Own types

In [6]:
# Composite type
struct ShachamParameters
    β::Array{Real,1}
    c::Float64 
end
pars = ShachamParameters([1,2],0.2)
@show pars.β

function logLik(p::ShachamParameters,d::DataFrame) 
    # DO CALCULATIONS
end ;
pars.β = Real[1, 2]

A simple benchmark

In [30]:
f_matmul(x,y) = exp.(x)' * exp.(y) 

f_dot(x,y) = sum( exp.(x) .* exp.(y) ) 

using Strided 
f_dot_strided(x,y) = @strided sum( exp.(x) .* exp.(y))

function f_loop(x,y)
    out = 0.0
    for ii = 1:length(x) 
        out += exp(x[ii])*exp(y[ii])
    end
    out
end

function f_loop_multithread(x,y)
    out = zeros(length(x))
    Threads.@threads for ii = 1:length(x) 
        out[ii] = exp(x[ii])*exp(y[ii])
    end
    return sum(out) 
end ;
In [31]:
using BenchmarkTools
using Random ; Random.seed!(42)
x = rand(50_000) ; y = rand(50_000) 
f_matmul(x,y) ; f_dot(x,y) ; f_loop(x,y) ;  f_loop_multithread(x,y) ; f_dot_strided(x,y) # Initial run 
println("Matmul")
@btime f_matmul(x,y) 
println("Dot")
@btime f_dot(x,y) 
println("Dot strided")
@btime f_dot_strided(x,y)
println("Loop")
@btime f_loop(x,y) 
println("Loop multithread")
@btime f_loop_multithread(x,y)

CSV.write("xy.csv",DataFrame(x=x,y=y));
Matmul
  892.394 μs (5 allocations: 781.42 KiB)
Dot
  774.009 μs (3 allocations: 390.72 KiB)
Dot strided
  252.094 μs (110 allocations: 399.58 KiB)
Loop
  764.695 μs (1 allocation: 16 bytes)
Loop multithread
  284.845 μs (33 allocations: 394.19 KiB)
  • Benchmark on university desktop (microseconds)
Julia 1.3 Matlab 2019a R 3.6.1 Python 3.6
Matmul 829 361 6,070 802
Dot 745 360 5,761 1,129
Loop 747 1,100 11,073 88,788
Dot Strided 251 --- --- ---
Loop Multithreaded 280 --- --- ---

Where does the speed come from?

Compilation

Translate human code into efficient machine code

In [9]:
# Julia code
function f() 
    x = 0 
    x += 2
    x *= x
    x += 2
    return x
end
f() 

# Assembly code (close to machine code)
@code_native syntax = :intel f()
	.text
; ┌ @ In[9]:3 within `f'
	push	rbp
	mov	rbp, rsp
; │ @ In[9]:7 within `f'
	mov	eax, 6
	pop	rbp
	ret
	nop	dword ptr [rax + rax]
; └

Just-in-time compilation

  • Compilation happens during execution of program
In [10]:
f(x) = x^3
@time f(3)
@time f(4) ;
  0.003183 seconds (966 allocations: 53.732 KiB)
  0.000001 seconds (4 allocations: 160 bytes)
  • Static compilation (Fortran/C): Compilation before running any code

  • Interpreted (R/Matlab/Python): No compilation, code calls subroutines

Types

  • For efficient machine code, compiler needs to know the types
    • Integer
    • Float64
    • etc.
  • Statically typed languages require type declarations
  • Julia is dynamically typed
    • For the previously defined functions, no types where specified

Type instability

Example 1

In [11]:
p_bad(p) = p < 0 ? 0 : p
p_stable(p) = p <0 ? zero(p) : p 

function outer_fun(p_fun)
    p = rand(Normal(),1000)
    return sum([p_fun(p[ii]) for ii in eachindex(p)])
end
outer_fun(p_bad)  ; outer_fun(p_stable) 

Random.seed!(42)
@btime outer_fun(p_bad)
Random.seed!(42)
@btime outer_fun(p_stable) ;
  44.169 μs (1443 allocations: 46.33 KiB)
  10.216 μs (4 allocations: 15.92 KiB)

Example 2

In [12]:
function f_bad(N)
    x = 1 
    for ii =1:N 
        x /= 2
    end
    return x 
end

function f_stable(N)
    x = 1.0 
    for ii =1:N 
        x /= 2
    end
    return x
end
f_bad(10) ; f_stable(10) 
Random.seed!(42)
@btime f_bad(10) 
Random.seed!(42)
@btime f_stable(10)
  6.610 ns (0 allocations: 0 bytes)
  1.201 ns (0 allocations: 0 bytes)
Out[12]:
0.0009765625

Checking type stability

In [13]:
@code_warntype f_bad(10)
Variables
  #self#::Core.Compiler.Const(f_bad, false)
  N::Int64
  x::Union{Float64, Int64}
  @_4::Union{Nothing, Tuple{Int64,Int64}}
  ii::Int64

Body::Union{Float64, Int64}
1 ─       (x = 1)
 %2  = (1:N)::Core.Compiler.PartialStruct(UnitRange{Int64}, Any[Core.Compiler.Const(1, false), Int64])
       (@_4 = Base.iterate(%2))
 %4  = (@_4 === nothing)::Bool
 %5  = Base.not_int(%4)::Bool
└──       goto #4 if not %5
2 ┄ %7  = @_4::Tuple{Int64,Int64}::Tuple{Int64,Int64}
       (ii = Core.getfield(%7, 1))
 %9  = Core.getfield(%7, 2)::Int64
       (x = x / 2)
       (@_4 = Base.iterate(%2, %9))
 %12 = (@_4 === nothing)::Bool
 %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return x

BLP Random coefficient logit

Nested fixed point algorithm: Iterate over $\delta_{jm}$ until observed equals model implied market share

$$S_{jm} = s_{jm}(\delta_m) \forall j,m$$

where $\delta_m$ is the mean utitility given by

$$\delta_m = x_{jm} \beta - \alpha p_{jm} + \xi_{jm} $$

Market share: Monte Carlo integration over random coefficient $\alpha_i = \alpha + \nu_i \sigma_{\alpha} $

$$ s_{jm}(\delta_{jm}) = \frac{1}{n}\sum_i^n \frac{\exp\left( \delta_{jm} + \sigma_{\alpha} \nu_i\right)}{1 + \sum_{h \in J_m} \exp \left( \delta_{hm} + \sigma_{\alpha} \nu_i \right)} $$

Nested fixed point algorithm

"Tricky" part is in calculating the following sum per market share and per simulation draw:

$$ \sum_{h \in J_m} \exp \left( \delta_{hm} + \sigma_{\alpha} \nu_i \right) $$

  • Straightforward solution is to loop over markets, and per market sum the terms
  • However, this will be slow in interpreted languages
  • Vectorized solution: Create additional matrices to use matrix multiplication
In [14]:
# Looped Julia version 
function fillsumshare!(sumshare,numer,d)
    # jj: simulation draws 
    # ii: markets
    # tt: products within market 
     @inbounds  Threads.@threads  for jj = 1:size(numer,2)
         for ii = 1:maximum(d.IDmkt),tt = d.T[ii,1]:d.T[ii,2]
                sumshare[ii,jj] += numer[tt,jj]
            end
        end
end
Out[14]:
fillsumshare! (generic function with 1 method)

Benchmark

In [15]:
include("functions_setup.jl")
include("functions_BLP.jl")
d = prep_data()
startval = get_startval()
beta = startval[1:end-4]
sigma = startval[end-3:end]

BLPmom(beta,sigma,d)
@btime BLPmom(beta,sigma,d)
  1.382 s (9518988 allocations: 264.24 MiB)
Out[15]:
1×1 Array{Float64,2}:
 137502.32455933973

Matlab time (vectorized): 2.55s

Logit example

Econometric Model

$$ y_i = \mathbb{1}(\mathbf{x}_i \beta \geq \varepsilon_i ) $$

with $\varepsilon_i$ being drawn from a logistic distribution.

We estimate $\beta$ using observations $\mathbf{x}_i$ and $y_i$.

In [16]:
using Distributions
# Generate data 
function gen_data(β,n)
    X = [ones(Float64,n) rand(Chisq(10),n)] 
    e = rand(Logistic(),n)
    y = [X[i,:]'*β > e[i] for i in eachindex(e)]
    return y,X
end
β = [-0.3;-0.1] 
using Random ; Random.seed!(132)
y,X = gen_data(β,1000)
Out[16]:
(Bool[0, 1, 0, 0, 0, 0, 0, 0, 0, 1  …  1, 1, 1, 1, 0, 0, 0, 0, 0, 0], [1.0 10.679635923514951; 1.0 4.817428373056074; … ; 1.0 3.4148723728709314; 1.0 10.965797991953453])
In [17]:
# Likelihood function 
p(β,x) = 1.0 / (1.0 + exp(-x*β)) # logit probability
       # = cdf(Logistic(),x*β) can also be used, but is slower 

function nll(β::Array{T,1},y,X) where T<:Real
    ll = zeros(T,length(y))
    for ii in eachindex(y)
        p_i = p(β,X[ii,:]')
        ll[ii] = y[ii]*log(p_i) + (1-y[ii])*log(1.0-p_i)
    end
    return -mean(ll)
end    
nll(β,y,X)
Out[17]:
0.5291460618733443
In [18]:
# Gradient 
function Δnll!(g,β::Array{T,1},y,X) where T<:Real
    s = zeros(Float64,length(y),2)
    for ii in eachindex(y)
        p_i = p(β,X[ii,:]')
        s[ii,:] = y[ii]*(1.0 - p_i)*X[ii,:] - (1-y[ii])*p_i * X[ii,:]
    end
    g[:] = -mean(s,dims=1)
end
g = zeros(Float64,2)
Δnll!(g,β,y,X)
invalid redefinition of constant g

Stacktrace:
 [1] top-level scope at In[18]:10
In [19]:
# Optimization
using Optim
obj_fun(β) = nll(β,y,X)
grad(g,β)    = Δnll!(g,β,y,X) 

x0 = rand(2) 
res = Array{Any,2}(undef,4,1)

# Use finite differencing for gradient 
@btime res[1] = optimize(obj_fun,x0,LBFGS())
# Use own gradient definition 
@btime res[2] = optimize(obj_fun,grad,x0,LBFGS()) 
# Use Automatic differentiation 
@btime res[3] = optimize(obj_fun,x0,LBFGS();autodiff=:forward) 
# No gradient 
@btime res[4] = optimize(obj_fun,x0) 

for i = 1:4 
    println(Optim.minimizer(res[i]))
end
  17.866 ms (320926 allocations: 30.56 MiB)
  11.570 ms (288504 allocations: 27.13 MiB)
  3.901 ms (96547 allocations: 7.10 MiB)
  6.562 ms (118337 allocations: 11.27 MiB)
[-0.3240504099532686, -0.09409542033967724]
[-0.3240504127427367, -0.09409542001711972]
[-0.3240504127427371, -0.09409542001711968]
[-0.32282299576061224, -0.09424837358873464]