Reversal of Fortune: Geography and Institutions in the Making of the Modern World Income Distribution.. Daron Acemoglu, Simon Johnson, and James A. Robinson. Quarterly Journal of Economics, 117, November 2002: pp. 1231-1294.,

In this 2002 article, Acemoglu, Johnson, and Robinson argue that countries that were more wealthy and urbanized in the 1500s saw their fortunes reverse in the subsequent centuries. Countries such as Rwanda and Tanzania were high-density areas in the 1500s but in the 20th century had low GDP per capita. The authors argue that this is because European colonialism settled more in areas that were less developed in the 1500s, but then went on to become strong economies. A simple bivariate relationship motivates their argument.

. use reversal.dta, clear

. describe

Contains data from reversal.dta
  obs:            91                          
 vars:            10                          26 Nov 2018 22:42
 size:         7,553                          
              storage   display    value
variable name   type    format     label      variable label
countryn        str20   %-9s                  Country Name
shortnam        str3    %-9s                  Country Name
logpgp95        double  %10.0g                Log GDP per Capita in 1995
logem4          double  %10.0g                
urbz1995        double  %10.0g                Urbanization in 1995 (Proportion Population
                                                in Large Towns)
lpd1500s        double  %10.0g                Log Population Density 1500s
cu1500          double  %10.0g                Urbanization in 1000s (Chandler)
sjb1500         double  %10.0g                Urbanization in 1500s (Bairoch)
sjb1000         double  %10.0g                Urbanization in 1000s (Bairoch)
continent       long    %8.0g      continent
Sorted by: 

Simple Regression

Use the regress or reg command to run a regression.

. regress logpgp95 lpd1500s 

      Source │       SS           df       MS      Number of obs   =        91
─────────────┼──────────────────────────────────   F(1, 89)        =     46.12
       Model │  30.3661927         1  30.3661927   Prob > F        =    0.0000
    Residual │  58.5990948        89  .658416795   R-squared       =    0.3413
─────────────┼──────────────────────────────────   Adj R-squared   =    0.3339
       Total │  88.9652874        90  .988503194   Root MSE        =    .81143

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │  -.3766786   .0554659    -6.79   0.000    -.4868881    -.266469
       _cons │   8.090425   .0887273    91.18   0.000     7.914126    8.266725

In a simple regression, a scatter plot can show the data cleanly

. twoway scatter logpgp95 lpd1500s ||  lfit logpgp95 lpd1500s, ///
>    ytitle(1950 GDP per Capita) ///
>    xtitle(1500s Population Density) ///
>    legend(off)

. graph export reversal_fit.png, width(2000) replace
(file reversal_fit.png written in PNG format)
The Best Fit Regression Line

The Best Fit Regression Line

Different options:

. regress logpgp95 lpd1500s, beta

      Source │       SS           df       MS      Number of obs   =        91
─────────────┼──────────────────────────────────   F(1, 89)        =     46.12
       Model │  30.3661927         1  30.3661927   Prob > F        =    0.0000
    Residual │  58.5990948        89  .658416795   R-squared       =    0.3413
─────────────┼──────────────────────────────────   Adj R-squared   =    0.3339
       Total │  88.9652874        90  .988503194   Root MSE        =    .81143

    logpgp95 │      Coef.   Std. Err.      t    P>|t|                     Beta
    lpd1500s │  -.3766786   .0554659    -6.79   0.000                -.5842314
       _cons │   8.090425   .0887273    91.18   0.000                        .

Multiple Regression

Multiple regression -- add more terms.

. reg logpgp95 lpd1500s sjb1000

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(2, 31)        =     21.66
       Model │  14.2886648         2   7.1443324   Prob > F        =    0.0000
    Residual │  10.2267689        31  .329895772   R-squared       =    0.5828
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5559
       Total │  24.5154337        33  .742891931   Root MSE        =    .57437

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │   -.321548   .0551574    -5.83   0.000    -.4340423   -.2090538
     sjb1000 │  -.0031051   .0284966    -0.11   0.914    -.0612242     .055014
       _cons │   8.642748   .1485872    58.17   0.000     8.339703    8.945794

What if explanatory variable is a categorical?

. tab continent

  continent │      Freq.     Percent        Cum.
     Africa │         45       49.45       49.45
   Americas │         32       35.16       84.62
       Asia │         12       13.19       97.80
    Oceania │          2        2.20      100.00
      Total │         91      100.00
. reg logpgp95 lpd1500s sjb1000 continent

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(3, 30)        =     13.97
       Model │  14.2889201         3  4.76297337   Prob > F        =    0.0000
    Residual │  10.2265136        30  .340883787   R-squared       =    0.5829
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5411
       Total │  24.5154337        33  .742891931   Root MSE        =    .58385

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │  -.3212955   .0568227    -5.65   0.000     -.437343    -.205248
     sjb1000 │  -.0028831   .0300819    -0.10   0.924    -.0643186    .0585524
   continent │   .0041495   .1516197     0.03   0.978    -.3054991    .3137982
       _cons │   8.632706   .3968173    21.75   0.000     7.822297    9.443115

(what's wrong with this?)

. reg logpgp95 lpd1500s sjb1000 i.continent

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(5, 28)        =     11.84
       Model │  16.6431982         5  3.32863964   Prob > F        =    0.0000
    Residual │  7.87223551        28  .281151268   R-squared       =    0.6789
─────────────┼──────────────────────────────────   Adj R-squared   =    0.6215
       Total │  24.5154337        33  .742891931   Root MSE        =    .53024

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │  -.4108716   .0664944    -6.18   0.000    -.5470792   -.2746641
     sjb1000 │    .017904   .0287151     0.62   0.538    -.0409162    .0767241
   continent │
   Americas  │  -.9076474   .3537367    -2.57   0.016    -1.632244   -.1830505
       Asia  │  -.5490911   .3531067    -1.56   0.131    -1.272397    .1742151
    Oceania  │  -.3875177    .548546    -0.71   0.486    -1.511163    .7361279
       _cons │   9.260819   .3231567    28.66   0.000     8.598863    9.922776


Create a new variable in dataset

. predict yhat
(option xb assumed; fitted values)
(57 missing values generated)

. summarize logpgp95 yhat 

    Variable │        Obs        Mean    Std. Dev.       Min        Max
    logpgp95 │         91    7.918999     .994235   6.109248   10.21574
        yhat │         34    8.614128    .7101685   7.443351   10.37231

Or residuals

. predict residuals, residuals
(57 missing values generated)

. summarize logpgp95 yhat residuals

    Variable │        Obs        Mean    Std. Dev.       Min        Max
    logpgp95 │         91    7.918999     .994235   6.109248   10.21574
        yhat │         34    8.614128    .7101685   7.443351   10.37231
   residuals │         34   -2.74e-09    .4884185  -1.083866   .8918947

Formatting Regression Tables

Instead of copy-pasting Stata output, use designated commands:

. reg logpgp95 lpd1500s sjb1000

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(2, 31)        =     21.66
       Model │  14.2886648         2   7.1443324   Prob > F        =    0.0000
    Residual │  10.2267689        31  .329895772   R-squared       =    0.5828
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5559
       Total │  24.5154337        33  .742891931   Root MSE        =    .57437

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │   -.321548   .0551574    -5.83   0.000    -.4340423   -.2090538
     sjb1000 │  -.0031051   .0284966    -0.11   0.914    -.0612242     .055014
       _cons │   8.642748   .1485872    58.17   0.000     8.339703    8.945794

. esttab

lpd1500s           -0.322***

sjb1000          -0.00311   

_cons               8.643***
N                      34   
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

. eststo clear // clear estimated tables

. eststo: reg logpgp95 lpd1500s

      Source │       SS           df       MS      Number of obs   =        91
─────────────┼──────────────────────────────────   F(1, 89)        =     46.12
       Model │  30.3661927         1  30.3661927   Prob > F        =    0.0000
    Residual │  58.5990948        89  .658416795   R-squared       =    0.3413
─────────────┼──────────────────────────────────   Adj R-squared   =    0.3339
       Total │  88.9652874        90  .988503194   Root MSE        =    .81143

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │  -.3766786   .0554659    -6.79   0.000    -.4868881    -.266469
       _cons │   8.090425   .0887273    91.18   0.000     7.914126    8.266725
(est1 stored)

. eststo: reg logpgp95 lpd1500s sjb1000

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(2, 31)        =     21.66
       Model │  14.2886648         2   7.1443324   Prob > F        =    0.0000
    Residual │  10.2267689        31  .329895772   R-squared       =    0.5828
─────────────┼──────────────────────────────────   Adj R-squared   =    0.5559
       Total │  24.5154337        33  .742891931   Root MSE        =    .57437

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │   -.321548   .0551574    -5.83   0.000    -.4340423   -.2090538
     sjb1000 │  -.0031051   .0284966    -0.11   0.914    -.0612242     .055014
       _cons │   8.642748   .1485872    58.17   0.000     8.339703    8.945794
(est2 stored)

. eststo: reg logpgp95 lpd1500s sjb1000 i.continent

      Source │       SS           df       MS      Number of obs   =        34
─────────────┼──────────────────────────────────   F(5, 28)        =     11.84
       Model │  16.6431982         5  3.32863964   Prob > F        =    0.0000
    Residual │  7.87223551        28  .281151268   R-squared       =    0.6789
─────────────┼──────────────────────────────────   Adj R-squared   =    0.6215
       Total │  24.5154337        33  .742891931   Root MSE        =    .53024

    logpgp95 │      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    lpd1500s │  -.4108716   .0664944    -6.18   0.000    -.5470792   -.2746641
     sjb1000 │    .017904   .0287151     0.62   0.538    -.0409162    .0767241
   continent │
   Americas  │  -.9076474   .3537367    -2.57   0.016    -1.632244   -.1830505
       Asia  │  -.5490911   .3531067    -1.56   0.131    -1.272397    .1742151
    Oceania  │  -.3875177    .548546    -0.71   0.486    -1.511163    .7361279
       _cons │   9.260819   .3231567    28.66   0.000     8.598863    9.922776
(est3 stored)

. esttab

                      (1)             (2)             (3)   
                 logpgp95        logpgp95        logpgp95   
lpd1500s           -0.377***       -0.322***       -0.411***
                  (-6.79)         (-5.83)         (-6.18)   

sjb1000                          -0.00311          0.0179   
                                  (-0.11)          (0.62)   

1.continent                                             0   

2.continent                                        -0.908*  

3.continent                                        -0.549   

4.continent                                        -0.388   

_cons               8.090***        8.643***        9.261***
                  (91.18)         (58.17)         (28.66)   
N                      91              34              34   
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

With more formats (http://repec.org/bocode/e/estout/esttab.html)

. esttab, label b(2) se(a1) r2 ar2

                              (1)             (2)             (3)   
                     Log GDP~1995    Log GDP~1995    Log GDP~1995   
Log Population~1500s        -0.38***        -0.32***        -0.41***
                           (0.06)          (0.06)          (0.07)   

Urbanization in 10~)                        -0.00            0.02   
                                           (0.03)          (0.03)   

Africa                                                       0.00   

Americas                                                    -0.91*  

Asia                                                        -0.55   

Oceania                                                     -0.39   

Constant                     8.09***         8.64***         9.26***
                           (0.09)           (0.1)           (0.3)   
Observations                   91              34              34   
R-squared                   0.341           0.583           0.679   
Adjusted R-squared          0.334           0.556           0.622   
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001