How to sample points from nonrectangular domains?

Recall the previous circle problem. We needed a square that enclosed the circle. Which of the two situations will be more efficient?

Clearly the case on the right is better. We will have fewer wasted darts in the smaller blue square area on the right.
Similarly, the case on the right in the figure below will be more accurate.
But how do we sample uniformly under the half circle the right figure?
- For a rectangle, we just used a uniform random number generator to get random x and y points.
- There are routines for generating random points that fit some given distribution, such as normally distributed random points.

Approach

Let the curve we want to generate points on be $g(x)$.
Let $G(x)=\int_a^xg(x)dx,$ where $a$ is the lower bound of integration.
We sample uniformly on $G$ and invert to find the corresponding $x=G^{-1}$.
The resulting $x$ locations will be distributed consistent with $g(x)$.
To get the $y$ location of a point corresponding to $x$, we sample a random number uniformly on $[0,,g(x)]$.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.special import erf
from scipy.optimize import fsolve
from scipy.integrate import quad

plt.rc('font', size=14)

def make_plot():

    μ = 2
    σ = 0.5
    x = np.linspace(0,4,100)
    g = 1/np.sqrt(2*np.pi*σ**2)*np.exp(-1/2/σ**2*(x-μ)**2)
    G = 0.5*(1+erf(1/σ/np.sqrt(2)*(x-μ))) 
    
    fig, (ax1, ax2) = plt.subplots(1,2, figsize=(10,5))
    ax1.plot(x,g)
    ax1.set_xlabel('x'); ax1.set_ylabel('g(x)')
    ax1.set_xlim([0,4]); ax1.set_ylim([-0.1,1])
    ax2.plot(x,G)
    ax2.set_xlabel('x'); ax2.set_ylabel('G(x)')
    ax2.set_xlim([0,4]); ax2.set_ylim([-0.1,1])
    
    def Ginv(xFind, Gwant):
        return 0.5*(1+erf(1/σ/np.sqrt(2)*(xFind-μ))) - Gwant
    for yy in np.linspace(0.01,0.99,15):
        xx = fsolve(Ginv, μ, args=yy)[0]
        ax2.arrow(4, yy, xx-4+0.1, 0,      head_width=0.02, head_length=0.1,  fc='gray', ec='gray', linestyle='dashed')
        ax2.arrow(xx, yy, 0, -yy+0.05-0.1, head_width=0.05, head_length=0.05, fc='gray', ec='gray')
    
    plt.tight_layout()

make_plot()

Example: let $g(x)$ be a normal distribution

n    = 2000                        # number of points

#---------- Define g(x)

μ    = 2                           
σ    = 0.5
maxG = 1.0
def g(x):
    return 1/np.sqrt(2*np.pi*σ**2)*np.exp(-1/2/σ**2*(x-μ)**2)

#---------- get g(x) distributed random x points

rG = np.random.rand(n)*maxG      # uniform samples on G
rx = np.empty(n)                 # corresponding x locations found next

def Ginv(xFind, Ggiven):          # inverse function: G --> x
    return 0.5*(1+erf(1/σ/np.sqrt(2)*(xFind-μ))) - Ggiven
for i, rGi in enumerate(rG):
    rx[i] = fsolve(Ginv, μ, args=rGi)
    
#---------- get y points below g(x)
    
ry = np.empty(n)
for i in range(n):
    ry[i] = np.random.rand()*g(rx[i])

#---------- plot results

/var/folders/_7/42qnnz6j17g26jhlhv2l5_580000gn/T/ipykernel_55618/596636519.py:19: DeprecationWarning: Conversion of an array with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you extract a single element from your array before performing this operation. (Deprecated NumPy 1.25.)
  rx[i] = fsolve(Ginv, μ, args=rGi)

x = np.linspace(0,2*μ,100)

fig,ax = plt.subplots(figsize=(10,5))
ax.plot(x,g(x), 'k-');
ax.plot(rx, ry, 'b.', markersize=2);
ax.plot(rx, np.random.rand(len(rx))*0.04-0.02-0.04, '.', markersize=1.0, color='red')
ax.set_xlabel('x');
ax.set_ylabel('g(x)');

Sampling nonuniformly distributed random numbers

Most random number generators provide uniformly distributed random numbers on the domain $[0,,1]$.
In the previous example, we saw how to sample random points that are normally distributed.
- Specifically, the selection of $x$ points will be normally distributed.
- This means that there are more $x$ points near $\mu$ where the peak is, than on the tails.
- In the above example, we also found corresponding random $y$ points.
- However, in this section, we are only interested in the nonuniformly distributed $x$ points.
- For example, plotting a histogram of the rx points above gives:

plt.hist(rx, 40, rwidth=0.8);

The previous method is difficult if the integration of $g(x)$ is complex or expensive and/or if the subsequent inversion of $G(x)$ is complex or expensive.
Another method for sampling uniformly distributed points from a nonrectangular region is similar to the MC integration problem.
However, we are not doing integration here, only showing another method for sampling from a nonuniform distribution.
This method does not require integration of $g(x)$ or inversion of $G(x)$.
We draw a bounding curve $F(x)$ that is easy to integrate and invert. This could be a rectangle.
- $F(x)$ is everywhere greater than $g(x)$.
Then we draw an $r_x$ point from from the bounding curve, as previously described (using G(x)).
Now we accept the sample point with probability $g(r_x)/F(r_x)$.
- This is done by sampling a random number between 0 and 1. If the random number is less than $g(r_x)/F(r_x)$, then we accept the point, otherwise we reject it.
This approach is called the rejection method.

Example

Sample random normally distributed variables using the rejection method.

def make_plot2():
    n    = 8000                 # random numbers to try (not all kept)
    μ    = 2                           
    σ    = 0.5
    def g(x):
        return 1/np.sqrt(2*np.pi*σ**2)*np.exp(-1/2/σ**2*(x-μ)**2)
    def F(x):                   # bounding function, here a simple uniform profile
        return 1
    
    #-----------------
    
    nkeep = 0                   # total number kept: incremented below
    rx_all = np.empty(0)        # storage for all points: expanded below
    
    for i in range(n):
        rx = np.random.rand()*2*μ
        Paccept = g(rx)/F(rx)
        r  = np.random.rand()        
        if r<Paccept:
            rx_all=np.append(rx_all, rx)
            nkeep += 1
            
    #-----------------
    
    plt.hist(rx_all, 40, rwidth=0.8);
    plt.xlim([0,4])

make_plot2()

Example-Poisson Process

Poisson Processes are extremely important in probability theory.
Used to model random processes in time and space
- Raindrop spacing, radioactive decay, eddies in turbulence, etc.

Raindrops

Suppose it’s raining and you listen to the time spacing of droplets.
There will be some average rate.
But sometimes droplets will be very close together, and sometimes far apart.

Question: what is the distribution of the spacing between events; and can we model this?

Find the PDF of intervals between events; that is $p(\Delta t_e)$.
Assume events occur randomly in time with a mean event rate $\lambda$ (=) $s^{-1}$. That is, $\lambda$ drops per second, on average.
Assume events are statistically independent of each other.
The probability of getting an event in some interval $dt$ is $\lambda dt$, or $dt/T$, where $T=1/\lambda$ is the mean time between events.
Consider the interval $(t_0,,t_0+\Delta t_e)$, with some sub-intervals:

|  1     2     3     4  |
|-----|-----|-----|-----|
| dt    dt    dt    dt  |

$\phantom{xxx}t_0\phantom{xxxxxxxxxxxxxxxxxx}t_0+\Delta t_e$

The probability of not getting an event in sub-interval 1 is $1-\lambda dt$.
The probability of not getting an event in sub-interval 1 and sub-interval 2 and $\ldots$ sub-interval 4 is

$$(1-\lambda dt)^4,$$

or, for $n$ sub-intervals

$$(1-\lambda dt)^n \rightarrow (1-\lambda dt)^{\Delta t_e/dt}$$

Now, let $dt\rightarrow 0$.
$P(t_{\text{next event}}>\Delta t_e) = \lim_{dt\rightarrow 0}(1-\lambda dt)^{\Delta t_e/dt}$.
Now, at $dt=0$ we have $1^{\infty}$. Work on this:
Now, at $dt=0$ we have $0/0$, so use L’Hopital’s Rule.

$$\ln P = \lim_{dt\rightarrow 0}\frac{-\lambda \Delta t_e}{(1-\lambda dt)} = -\lambda \Delta t_e.$$

Hence, $$P(t_{\text{next event}}>\Delta t_e) = e^{-\lambda \Delta t_e}.$$ Or, $$P(t_{\text{next event}}<\Delta t_e) = 1- e^{-\lambda \Delta t_e}.$$

This is a cumulative distribution function.
Differentiate it to get the probability density function:

$$p(\Delta t_e) = \lambda e^{-\lambda \Delta t_e}.$$

This is an exponential distribution.
So, to generate exponentially-distributed events, randomly sample $\Delta t_e$ values from this distribution.
- As above, draw uniform samples of $P(\Delta t_e)$ (the cumulative distribution function) on $[0,,1]$. Invert this function to get $\Delta t_e$.
- The inversion, is just solving the expression for $P(\Delta t_e)$ for $\Delta t_e$, where $P$ values are taken as random numbers between 0 and 1:
$$\Delta t_e = -\frac{\ln(1-P)}{\lambda}.$$

def make_plot3():
    λ = 1                   
    n = 2000
    rP = np.random.rand(n)
    Δt_e = -np.log(1-rP)/λ
    
    x = np.linspace(0,8,1000)
    y = λ*np.exp(-λ*x)
    
    times = np.cumsum(Δt_e[:40])
    
    plt.rc('font', size=14);
    fig,(ax1,ax2) = plt.subplots(1,2,figsize=(10,5))
    
    ax1.hist(Δt_e, 80, density=True, rwidth=0.8);
    ax1.plot(x,y, linewidth=3);
    ax1.set_xlabel(r'$\Delta t_e$');
    ax1.set_ylabel(r'$p(\Delta t_e)$');
    ax1.set_xlim([0,5])
    
    ax2.plot(times, np.zeros(len(times)), '.', markersize=5)
    for t in times:
        ax2.axvline(x=t, ymin=0.45, ymax=0.55, linewidth=1, color='gray')
    ax2.set_xlabel('time')
    ax2.set_ylim([-1,1])
    
    plt.tight_layout()

make_plot3()