Seismic Inversion for the Calculation of Velocities Using the Generalized Inverse Linear Matrix, the Wave Equation, and a Non-Reflective-Boundaries Condition

Objective: This work presents the results obtained in the development of a seismic velocity inversion model. The reference times recorded on the surface are taken and using the inversion model to obtain the initial reference model (hypocenters and velocities), starting from an unknown model. Methodology: A hypothetical reference model is proposed containing 64 blocks with interval velocity, 16 recording stations on the surface, and 64 earthquakes in the center of each block. With this model, the reference arrival times are generated for each earthquake registered in each station. The inversion model is made up of two parts: the direct model, which allows calculating the arrival times of the signal registered on the surface according to the hypo-central location of the earthquake and the velocity of the P and S wave of the medium; and the inverse model, which estimates a model of the velocity of the environment and hypo-central locations of the earthquakes that are the input variables of the direct model. The direct model was developed with the wave equation, while the inverse model was developed by modifying the generalized inverse matrix by introducing a factor called “damping.” Results: The discretization model is based on the finite difference method. When estimating the values of velocity and hypo-central location with the inverse algorithm, the propagation of the wave is simulated with the direct model, and then compared with the data of reference times measured on the surface. Depending on the mean square error, we proceed to modify the mean velocities and hypocenters of the earthquakes. This process repeates Seismic Inversion for the Calculation of Velocities Using the Generalized Inverse Linear Matrix González–Veloza., J.F. Duitama–leal ., a. Castillo–lópez., l.a. Gil – Gómez., J.H. y esquiVel., R.e. Tecnura • p-ISSN: 0123-921X • e-ISSN: 2248-7638 • Vol. 24 No. 66 • Octubre Diciembre de 2020 • pp. 13-26 [ 14 ] iteratively until the calculated error is less than a tolerance of 2x10-3s2. Conclusions: It was found that the estimated values of velocity and hypocentral locations coincide well for regions closer to the surface, while for deep regions the error more significant compared to the hypothetical reference model.


INTRODUCTION
For decades, seismic tomography has been the main tool to know the inner structure of the Earth. Their origins go from characterization to seismic prospection of oil wells (Bois et al., 1971). The fundamental theory of inversion geophysics data begins with Backus & Gilbert (1967), and a decade later the first application of seismic tomography appears (Aki et al., 1977;Aki & Lee, 1976;Dziewonski et al., 1977). In Aki & Lee (1976), the arrival times of the P wave are inverted to produce velocities variation. For the purpose of this work, there are three main assumptions: a front wave plane arrives, the velocity variation can be adjusted for a regular grid, and the geometry of the rays entering each block of the grid is only affected by the velocity variation. This work is considered the first example of seismic tomography that assumes the simultaneous inversion of the hypo-central location and velocity.
During the first two decades after the work by Aki & Lee (1976), the arrival times were calculated using ray tracing. However, the wave phenomena as the dispersion of the diffraction affect the arrival times, especially in 3D models (Liu & Dong, 2012). Virieux (1986) suggests calculating arrival times simulating the seismic wave propagation through a 3D elastic medium. Tong et al. (2014) make a review of different forward model methods and conclude that the wave propagation method tends to decrease error, but better results are achieved when we use seismograms. Recently, some works have managed to invert the entire seismogram. Chen et al. (2007), Fichtner & Trampert (2011), and Tape et al. (2010 calculated the residuals of full synthetic seismograms against the observations, getting better results in different regions of study.

Direct Model
The wave equation (1) is classified as a hyperbolic differential equation (Burden et al., 2016) in three dimensions: where V is the velocity, which depends on spatial coordinates, and ϕ is the wave function. Considering some boundaries and initial conditions, such as: , 0 , where E represents the spatial coordinates, and дE is the E boundary; also f(E), g(E) are the known functions in the entire domain. In Cartesian coordinates (x, y, z) and considering integers (i, j, k, n), so that with the soace and time discretized. Using Taylor series expansion with a fourth order approximation we obtain: , for the coordinates y and z we obtain analogous expressions. By replacing the equation system (7) in (1), solving ( ! , " , # , $%& ) , and following the nomenclature proposed by Yee (1966), we obtain: Equation (8) The theoretical calculation of wave time S assumes that the wave velocities S and P are proportional in the entire medium and behave like a Poison solid (10), with proportionality constant with = √3 . Thus, ! "#$ = %&' − ( and the difference of arrival times of waves S and P is: As the station locations are fixed, T dir depends on the velocity distribution and the hypocenter of the seismic event. Therefore � ��� � ��� � � .
is the only variable that depends on the origin time.

Generalized linear inversion
In the inverse algorithm, we compared data measured on the surface with the results obtained from the simulation. Thus, we quantify the difference between the simulated values and the measured observations in the field. Then, it is possible to modify the initial values of the P wave velocity parameters in the field and the coordinates of the hypocenter. With these new initial values, we simulate the arrival times of the P wave for each one of the seismic events and compare the new results with the measured data on the surface, obtaining a new error difference, which will allow us to recalculate the new initial parameters for the next simulation.
This new methodology iterates until the difference between the simulated and the observed data on the surface is lower than the set threshold, in order to invert the velocity, the hypocenter and the origin time simultaneously and the origin time (Crosson, 1976;Ramadan, 2016). In the same way, the hypocenters inversion method was generalized in a simultaneous inversion of velocities and hypocenters (Franco et al., 2006).
In the model, we consider N seismic sources with hypocenters given by x j = longitude, y j = latitude, z j = depth, and arrival times t j , where j = {1,2,3…,N}. Additionally, we split the in L blocks space each block with velocity v j , where l = {1,2,3…,L}. These variables compose the model m 0 : where 4N + L represents the number of elements.
To determine m 0 , we consider M stations (k = 1,2,…,M) located at (x k , y k , z k , t k ) and the arrival time observed in the k-th position for the j-th source. A functional relationship T(m 0 ) allows us to reproduce the observed times according to the model m 0 . If this relation is linear, the sensitivity matrix defined as: where A j is a MX4 matrix given by: with ( , ) = + ( − 1) a and !"# is.
To find ∆m (16), we need to invert G, but G is not necessarily a square matrix. Therefore, we use the least square method to determine m, which consists of multiplying the matrix by its transpose and then we invert the result. According to this, (16) is transformed into: where ∆T is the variation between the time observed (measured on the surface) and the time simulated: .
The error between both times is calculated as follows: and ∆m is the change of the model, thus: The matrix G T G may have determinant equal zero, in which case the matrix is not invertible; this is means that the system shown in (20) does not have a unique solution, and there is at least one null eigenvalue. As proved by Schwerdtfeger (Schwerdtfeger, 1960), there is a method to obtain the subspace solution to the highest likely dimension, known as Lanczos breakdown. This technique consists of decomposing G through matrices built with eigenvalues different from zero. The generalized inverse found with this method fits better the data on the model. However, the computational method cost makes it difficult when implementing real-world applications where matrices are bigger. Furthermore, there is difficulty to formally organize the eigenvalues, due to the data uncertainty because some of them may have where ϵ is the dampening factor. If ϵ increases, the error curve presents fewer fluctuations because it depends on the iteration. This factor is found empirically, so we obtained a reasonable fit between the theoretical and observed data, and we also found that the variations of the model during each iteration are small. The W m matrix in the most straightforward cases is the identity matrix. The initial model m 0 is proposed randomly but within a range of values that make sense in the geological model. With this model, the arrival time is simulated with the direct model and is obtained the difference with the reference time ∆T 0 .
It is possible to define the smoothing matrix (Stein & Wysession, s. f.) to modify the change rate of the velocity model. For a 3D geometry composed by a rectangle with several blocks � � � � � � , the number of columns for D is L and has the following form: where the sub-matrices D x , D y , and D z represent the first derivative in the directions x, y and z with the number of rows � � � 1� � � , � � � � 1� � and � � � � � 1� , respectively. The smoothing is applied to the velocity distribution, so it does not affect the hypocenter and origin time model, hence: where I is the identity matrix with dimensions 4N x 4N D x , D y , and D z comprise the sub-matrices B x , B y , and B z , defined as: where, Usually, B x , B y , and B z , are equal, and the only difference is the separation of the columns between −1 and 1. For B x there is no separation; for B y the separation is L x ; and for B z separation is L x L y .

CONDITIONS OF NON-REFLECTIVE BOUNDARY
The boundary conditions known as Dirichlet or Neumann are the most used in the solution of partial differential equations. These boundary conditions can reflect all the energy in the boundary. However, the algorithm determine this energy reflected from any seismic source directly, causing phantom seismic events. Thus, we propose to use non-reflective boundaries, described below. Hastings et. al. (Hastings et al., 1996) proposed a method based on (Berenger, 1994a) that consists of defining potential and decompose the wave equation according to the formulation of (Yee, 1966), is used for the electromagnetic fields.
. Following the scheme used by (Yee, 1966), the following nomenclature for the indeces is defined as: and, as suggested by Gonzalez (2012), the operators of centered difference are defined as: According to the equations (30) to (32), the discretization of (27) in Taylor series with second order truncation error is: and the discretization for the system shown in (28) is: Equations (32) and (33) show the evolution rule, as well as the relation between the potential and the wave function with their neighbors.
Each component of the potential contributed to the full wave function in the corresponding direction (Figure 1). Association of ϕ x , ϕ y , and ϕ z with its respective neighbors (the potential of each case). When the space split, and the disconnected wave function is in the semi-entire points and around the corresponding potential, which is going to help define its value.

Attenuation factors
Assuming that the space formed by a square and covered by a shell of determined width, we define an inside region called PML and a contact region called interface (Figure 2). So that one wave will expand freely in the interface and will be absorbed in the PML region. So that the wave attenuation occurs, (Berenger, 1994b) proposes attenuation factors Q s1 and Q s2 in the (33) and (34), so the potential is redefined as: and for the disconnected wave function: The attenuation factors: where s = {x, y, z} and β = {i, j, k}, respectively, and where P is the length of the PML region and r(β) is the depth from the interface to the outside edge as shown in Figure 2(a) and 2(b).

Complete solution
There is a slight change in the interface between the two regions, and the wave is slowly absorbed depending on the length P. Given the calculated limit in (38), when r → 0 then q → 0. So, from the (36) and (37) it is obtained that Q s1 → 1 and Q s2 → Λ s thus the same (32) and (33) are reproduced. Then, we use the standard solution in the inside region by using finite differences (8)  Source: Authors.
where I d is the value that the index i takes in the right-side interface (Figure 2). Similarly, the equation of potential Hx is obtained in the left interface and for the rest of the potentials Hy and Hz. In other words, we have both the potential and the disconnected wave function in the PML region. In the interface we have only the potential and only the wave function in the inside (Figure 3). In most outer boundaries, we can use the Dirichlet or Neumann conditions (2).

Scope and limitations
There are three possible limitations in the proposed model. First, the velocity blocks, hypocenters, and stations must be located; then it is possible to define the space and its limits based on those locations. Additionally, the space should be small for a short computational time; conversely, ample space contains the location of elements and stations of the model and allows achieving an equilibrium. Second, it is better to have many blocks for the tomography in order to be more detailed; however, there is a limit to the information available. Finally, since the stations are located on the surface and the hypocenter is distributed in the sub-surface, there is significant uncertainty of the velocity solution the larger the distance between the stations and the hypocenter.
In the PML region, we use the solution of the disconnected wave equation (35), as well as the following equation in the interface: (a) (b)

SPACE OF THE MODEL
The model validation comprised 54 seismic events, registered in Urabá, Colombia. We developed a model in which the limits are formed by a cubic surface with height h z , width h x in heading West-East, and length h y heading South-North (Figure 4). It represents a part of the surface and sub-surface of this region, where the hypocenters of seismic events, the stations, and the velocity blocks are determined by the user. Furthermore, h x and h y are given by the location of the stations and epicenters of the seismic events (superficial projection of the hypocenters), while h z is limited by the distribution of these events.

Number of elements in the model
There is a limit for the number of elements considered in the model depending on the information available. According to the number of seismic events (N = 54), 403 observed times are determined (number of equations), including the arrival time of the P wave (TP) and the difference between the arrival times of the P and S waves (TSP). To have a better adjustment of the convergence of the generalized inversion corresponding to the observed information, the number of equations cannot exceed the number of unknown factors; otherwise, we obtain unexpected solutions. The unknown factors are the elements of the model; for every seismic event there are 4 (location x; y; z of the hypocenter and origin time), that is 4 x N = 216. They reduce the number of blocks to consider: 403 -4 x N = 187 So, the maximum number of blocks is 187. If h x , h y , and h z split up in L x , L y , and L z intervals, then a matrix forms a L = L x ×L y × L z total blocks, so that L x ×L y × L z < 187.

Uncertainty of the velocity with depth
The last limitation is the uncertainty of velocity with depth. The trajectory between the hypocenter and the station is named ray according to the Fermat principle: the way taken by a wave expanding from one point to the other is such that the time used to cover it is minimum. The more significant the depth, the littler the ray density; so, the deepest cubes have a lower possibility of being crossed by a ray; furtherore,the uncertainty of velocity is more significant.

DIRECT ALGORITHM VALIDATION AND THE NON-REFLECTIVE BOUNDARIES
A region built with dimensions 200km x 40 km x 40km and seven stations on the surface, 10km apart towards x starting from x = 20km and the seismic source at (100, 25, 25) km, with velocity V p = 5km/s as shown in Figure 5(a). Figure 5(a) shows the wave reflecting on the boundary, while in figure 5(b) the wavefield is like an open frontier. These reflections to the inside of the region in the study do not appear.

VALIDATION OF THE INVERSION PROGRAM
We built a synthetic model considering a region with dimensions 48km x 48 km x 48km divided into 64 blocks, each one characterized by velocity. The distribution is analogous to the checkerboard, with the alternate velocity 5.5 km/s and 6.5 km/s. In each block, a seismic source is at the center, and 16 stations in the center of the upper face of the superficial blocks as shown in Figure 6(a).
Using the direct algorithm, we simulate seismic events and determine the arrival times for each one (T obs ).
We run the initial inverse algorithm assuming an initial model with a uniform velocity of 5 km/s and hypocenters located in the center of the region (24 x 24 x24) km (Figure 6(b)). As expected, the initial error is significant. Figure 7(a) shows the result of 69 iterations where the error given by (21) converges after a certain number of iterations and the model does not change significantly. It was comparing the synthetic model in Figure 6(a) with the final model in Figure 7(b), it is possible to see that the difference is minimal on the surface but increases in deep parts and even more in the corners (Figure 8). Similarly, the error related to the velocity distribution increases  the deeper the hypocenter is located (Figure 7). This distribution means that a deep block has less chance to be sampled by the wave of any sources, and it will weigh less in the solution of the inversion. Therefore, a deep source location error because of the velocity error of a deep block. Figure 7(c) shows the the hypocenters in the plane xy (Slice z = 0). They are located approximately in the center of the block,like the reference model. In Figure 7(d), we have the projections of the locations over the plane xz (y = 0), where the hypocenters are approximately in the center of each block, except for the deepest blocks. The black dots show a more significant error.
A lack of information produces errors in the locations as well as errors in the estimation of velocity. This error is due to the ray paths coming to form seismic events that cut the blocks in the surface, while deeper rays only cuts deepest blocks. This lack of information from the deepest blocks creates an error in the source location of the sources, as well as higher velocities (Figure 8). In Figures 8(a)   the slices of the more superficial layers show that the velocity value is close to the value calculated by the reference model ( Figure 6(a)), but Figure 8(d) represents a deeper layer, and the values show a more significant error.

CONCLUSIONS
An inversion program was created using inverse and direct models of an algorithm with the following contributions: the inverse algorithm considers the smooth velocity variations found in the geological models, while the direct algorithm works with finite differences and non-reflective PML boundaries, which offers a useful tool to make high-quality synthetic seismograms. In order to prove the direct algorithm, we suggest making the solution of the wave equation by finite differences with non-reflective boundaries and truncation error of grade 4 or higher. This reduces the error of the theoretical arrival time without damaging the computational time.
Multiple sources run simultaneously; this would represent a significant advance. For this reason, we proposed that the sources be distinguished by their frequency, as is the case with FM radio stations. Then, the simulated stations must discriminate frequencies, so that the computational time decreases.
Better results may be achieved if the acquisition time had been more significant. Therefore, we propose to install stations in the zone for over a minimum of 5 years. We also recommended the inversion program in real-time, with this work as an initial model, and obtaining a better resolution of the velocity and a reinforce the understanding of the tectonics of the region. The inversion program can be used for stations without synchrony.