## About Summit

# Mathematics - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection## Variants of the Consecutive Ones Property: Algorithms, Computational Complexity and Applications to Genomics

Genome mapping problems in bioinformatics can be modelled as problems of finding sequences of vertices in hypergraphs, subject to consecutivity constraints. These problems are related to the \emph{consecutive ones property}, a well-studied structural property on binary matrices. Many variants of this property have been introduced to include subtleties in the model, such as upper bounds on the number of times a vertex may appear in a sequence, the distance of the input from having the property, and confidence values for the consecutivity constraints. Most problems involving these variants are intractable, and efficient solutions call for restrictions on the structure of the input, exponential time algorithms, or approximations. The following document discusses these problems, from both a theoretical perspective, and from the genomics point of view.We encounter two main classes of problems, divided into models which account for repeated elements in genomes, and those which do not. Orthogonally, we divide the problems into decision and optimization questions. For models with repeats, we discuss when the given input can be used to reconstruct the genome map of interest, and if we can discard a minimal set of encoded consecutivity information from the model to obtain an input which can be used to reconstruct this genome map. We also discuss the problem of ambiguity introduced by repeats, and introduce the concept of \emph{repeat spanning intervals} in order to address them. We show that the problem of optimizing over the set of repeat spanning intervals is NP-hard in general, and give an algorithm when the intervals are small. In models without repeated elements, we discuss the problem of optimization byfinding a solution that minimizes the distortion in the consecutivity information, by generalizing the concepts of bandwidth and minimum linear arrangement to hypergraphs. We design approximation algorithms for two versions of the latter problem, with an approximation ratio of $O\left(\sqrt{\log n}\log\log n\right)$.Finally, we provide details of implementations of some of the methods developed for genome mapping and scaffolding on ancestral genomes. We include results on real data for the genome of the Black Death agent, and for ancestral \textit{Anopheles} mosquitoes.

## Using Viral Dynamics to connect clinical markers of disease progression to sequence evolution during HIV-1 infection

HIV-1 remains a global health challenge, with over 35 million people infected. The high rates of turnover and evolutionary adaptability exhibited by HIV-1 pose a particular challenge to HIV-1 vaccine development. We developed a dynamic model of HIV-1 infection that uses equilibration, adaptation, and inheritance to model the initial infection and successive generations of viral lineages. The model allows viruses to generate new lineages in proportion to their viral load. These lineages compete for immune cells to infect. We use this model to demonstrate how viruses with a sufficiently high mutation rate could overcome the immune system, even when most changes are expected to be detrimental to viral fitness. We have calibrated our model to match averages of CD4+ T cells/mm^3 and HIV-1 RNA/ml derived from 91 HIV-infected individuals studied longitudinally during various disease stages. We also explore the use of phylogenies to validate the underlying composition of viral load.

## On excluded minors and biased graph representations of frame matroids

A biased graph is a graph in which every cycle has been given a bias, either balanced or unbalanced. Biased graphs provide representations for an important class of matroids, the frame matroids. As with graphs, we may take minors of biased graphs and of matroids, and a family of biased graphs or matroids is minor-closed if it contains every minor of every member of the family. For any such class, we may ask for the set of those objects that are minimal with respect to minors subject to not belonging to the class - i.e., we may ask for the set of excluded minors for the class. A frame matroid need not be uniquely represented by a biased graph. This creates complications for the study of excluded minors. Hence this thesis has two main intertwining lines of investigation: (1) excluded minors for classes of frame matroids, and (2) biased graph representations of frame matroids. Trying to determine the biased graphs representing a given frame matroid leads to the necessity of determining the biased graphs representing a given graphic matroid. We do this in Chapter 3. Determining all possible biased graph representations of non-graphic frame matroids is more difficult. In Chapter 5 we determine all biased graphs representa- tions of frame matroids having a biased graph representation of a certain form, subject to an additional connectivity condition. Perhaps the canonical examples of biased graphs are group-labelled graphs. Not all biased graphs are group-labellable. In Chapter 2 we give two characterisations of those biased graphs that are group labellable, one topological in nature and the other in terms of the existence of a sequence of closed walks in the graph. In contrast to graphs, which are well-quasi-ordered by the minor relation, this characterisation enables us to construct infinite antichains of biased graphs, even with each member on a fixed number of vertices. These constructions are then used to exhibit infinite antichains of frame matroids, each of whose members are of a fixed rank. In Chapter 4, we begin an investigation of excluded minors for the class of frame ma- troids by seeking to determine those excluded minors that are not 3-connected. We come close, determining a set E of 18 particular excluded minors and drastically narrowing the search for any remaining such excluded minors.

## Generalized thrackles and graph embeddings

A thrackle on a surface X is a graph of size e and order n drawn on X such that every two distinct edges of G meet exactly once either at their common endpoint, or at a proper crossing. An unsolved conjecture of Conway (1969) asserts that e≤n for every thrackle on a sphere. Until now, the best known bound is e≤1.428n. By using discharging rules we show that e≤1.4n. Furthermore we show that the following are equivalent: G has a drawing on X where every two edges meet an odd number of times (a generalized thrackle); G has a drawing on X where every two edges meet exactly once (a one-thrackle); G has a special embedding on a surface whose genus differs from the genus of X by at most one.

## Barycentric Rational Interpolation and Spectral Methods

For the numerical solution of differential equations spectral methods typically give excellent accuracy with relatively few points (small N), but certain numerical issues arise with larger N. This thesis focuses on spectral collocation methods, also known as pseudo-spectral methods, that rely on interpolation at collocation points. A relatively new class of interpolants will be considered, namely the Floater-Hormann family of rational interpolants. These interpolants and their properties will be studied, including their use in differentiation by means of differentiation matrices based on rational interpolants in the barycentric form. Then, consideration will be given to the solution of singularly perturbed boundary value problems through the use of boundary layer resolving coordinate transformations. Finally, coupled systems of singularly perturbed boundary value problems will be investigated, though only with the standard Chebyshev collocation method.

## Analysis of an Age-Structured Model of Chemotherapy-Induced Neutropenia

Neutropenia is a blood disorder characterized by low levels of neutrophils and is a common side effect of chemotherapy. Administration of granulocyte-colony stimulating factor (G-CSF) is a typical treatment that helps stabilize the level of neutrophils. However, it is not known if changes to the frequency and dosage of administered G-CSF will lead to better treatment. We analyze a nonlinear hyperbolic system of coupled integro-differential equations aimed at quantifying the effect of treatment plans on patients with chemotherapy-induced neutropenia. We show how this age-structured model can be decoupled for short time. We then investigate the equivalence of an integral equation with a related nonlinear PDE and prove existence and uniqueness of solutions of the integral equation. This is used to finally demonstrate existence and uniqueness of solutions to the full PDE system.

## The Bipartite Boolean Quadratic Programming Problem

We consider the Bipartite Boolean Quadratic Programming Problem (BQP01), which generalizes the well-known Boolean Quadratic Programming Problem (QP01). The model has applications in graph theory, matrix factorization and bioinformatics, among others. BQP01 is NP-hard. The primary focus of this thesis is on studying algorithms and polyhedral structure from a linearization of its integer programming formulation. We show that when the rank of the associated m x n cost matrix Q is fixed, BQP01 can be solved in polynomial time. Further, for the rank one case, we provide an O(n log n) algorithm. The complexity reduces to O(n) with additional assumptions. Further, we observe that BQP01 is polynomially solvable if m = O(log n). Similarly, when the minimum negative eliminator of Q is O(log n), the problem is shown to be polynomially solvable. We then develop several heuristic algorithms for BQP01 and analyze them using domination analysis. First, we give a closed-form formula for the average objective function value A(Q, c, d) of all solutions. Computing the median objective function value however is shown to be NP-hard. We prove that any solution with objective function value no worse than A(Q, c, d) dominates at least 2^(m+n-2) solutions and provide an upper bound for the dominance ratio of any polynomial time approximation algorithms for BQP01. Further, we show that some powerful local search algorithms could produce solutions with objective function value worse than A(Q, c, d) and propose algorithms that guarantee a solution with objective function value no worse than A(Q, c, d). Finally, we study the structure of the polytope BQP^(m;n) resulting from linearization of BQP01. We develop various approaches to obtain families of valid inequalities and facet-defining inequalities of BQP^(m,n) from those of other related polytopes. These approaches include rounding coeffcients, using the linear transformation between BQP^(m,n) and the corresponding cut polytope, and applying triangular elimination.

## Enumeration of Set Partitions Refined by Crossing and Nesting Numbers

The standard representation of set partitions gives rise to two natural statistics: a crossing number and a nesting number. Chen, Deng, Du, Stanley, and Yan (2007) proved, via a non-trivial bijection involving sequences of Young tableaux that these statistics have a symmetric joint distribution. Recent results by Marberg (2013) has lead to algorithmic tools for the enumeration of set partitions with fixed crossing number and fixed nesting number. In this thesis we further consider set partitions refined by these two statistics. These sub-classes can be recognized by finite automata, and consequently have rational generating functions. Our main contribution is an investigation into the structure of the automata, the corresponding adjacency matrices, and the generating functions.

## Optimization Methods for Sparse Approximation

In the last two decades, there are numerous applications in which sparse solutions are concerned. Mathematically, all these applications can be formulated into the L0 minimization problems. In this thesis, we first propose a novel augmented Lagrangian (AL) method for solving the L1-norm relaxation problems of the original L0 minimization problems and apply it to our proposed formulation of sparse principal component analysis (PCA). We next propose penalty decomposition (PD) methods for solving the original L0 minimization problems in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. For the AL method, we show that under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the AL subproblems, and establish their global and local convergence. Moreover, we apply the AL method to our proposed formulation of sparse PCA and compare our approach with several existing methods on synthetic, Pitprops, and gene expression data, respectively. The computational results demonstrate that the sparse {\it principal components} (PCs) produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. For the PD methods, under some suitable assumptions, we establish some convergence results for both inner (the BCD method) and outer (the PD method) iterations, respectively. We test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that when solutions of same cardinality are sought, our approach applied to the L0-based models generally has better solution quality and/or speed than the existing approaches that are applied to the corresponding L1-based models. Finally, we adapt the PD method to solve our proposed wavelet frame based image restoration problem. Some convergence analysis of the adapted PD method for this problem are provided. Numerical results show that the proposed model solved by the PD method can generate images with better quality than those obtained by either analysis based approach or balanced approach in terms of restoring sharp features as well as maintaining smoothness of the recovered images.

## Quantifying the Effect of Open-Mindedness on Opinion Dynamics and Advertising Optimization

Group opinion dynamics shape our world in innumerable ways. Societal aspects ranging from the political parties we support to the economic decisions we make in our daily lives are all directly af- fected in some way by group opinion dynamics. This makes understanding and potentially being able to predict the complex inter-relationships between individuals’ opinions and group opinion dynam- ics invaluable both scientifically and economically. We propose an aggregation model incorporating ingroup-outgroup dynamics, as well as media influence, to establish potential causal relationships between various types of social interaction and social phenomena such as the occurrence of group consensus and the hostile media effect. We further apply our model to simplified commercial appli- cations relating to advertisement optimization to determine the optimal proportion of a population to target with advertising in order to maximize opinion shift while fixing cost.