DIMACS is very proud for the achievements of the 2020 DIMACS REU students during their summer research experience. This page includes the abstracts of the final papers that the students submitted to the program, summarizing their summer research.
The project summaries of the students who participated in the Barnard Computer Science - Summer Research program can be found here.
Abstract: In this project, we explored the notion of polynomial mutation in the context of algebraic geometry. It is conjectured that these polynomials can tell us something about the smoothing components of deformations of toric varieties. We worked on a simple combinatorial proof that an irreducible 0-mutable polynomial is rigid maximally mutable. An algorithm for reducing polynomials was developed, although it was demonstrated that such an algorithm cannot be guaranteed to find 0-mutable polynomials efficiently. Finally, we considered different notions of polytope mutation, and proved an equivalence between different definitions. Using this equivalence, we showed a fact about the number of possible mutations of a polynomial, up to isomorphisms of the mutations' convex hulls.
Abstract: We build on previous work by Teresa Ngo and Hannah Fell to help construct career pathways and determine the criminal implications of digital forensics. Federal Law Enforcement Training Center needed additional knowledge to understand the training, certification, and skills necessary at different levels of a career. We looked at the certification providers CompTIA and Global Information Assurance Certification to find certifications relevant to digital forensics. We did text analysis of Thomson Reuters Westlaw criminal case documents to determine the relevance of digital forensics. This project has not concluded, so extending the time period of Westlaw document analysis would make viewing trends available and improve understanding of cybersecurity and digital forensics career pathways.
Abstract: The current pandemic as a result of COVID-19 has created a new sense of reality. Across the nation, state governors have held press conferences related to the pandemic. Topic modeling, word frequencies, and sentiment analysis were conducted on press conference transcripts from various states. By incorporating these techniques into the study, weekly themes and word frequencies were revealed as well as monthly analyses of sentiment pertaining to the data. With these results, the researchers are better able to develop a comprehensive resource allocation model.
Abstract: Laser-Based Additive Manufacturing (LBAM) is a promising process in manufacturing that allows for capabilities in producing complex parts with multiple functionalities for a large array of engineering applications. Melt pool is a defining characteristic of the LBAM process and known defects of porosity in the melt pool and LBAM process has prevented the expansive adoption of LBAM. High-speed monitors that can capture the LBAM process have created the possibility for in-situ monitoring for defects and abnormalities. This paper focuses on augmenting knowledge of the relation between the LBAM process and porosity and providing models that could efficiently, accurately, and consistently predict defects and anomalies in-situ for the LBAM process. Two models are presented in this paper, Random Forest Classifier and Early Stopping Neural Network, which are used to classify pyrometer images and categorize if those images will result in defects. Both methods can achieve over 99% accuracy in an efficient manner, which would create an in-situ method for quality prediction in the LBAM process.
Abstract: In the recent light of the SARS-CoV-2 pandemic, the United States' response has proven to be incredibly poor with respect to policy decisions made on the local and federal levels. With deaths totaling at 142,755, this is more than the number of battle deaths in the Vietnam War, Korean War, and World War I combined. One of the easiest ways to prevent such an outbreak from reoccurring has shown to be a stronger implementation of policies on all governmental levels. Three categories which require more heavy implementation include Test, Track, and Treat (TTT), Distance, Isolate, Mask (DIM), and Prepare & Organize, Lead & Inform, Coordinate & Yaager results (POLICY). The American response was not sufficient enough to prevent mass infections and fatalities, and analysis of state and federal policies with respect to COVID-19 containment outstandingly point to a need for more drastic policy decisions.
Abstract: We studied the problem of deterministic vertex coloring in the semi-streaming model. The (Δ + 1)-coloring problem is easily solved in the classical setting via a greedy coloring, where Δ is the maximum degree. In the streaming model, we are limited to a memory of size O(n) and can only read off edges of the graph one at a time. Although there exist randomized algorithms solving the (Δ + 1)-coloring problem in the semi-streaming model, it is unknown whether there exists a deterministic such algorithm. We establish two deterministic O(poly(Δ))-coloring algorithms towards this; the first produces an O(Δ3)- coloring in two passes, while the second produces an O(Δ2)-coloring in O(log Δ) passes. We also briefly touch upon the related independent set problem, and prove a deterministic Ω(n/Δ2)-independent set single-pass algorithm.
Abstract: The ball-recycling problem, a variant of the common balls-and-bins problem, has m balls thrown into n bins, typically according to some probability distribution p. Repeatedly, one bin is selected, and its balls are removed and rethrown according to p. We want to maximize the expected number of balls rethrown at every step. I investigated worst-case performance for online ball-recycling algorithms, that is, ball-recycling when the balls are rethrown not according to p, but according to some adversary. The goal was to find an online algorithm that gets as close as possible to the offline optimum algorithm over all possible adversaries.
Abstract: Let X and Y be two finite-area hyperbolic Riemann surfaces. Since X and Y are both hyperbolic, we know that they must both be uniformized by ℍ ; however, the maps ℍ → X and ℍ → Y are not in general of finite degree. In general, guaranteeing that X and Y share conformally equivalent finite covers is too much to hope. The Ehrenpreis conjecture states that we can achieve something close: in particular, for any K>1, we can find finite covers X' → X and Y' → Y such that X' and Y' are K-quasiconformally equivalent; in other words, we can always find finite covers which have little distortion of angles relative to each other. The conjecture was recently settled in the case of closed surfaces in 2015 by Kahn and Markovic, but there has been no such resolution for punctured Riemann surfaces, and in fact the truth of the conjecture remains contested. The modular surface is a model example of a punctured Riemann surface of finite area. In this project, we numerically find and study finite covers of the modular surface constructed by gluing together immersed hyperbolic pants and "degenerate pants", in which we allow one cuff to degenerate into a cusp, subject to similar technical constraints. We hope that the work will help in generating insight towards an angle of attack in a proof of the cusped Ehrenpreis conjecture inspired by the methods used in the proof of the case of closed surfaces.
Abstract: We systematically determine infinite circle packings that arise from 8 and 9-vertex polyhedra. We then classify packings where all radii are the reciprocals of integers.
Abstract: AES is one of the best known and most widely used block ciphers to date. However, the primary argument towards the security of AES is simply the fact that it has not been broken in practice. We present a new way of theoretically analyzing the security of AES and related block ciphers by looking at the expansion of a related Cayley graph. More specifically, we provide a generalization of a single round of AES, which we consider as a generating set for the symmetric group. This then gives a Cayley graph for which one could theoretically explicitly calculate the spectral gap. We explicitly perform this calculation for small security parameters, and we compare these results to related expansion results for the ideal cipher. We hope that this analysis, when continued, will either give some indication of the security of AES or will lead to an explicit attack.
Abstract: The Riemann Hypothesis is a famous unsolved problem in mathematics first studied by Bernhard Riemann in 1859 which is important for understanding the distribution of prime numbers. In recent years, the development of type theory and automated theorem proving has made possible a new level of rigour in the ability for computers to check mathematical proofs. In this project, we formalize the statement of the Riemann Hypothesis in the Lean Theorem Prover using elementary methods in analysis and study the minimal algebraic requirements necessary to construct the statement of the Riemann Hypothesis.
Abstract: Zero-knowledge proof systems are of great interest to cryptographers, since they allow for the sharing of knowledge of secret information without divulging the actual secret. Interactive versions of these proof systems contain fundamental cryptographic problems such as discrete log and decisional Diffie-Helman. We analyzed the class of non-interactive versions of these proof systems, NISZK and a variant where the verfier in the system is limited to being a log-space machine, NISZKL. We show that EA for NC0 circuits is complete for NISZKL under AC0-many one reductions. We also show that a problem concerning the minimum time-bounded Kolmogorov complexity of strings, MKTP, is hard for co-NISZKL under P/poly many-one reductions by reduction from EA. MKTP is a candidate NP-intermediate problem and previously had only been shown to be hard under NC0 reductions for a subclass of P.
Abstract: I surveyd equilibrium arrangements, proper as well as pseudo, of four point particles on the unit circle which interact pairwise with a repulsive power law force. The bifurcation diagram which jointly exhibits all these equilibrium arrangements as functions of s features four obvious "universal" equilibria, which do not depend on s, and many more continuous families of s-dependent non-universal equilibria, some of which were known before but many were not. All equilibria are also planar special cases of the analogous four-particle equilibrium problem on the 2-sphere, some of which are also studied.
Abstract: Single cell sequencing is a new field in biology that allows for the extraction of tumor cell populations, and subsequent analysis on this data leads to a better understanding of the differences in cells from these populations. The data in regard to reads and mutant allele coverage is critical to understanding how phenotype of cells are informed. Data extracted from thousands of cells yielded wide ranges of depth of UMI (Unique Molecular Identifier) and mutant UMI reads for single cells and base pairs. The analysis showed similar patterns of depth between the genes studied with low mutant coverage, which is not unexpected with single cell data. The results of having low data coverage results in less statistical power when trying to create confidence intervals for variant allele frequency. Further analysis using known mutation data showed that a previously developed probability function can result in a negative value when variant allele frequency is greater than .5. Thus, future directions of this project will focus on a development of a probability model based on our data.
Abstract: Feder and Subi conjectured that for any 2-coloring of edges of the hypercube Qn, there always exists a pair of antipodal vertices connected by a shortest path which changes color at most once. Leader and Long proved that there always exists a path between antipodal vertices with at most n/2 changes and Dvorak improved this bound to (3/8+o(1))n. We give some partial results which may lead to further improvements of Dvorak's upper bound.
Abstract: Most data informatics researchers considered Data visualization to be just the graphic representation. Here we interacted with and represented a data set of Danish folklore by creating human understandable stories which combine topics of the different stories in the data set.
Abstract: We study sum edge coloring for multigraphs, motivated by the bi-processor scheduling problem; the jobs must be scheduled in such a way that no two jobs that share a processor may run at the same time and the average completion time is minimized. If edges are jobs and vertices processors, then the scheduling problem is precisely edge coloring with the intention to minimize the sum over all colors used. We investigate two variants of sum coloring: the Minimum Edge Chromatic Sum (MECS) problem and the Optimum Cost Chromatic Partition (OCCP) problem. The former aims to find an edge coloring that minimizes the sum of the maximum color on each edge while the latter the sum of all colors used. Despite similarity of the two objectives, they are quite different in complexity. This difference is most observable when restricted to trees; we show that the former remains NP-hard, while the latter is polynomially solvable.
Abstract: A knot is an embedding of S1 into S3. Knots are studied up to the equivalence relation of concordance; two knots are said to be concordant if their disjoint union is the boundary of an annulus in S^3 \times I. Heegaard Floer homology is a suite of invariants of knots and 3-manifolds; the knot version categories the classical Alexander polynomial. The variant we studied assigns to a knot K a filtered and graded chain complex over F2[U,U-1]. This complex admits various chain maps which carry geometric information, such as the Sarkar involution associated to a Dehn twist around the knot, and skew-filtered chain map iotaK associated to a natural symmetry on the construction of Heegaard Floer homology. Our project dealt with the Heegaard Floer homology of pretzel knots P(-2,m,n) with m and n positive odd integers. These knots have relatively simple Heegaard Floer knot homology, and we were therefore able to give a combinatorial computation of the chain map iotaK. Since the map iotaK carries geometric information, this led to new computations of invariants of knot concordance.
Abstract: Impedance control is a type of interaction control method used to control the force of resistance within a robot from an external manipulator (e.g. a human). Impedance control has motion inputs and force outputs. Impedance control can be used as a means to improve the physical and dynamic relationship between robot and manipulator and has applications in rehabilitation robots and vehicles due to their interaction with humans (fragile manipulators). The goal of this project is to work with the Baxter robot, created by Rethink Robotics, to implement an impedance controller and then a spike-based impedance controller for the limbs of the Baxter robot. The spike-based impedance controller will be executed using a spiking neural network (SNN) to control the robot. An SNN is a type of artificial neural network that is more biologically plausible and its neurons only fire when an action potential reaches a certain value.
Abstract: In the problem of object rearrangement, we have n items in an initial configuration and we want to move them to certain goal positions. In particular, we are considering the case where each item has its own respective goals and the objects may be initially positioned atop each other's goals. It has been shown how by considering the items' relationships as a dependency graph where each item is a vertex and there is an directed edge (u,v) if and only if v's initial position intersects u's goal. Finding the minimum amount of items that need to be moved to temporary locations, or buffers, in order to move all objects to their goals is then shown to be equivalent to finding a minimum feedback vertex set for this dependency graph, as the collisions which are unsolvable without buffers are cycles. The goal of this project was to look at the related problem of the minimum number of active buffers required. In other words, the minimum amount of items that need to be simultaneously stored in temporary locations in order to resolve the graph.
Abstract: The celebrated Brooks' theorem in graph theory states that every connected graph which is not a clique nor an odd-cycle can be Δ colored; here Δis the maximum degree of the graph and we use n to denote the number of vertices. We investigate the possibility of obtaining a Δ-coloring via a space-efficient streaming algorithm. That is, an algorithm that makes one pass over the edges of the input graph while using a limited memory - much smaller than the input size which can be as large as Θ(n2) - and at the end of the stream, outputs a coloring of the input graph. Our main result is a randomized algorithm for this problem that uses O(n7/4) space.
Abstract: For a classical system, momentum has been understood as the product of mass and velocity, however in a quantum system this is not the only notion for momentum. In the Bohmian interpretation of Quantum mechanics, there exists a guiding equation which helps support the classical understanding of momentum by providing both mass and velocity in a proportional relationship. However for specific particles such as photons, the classical notion of momentum is no longer applicable. To better understand momentum in a 1-dimensional quantum system I analyzed the possible different notions of momentum for systems with a finite number of particles and compared the results to strive for the overall "winner" in momentum measurement.
Abstract: In my summer work I worked with the P4 programming language and the Barefoot SDE. Information about the P4 programming language includes implementing exercises that exhibit network-level use cases of programmable switches through P4 programs. Information about the Barefoot SDE includes experiences with a real hardware device that can process data at 6.4 Tbit/s.
Abstract: Security enclaves are sections of chips that run trusted and verified code that cannot be accessed by other parts of the computer. If code running on an enclave would like to use a key-value store that is stored on the disk, it will have to go through the untrusted main OS. We investigated how to verify integrity and maintain secrecy with regards to this key-value store that the enclave is using. We first showed how to do it in the case when our data structure is a tree. Then, we used this idea to construct a faster set membership proof on disk. Finally, we made this construction write optimized using ideas drawn from Bε-trees.
Abstract: Virus spread can be simulated through computer programming which can help us to better understand how to minimize the rate at which it spreads. The coronavirus pandemic has impacted how everyone lives their day-to-day lives. The goal of this research was to develop a computer simulation to better understand how the virus spreads and to investigate different graphs such as the connected caveman graph that we can implement to reduce the spread. The idea behind the connected caveman graph was to partition the population into groups to limit the spread of the virus and quarantine specific groups if an outbreak would occur.
Abstract: Understanding gene regulatory networks is key to targeting diseases with drugs that don't include dangerous risks. In this research project, we studied the analyses generated by Huang et al. using random circuit perturbation (RACIPE). After gaining a comprehensive understanding of Huang's paper, we employed RACIPE to reproduce these results. We then worked to produce analogous results in DSGRN (Dynamic Signatures Generated by Regulatory Networks). Moreover, we found the DSGRN parameter nodes corresponding to the RACIPE data. We added weights to the parameter nodes based on what percentage of the RACIPE models occurred in each parameter node. Our preliminary low dimensional results lead to enticing conjectures. When including weighting, the DSGRN results are very similar to RACIPE for high Hill coefficients. Thus, weighting in DSGRN with sampling from RACIPE yields comparable results to RACIPE with far quicker computations. Hence, with additional work we hope to show that DSGRN can achieve comparable results to RACIPE for a fraction of the computational cost of RACIPE.
Abstract: It is known result in differential geometry that a 3-ellipsoid which is near spherical contains 4 minimal 2-spheres, and that if the 3-ellipsoid is scaled enough along one axis, additional minimal 2-spheres occur. We attempt to show that as this scaling approaches infinity, in other words, as the 3-ellipsoid approaches a higher dimensional cylinder, the number of minimal 2-spheres approaches infinity. The method we use is to take advantage of the rotational symmetry of an ellipsoid to reduce the problem to finding geodesics in a two-dimensional metric space, which can be done by finding the solutions to a system of nonlinear ordinary differential equations.
Abstract: Although tumors arise as a result of the rapid proliferation of cancerous cells, these cells comprise only a fraction of the tumor. Many types of cells, each of which possesses a specific role in the growth of solid tumors, make up the tumor microenvironment (TME). In recent years, immunotherapy, a treatment that utilizes the significant presence of immune cells within the TME, has proven effective for liquid tumors such as leukemia and lymphoma. However, in solid tumors, the complex relationships within the TME can make the response to immunotherapy challenging to predict and the treatment difficult to use effectively. Thus, we created a mathematical model of pancreatic adenocarcinoma growth which includes cancer cells, T-cells, and macrophages, and used it to study the dynamics of the cellular interactions within the TME over time and throughout treatment. The model behavior indicates that the ability of CAR T-cells to kill cancer cells is more critical than their persistence. However, when used in combination with chemotherapy, the tumor can be eliminated using less effective T-cells than would be needed if using immunotherapy alone. These results indicate that CAR T-cell research should focus on engineering T-cells with high efficacy in killing cancer cells rather than long persistence in the body. Doing so could allow for effective immunotherapy use in solid tumors, either as a lone treatment or used in combination with chemotherapy.
Abstract: The network interface card (NIC) is the hardware component that connects a computer, such as a laptop, server, or datacenter, to the internet. The NIC sits in the path of the data moving in and out of your machine, allowing it to compute over the data as it transits. For example, a NIC might automatically compress data going in and out of your machine to save bandwidth and reduce download time. With the advent of software-defined networking, along with ever-increasing network bandwidths, NICs have become increasingly flexible in the computations they support, and their architectures more complicated. The Netronome NFP-6000 is one example of these "smart NICs". We present a formal specification of arithmetic and logic instructions for the Netronome NFP-6000, and a programmatic framework for interpreting and formally validating Netronome programs. This work brings us closer to a superoptimizing compiler for Netronome smart NIC programs, and will help further understanding of the capabilities and limitations of hardware acceleration using NICs.
Abstract: In the United States alone, over 3.8 million people have contracted the Covid-19 virus. Due to the increased demand for testing, there have been shortages and issues for the tests themselves. Access to testing and the reporting of cases varies by region, so reported numbers are mainly consisting of people who exhibit symptoms of the virus. Taking into account those who are asymptomatic or do not have access to reliable tests, one can safely assume that the actual cases of this virus are actually much larger. Despite rates continuing to rise and testing becoming more scarce, the United States government pushes for the economy to reopen. This may lead to multiple consequences, to the cases of citizens surging to the negative social impact. Which makes research on methods to help protect individuals from infection a necessity. Continuing to see the changes and trends of this virus is crucial to both understanding causation of case spikes and to finding solutions to help flatten the curve as well as find effective ways to distribute resources. To understand the full scope of the Pandemic's impact on the United States, we considered a wide range of information regarding the virus. Focusing on the changing mobility patterns and state government mandates to understand the risk management of state governments as well as the social impact of Covid-19.
Abstract: Our project is inspired by the mathematical model used in the software package DSGRN. We try to find the equilibria in the systems of ODEs containing Hill models using Newton's method and validate them by radii polynomials. Finally, we introduced interval arithmetics to prove the nonexistence of other equilibria besides the ones we have found.
Abstract: In the nuclei of healthy human cells, DNA is compacted into chromosomes. However, in the nuclei of cancerous cells, there are also small closed loops of DNA called extrachromosomal DNA. These loops of DNA are believed to be responsible for the incredible adaptability of cancerous cells. Here we provide methods for analytically modeling loops of extrachromosomal DNA through the use of B-spline and Bezier curves. In each case, curves are first determined, local reference frames are attached, and then idealized base pairs corresponding to a desired chain sequence are plotted in each of these reference frames. Through the use of these models, we have studied various phenomena related to the structures of these closed loops of DNA.
Abstract: The software Dynamic Signatures Generated by Regulatory Networks (DSGRN) allows for a nearly instantaneous computational analysis of the global dynamics of (genetic) regulatory networks. There is a great need to further understand biologically relevant regulatory networks, which motivates the development of the techniques employed by DSGRN. In order to expand and enrich the abilities of DSGRN, I consider a generalized regulatory network which gives greater flexibility to the interactions between the nodes of the network. This flexibility arises from the introduction of multiple threshold interactions between species. In order to allow DSGRN to handle these generalized regulatory networks, a new kind of parameter space decomposition is required. This parameter space decomposition consists of calculating which parameter sets are possible for the regulatory network. For my project, I developed code to compute this decomposition and store the results in a database. This database can be used by DSGRN in the future in order to handle regulatory networks with multiple thresholds.
Abstract: Summary: The COVID-19 outbreak has changed the way that people interact. With the new stay at home and social distancing orders, most activities that people were used to changed drastically e.g., going to work and classes, living on campuses and travelling. In this new social environment, since early March people in the US were required to quarantine and do all their activities from home. Thus, in this project we researched how COVID-19 impacted the Internet usage in the US. While analyzing the Fixed Broadband raw data sets collected by the Federal Communications Commission (FCC), we were able to identify how aspects of the Internet e.g., amounts of data downloaded and uploaded used by users and average download and upload speed changed due to COVID-19. Moreover, through these analysis, we could also learn about how people have changed their behaviors using the Internet during COVID-19.
Abstract: It is important to form nuanced interpretations of qualitative data, but it is not always feasible because the method of affinity diagramming, which is ideal for wide-ranging, unstructured data, is inefficient and insufficiently scalable for large data sets. We propose an affinity diagramming system, called Qualitative Affinity Diagrammer (QuAD), that leverages computer-generated suggestions to address the scalability and efficiency of the diagramming process. QuAD pre-groups near duplicate notes, and also provides grouping and labeling suggestions for the notes. Additionally, our use of a shared vertical screen and individual tablets provides a digital layout that is conducive to large diagrams. We have created wireframes of QuAD, which we present in this paper. This work is the first step towards creating a powerful tool for assisting in the analysis of large qualitative data sets.
Abstract: Tweetorials are a novel way to improve science communication on social media platforms. A tweetorial is defined as a thread of tweets that explains a scientific subject while taking advantage of writing techniques, linguistic conventions, and media to increase engagement. We explore its usefulness in communicating scientific subjects by conducting a formative study with ten participants. Participants are asked to read through a tutorial on how to write hooks, or the first tweet in a tweetorial, and then write one of their own. We gather their final hooks and summarize their responses to interview questions.
Abstract: The purpose of this research was to build a tool that evaluates elegance in code. Elegant code is code that has been optimized for readability and functionality. There are common criteria that make code elegant, such as conciseness and formatting, and we defined these components as spaghetti code features. In order to accurately and consistently measure elegance, we modified and implemented the static analysis tool Checkstyle to assess java programs for spaghetti code features. If the java program did not meet our standard of elegance, an error message would be outputted. A python program would then count and group the errors by category output in order to formulate an elegance score for the code being assessed. Our next steps will be to create a grading algorithm and run linear regression on the elegance score in order to find which features correlate best with human judgement.
Abstract: Public Interest Technology (PIT) is a new field that encompasses the techniques, skills, methods, and processes used in the production of a good, service, or event that empowers a disempowered population, without disempowering those outside of that population. It is not commercial or for profit, and is ideally not government affiliated. It is owned and operated by and for the people, mutable by those who use it. It fosters community building, establishing networks of connections outside of normal institutions that hold power over us, whether that be the state or corporations. PIT can be applied to a diverse range of communities and spaces. In this research, I show a glimpse of this range through the design and application of PIT to public libraries and computer science communities.
Abstract: With an increase in Big Data and virtual collaboration in life science research, life scientists have faced restrictions and regulations about data security that affect research workflows and collaboration. Through designing a data security and access control system, we introduce a feasible way to securely share data across institutions and borders while abiding by these restrictions. We looked to past research and the experiences of potential users to develop a model for secure collaboration, that further takes into account ease of use and options for complex preferences. We further wire-framed a potential solution to visualize its function. Our system includes options to set simple yet effective access policies, quickly import fine-grained access settings, and receive feedback on how collaborators interact with the given data. By prototyping a potential system to enable the secure sharing of data for life science researchers, we present a model for future software development and research on the usability of that software.