Anthony Merlin Portfolio

Written by Anthony Merlin / GitHub / LinkedIn

Education

Glen Allen High School Awarded: (Advanced High School Diploma)

Virginia Polytechnic Institute and State University (Virginia-Tech) Currently: (Bachelors Degree in Computational Modeling And Data Analytics(CMDA) | Minor in Mathematics and Computer Science).

Experience

Dataism Laboratory for Quantitative Finance (DLQF): Quantitative Researcher Conducted extensive research in network-based portfolio optimization, applying graph theory and machine learning approaches to improve upon traditional financial models. My work focuses on leveraging network structures within financial markets to enhance risk-adjusted returns, increase portfolio robustness, and develop novel quantitative investment strategies. I have experience implementing both model-based and data-driven approaches, with particular emphasis on graph neural networks and network centrality measures to capture complex market dependencies.

The Aerospace Corporation: Machine Learning Engineer Collaborated with The Aerospace Corporation to develop a next-day wildfire spread prediction system using satellite imagery and machine learning. The project leveraged nearly a decade of historical wildfire data across the United States, combining 2D fire data with multiple environmental variables. Key Contributions: - Designed and implemented an improved U-Net deep learning architecture with attention gates for wildfire spread prediction, achieving 59.97% precision - Developed a series of models with progressive improvements, with our best model reaching an F1 score of 0.5039 and recall of 0.5130 - Created custom loss functions and evaluation metrics to address extreme class imbalance in wildfire data - Implemented post-processing techniques including Gaussian smoothing and morphological operations to enhance prediction accuracy - Engineered a comprehensive evaluation framework with visualization tools to interpret and analyze model performance - Worked with a complex dataset that integrated multiple variables including elevation, weather, vegetation, drought indices and population density. Technologies: Python, PyTorch, TensorFlow, Keras, NumPy, Matplotlib, OpenCV, Computer Vision, Satellite Imagery Analysis

The Complex Systems Lab at Virginia Tech and the Virginia Affect & Interoception Lab at the University of Virginia: Researcher I participated in a collaborative research project funded by the NSF, focusing on the role of behavioral and physiological synchrony in crowd stress contagion dynamics. My responsibilities included data collection and processing, where I worked closely with the research team to collect and process experimental group data using motion capture and cardiovascular physiological modalities. I was also responsible for data cleaning, using software such as Mindware and Qualtrics. Additionally, I was responsible for model implementation, where I selected, built, and implemented a chosen model in R using statistical and mathematical techniques such as ARIMA detrending, cross-correlation analysis, and spline interpolation, complete with commented code to analyze the collected data. Lastly, I conducted extensive literature reviews on existing models of behavioral and physiological synchrony.

Virginia Tech Academy of Data Science: Computational Modeling and Data Analytics (CMDA) Ambassador Responsible for promoting the CMDA program to prospective students, as well as assisted faculty and staff at student fairs and recruiting events.

American Red Cross: Data Analyst for International Services In my role at the American Red Cross, I managed blood products data across international operations using SQL to organize information from field reports and survey data. I analyzed this data to identify patterns and prevent complications in blood supply distribution during emergency situations. My responsibilities included creating regular reports to help leadership make informed decisions about resource allocation, particularly during humanitarian crises. By implementing improved data collection methods, I helped streamline operations and ensure blood products reached the areas of greatest need more efficiently. I worked closely with medical teams and field coordinators to understand their data needs and translate statistical findings into practical recommendations. This collaborative approach helped the organization respond more effectively to emergencies and ultimately contributed to the Red Cross mission of saving lives through data-driven decision making.

Certifications

(Microsoft Power BI) from Simplilearn

(Basic Responsible Conduct of Research Course) from CITI Program

(Social and Behavior Research) from CITI Program

(Cypher Fundatmentals) from Neo4j

(Neo4j Fundamentals) from Neo4j

(Tools for Data Science) from IBM

(Data Science Orientation) from IBM

(Google Data Analytics Certifcations) from Google

(Cvent Supplier Network) from Cvent

Core Analytical & Quantitative Skills

Mathematical Modeling (including Advanced Mathematical Modeling), Numerical Analysis, Linear Algebra, Calculus, Differential Equations, Partial Differential Equations (PDEs), Inverse Problem Formulation, Optimization (Numerical & Portfolio Optimization), Scientific Machine Learning, Statistical Analysis, Advanced Statistical Modeling, Economic Data Analysis, Quantitative Finance, Financial Risk Management, Big Data Economics, Financial Performance Evaluation, Scalability.

Data Science & Machine Learning Skillls

Data Analytics, Data Modeling, Data Collection, Data Cleaning, Data Visualization, Machine Learning, Deep Learning, Scientific Machine Learning (SciML), Model Evaluation & Cross-Validation, Model Validation & Optimization, Quantitative Result Analysis, Deep Learning Model Architecture, Applied ML for Environmental Science, Machine Learning for Finance, Computer Vision for Remote Sensing

Tools & Technologies Skills

Python (incl. Pandas, NumPy, Matplotlib), R & RStudio, Julia, MATLAB / GNU Octave, SQL, CUDA, C, Java, PyTorch, TensorFlow, LangChain, Neo4j (and Cypher Query Language), HTML / CSS / JavaScript, Excel, Microsoft Power BI, AWS (Amazon Web Services), Qualtrics, MindWare

Research & Domain-Specific Skills

Biomedical Instrumentation, Physiological Signal Acquisition, Physiological Monitoring, Motion Capture, Signal Processing

Communication & Soft Skills

Problem Solving, Written Communication, Interpersonal Skills, Communication, Team Leadership, CMDA Ambassador / Outreach

Current Goals

I’m currently preparing for graduate studies in Data Science, Mathematics, Statistics, or Computer Science, with a strong research and career focus on Artificial Intelligence (AI) and Cloud Computing. I'm passionate about the intersection of deep learning, large-scale data systems, and real-world problem solving. In the short term, I’m actively expanding my skills in cloud platforms—especially Amazon Web Services (AWS), with plans to earn the AWS Certified Cloud Practitioner certification and continue toward AWS Solutions Architect – Associate. I’m also exploring other top cloud ecosystems like Google Cloud Platform (GCP) and Microsoft Azure to broaden my understanding of scalable, production-ready ML systems. I’m also deepening my understanding of relational database design, query optimization, and data engineering concepts to support real-time analytics and distributed learning pipelines. My long-term goal is to contribute to the design of intelligent, scalable systems that drive innovation in finance, environmental science, and beyond.

Signature Projects

Physics Informed Neural Networks for Heat Equation This project implements Physics-Informed Neural Networks (PINNs) to solve the heat equation, a fundamental partial differential equation in physics and engineering. PINNs represent a novel scientific machine learning approach that embeds physical laws directly into neural network training through custom loss functions. The implementation tackles two key problems: Forward Problem: Given a known diffusion coefficient κ(x)=1.0, the neural network learns to predict the temperature field u(t,x) across space and time. Inverse Problem: Using only 100 sparse measurements (1% of total solution points), the model simultaneously recovers both the temperature field AND the unknown diffusion coefficient. The neural architecture consists of: Temperature network: 4 hidden layers with 20 neurons per layer and tanh activation. Diffusion coefficient network: 2 hidden layers with 10 neurons per layer and softplus activation. Training uses ADAM optimizer with learning rate scheduling. A comprehensive comparison with traditional Finite Element Method (FEM) reveals that while FEM achieves slightly better accuracy for the forward problem (0.0031 vs 0.0066 mean error), PINNs excel in solving the inverse problem by directly recovering the unknown diffusion coefficient from limited data. Key advantages of PINNs demonstrated in this project include: Physics-informed constraints that ensure solutions satisfy governing equations. Ability to handle inverse problems without iterative optimization loops. Continuous, differentiable solutions across the entire domain. Efficient utilization of sparse measurements. The results confirm PINNs as a powerful tool for scientific computing, especially for scenarios with limited data availability or unknown physical parameters.

Portfolio Optimization Using Penalized Regression: Ridge, Lasso, and Elastic Net Project Overview: I developed a machine learning approach to stock portfolio optimization using penalized regression techniques (Ridge, Lasso, and Elastic Net) on S&P 500 data to address multicollinearity, overfitting, and improve feature selection in quantitative finance. Key Achievements - Achieved significant outperformance with Ridge (27.61% return, +49.73% vs benchmark) and Elastic Net (26.14% return, +41.77% vs benchmark) portfolios - Demonstrated effective feature selection with Ridge (85 features) and Elastic Net (11 features) - Identified optimal sector allocations and key stock selections driving performance. Technical Approach - Processed S&P 500 financial data including company metrics, prices, and sector classifications - Implemented 80/20 train-test splitting for reliable out-of-sample validation - Used cross-validation for hyperparameter tuning and coefficient analysis - Created sector allocation visualizations for portfolio interpretation. Investment Insights - Top sectors: Industrials (40%), Technology (30%), Healthcare (20%) - Consistent performers: ON Semiconductor, Axon, Salesforce, Vertex Pharmaceuticals - Elastic Net provided best balance between returns and diversification. The results demonstrate how machine learning can enhance portfolio management by reducing overfitting, managing feature correlation, and generating significant alpha compared to benchmarks.

Advanced Statistical and Machine Learning Insights into USDA Rural Housing Data This dashboard analyzes USDA Rural Development Multi-Family Housing data using various statistical methods to explore relationships between demographics, income, and housing characteristics: Logistic Regression predicts above-average income properties based on household count, showing 40.2% probability for 30-household properties. KNN Analysis for income prediction finds k=50 provides optimal balance, revealing a positive relationship between household size and income. Multiple Regression shows both elderly and disabled populations positively affect household counts, with disabled residents having stronger impact (0.79 vs 0.30 units). Ridge Regression identifies Other_Income_Average and Public_Assistant_Income as strongest predictors of annual income while effectively balancing model complexity. Naive Bayes Classification reveals significant class imbalance (97.5% low cost-burden) with annual income and minor count as key predictors of cost burden status. LOESS Fitting shows mild positive non-linear association between female household heads and annual income, with wider confidence intervals at higher counts. Secondary Logistic Regression finds designated minors count strongly negatively predicts elderly-focused properties, while female household heads show positive association. The dashboard presents findings through visualizations, statistical summaries, and interpretations, providing insights into rural housing patterns and the relationships between property characteristics, demographics, and economic outcomes.

CUDA-Optimized Collatz Conjecture: GPU Parallelism for Maximum Stopping Time The GPU Collatz Project harnesses CUDA parallelism to explore the famous mathematical Collatz Conjecture with unprecedented efficiency. The Challenge The Collatz Conjecture follows two simple rules for any positive integer: Even numbers: divide by 2 Odd numbers: multiply by 3, add 1. Our goal: find which starting number takes the longest path to reach 1 within a massive range. Key Innovations. Our solution abandons sequential processing in favor of massive parallelism: 10 billion concurrent GPU threads, each handling one number. Bit-packing technique combines stopping time and position into single 64-bit values. Precise overflow detection at threshold 6148914691236517204. Thread-safe atomic operations eliminate race conditions. Implementation Highlights. The kernel architecture balances multiple factors: Optimized thread organization: 256 threads/block × 39,062,500 blocks. Memory-efficient bit-packing for atomic updates. Local calculations minimize memory traffic. Robust overflow detection and counting. Results, Testing demonstrated remarkable performance: Processed billions of values in seconds. Identified 904336917279 as requiring 1421 steps to reach 1. Completed calculations in 33.57 seconds. Detected 8129 overflow conditions during processing. Impact, This GPU-accelerated approach transforms mathematical exploration by: Achieving orders-of-magnitude speedup over CPU implementations. Enabling exploration of previously inaccessible number ranges. Providing new insights into the Collatz Conjecture's behavior. Demonstrating how engineering principles can advance pure mathematics. The project illustrates how parallel computing opens new frontiers for computational number theory, making previously intractable problems solvable through sophisticated GPU programming techniques.

Quantitative Analysis of Ideological Biases in Search Engine Results My research team conducted an extensive data analysis investigating bias in Google's search algorithm across politically charged topics. Using sentiment analysis on 109 bipolar search queries and topics, we found clear evidence of non-neutral search results. Key findings: Google displays measurable sentiment bias across controversial topics, with virtually no neutral article distribution. Political sentiment analysis reveals Google's results lean slightly Totalitarian (52.3%), more strongly Collectivist (59.6%), and Progressive (58.7%). Bias varies significantly by topic category: environmental, healthcare, and education topics show Collectivist-Progressive bias, while gun and judicial topics display Libertarian-Conservative bias. Cluster analysis confirmed these patterns aren't random—specific topic groups consistently receive similar sentiment treatment. Methodology included: Web scraping Google search results for controversial topics and political questions. Sentiment analysis using the AFINN-165 dataset to measure positivity/negativity. Political axis mapping across Totalitarian-Libertarian, Collectivist-Individualist, and Progressive-Conservative dimensions. K-means clustering to identify topic relationships. Visualization through divergent bar charts, boxplots, MDS, and 3D political scatter plots. This research raises important questions about algorithmic neutrality in information discovery. While overall bias is slight (+0.01 sentiment per word on each political axis), the consistent deviation from neutrality demonstrates how search engines may inadvertently shape public discourse on contentious issues. Understanding these patterns is crucial for media literacy and developing more balanced information systems that present diverse viewpoints on complex political topics. My research team conducted an extensive data analysis investigating bias in Google's search algorithm across politically charged topics. Using sentiment analysis on 109 bipolar search queries and topics, we found clear evidence of non-neutral search results. Key findings: Google displays measurable sentiment bias across controversial topics, with virtually no neutral article distribution Political sentiment analysis reveals Google's results lean slightly Totalitarian (52.3%), more strongly Collectivist (59.6%), and Progressive (58.7%) Bias varies significantly by topic category: environmental, healthcare, and education topics show Collectivist-Progressive bias, while gun and judicial topics display Libertarian-Conservative bias Cluster analysis confirmed these patterns aren't random—specific topic groups consistently receive similar sentiment treatment Methodology included: Web scraping Google search results for controversial topics and political questions Sentiment analysis using the AFINN-165 dataset to measure positivity/negativity Political axis mapping across Totalitarian-Libertarian, Collectivist-Individualist, and Progressive-Conservative dimensions K-means clustering to identify topic relationships Visualization through divergent bar charts, boxplots, MDS, and 3D political scatter plots This research raises important questions about algorithmic neutrality in information discovery. While overall bias is slight (+0.01 sentiment per word on each political axis), the consistent deviation from neutrality demonstrates how search engines may inadvertently shape public discourse on contentious issues. Understanding these patterns is crucial for media literacy and developing more balanced information systems that present diverse viewpoints on complex political topics.

Modeling Consensus Dynamics in Static and Switching Networks This project explores how groups of interacting agents reach consensus using averaging algorithms applied over network structures. Drawing from real-world examples like bird flocking and distributed systems, the study models consensus behavior on both static and switching (time-varying) networks using tools from linear algebra, graph theory, and numerical simulation. The analysis begins with three static networks (A1, A2, A3), each represented by an adjacency matrix. The corresponding Laplacian matrices are computed, and their eigenvalues and eigenvectors are used to assess the network's connectivity. The number of zero eigenvalues indicates how many connected components exist, which directly influences whether a network can reach consensus. Both discrete-time and continuous-time averaging models are studied. It is shown that the state where all agents have the same value is an equilibrium for both models, and conditions are derived to ensure stability in the discrete case. Specifically, the update step size must be within a safe range based on the largest nonzero Laplacian eigenvalue. Simulations confirm that proper choices lead to convergence, while poor choices cause instability or divergence. The project then explores switching networks, where the structure changes randomly at each time step. Simulations using different connection probabilities (such as 0.1 and 0.9) show that more connected networks lead to faster and more reliable consensus. A disagreement metric is used to measure how quickly the agent values align over time. Overall, the project demonstrates how the structure and behavior of networks directly impact the ability of agents to achieve consensus, bridging mathematical theory with computational modeling.

Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic" This project is my approach and work to solve the problem of Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic". The main objective of our case study is "How to convert casuals to members?". I used PySpark SQL with Jupyter Notebook for the data cleaning since the dataset is too large to merge and operate (around 4,073,561 observations). Once data was cleaned I did the analysis in R. As a result, I observed the following from the given data: -Casual usage is slow for weekdays but weekends are very popular especially Saturday. -Docked bike is the most popular for both members and casuals. But we can see that casuals prefer docked bike more than members do. -The average distance traveled by members and casuals are almost same, however, members average trip duration ~15 min. is almost three times less than casual mean trip duration ~42 min. -Casual users tend to start and end trips from the same station while its little different for members. and some others. Considering the above observations and insights we can suggest the following: 1. We could increase the renting price of the bikes for the weekend to target casual users into having a membership especially for docked bikes, since they are preferred more by causal users. 2. Providing a special service or perks for only members might motivate casual users to have a membership. Services might include free ice cream or lemonade, free tour guide, or fast line for renting without any line etc

Personal Projects

Robotic Arm (Project Zeta 1) This is a programmable robotic arm designed to perform precise movements and tasks, including gripping, rotating, and lifting objects. This project demonstrates the use of servo motors, Arduino microcontroller, and PCA9685 PWM driver to control six degrees of freedom with custom configurations and loops for real-time adjustments. Enhancements include integrating a webcam for real-time object detection and tracking, implementing advanced control algorithms like inverse kinematics, and training machine learning models on CUDA-enabled GPUs for improved object recognition and decision-making. Plans also involve exploring reinforcement learning, upgrading to more powerful microcontrollers, and adding IoT capabilities for remote control and monitoring.

About Me

For starters, I'm 21 years old and originally from Richmond, VA. Two things I’ve always loved are spending time with my family and diving into math. My passion for mathematics really took shape during high school at Glen Allen High School, where I earned my Advanced Studies Diploma. Over time, I found that coding felt like a natural extension of my problem-solving skills—it just made sense. That led me to pursue a degree at Virginia Polytechnic Institute and State University (Virginia Tech), where I'm currently studying Computational Modeling and Data Analytics (CMDA) with minors in Mathematics and Computer Science. My coursework blends rigorous mathematical analysis with hands-on, intensive programming.

Get in touch

Feel Free to Reach Out: Thank you for visiting my website — it was built from the ground up using HTML, CSS, and JavaScript. I appreciate you taking the time to explore my work. If you have any questions, want to collaborate, or just want to connect, don’t hesitate to contact me. I typically respond to emails within 24 hours.

Check Out My Projects (Math & Coding Based) If you're interested in what I've been working on, feel free to check out my GitHub! Most of my projects focus on math, data science, and coding — everything from analytical models to interactive web tools. I’m always learning and building, so there’s usually something new.