Google Summer of Code 2024
GSoC 2024 Final Report
Author: Abhinav Kumar
Mentor: Bryan Richter
Project Name: Continuous Integration Log Explorer Tool
Proposed Problem: GSoC 2024 Ideas for Haskell
My Proposal Summary: The goal of this project is to assist developers in analyzing large CI test logs, particularly for rare intermittent failures, by creating a web-based tool. This tool extends an existing service (Spuriobot) to collect CI log data and store it in a full-text search (FTS) database. It also includes the development of an improved web UI, offering advanced search capabilities and automatic log integration from GitHub workflows.
GitLab Repository: Spuriobot GitLab Repository
Also, Don’t forget to read the Acknowledgment!
Accomplished Tasks
1. CI Job Log Backfill into FTS Database
To build the Web UI for exploring rare intermittent failures from CI test logs, the first major task was to backfill all existing CI job logs into a Full Text Search (FTS) database. While the system was already set up to insert new job logs as they finished if they failed, there was no functionality for retroactively backfilling historical job logs. This functionality was added to Spuriobot, ensuring that all previous job logs are available in the FTS database for comprehensive searching.
Code for the implementation was written in Haskell and involved setting up a job-fetching process using GitLab's API, processing job data, and inserting it into the FTS database. The Haskell module responsible for this backfill operation includes functions like fetchJobsBetweenDates
and initDatabase
to handle the collection and insertion of job data, with support for concurrent operations and handling job trace fetching.
Key Implementation Details
-
Job Backfilling Process: The function
fetchJobsBetweenDates
retrieves CI job logs between a specified date range and inserts them into the FTS database for future search.fetchJobsBetweenDates :: (UTCTime, UTCTime) -> IO ()
-
Database Initialization: The initDatabase function sets up the SQLite database with FTS5 support, creating tables for job data and traces.
initDatabase :: MonadIO m => TMVar Connection -> m ()
-
Job Staging and Trace Insertion: The insertJob function inserts the trace data associated with each job into the FTS database, making logs searchable by their trace content.
insertLogToFTS :: FinishedJob -> TMVar Connection -> Spuriobot ()
The implementation can be found in the module Spuriobot.FTS
, which includes the use of SQLite and HTTP requests to interact with the GitLab API.
This feature ensures that all CI logs, past and present, are indexed and searchable in the system, laying the groundwork for the Web UI and search functionalities.
2. Implemented FTS Log Explorer UI
To facilitate the exploration of CI test logs and troubleshoot intermittent failures efficiently, a Full Text Search (FTS) Log Explorer UI was implemented. The interface allows users to search for jobs in the database based on keywords, with options for advanced search using exact matches. The web UI was built using a Haskell web framework (Scotty) and renders the results using Lucid2 for HTML templating, ensuring a responsive and clean design.
The search functionality is driven by a search API that interacts with the SQLite database where job logs are stored. Users can enter search queries to filter results by job name, runner, or specific log contents. A paginated system ensures that large results are easy to navigate, and users can toggle between advanced (exact match) and regular searches. The search results display key information such as job IDs, timestamps, URLs, and project paths, offering a complete overview of the relevant logs.
Key Features of the UI:
-
Search Form: The form allows users to input keywords and optionally enable "Advanced Search" by checking a box. When "Advanced Search" is enabled, the keyword is passed directly without modification, allowing users to write precise SQL queries for more control over their search. If "Advanced Search" is not selected, the keyword is wrapped in quotes to ensure a more flexible keyword search by handling variations in case and spacing.
wrapKeyword :: Maybe Text -> Maybe Text wrapKeyword = fmap (\k -> "\"" <> k <> "\"")
This function formats the keyword appropriately for non-exact searches, ensuring that the search query captures all potential matches, while "Advanced Search" enables exact matches by skipping the formatting.
-
Pagination: The UI supports pagination with "Next" and "Previous" buttons, enabling users to navigate through large result sets easily.
renderPage :: Maybe Text -> SearchOutcome JobInfo -> Bool -> Int -> Bool -> Html () -- Function to render the UI with paginated results and search form.
This function generates the HTML for the search form and job results, with pagination controls based on the number of results.
-
Job Information Display: Each job result is displayed with key details like job ID, creation date, URL, runner details, and project path.
renderJob :: JobInfo -> Html () -- Function to display detailed job information.
This function renders the job information in a structured format, making it easy for users to view relevant details.
-
Search Backend: The search is performed using SQL queries that interact with the SQLite database, leveraging FTS for efficient keyword searches.
searchJobs :: Connection -> Maybe Text -> Int -> Int -> IO (SearchOutcome JobInfo) -- Function to search for jobs based on the provided keyword and pagination details.
This function executes the SQL queries to fetch matching job results from the FTS-enabled database.
The implementation can be found in the Spuriobot.SearchUI
module, which includes the Scotty server setup and the integration with SQLite for full-text search.
This UI offers a user-friendly way to explore CI logs, speeding up the process of finding patterns and resolving issues with CI jobs.
Future Work
An important next step is to integrate Spuriobot with GitHub workflows, expanding its functionality beyond GitLab. By adding support for both platforms, the tool will become more versatile, allowing users to analyze CI job logs regardless of whether their CI runs on GitHub or GitLab. This enhancement will ensure broader applicability and make the tool a valuable resource for a wider range of developers and projects across different CI environments.
Acknowledgment
I would like to express my deepest gratitude to my mentor, Bryan Richter, whose unwavering support and invaluable guidance were instrumental in the success of this project. His commitment was evident through his hands-on involvement, from writing new lines of code to providing thoughtful feedback, and his generous investment of time was crucial to my progress. Over the past three months, I have learned an immense amount from him—not only about Haskell but also about software development life cycle (SDLC) best practices.
I would also like to extend my thanks to the Haskell.org community, whose collective efforts have significantly contributed to the development of this tool. This project is a reflection of the collaborative spirit of the community.
For more details, visit the spurious-failures project.