core

Basic File I/O


source

read_text_file

 read_text_file (filename:str)

generic, read any text file

recc_info = read_text_file("../example/recc_info.txt") 
print(recc_info)
Reccomender Name: Teacher Person 
Title: Professor of Cleverness 

Address: 
Department of Curiosities
Generic University 
1337 Generic Pl. 
Springfield, WA 31416 USA

Phone: 555-123-1337
Email: teacher.person@generic.edu

source

read_urls_file

 read_urls_file (filename:str)

read a text file where each line is a url of a submission site

urls = read_urls_file("../example/sample_urls.txt") 
print(f"{len(urls)} urls in list")
for i, url in enumerate(urls): 
    print(f"{i+1} of {len(urls)}: {url}")
2 urls in list
1 of 2: http://localhost:8000/sample_form.html
2 of 2: http://localhost:8000/sample_form2.html

source

read_pdf_text

 read_pdf_text (filename:str)
letter_text = read_pdf_text("../example/sample_letter.pdf")
print(letter_text)
   Dear Graduate Admissions Committee,  I am writing to recommend Student Person for admission to your graduate program. Having worked closely with them for two years in both teaching and research capacities, I can say they are among the strongest students I have encountered in over a decade of academic work.  Student Person took several of my advanced courses — Quantum Rollercoasters, Physics of Impossible Machines, and a seminar on Neural Networks for Curious Minds. They also worked with me on an independent research project. In every setting, they showed sharp intellectual ability, creative thinking, and real persistence. Their coursework went beyond surface-level competence; they clearly grasped the deeper principles at play. As a researcher, they brought fresh perspectives while staying receptive to guidance.  What stands out most is their dependability. They consistently met deadlines and produced high-quality work. During our independent project, they actually moved ahead of schedule, diving into advanced material sooner than expected. That kind of self-direction is uncommon and bodes well for graduate study.  They are also an effective communicator — clear and organized in writing, articulate in discussion. They collaborate well, contributing to group dynamics without dominating them.  Overall, I would rate Student Person as outstanding across the board: intellectual ability, research aptitude, writing, and professional potential. Their creativity, interpersonal skills, and motivation are all exceptional. I am confident they will be a valuable addition to your program.  Feel free to reach out if you would like to discuss further.  Please feel free to contact me if you require any additional information.  Sincerely,    Teacher Person, Ph.D. Professor of Cleverness  
          DEPARTMENT of  CURIOSITIES 
1337 Generic Pl Springfield, WA 31415-2654  phone 555-123-1337 fax  555-123-5555 

Parsing HTML (Form)


source

group_radio_buttons

 group_radio_buttons (soup, name)

Group radio buttons by name into a single field dict


source

scrape_form_fields

 scrape_form_fields (html:str)

Extract all fillable form fields from HTML

html = read_text_file("../example/sample_form.html") 
fields = scrape_form_fields(html) 
[f['id'] for f in fields][10:30]
['title',
 'phone',
 'email',
 'addr1',
 'addr2',
 'city',
 'state',
 'zip',
 'country',
 'months_known',
 'years_range',
 'capacity',
 'rating_intellectual',
 'rating_scientific',
 'rating_research',
 'rating_prev_work',
 'rating_lab',
 'rating_oral',
 'rating_writing',
 'rating_originality']

LLM Usage

Next we prompt the LLM to figure out which form fields apply, and how:


source

trim_fields

 trim_fields (fields:list[dict])

Remove unnecessary fields so we send fewer tokens to LLM: remove prefilled fields and drop options from non-select fields

print(f"Fields JSON length:  {len(json.dumps(fields, separators=(',', ':')))} characters")
trimmed = trim_fields(fields) 
print(f"Trimmed JSON length: {len(json.dumps(trimmed, separators=(',', ':')))} characters")

source

make_prompt

 make_prompt (fields:list[dict], recc_info:str, letter_text:str)

build the prompt that will go to the LLM

prompt = make_prompt(trimmed, recc_info, letter_text)
print(f"Prompt is {len(prompt)} characters")
prompt[2000:3000] # brief inspection

source

get_field_mappings

 get_field_mappings (fields:list[dict], recc_info:str, letter_text:str,
                     model='claude-sonnet-4-20250514', debug=False)

Use LLM to map recommender info and letter to form fields

Type Default Details
fields list list of form fields
recc_info str info on recommending person
letter_text str text of recc letter
model str claude-sonnet-4-20250514 LLM choice, e.g. “ollama/qwen2.5:14b”
debug bool False print debugging/status info
Returns list

‘Hybrid’ Form Verification

If the user desires, they can have Anthropic verify the extracted form fields before moving on. And we can redact any student info from the blank form just by having the student’s name be specified via the --verify CLI argument.

This DOES require the ANTHROPIC_API_KEY, even if local LLM is being used for everything else.


source

trim_html

 trim_html (html:str, trim_script=False)

remove irrelevant html before sending to remote LLM


source

verify_form_fields

 verify_form_fields (html:str, fields:list[dict], student_name:str,
                     model='claude-sonnet-4-20250514', debug:bool=False)

Filling in the Form


source

get_element_info

 get_element_info (page, field_id, field_type=None)

given an id or a name, find the element on the page and get its info


source

should_skip

 should_skip (elem, tag, input_type, skip_prefilled)

should we fill in this element? Not if there’s already a value there.


source

fill_element

 fill_element (page, elem, tag, input_type, field_id, value)

actually fill in this element


source

fill_form

 fill_form (page, mappings, fields, skip_prefilled=True, debug=False)

Fill form fields using Playwright


source

upload_pdf

 upload_pdf (page, pdf_path)

Upload the recommendation letter PDF


source

process_url

 process_url (page, url, recc_info, letter_text, pdf_path, model,
              verify='', debug=False)

Process a single recommendation URL

formalyzer CLI script


source

read_inputs

 read_inputs (recc_info:str, pdf_path:str, urls:str)

reads all input files


source

setup_browser

 setup_browser ()

Connect to Chrome with remote debugging


source

run_formalyzer

 run_formalyzer (recc_info:str, letter_text:str, urls:list, pdf_path:str,
                 model:str, verify='', debug=False)

Main async workflow


source

main

 main (recc_info:str, pdf_path:str, urls:str, model:str='claude-
       sonnet-4-20250514', verify:str='', debug:bool=False)
Type Default Details
recc_info str text file with recommender name, address, etc
pdf_path str name of PDF recc letter
urls str txt file w/ one URL per line
model str claude-sonnet-4-20250514 ‘ollama/qwen2.5:14b’ for local model
verify str Option to verify field extraction via Claude. Value should be student name
debug bool False best to always turn this on, actually
#main("~/recc_info.txt", "~/recc_letter.pdf","~/recc_urls.txt", debug=True)