Deep Research for Hong Kong Cases, Homemade

2026, February

Deep research agent decides to use tools, does it over several rounds. Reads, searches again, just as a lawyer would.

Writes up a note, in the exact style I want, with direct links and paragraph citations:

Product comments:

The output, the research note, should be so close in style in style to what the lawyer wants that they can copy and paste relevant parts, with minimal editing.
Human verification is still crucial. Direct links to cases, and exact paragraph references, must be provided so that the lawyer can verify the machine's work ASAP.
Mini-apps like these can also be offered as an API, to plug in to agents which are more general purpose.

Technical observations:

Early 2025 to early 2026, the overall inference costs of the LLM agent fell by, ballpark, 4x. (Gemini 2.5 Pro -> Gemini 3 Flash)
Without the big drop in inference costs, this deep research would be too expensive to run as a personal project.
The LLM agent itself is extremely simple to program. A loop with around 5 tools.

I wrote this app for exactly 1 user - myself. It's never been easier to build custom, personal apps. Python is basicallly English. AI can write all the boilerplate code for you.

Professional legaltech service providers are still absolutely necessary though. For instance, this homemade deep research app does not yet do sherpardization, i.e. flagging whether cases have been overturned or disapproved of. An LLM can have a good try at automated sherpardization. But it would be expensive for a personal project, and still not as reliable as paying humans to do it assiduously.

AI Agent for a Litigation Case Papers

2025, April

Tldr:

Like Claude Code, but the scaffolding is for all the pdf case papers for a legal case (say, ~3000 pages in total).
Tool calling in a loop, e.g. read, write, save.
Case papers can be pdf files which hasn't been OCR-ed yet. Not completely helpless with tables and images.
~1 week project.

Techniques:

Gemini 2.5 pro. Giving it the usual read, write, grep tools.
TODO: sub-agent as a tool.
Compaction of context is critical.
Scaffolding is mostly pre-processing the bundle so the agent can navigate it more easily.

...

Prompt No. 1:

write me a note on the best arguments of either side. include page and paragraph references to the submissions and to cases.

...

Output No. 1:

Note on Arguments

This note summarizes the best arguments for the Appellants (the pilots) and the Respondent (British Airways) in the dispute over the calculation of holiday pay.

Appellants' (Pilots) Arguments

The Appellants' central contention is that their holiday pay should be their "normal" pay, not just their basic salary. The strongest arguments in support of this position are:

Purposive Interpretation of "Paid Annual Leave": The Appellants argue that the phrase "paid annual leave" in the Civil Aviation (Working Time) Regulations 2004 must be interpreted in line with the purpose of the underlying European Directives (2000/79/EC and 93/104/EC) [SC/18/C]. This purpose is to ensure workers actually take their leave to protect their health and safety. If taking leave results in a significant reduction in pay, it creates a disincentive to taking leave, defeating the directive's purpose [SC/19/D].
"Normal Remuneration" Principle from EU Case Law: The Appellants lean heavily on the interpretation of "paid annual leave" from the European Court of Justice in cases concerning the Working Time Directive. This case law establishes that holiday pay must be "comparable" to a worker's "normal" remuneration. The Appellants' pay when on leave is significantly less than when they are working, due to the exclusion of the Flying Pay Supplement (FPS) and Time Away from Base (TAFB) allowance [SC/19/C-D].
Absence of Domestic Calculation Mechanism: The Appellants point out that the 2004 Regulations do not provide a specific method for calculating "a week's pay" for holiday. They argue this is a deliberate omission, requiring the courts to apply a meaning consistent with the European directives [SC/19/F-G]. The Court of Appeal judgment summary notes the Employment Appeal Tribunal's acceptance of this argument [SC/27/D].

Respondent's (British Airways) Arguments

The Respondent's case is that paying the contractually agreed basic salary during annual leave fulfils its legal obligations. Their strongest arguments are:

Literal Interpretation of the Regulations: The Respondent argues for a plain-meaning interpretation of "paid annual leave." The pilots receive their basic pay while on leave; therefore, they are receiving "paid... leave" [SC/21/1].
Contractual Agreement: The pilots' terms and conditions, including holiday pay, are governed by collective agreements negotiated with their union, BALPA [SC/14/G]. The Respondent will argue that these agreed terms satisfy the requirements of the 2004 Regulations.
Absence of Explicit Requirement for "Normal Pay": The Respondent contends that the 2004 Regulations do not explicitly state that holiday pay must include all pay components. They argue that if Parliament had intended to include supplements such as FPS and TAFB, it would have done so expressly [SC/21/1-2]. The Court of Appeal agreed, stating that the ordinary meaning of "paid annual leave" was not pay measured by what a pilot earned while working [SC/27/F].

...

Prompt No. 2:

translate the note you've just written into traditional chinese, and save it as a separate document

...

Output No. 2:

法律論點備忘錄

本備忘錄總結了上訴人（飛行員）與被上訴人（英國航空公司）在假日薪酬計算爭議中的最佳論點。

上訴人（飛行員）的論點

上訴人的核心論點是，他們的假日薪酬應為其「正常」薪酬，而不僅僅是基本工資。支持此立場的最有力論點如下：

「帶薪年假」的意旨解釋： 上訴人主張，2004年《民航（工作時間）規例》中「帶薪年假」一詞，必須根據其所依據的歐洲指令（2000/79/EC 和 93/104/EC）的意旨進行解釋 [SC/18/C]。該意旨是確保勞工確實休假，以保障其健康與安全。如果休假導致薪酬大幅減少，將會阻礙勞工休假，從而違背該指令的目的 [SC/19/D]。
源自歐盟案例法的「正常薪酬」原則： 上訴人極力倚重歐洲法院在有關《工作時間指令》的案件中對「帶薪年假」的解釋。該案例法確立，假日薪酬必須與勞工的「正常」薪酬「相當」。由於飛行薪酬補貼（FPS）和離港津貼（TAFB）被排除在外，上訴人在休假期間的薪酬遠低於其正常工作時的薪酬 [SC/19/C-D]。
缺乏本土計算機制： 上訴人指出，2004年的規例並未提供計算假日「週薪」的具體方法。他們認為這是刻意遺漏，要求法院採用與歐洲指令一致的涵義 [SC/19/F-G]。上訴法院的判決摘要指出，勞工上訴審裁處接受了此論點 [SC/27/D]。

被上訴人（英國航空公司）的論點

被上訴人的主張是，在年假期間支付合約議定的基本工資，即已履行其法律義務。其最有力的論點如下：

規例的字面解釋： 被上訴人主張對「帶薪年假」進行字面解釋。飛行員在休假期間領取基本工資，因此他們正在領取「帶薪...休假」[SC/21/1]。
合約協議： 飛行員的僱傭條款與條件，包括假日薪酬，均受其工會 BALPA 協商達成的集體協議所約束 [SC/14/G]。被上訴人將主張，這些議定條款符合2004年規例的要求。
未明確要求「正常薪酬」： 被上訴人辯稱，2004年的規例並未明確規定假日薪酬必須包含所有薪酬組成部分。他們主張，如果國會有意納入如 FPS 和 TAFB 等補貼，必定會明確規定 [SC/21/1-2]。上訴法院對此表示同意，並指出「帶薪年假」的通常涵義並非以飛行員工作時所賺取的薪酬來衡量 [SC/27/F]。

...

Litigation Bundle Used: https://supremecourt.uk/uploads/non_portal_electronic_bundle_sample_44b4f99813.pdf

RAG Over Database of Legal Cases

2024, June

Github Code Snippets

Techniques:

Use cheap llm and long context llm (gemini flash) to summarize each case and add that short summary to metadata. Improves big picture context for retrieval (bm25 + vector search).
At the last stage of throwing retrieved data into llm for relevance analysis, don't chunk. Put the entire case into the context window. Gemini flash will be able to generate paragraph references reliably enough. For lawyers, paragraph references are essential.
Use a vector database which stores multiple vectors per document (vespa). Can take the highest average score of all the vectors for an overall score of the document. And can conveniently return the entire document instead of only a chunk.

Inevitable future directions:

Automate further rounds of search. Will be feasible as models get cheaper and exhibit slightly stronger agentic behaviour. It's already possible, but it will be expensive.
o1 reasoning style llm to synthesize results. Also currently too expensive what what you'll get.

Real-time Conversational Language Learning

2024, August

Techniques:

WebRTC, same technology for streaming video conferences, used to stream data packets quickly.
Pipeline is simply speech-to-text -> large language model -> text-to-speech. WebRTC makes it fast enough for conversation.
Can interrupt the bot by talking over it.
Real-time transcription of the conversation. Added pinyin for language learning.
Web search function call so llm has access to updated information.

Why not just use openai or gemini voice chat:

The models from big labs are their spokesmen for millions of people around the world, of all ages. They are necessarily boring, like a teacher giving a speech during assembly. Mainstream v underground; corpo v street kid; empire vs rebel alliance.
Rolling your own voice chat, you can use any llm, including open models which have been finetuned to be less politically correct. Can also customize the transcript.

Jarvis on esp32

2025, February

Techniques:

esp32, which only activates on the wakeword "Jarvis". It's not always recording.
Connects to your own websocket server. You can use any speech-to-text, large language model, text-to-speech you choose.
Websocket is slower than webrtc. For use in Asia, apis in the US tend to be slow.

Telegram Bot with memgpt Memory

2024, October

Techniques:

memgpt paper / letta library implements memory as follows: (1) self-editing the prompt (user section / agent section) for memories which should be frequently accessed; (2) writes long-term archival memory and all conversational history into (vector/)database; and (3) appropriate method of reading and writing memory after each user message handled by llm function calling.
For prototyping and testing ux, telegram bot free and can be implemented in 5 minutes.

Why not just use openai or gemini assistant:

Same as for convo bot above. Big lab models = boring.
As open models became even more reliable with function calling, more complex systems of memory management, like memgpt, can be implemented for production.

Legal Translation in Microsoft Word Plugin

2025, January

Observations:

For legal translation, one tool will not be enough. Deepl may not be smooth. An llm may not be accurate. There is no knowledge of much of the local legal jargon.
This combines: deepl for basics + noun extraction and legal glossary search + llm to consolidate.
Highlight and hotkey is a nice UI.

Voice Typing for Linux

2025, March

Github Code

Observations:

Voice typing services exist for macos and windows, but not linux (as of early 2025).
Runs whisper locally. It is fast and useful enough on cpu (base.en whsiper model).
Hotkey to start recording, hotkey to stop recording, automatically stored in clipboard and ctrl+v.
New trend, as of early 2025, of using voice typing to spam an ai with context / questions, e.g. "vibe coding".

Finetuning Llama 3.2 3b for Writing Style Conversion

2024, December Huggingface Model

Download and run in ollama (system prompt below):
ollama run hf.co/iach/judgegguf3

Observations:

Task: Convert legal English writing from someone with English as a second language to the style of an erudite English Judge (e.g. Lord Denning). It pays to sound posh.
Simple prompting, even frontier models, tended to make the writing overwrought, too flowery.
Sonnet 3.5 did will with prompting and few shot examples. But this would be expensive if used frequently.
Finetuning produced good results. Only a few hundred writing snippets required.
Llama 3.2 3b would run fast enough on CPU.
For production finetuning a chatgpt model or equivalent is recommended. Or at minimum a 7b local model.
In early 2025, Zed (IDE) shipped a finetuned qwen-2.5-coder 7b model for autocomplete. Small models aren't only toys.

Sample:

System prompt:

"You will be given a piece of legal writing by the user. You will output a piece of improved legal writing. Keep exactly the same meaning, but correct any errors and infelicities, and edit it to be in the exact style of Lord Denning (the renowned English Judge). ONLY output the improved legal writing in the style of Lord Denning, and nothing else. Do not give any explanation; do not add markdown or formatting; do not add new lines."

Original text (English as second language):

"The defendant does not have enough evidence, and his argument is not very good. He can't win because he cannot show that he and his business partner had an oral agreement for him not to pay his business partner until the PRC joint venture was profitable. It just seems unlikely, because this event is not certain. It may or may not happen. Many cases say inherent probabilities are relevant when thinking about credibility."

Llama 3.2 3b (not finetuned, too flowery):

"The defendant's case hinges on a crucial issue of credibility, which appears to be in tatters. The burden of proof rests squarely upon him to establish with reasonable certainty that an oral agreement existed between himself and his business partner, prohibiting the payment of dividends to the latter until the PRC joint venture commenced operations. Unfortunately for the defendant, this claim strains credulity on several counts, particularly given the inherent uncertainties surrounding the realisation of such a venture. The fact that numerous cases have endorsed the relevance of inherent probabilities in assessing credibility lends weight to the proposition that the defendant's case is, at best, highly speculative and unlikely to succeed."

Llama 3.2 3b (finetuned, closer to a "learned" legal style):

"The defendant has insufficient evidence and a somewhat unconvincing argument. He cannot succeed because he has failed to prove that he and his partner entered into an oral agreement for him not to pay his business partner until the P.R.C. joint venture was profitable. It seems improbable, because this event is in doubt. It may or it may not occur. Many cases have held that inherent probabilities are relevant when assessing credibility."

Projects List

Deep Research for Hong Kong Cases, Homemade

AI Agent for a Litigation Case Papers

Note on Arguments

Appellants' (Pilots) Arguments

Respondent's (British Airways) Arguments

法律論點備忘錄

上訴人（飛行員）的論點

被上訴人（英國航空公司）的論點

RAG Over Database of Legal Cases

Real-time Conversational Language Learning

Jarvis on esp32

Telegram Bot with memgpt Memory

Legal Translation in Microsoft Word Plugin

Voice Typing for Linux

Finetuning Llama 3.2 3b for Writing Style Conversion