Information Extraction of Legal Case Features with Lists and Rules

In this post, we show how legal case features can be annotated using lists and rules in GATE. By features, we mean a range of detailed information that may be relevant to searching for cases or extracting information such as the parties, the other legal professionals involved (judges, lawyers, etc), location, decision, case citation, legislation, and so on. In a forthcoming related post, we discuss how to use an ontology to annotate cases. We have some background discussion of case based reasoning Information Extraction of Legal Case Factors. (See introductory notes on this and related posts.)

Features of cases

Legal cases contain a wealth of detailed information such as:

  • Case citation.
  • Names of parties.
  • Roles of parties, meaning plaintiff or defendant.
  • Sort of court.
  • Names of judges.
  • Names of attorneys.
  • Roles of attorneys, meaning the side they represent.
  • Final decision.
  • Cases cited.
  • Relation of precedents to current case.
  • Case structural features such as sections.
  • Nature of the case, meaning using keywords to classify the case in terms of subject (e.g. criminal assault, intellectual property, ….)

With respect to these features, one would want to make a range of queries (using some appropriate query language).

  • In what cases has company X been a defendant?
  • In what cases has attorney Y worked for company X, where X was a defendant?
  • What are the final decisions for judge Z?
  • If the case concerns criminal assault, was a weapon used?

We initially based our work on Bransford-Koons Ph.D. Thesis 2005, commenting on, adapting, and adding to it. We used cases from California Criminal Courts which were used in that work since the lists and rules are highly specific.


We have the following sample outputs from our lists and rules applied to People v. Coleman, 117 Cal App. 2d 565. In the first figure, we find the address, court district, citation, case name, counsels for each side, and the roles. There are aspects which need to be further cleaned up, but this gives a flavour of the annotations.

Case Features I

In the second figure, we focus on additional information such as structural sections (e.g. Opinion), the name of the judge, and terms having a bearing on criminal assault and weapons.

Case Features II

In the final figure, we identify the decision.

Case Features III


In the archive, we have the application, lists, JAPE rules, and graphics. The lists.def file in this archive are associated with the various other lists. The JAPE rules may have different names from what is found in the application and discussed below, but (so far as we understand), this should make no difference in the functionality.


Gazetteer lists which were used are the following; these are lists contained in a master list labelled DSAGaz. We samples and comment below.

  • lists.def. The gazetteer list which contains the lists below. When importing this along with the standard ANNIE list, this list is renamed in the application.
  • attack_words.lst. Actions that can be construed as attacks such as hit, hitting, throw, thrown, threw,….
  • intention.lst. Terms for intention such as intend, intends, intending,…, expect, expects,….
  • judgements.lst. Terms related to judgment such as granted, denied, reversed, overturned, remanded,….
  • judgeindicator.lst. The indicator J.. This is a problematic indicator if it is part of an individual’s name.
  • criminal_assault.lst. Terms related to assault such as assault, violent injury, ability,…. It is unclear just how cohesive this set of terms is.
  • legal_appellate_districts.lst. A list of appellate districts such as Fifth Appellate District, Fifth Dist.,….
  • legal_casenames.lst. Terms that can be used to indicate case names such as v., In Re,
  • legal_counselnames.lst. Terms for counselor titles such as Attorney General, Deputy Public Defender,….
  • legal_general.lst. Terms for footnotes or numbering sections such as fn., No.,….
  • legal_opinion_sections.lst. Terms for sections of legal opinion such as concurring, counsel, dissenting, opinion,….
  • legal_coa.lst. Terms for causes of action such as aggravated assault, assault, breaking and entering, burglary, robbery,….
  • legal_code_citations.lst. Code citation information such as Civ. Code, Penal Code,….
  • us_district_abb_01.lst. Abbreviations for legal districts such as Cal., P., Wis.,….
  • us_context_abb_01.lst. Abbreviations for participant roles such as App., Rptr,….
  • legal_citations.lst. Abbreviations for citations and related to districts such as Cal.2d, Cal.App. 3d,….
  • legal_parties.lst. Terms for legal roles such as amicus curie, appellant, appellee, counsel, defendant, plaintiff, victim, witness,….
  • lower_courts.lst. Phrases for other courts such as Municipal Court of, Superior Court of,….
  • possible_weapons.lst. A list of items that could be weapons such as automobile, bat, belt,….
  • weapons.lst. A list of items that are weapons such as assault rifle, axe, club, fist, gun,….

Discussion of Lists

We used some of the lists directly from Bransford-Koons 2005, but they are clearly in need of reconstruction and extension. A general problem is that the lists are defined for US case law and particularly the California district courts. Thus, we cannot simply apply the lists to different jurisdictions, e.g. the United Kingdom; the lists and rules must be relativised to different contexts. More technically, lists have alternative graphical (capital or lower case) or morphological forms, which would be better addressed using a Flexible Gazetteer. In addition, it is unclear how one could bound the range of relevant terms appropriately and give them interpretations that are relevant to the context; in general, a lexicon or ontology could give us a better list of terms, but we must find some means to construe them as need be in the legal context. For example, we have a range of attack action terms such as hit, hitting, throw, thrown, threw,….; in some contexts these actions need not be construed as attack, e.g. baseball. Some means needs to be found to ascribe the appropriate interpretation in context. A related issue is whether we must list all alternative forms of some terms (also taking into consideration spaces) or whether we can better write JAPE rules; this is relevant for the list of appellate districts, where we find both abbreviations and alternative elements of information as in Fifth Appellate District, Fifth Appellate District Div 1, and Fifth Appellate District, Division 1. Along these lines, we would prefer a systematic means to relate abbreviations to the terms they abbreviate. In our view, more general solutions are better than specific ones which list information; lists ought to be contain arbitrary information, while JAPE rules construct systematic information.

JAPE Rules

Given the lists, we have JAPE rules to annotate the relevant portions of text.

  • AppellantCounsel: annotates the appellant counsel.
  • RespondentCounsel: annotates the respondent counsel.
  • DSACounsellor: annotates counsels.
  • SectionsTerm: annotates sections relative to the list of section terms.
  • CaseRoleGeneral
  • DSACaseName2: annotates the case name.
  • DSACaseName: annotates the case name.
  • DSACaseCit: annotates the case citation.
  • CriminalAssault: annotates terms for criminal assault.
  • CauseOfAction: annotates for causes of action.
  • AttackTerm: annotates attack terms.
  • AppellateDistrict: annotates districts of courts.
  • DecisionStatement: annotates a sentence as the decision statement.
  • JudgementTerm: annotates terms related to judgement.
  • JudgeName: annotates the names of judges.
  • JudgeInd: annotates the judge name indicator.
  • IntentTerm: annotates terms of intent.


Some of these rules annotate sentences, while others annotate entities with respect to some property. Some of the rules don’t work quite as well as we would wish and could stand further refinement such as the rule for the roles of counsels; the solution we have is rather ad hoc. Nonetheless, as a first pass, the lists and rules give some indication of what is possible.

Order of application

  • Document Reset PR
  • RegexSentenceSplitter
  • ANNIE English Tokeniser
  • ANNIE POS Tagger
  • MorphologicalAnalyzer
  • DSAGaz
  • AnnieGaz
  • Flexible Gazetteer
  • NPChunker
  • ANNIE NE Transducer
  • IntentTerm
  • JudgeInd
  • JudgeName
  • JudgementTerm
  • DecisionStatement
  • Weapons
  • AppellateDistrict
  • AttackTerm
  • CauseOfAction
  • CriminalAssault
  • DSACaseCit
  • DSACaseName
  • DSACaseName2
  • DSACaseNameAZW
  • CaseRoleGeneral
  • SectionsTerm
  • DSACounsellor
  • RespondentCounsel
  • AppellantCounsel


Despite the limitations, this gives some useful, preliminary results which can easily be built upon. Moreover, we know of no other public, open system of annotating case elements (or factors).

By Adam Wyner
Distributed under the Creative Commons
Attribution-Non-Commercial-Share Alike 2.0

One Response to “Information Extraction of Legal Case Features with Lists and Rules”

  1. [...] system can also be used to annotate more specifically to elements of interest. In one study, we annotated legal cases from a case base (a corpus of cases) in order to identify a range of [...]