Archive for the ‘text mining’ Category

Semantic Processing of Legal Texts Workshop

Wednesday, April 21st, 2010

In this post you will find information on the Semantic Processing of Legal Texts workshop, held in conjunction with the Language Resources and Evaluation Conference. Below please find a link to the conference, information on the workshop, and a program for the conference.

LREC
Language Resources and Evaluation Conference, May 17-23, Malta.

LREC 2010 Workshop on
SEMANTIC PROCESSING OF LEGAL TEXTS (SPLeT-2010)

23 May 2010, Malta

Workshop Description
The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.

Over the last two years, a number of dedicated workshops and tutorials specifically focusing on different aspects of semantic processing of legal texts has demonstrated the current interest in research on Artificial Intelligence and Law in combination with Language Resources (LR) and Human Language Technologies (HLT). The LREC 2008 Workshop on “Semantic processing of legal texts” was held in Marrakech, Morocco, on the 27th of May 2008. The JURIX 2008 Workshop on “the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation (NaLEA)”, which focused on recent advances in natural language engineering and legal argumentation. The ICAIL 2009 Workshops “LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts” and “NALEA’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation”, the former focusing on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, the latter tackling issues related to legal argumentation and linguistic technologies.
To continue this momentum, a 3rd Workshop on “Semantic Processing of Legal Texts” is being organised at the LREC conference to bring to the attention of the broader LR/HLT community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.

The main goals of the workshop are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal LRs and HLTs and their applications.

Areas of Interest
The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies. Papers are on, but not limited to, the following topics:

  • Building legal resources: terminologies, ontologies, corpora
  • Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.
  • Information retrieval and extraction from legal texts
  • Semantic annotation of legal texts
  • Legal text processing
  • Multilingual aspects of legal text semantic processing
  • Legal thesauri mapping
  • Automatic Classification of legal documents
  • Logical analysis of legal language
  • Automated parsing and translation of natural language arguments into a logical formalism
  • Linguistically-oriented XML mark up of legal arguments
  • Dialogue protocols for argumentation
  • Legal argument ontology
  • Computational theories of argumentation that are suitable to natural language
  • Controlled language systems for law
  • Workshop Chairs

  • Enrico Francesconi (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Simonetta Montemagni (Istituto di Linguistica Computazionale of CNR, Pisa, Italy)
  • Wim Peters (Natural Language Processing Research Group, University of Sheffield, UK)
  • Adam Wyner (Department of Computer Science, University College London, UK)
  • Program Committee

  • Johan Bos (University of Rome, Italy)
  • Danièle Bourcier (Humboldt Universität, Berlin, Germany)
  • Thomas R. Bruce (Cornell Law School, Ithaca, NY, USA)
  • Pompeu Casanovas (Institut de Dret i Tecnologia, UAB, Barcelona, Spain)
  • Alessandro Lenci (Dipartimento di Linguistica, Università di Pisa, Pisa, Italy)
  • Leonardo Lesmo (Dipartimento di Informatica, Università di Torino, Torino, Italy)
  • Raquel Mochales Palau (Catholic University of Leuven, Belgium)
  • Paulo Quaresma (Universidade de Évora, Portugal)
  • Erich Schweighofer (Universität Wien, Rechtswissenschaftliche Fakultät, Wien, Austria)
  • Manfred Stede (University of Potsdam, Germany)
  • Daniela Tiscornia (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
  • Tom van Engers (Leibniz Center for Law, University of Amsterdam, Netherlands)
  • Stephan Walter (Euroscript, Luxembourg S.a.r.l.)
  • Radboud Winkels (Leibniz Center for Law, University of Amsterdam, Netherlands)
  • Program

  • 14:30-14:45 Welcome and introduction
  • 14:45-15:10
    A Description Language for Content Zones of German Court Decisions
    Florian Kuhn
  • 15:10-15:35
    Controlling the language of statutes and regulations for semantic processing
    Stefan Hoefler and Alexandra Bünzli
  • 15:35-16:00
    Named entity recognition in the legal domain for ontology population
    Mírian Bruckschen, Caio Northfleet, Douglas da Silva, Paulo Bridi, Roger Granada, Renata Vieira, Prasad Rao and Tomas Sander
  • 16:00-16:30
    Coffee break

  • 16:30-16:55
    Legal Claim Identification: Information Extraction with Hierarchically Labeled Data
    Mihai Surdeanu, Ramesh Nallapati and Christopher Manning
  • 16:55-17:20
    On the Extraction of Decisions and Contributions from Summaries of French Legal IT Contract Cases
    Manuel Maarek
  • 17:20-17:45
    Towards Annotating and Extracting Textual Legal Case Factors
    Adam Wyner and Wim Peters
  • 17:45-18:10
    Legal Rules Learning based on a Semantic Model for Legislation
    Enrico Francesconi
  • Information Extraction of Legal Case Features with Lists and Rules

    Thursday, January 21st, 2010

    In this post, we show how legal case features can be annotated using lists and rules in GATE. By features, we mean a range of detailed information that may be relevant to searching for cases or extracting information such as the parties, the other legal professionals involved (judges, lawyers, etc), location, decision, case citation, legislation, and so on. In a forthcoming related post, we discuss how to use an ontology to annotate cases. We have some background discussion of case based reasoning Information Extraction of Legal Case Factors. (See introductory notes on this and related posts.)

    Features of cases

    Legal cases contain a wealth of detailed information such as:

    • Case citation.
    • Names of parties.
    • Roles of parties, meaning plaintiff or defendant.
    • Sort of court.
    • Names of judges.
    • Names of attorneys.
    • Roles of attorneys, meaning the side they represent.
    • Final decision.
    • Cases cited.
    • Relation of precedents to current case.
    • Case structural features such as sections.
    • Nature of the case, meaning using keywords to classify the case in terms of subject (e.g. criminal assault, intellectual property, ….)

    With respect to these features, one would want to make a range of queries (using some appropriate query language).

    • In what cases has company X been a defendant?
    • In what cases has attorney Y worked for company X, where X was a defendant?
    • What are the final decisions for judge Z?
    • If the case concerns criminal assault, was a weapon used?

    We initially based our work on Bransford-Koons Ph.D. Thesis 2005, commenting on, adapting, and adding to it. We used cases from California Criminal Courts which were used in that work since the lists and rules are highly specific.

    Output

    We have the following sample outputs from our lists and rules applied to People v. Coleman, 117 Cal App. 2d 565. In the first figure, we find the address, court district, citation, case name, counsels for each side, and the roles. There are aspects which need to be further cleaned up, but this gives a flavour of the annotations.

    Case Features I

    In the second figure, we focus on additional information such as structural sections (e.g. Opinion), the name of the judge, and terms having a bearing on criminal assault and weapons.

    Case Features II

    In the final figure, we identify the decision.

    Case Features III

    GATE

    In the archive, we have the application, lists, JAPE rules, and graphics. The lists.def file in this archive are associated with the various other lists. The JAPE rules may have different names from what is found in the application and discussed below, but (so far as we understand), this should make no difference in the functionality.

    Lists

    Gazetteer lists which were used are the following; these are lists contained in a master list labelled DSAGaz. We samples and comment below.

    • lists.def. The gazetteer list which contains the lists below. When importing this along with the standard ANNIE list, this list is renamed in the application.
    • attack_words.lst. Actions that can be construed as attacks such as hit, hitting, throw, thrown, threw,….
    • intention.lst. Terms for intention such as intend, intends, intending,…, expect, expects,….
    • judgements.lst. Terms related to judgment such as granted, denied, reversed, overturned, remanded,….
    • judgeindicator.lst. The indicator J.. This is a problematic indicator if it is part of an individual’s name.
    • criminal_assault.lst. Terms related to assault such as assault, violent injury, ability,…. It is unclear just how cohesive this set of terms is.
    • legal_appellate_districts.lst. A list of appellate districts such as Fifth Appellate District, Fifth Dist.,….
    • legal_casenames.lst. Terms that can be used to indicate case names such as v., In Re,
    • legal_counselnames.lst. Terms for counselor titles such as Attorney General, Deputy Public Defender,….
    • legal_general.lst. Terms for footnotes or numbering sections such as fn., No.,….
    • legal_opinion_sections.lst. Terms for sections of legal opinion such as concurring, counsel, dissenting, opinion,….
    • legal_coa.lst. Terms for causes of action such as aggravated assault, assault, breaking and entering, burglary, robbery,….
    • legal_code_citations.lst. Code citation information such as Civ. Code, Penal Code,….
    • us_district_abb_01.lst. Abbreviations for legal districts such as Cal., P., Wis.,….
    • us_context_abb_01.lst. Abbreviations for participant roles such as App., Rptr,….
    • legal_citations.lst. Abbreviations for citations and related to districts such as Cal.2d, Cal.App. 3d,….
    • legal_parties.lst. Terms for legal roles such as amicus curie, appellant, appellee, counsel, defendant, plaintiff, victim, witness,….
    • lower_courts.lst. Phrases for other courts such as Municipal Court of, Superior Court of,….
    • possible_weapons.lst. A list of items that could be weapons such as automobile, bat, belt,….
    • weapons.lst. A list of items that are weapons such as assault rifle, axe, club, fist, gun,….

    Discussion of Lists

    We used some of the lists directly from Bransford-Koons 2005, but they are clearly in need of reconstruction and extension. A general problem is that the lists are defined for US case law and particularly the California district courts. Thus, we cannot simply apply the lists to different jurisdictions, e.g. the United Kingdom; the lists and rules must be relativised to different contexts. More technically, lists have alternative graphical (capital or lower case) or morphological forms, which would be better addressed using a Flexible Gazetteer. In addition, it is unclear how one could bound the range of relevant terms appropriately and give them interpretations that are relevant to the context; in general, a lexicon or ontology could give us a better list of terms, but we must find some means to construe them as need be in the legal context. For example, we have a range of attack action terms such as hit, hitting, throw, thrown, threw,….; in some contexts these actions need not be construed as attack, e.g. baseball. Some means needs to be found to ascribe the appropriate interpretation in context. A related issue is whether we must list all alternative forms of some terms (also taking into consideration spaces) or whether we can better write JAPE rules; this is relevant for the list of appellate districts, where we find both abbreviations and alternative elements of information as in Fifth Appellate District, Fifth Appellate District Div 1, and Fifth Appellate District, Division 1. Along these lines, we would prefer a systematic means to relate abbreviations to the terms they abbreviate. In our view, more general solutions are better than specific ones which list information; lists ought to be contain arbitrary information, while JAPE rules construct systematic information.

    JAPE Rules

    Given the lists, we have JAPE rules to annotate the relevant portions of text.

    • AppellantCounsel: annotates the appellant counsel.
    • RespondentCounsel: annotates the respondent counsel.
    • DSACounsellor: annotates counsels.
    • SectionsTerm: annotates sections relative to the list of section terms.
    • CaseRoleGeneral
    • DSACaseName2: annotates the case name.
    • DSACaseName: annotates the case name.
    • DSACaseCit: annotates the case citation.
    • CriminalAssault: annotates terms for criminal assault.
    • CauseOfAction: annotates for causes of action.
    • AttackTerm: annotates attack terms.
    • AppellateDistrict: annotates districts of courts.
    • DecisionStatement: annotates a sentence as the decision statement.
    • JudgementTerm: annotates terms related to judgement.
    • JudgeName: annotates the names of judges.
    • JudgeInd: annotates the judge name indicator.
    • IntentTerm: annotates terms of intent.

    Discussion

    Some of these rules annotate sentences, while others annotate entities with respect to some property. Some of the rules don’t work quite as well as we would wish and could stand further refinement such as the rule for the roles of counsels; the solution we have is rather ad hoc. Nonetheless, as a first pass, the lists and rules give some indication of what is possible.

    Order of application

    • Document Reset PR
    • RegexSentenceSplitter
    • ANNIE English Tokeniser
    • ANNIE POS Tagger
    • MorphologicalAnalyzer
    • DSAGaz
    • AnnieGaz
    • Flexible Gazetteer
    • NPChunker
    • ANNIE NE Transducer
    • IntentTerm
    • JudgeInd
    • JudgeName
    • JudgementTerm
    • DecisionStatement
    • Weapons
    • AppellateDistrict
    • AttackTerm
    • CauseOfAction
    • CriminalAssault
    • DSACaseCit
    • DSACaseName
    • DSACaseName2
    • DSACaseNameAZW
    • CaseRoleGeneral
    • SectionsTerm
    • DSACounsellor
    • RespondentCounsel
    • AppellantCounsel

    Discussion

    Despite the limitations, this gives some useful, preliminary results which can easily be built upon. Moreover, we know of no other public, open system of annotating case elements (or factors).

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Information Extraction with ANNIC

    Wednesday, January 20th, 2010

    In Information Extraction of Legal Case Factors, we presented lists and rules for annotation of legal case factors. In this post, we go one step further and use the ANNotations In Context (ANNIC) tool of GATE. This is a plug which helps to search for annotations, visualise them, and inspect features. It is useful for JAPE rule development. We outline how to plug in, load, and run ANNIC. (See introductory notes on this and related posts.)

    Introduction to ANNIC

    ANNIC is an annotation indexing and retrieval system. It is integrated with the data stores, where results of annotations on a corpus can be saved. Once a processing pipeline is run over the corpus, we can use ANNIC to query and inspect the contexts where annotations appear; the queries are in a subset of the JAPE language, so can be complex. The results of the queries are presented graphically, making them easy to understand. As such, ANNIC is a very useful tool in the development of rules as one can discover and test patterns in corpora. There is also an export facility, so the results can be presented in a file, but this is not a full information extraction system such as one might want with templates.

    For later, but important to know from the documentation: “Be warned that only the annotation sets, types and features initially indexed will be updated when adding/removing documents to the datastore. This means, for example, that if you add a new annotation type in one of the indexed document, it will not appear in the results when searching for it.” This implies that where one adds new annotations to the pipeline, one should delete the old data store and create another one with respect to the new results. For example, if one ran the pipeline without POS, one cannot add POS later and inspect it in the pipeline.}

    Further details on ANNIC are available at GATE documentation on ANNIC and there is an online video.

    Instantiating the serial data store

    The following steps are used to create the requisite parts and inspect them with ANNIC. One starts with an empty GATE, then adds processing resources, language resources, and pipelines since these can all be related to the data store in a later step. This material is adapted or adopted from the GATE ANNIC documentation, cutting out many of the options. To instantiate a serial data store (SSD), which is how the annotated documents are saved and searched. The application, lists, and rules that this example uses is from Information Extraction of Legal Case Factors.

    • RC on Datastores > Create datastore.
    • From the drop-down list select “Lucene Based Searchable DataStore”.
    • At the input window, provide the following parameters:
      • DataStore URL: Select an empty folder where the data store is created.
      • Index Location: Select an empty folder. This is where the index will be created.
      • Annotation Sets: Provide the annotation sets that you wish to include or exclude from being indexed. There are options here, but we want to index all the annotation sets in all the documents, so make this list empty.
      • Base-Token Type: These are the basic tokens of any document (e.g. Token) which your documents must in order to get indexed.
      • Index Unit Type: This specifies the unit of Index (e.g. Sentence). In other words, annotations lying within the boundaries of the annotations are indexed (e.g. in the case of Sentence, no annotations that are spanned across the boundaries of two sentences are considered for indexing). We use the Sentence unit.
      • Features: Users can specify the annotation types and features that should be included or excluded from being indexed (e.g. exclude SpaceToken, Split, or Person.matches).
    • Click OK. If all parameters are OK, a new empty searchable SSD will be created.
    • Create an empty corpus and save it to the SSD.
    • Populate the corpus with some documents. Each document in the corpus is automatically indexed and saved to the data store.
    • Load some processing resources and then a pipeline. Run the pipeline over the corpus.
    • Once the pipeline has finished (and there are no errors), save the corpus in the SSD by right clicking on the corpus, then “Save to its datastore”.
    • Double click on the SSD file under Datastores. Click on the “Lucene DataStore Searcher” tab to activate the search GUI.
    • Now you are ready to specify a search query of your annotated documents in the SSD.

    Output

    The GUI opens with parts as shown in the following two figures:

    ANNIC search for

    ANNIC search for disclosure concept

    Working with the GUI

    The figures above show three main sections. In the top section, left section, there is a blank text area in which one can write a query (more on this below); the search query returns the “content” of the annotations. There are options to select a corpus, annotation set, the number of results, the size of the context (e.g. the number of tokens to the left and right of what one searches for). In the central section, one can see a visualisation of annotations and values given the search query. In the bottom section, one has a list of the matches to the query across the corpus, giving the left and right contexts relative to the search results. An annotation rows manager lets one add (green plus sign) or remove (red minus sign) annotation types and features to display in the central section. The bottom section contains the results table of the query, i.e. the text that matches the query with their left and right contexts. The bottom section also contains tabbed panes of statistics such as how many instances of particular annotation appear.

    Queries

    The queries written in the blank text area are a subset of the JAPE patterns and use the annotations used in the pipeline. Queries are activated by hitting ENTER (or the Search icon). The following are some template patterns that can be used Below we give a few examples of JAPE pattern clauses which can be used as SSD queries.

    • String
    • {AnnotationType}
    • {AnnotationType == String}
    • {AnnotationType.feature == feature value}
    • {AnnotationType1, AnnotationType2.feature == featureValue}
    • {AnnotationType1.feature == featureValue,
      AnnotationType2.feature == featureValue}

    Specific queries are:

    • Trandes — returns all occurrences of the string where it appears in the corpus.
    • {Person} — returns annotations of type Person.
    • {Token.string == “Microsoft”} — returns all occurrences of “Microsoft”.
    • {Person}({Token})*2{Organization} — returns Person followed by zero or up to two tokens followed by Organization.
    • {Token.orth==”upperInitial”, Organization} — returns Token with feature orth with value set to “upperInitial” and which is also annotated as Organization.
    • {Token.string==”Trandes”}{Token})*10{Secret} — returns string “Trandes” followed by zero to ten tokens followed by Secret.
    • {Token.string ==”not”}({Token})*4{Secret} — returns the string “not”, followed by 4 or less tokens, followed by something annotated with Secret.

    An example of a result for the last query is:

    Trandes averred nothing more than that it possessed secret.

    In ANNIC, the result of the query appears as:

    ANNIC search for negation and disclosure concept

    One can write queries using the JAPE operators: | (OR operator), +, and *. ({A})+n means one and up to n occurrences of annotation {A}, and ({A})*n means zero or up to n occurrences of annotation {A}.

    Summary

    ANNIC is particularly useful in writing and refining one’s JAPE rules. Finally, one’s results can be exported at HTML files.

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Information Extraction of Legal Case Factors

    Wednesday, January 20th, 2010

    This post reports initial steps in legal case factor annotation. We first give a very brief and highly simplified overview of case based reasoning using case factors, then present how case factors can be identified using text mining. (See introductory notes on this and related posts.)

    Case based reasoning background

    In Common Law legal systems such as in the USA and UK, judges make decisions concerning a case; we can say the judges make the law. This is in contrast to Civil Law legal systems as in Europe (excluding the UK) or elsewhere in which legislatures make law and which must be followed by judges. Neither legal system is common law or civil law in practice: the USA and UK have laws made by legislatures; in Europe, the application of legislative acts in particular circumstances (refining the law to apply to the facts) takes on aspects of common law.

    In a Common Law system, judges and lawyers argue using case based reasoning: a current undecided case with respect to precedent cases, which are cases that have already been decided by a court and are accepted as “good law”. In essence, if the current case were exactly like a particular precedent case in all essential ways, then the current case ought to be decided as was the precedent case. Where the current case varies, one must argue comparatively with respect to other precedents. Among the ways in which cases are compared and contrasted, we find the case factors, where factors are prototypical fact patterns of a case. In virtue of the facts of a case and along with the applicable laws and precedents, a judge decides a case. It is, therefore, crucial to be able to identify the facts of a case in order to compare and contrast the cases.

    In AI and Law, case based reasoning has a long and well developed history and literature (see the work of Hafner, Rissland, Ashley, and Bench-Capon among others. We make specific reference to Aleven’s 1997 Ph.D. Thesis. Given an analysis of cases in terms of factors, one can reason about how a current undecided case should, according to the precedents, be decided. However, a central problem is the knowledge bottleneck — how to analyse cases in terms of factors. By an large, this has been a manual labour. In the CATO database of cases discussed in Aleven 1997 (about 140 cases concerning intellectual property), the factors are manually annotated. There has been some effort to automate textual identification of factors in cases (see Bruninghaus and Ashley, but this is done with case summaries, not “actual” cases; moreover, the database, annotation, and other system supports are unavailable, so the results of their experiments are not independently verifiable and cannot be developed by other researchers.

    Factors in text

    In the CATO system, texts of case decisions are presented to the student along with a menu of factors; the student associates the factors with the text, in effect, annotating the case as a whole with the factors, but not the linguistic aspects which gave rise to the annotation. The factors are not extracted. The CATO system has other components to support case based argumentation, but these are not relevant to our discussion at this point.

    Factors are legal concepts that range over facts. While Aleven 1997 has 27 factors and a factor hierarchy, we only look at two factors in order to give a flavour of our approach.

    • Security-Measures
      • Description: The plaintiff took efforts to maintain the secrecy of its information.
      • The factor applies if: The plaintiff limited access to and distribution of information. Examples: nondisclosure agreements, notification that the information is confidential, securing the information with passwords and secure storage facilities, secure document distribution systems, etc.
    • Secrets-Disclosed-Outsiders
      • Description: The information was disclosed to outsiders or was in the public domain. The plaintiff either did not have secret information or did not have an interest in maintaining the secrecy of information.
      • The factor applies if: The plaintiff disclosed the product information to licensees, customers, subcontractors, etc.
      • The factor does not apply if: Plaintiff published the information in a public forum. All we know is that plaintiff marketed a product from which the information could be ascretained by reverse engineering.

    Aleven 1997 illustrates the association of factors with textual passages in a case.

    Mason v. Jack Daniels Distillery

    Given the factor description, we make lists and rules which at least highlight candidate passages in the case which might be relevant.

    Output

    The results of annotating terms and sentences appears in:

    Annotations for Secret and Disclose Terms in Trandes v. Atkinson

    Annotations for Secret and Disclose Sentences in Trandes v. Atkinson

    Note that the disclosure sentence seems to be a reasonable candidate about the disclosure factor, but the secrecy sentence is a discussion about the factor rather than a presentation of the factor itself. As we have said, at this point we provide candidate expressions for the factors; further work must be done to more accurately automatically annotate the text.

    GATE

    The lists, JAPE rules, graphics, and application state are in the archive. See the related post Information Extraction with ANNIC which uses a GATE plugin to further analyse the results so they can be improved.

    Lists

    To highlight the relevant passages, we created Lookup lists and then JAPE rules. To create the Lookups, we turned to disclosure and secret in WordNet, taking the SynSets of each, as well as looking at hypernyms (superordinate terms). Making a selection, we created lists using the infinitival, lower case form. This gave us two lists — disclosure.lst and secret.lst.

    • disclosure.lst: announce, betray, break, bring out, communicate, confide, disclose, discover, divulge, expose, give away, impart, inform, leak, let on, let out, make known, pass on, reveal, tell
    • secret.lst: confidential, confidentiality, hidden, private, secrecy, secret

    In the gazetteer itself, disclosure.lst has a majorType disclose, and secret.lst has a majorType secret. With these lists, we homogenize the alternative words for these concepts. It is importantly that these particular lists are integrated into a lists.def file; in our example, this is ListGaz, but is not included in the distribution. As the application uses the Flexible Gazetteer (not discussed here), we can Lookup alternative morphological forms of words in the lists.

    JAPE rules

    Then we write JAPE rules so we can more easily identify them. The first rules make the majorType into an annotation for the annotation set, highlighting any occurrence of the terms; we could have skipped this, but it is worthwhile to see where and how the terms appear. The second rules classify sentences as relating to disclosure and secrecy.

    • SecretFactor01.jape: Annotates any word from the secret.lst.
    • DisclosureFactor01.jape: Annotates any word from the disclosure.lst.
    • SecretFactorSentence01.jape: Annotates any sentence which contains an annotation Secret.
    • DisclosureFactorSentence01.jape: Annotates any sentence which contains an annotation Disclosure.

    Application order

    The order of application of the processing resources is:

    • Document Reset PR
    • ANNIE Sentence Splitter
    • ANNIE English Tokeniser
    • ListGaz
    • SecretFactor01.jape
    • DisclosureFactorSentence01.jape
    • SecretFactorSentence01.jape
    • DisclosureFactorSentence01.jape

    Discussion

    As we have already pointed out, the annotations highlight potentially relevant passages. Further refinement is needed. This would be clearer were one to look at more applications of the annotation. It will also be important to consider more factors on more cases and across more domains of case law.

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Information Extraction of Conditional Rules

    Wednesday, January 20th, 2010

    In this post, we extract conditional rules, such as If it rains, then the sidewalk is wet both in simple examples and from a sample fragment of legislation. (See introductory notes on this and related posts.)

    Sample legislation

    In legislation (and elsewhere in the law), conditional statements of the form If P, then Q are used. A well-researched example in AI and Law is the UK Nationality Act. In this post, we provide some initial JAPE rules to annotate conditional statements.

    We work with a several variants of simple conditional statements and a (modified) conditional statement from the UK Nationality Act. For each statement, we want to annotate them as rules as well as to identify the portions of the rule.

      If Bill is happy, then Jill is happy.

      Jill is happy, if Bill is happy.

      Jill is happy if:

          1) Bill is happy;
          2) Bill and Jill are together.

      Acquisition by birth or adoption

          (1) A person born in the United Kingdom after commencement shall be a British citizen if –
          (a) at the time of the birth his father or mother is a British citizen; or
          (b) at the time of the birth his father or mother is settled in the United Kingdom.

    Output

    What we want to get is not only do we have a sentence which we have identified as being a rule, but that we can also identify the parts of the rule, namely the antecedent and the consequent. This may be useful for further processing.

    The results appear in a graphic as:

    Rule Output

    Below, we discuss some of the problems with annotating the legislative rule.

    GATE

    In the zip file we have the application state, text, graphic, and JAPE rules.

    Lists

    There are no particular lists for this section; we used the same lists from the rulebook development.

    JAPE Rules

    We have a cascade of rules as follows.

    • AntecedentInd01: finds the token “if” in the text. We use this as an indicator that the sentence is or may be a rule. We may have a range of such rules that we take to indicate a rule. We can use them to examine results from a body of texts, refining what is identified as a rule and how. Overgenerate, then prune. After we are clear about the results from individual rules, we can gather the annotations together under another annotation, which generalises the result.
    • AntecedentInd02: finds the conditional indicator inside a sentence and annotates the resulting sentence as a rule with a conditional. A general rule like this can be used as we refine the indicators of rule. It also is an example of sentence annotation with respect to properties contained in the sentence.
    • ConditionalParts01: finds the string between if and some punctuation, then labels it antecedent. This labels Bill is happy as antecedent in simple sentences such as If Bill is happy, then Jill is happy and Jill is happy, if Bill is happy. It does not work for the list.
    • ConditionalParts02: finds the string between a preceding sentence and a comma followed by a conditional indicator, then labels it consequent. This labels Jill is happy as consequent in simple sentences such as Jill is happy, if Bill is happy.
    • ConditionalParts03: finds the string between then and the end of the sentence, labelling it consequent. This labels Jill is happy as consequent in simple sentences such as If Bill is happy, then Jill is happy.
    • ConditionalParts04: find the string between a preceding sentence and a conditional indicateor followed by a colon, then labels it consequent. This labels Jill is happy as consequent in constructions where the antecedents are presented in a list such as Jill is happy if: Bill is happy and Jill and Bill are together.
    • ConditionalParts05: finds the strings between list indicators (see the section on legislative presentation) and some punctuation (semi-colon or period), and labels them as antecedents. This labels Bill is happy as antecedent in Jill is happy if: Bill is happy and Jill and Bill are together.
    • ConditionalSentenceClass: annotates sentences as conditionals if they contain a conditional indicator.

    Application order

    The order of application of the processing resources is:

    • Document Reset PR
    • ANNIE English Tokeniser
    • ANNIE Sentence Splitter
    • ListFlagLevel1
    • AntecedentInd01
    • ConditionalParts01
    • ConditionalParts02
    • ConditionalParts03
    • ConditionalParts04
    • ConditionalParts05
    • ConditionalSentenceClass

    Comments

    While our application clearly works well for the simple samples of conditional statements, it does not do well with respect to our sample legislation. There are a range of problems: list recognition “(x)”, use of “;” , use of “–”, and use of “or”. Most of these have to do with refining the notions of lists that we inherited from the rulebook example, so we need to refine the rules to the particular context of use. We leave this as an exercise.

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Introduction to a Series of Posts on Legal Information Extraction with GATE

    Wednesday, January 20th, 2010

    This post has notes on and links to several other posts about legal information annotation and extraction using the General Architecture for Text Engineering system (GATE). The information in the posts was presented at my tutorial at JURIX 2009, Rotterdam, The Netherlands; the slides are available here. See the GATE website or my slides for introductory material about NLP and text annotation. For particulars about NLP and legal resources, see the posts and files at the links below.

    The Posts

    The following posts discuss different aspects of legal information extraction using GATE (live links indicate live posts):

    Prototypes

    The samples presented in the posts are prototypes only. No doubt there are other ways to accomplish similar tasks, the material is not as streamlined or cleanly presented as it could be, and each section is but a very small fragment of a much larger problem. In addition, there are better ways to present the lists and rules “in one piece”; however, during development and for discussion, it seems more helpful to have elements separate. Nonetheless, as a proof of concept, the samples make their point.

    If there are any problems, contact Adam Wyner at adam@wyner.info.

    Files

    The posts are intended to be self-contained and to work with GATE 5.0. The archive files include the .xgapp file, which is a saved application state, along with text/corpus, the lists, and JAPE rules needed to run the application. In addition, the archive files include any graph outputs as reference. As noted, one may need to ‘fiddle’ a bit with the gazetteer lists in the current version.

    Graphics

    Graphics in the posts can be viewed in a larger and clearer size by right clicking on the graphic and selecting View Image. The Back button on your browser will close the image and return you to the post.

    License

    The materials are released under the following license:

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    If you want to commercially exploit the material, you must seek a separate license with me. That said, I look forward to further open development on these materials; see my post on Open Source Legal Information.

    Using XSLT to Re-represent GATE Output

    Wednesday, January 20th, 2010

    Once one has processed some documents with GATE, what can one do with the result? After all, information extraction implies that the information is extracted, not simply annotated. (See introductory notes on this and related posts.)

    There are several paths. One is to use Annotations in Context (ANNIC), which searches for and returns a display of annotated elements; we discuss how to use ANNIC in a separate post. However, this does not appear to support an “export” function to further process the results. Another path is to export the document with inline annotations; this, with a bit of further manual work, can then be processed further with EXtensible Stylesheet Language Transformations — XSLT. There are other approaches (e.g. XQUERY), but this post provides an example of using XSLT to present output as a rule book.

    In Legislative Rule Extraction, we annotated some legislation. We carry on with the annotated legislation.

    Output of GATE

    In addition to the graphic output from GATE’s application, we can output the results of the annotation either inline or offset. As we are interested to provide alternative presentations of the annotated material, we look at the inline annotation.

    In GATE, by right clicking on the document file (after applying the application to it) and choose “Save preserving document format’”. For out sample text, the result is:

    <ArticleFlag> Article 1 </ArticleFlag>
    <SectionType> Subject matter </SectionType>
    <ListStateTop> This Directive lays down rules concerning the
    following </ListStateTop>:
    <ListFlagLevel1> 1) </ListFlagLevel1>
    <SubListStatementPrefinal> the taking-up and pursuit, within the Community,
    of the self-employed activities of direct insurance and
    reinsurance </SubListStatementPrefinal>;
    <ListFlagLevel1> 2) </ListFlagLevel1>
    <SubListStatementPrefinal> the supervision in the case of insurance and
    reinsurance groups </SubListStatementPrefinal >;
    <ListFlagLevel1> 3) </ListFlagLevel1>
    <SubListStatementFinal> the reorganisation and winding-up of direct
    insurance undertakings </SubListStatementFinal>.
    

    Legal XML

    The GATE output needs to be made into proper XML, having a root and being properly nested. As there will be several rules, each rule extracted should go between some legal XML annotation. There is an issue about how to save and process a full corpus, as the only options to save are XML or Datastore, but we leave this aside for the time being. For now, we ‘manually’ wrap our GATE output as below.

    I used the online XSLT editor at w3schools, which has nice online functionality which allows one to experiment and get results right away. In particular, one can cut and paste the XML rulebook (below) into the left hand pane and the XSLT code (below) into the right hand pane, hit the edit button, and get the transformed output. Caveat, one might have to do a bit of editing on the XML rulebook for spaces and returns since there are some bumps between what appears in WordPress and what is needed to run code.

    The XML Rulebook:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <rulebook>
    <rule>
    <ArticleFlag> Article 1 </ArticleFlag>
    <SectionType> Subject matter </SectionType>
    <ListStateTop> This Directive lays down rules concerning the
    following </ListStateTop>:
    <ListFlagLevel1> 1) </ListFlagLevel1>
    <SubListStatementPrefinal> the taking-up and pursuit, within the Community,
    of the self-employed activities of direct insurance and
    reinsurance </SubListStatementPrefinal>;
    <ListFlagLevel1> 2) </ListFlagLevel1>
    <SubListStatementPrefinal> the supervision in the case of insurance and
    reinsurance groups </SubListStatementPrefinal >;
    <ListFlagLevel1> 3) </ListFlagLevel1>
    <SubListStatementFinal> the reorganisation and winding-up of direct
    insurance undertakings </SubListStatementFinal>.
    </rule>
    </rulebook>
    

    The XSLT code:

    <?xml version="1.0" encoding="ISO-8859-1"?>
    <!-- Edited by XMLSpy® -->
    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
    <xsl:template match="/">
      <html>
      <body>
      <h3>My Rulebook</h3>
      <xsl:apply-templates/>
      </body>
      </html>
    </xsl:template>
    
    <xsl:template match="rule">
      <p>
        <xsl:apply-templates select="ArticleFlag"/>
        <xsl:apply-templates select="SectionType"/>
        <xsl:apply-templates select="ListStateTop"/>
        <xsl:apply-templates select="SubListStatementPrefinal"/>
        <xsl:apply-templates select="SubListStatementFinal"/>
      </p>
    </xsl:template>
    
    <xsl:template match="ArticleFlag">
      Reference Code: <span style="color:#ff0000">
      <xsl:value-of select="."/></span>
      <br />
    </xsl:template>
    
    <xsl:template match="SectionType">
      Title: <span style="color:#00ffff">
      <xsl:value-of select="."/></span>
      <br />
    </xsl:template>
    
    <xsl:template match="ListStateTop">
      Description: <span style="color:#00ff00">
      <xsl:value-of select="."/></span>
      <br />
    </xsl:template>
    
    <xsl:template match="SubListStatementPrefinal">
      Description: <span style="color:#00ff00">
      <xsl:value-of select="."/></span>
      <br />
    </xsl:template>
    
    <xsl:template match="SubListStatementFinal">
      Description: <span style="color:#00ff00">
      <xsl:value-of select="."/></span>
      <br />
    </xsl:template>
    
    </xsl:stylesheet>
    

    XSLT Output

    The result is the following:

    Output of XSLT on the XML Rulebook

    In general, one can create any number of rulebooks from the same underlying data, varying the layout and substance of the presentation. For example, we can change the colours or headers easily; we can present more or less information. This is a lot more powerful than the static book that now exists.

    Problems and Issues

    Our example is a simple illustration of what can be done. Note that we have not yet fulfilled the requirements from our initial post since we have not numbered the sections, but this can be added later.

    An important problem is that GATE annotations are not always in accordance with XML standards. In particular, XML markups must be strictly embedded as in

     <x> <y> </y> <z> </z> </x>

    There can be no crossover such as in

    <x> <y> <z> </y> </z> </x>

    though this may well occur for GATE annotations. There may be several approaches to this problem, but we leave that for future discussion.

    Another problem is that “Save preserving document format” only works with documents and not corpora, and we might want to work with corpora.

    Finally, XSLT is useful for transforming XSL files, not in extracting information from XML files, for which one would need something such as XQuery.

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Legislative Rule Extraction

    Wednesday, January 20th, 2010

    In this post, we discuss the annotation of information from legislation, for example, to create a rule book from legislation. There are two distinct tasks and two tools. First, we want to take the original legislation and annotate it; for this, we use GATE. Second, we want to transform the output of GATE, using the annotations, into some alternative, web-compatible format; for this, we use EXtensible Stylesheet Language Transformations (XSLT). This is presented in STUB. John Cyriac of compliancetrack outlined the problem that is addressed in these two posts. (See introductory notes on this and related posts.)

    Sample legislation and text

    The text we are working with is a sample from Insurance and Reinsurance (Solvency II) from the European Parliament.

    SUBJECT MATTER AND SCOPE
    Article 1
    Subject matter
    This Directive lays down rules concerning the following:
    1) the taking-up and pursuit, within the Community, of the self-employed activities of direct insurance and insurance;
    2) the supervision in the case of insurance and reinsurance groups;
    3) the reorganisation and winding-up of direct insurance undertakings.
    Article 2
    Scope
    1. This Directive shall apply to direct life and non-life insurance undertakings which are established in the territory of a Member State or which wish to become established there. It shall also apply to reinsurance undertakings, which conduct only reinsurance activities, and which are established in the territory of a Member State or which wish to become established there with the exception of Title IV.

    There are additional articles which we do not work with. The article is not a logical statement (an If, then statement), but identifies the matters which the directive is concerned with. Each statement of the article may be understood as a conjunct: the rules concern a, b, and c. However, this is not yet relevant to our analysis. See the separate post about rule extraction for conditionals.

    Target result

    We want to annotate the first article, picking out each section for extraction. In particular, for a practitioner to use the extraction, he should have it in a format which identifies the following:

    Reference Code: Article 1
    Title: Subject Matter
    Level: 1.0
    Description: This Directive lays down rules concerning the following:
    Level: 1.1
    Description: the taking-up and pursuit, within the Community, of the self-employed activities of direct insurance and reinsurance;
    Level: 1.2
    Description: the supervision in the case of insurance and reinsurance groups;
    Level: 1.3
    Description: the reorganisation and winding-up of direct insurance undertakings;

    Output

    The output of GATE appears in the following figure:

    Annotating the structure of legislative rules

    GATE

    To get this output, we used the files and application state in GATELegislativeRulebook.tar.gz.

    Text

    The text is a fragment of the legislation above and is found in the SmallRulebookText.tex file.

    Lists

    We use the following lists in addition to standard ANNIE lists, meaning that a lists.def file ought to incorporate the files. This is the resource ListGaz given in the .xgapp file (though this may require some additional fiddling and files to work).

    • roman_numerals_i-xx.lst: It has majorType = roman_numeral. This is a list of roman numbers from i to xx.
    • rulebooksectionlabel.lst: It has majorType = rulebooksection. This is a list of section headings such as: Subject matter, Scope, Statutory systems, Exclusion from scope due to size, Operations, Assistance, Mutual undertakings, Institutions, Operations and activities.

    The list of section headings is taken from the legislation, which presumably follows standard guidelines for section heading labels. For the list of roman numerals, there are more general methods using Regex to match well-formed numerals (see Roman Numerals in Python and Regex for Roman Numerals); however, for our purposes it is simpler to use limited lists rather than Regex. In either case, several problems arise, as we see later.

    JAPE rules

    • ListArticleSection.jape: What is annotated with Article (from the lookup) and a number is annotated ArticleFlag.
    • ListFlagLevel1.jape: The string number followed by a period of closed parenthesis is annotated ListFlagLevel1.
    • ListFlagLevel1sub.jape: A number followed by a letter followed by a period is annotated ListFlagLevel1sub.
    • ListFlagLevel2.jape: A string of lower case letters followed by a closed parenthesis is annotated ListFlagLevel2.
    • ListFlagLevel3.jape: A roman number from a lookup list followed by a closed parenthsis is annotated ListFlagLevel3.
    • RuleBookSectionLabel.jape: Looks up section labels from a list and annotates them SectionType. For example, Subject matter, Scope, and Statutory systems.
    • ListStatement01.jape: A string which occurs between SectionType annotation and a colon is annotated ListStateTop.
    • ListStatement02.jape: A string which occurs between a ListFlagLevel1 and a semicolon is annotated SubListStatementPrefinal.
    • ListStatement03.jape: A string which occurs between a ListFlagLevel1 and a period is annotated SubListStatementFinal.

    Application order

    The order of application of the processing resources is:

    • Document Reset PR
    • ANNIE Sentence Splitter
    • ANNIE English Tokeniser
    • ListGaz
    • RulebookSectionLabel:
    • ListArticleSection
    • ListStatement01
    • ListFlagLevel01
    • ListStatement02
    • ListStatement03

    Additional issues

    This example does not show the other list flag levels (e.g. using letters, roman numerals etc.), nor the results on other parts of the legislation.

    While the result for the specific text is attractive, there is much work to be done. The lists and rules overgenerate. For example, the rules indicate that avrt is a level flag because v is recognised as a roman numeral. In other cases, too long a passage is selected as the statement at the top of the list. Yet, the example is still useful to demonstrate a proof of concept, particularly in conjunction with the post on XSLT.

    By Adam Wyner
    Distributed under the Creative Commons
    Attribution-Non-Commercial-Share Alike 2.0

    Legal Informatics Start-up from Stanford University

    Thursday, January 14th, 2010

    The Stanford Daily, an online newspaper with news from Stanford University, reports the creation of a spin-off, start-up company Lex Machina which is the result of collaboration between the Law School and Department of Computer Science at Stanford. The focus of the company is to make intellectual property litigation more transparent; it covers patent infringement, copyright, trademark, antitrust, and certain trade secret lawsuits. There are commercial and non-commercial services.

    This is an interesting development, particularly in terms of the collaboration between a law school and department of computer science. I hope it is the first of many, and I look forward to learning more about the company and system.

    CFP: Workshop on Semantic Processing of Legal Texts

    Tuesday, January 12th, 2010

    LREC 2010 Workshop on

    SEMANTIC PROCESSING OF LEGAL TEXTS (SPLeT-2010)

    CALL FOR PAPERS

    23 May 2010, Malta

    Workshop description

    The legal domain represents a primary candidate for web-based information distribution, exchange and management, as testified by the numerous e-government, e-justice and e-democracy initiatives worldwide. The last few years have seen a growing body of research and practice in the field of Artificial Intelligence and Law which addresses a range of topics: automated legal reasoning and argumentation, semantic and cross-language legal information retrieval, document classification, legal drafting, legal knowledge discovery and extraction, as well as the construction of legal ontologies and their application to the law domain. In this context, it is of paramount importance to use Natural Language Processing techniques and tools that automate and facilitate the process of knowledge extraction from legal texts.

    With the last two years, a number of dedicated workshops and tutorials specifically focussing on different aspects of semantic processing of legal texts has demonstrated the current interest in research on Artificial Intelligence and Law in combination with Language Resources (LR) and Human Langugage Technologies (HLT). The LREC 2008 Workshop on “Semantic processing of legal texts” was held in Marrakech, Morocco, on the 27th of May 2008. The JURIX 2008 Workshop on “the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation (NaLEA)”, which focussed on recent advances in natural language engineering and legal argumentation. The ICAIL 2009 Workshops “LOAIT ’09 – the 3rd Workshop on Legal Ontologies and Artificial Intelligence Techniques joint with the 2nd Workshop on Semantic Processing of Legal Texts” and “NALEA’09 – Workshop on the Natural Language Engineering of Legal Argumentation: Language, Logic, and Computation”, the former focussing on Legal Knowledge Representation with particular emphasis on the issue of ontology acquisition from legal texts, the latter tackling issues related to legal argumentation and linguistic technologies.

    To continue this momentum, a 3rd Workshop on “Semantic Processing of Legal Texts” is being organised at the Language Resources and Evaluation Conference to bring to the attention of the broader LR/HLT community the specific technical challenges posed by the semantic processing of legal texts and also share with the community the motivations and objectives which make it of interest to researchers in legal informatics. The outcome of these interactions are expected to advance research and applications and foster interdisciplinary collaboration within the legal domain.
    The main goals of the workshop are to provide an overview of the state-of-the-art in legal knowledge extraction and management, to explore new research and development directions and emerging trends, and to exchange information regarding legal LRs and HLTs and their applications.

    Areas of Interest

    The workshop will focus on the topics of the automatic extraction of information from legal texts and the structural organisation of the extracted knowledge. Particular emphasis will be given to the crucial role of language resources and human language technologies.

    Papers are invited on, but not limited to, the following topics:

    • Building legal resources: terminologies, ontologies, corpora
    • Ontologies of legal texts, including subareas such as ontology acquisition, ontology customisation, ontology merging, ontology extension, ontology evolution, lexical information, etc.
    • Information retrieval and extraction from legal texts
    • Semantic annotation of legal texts
    • Legal text processing
    • Multilingual aspects of legal text semantic processing
    • Legal thesauri mapping
    • Automatic Classification of legal documents
    • Logical analysis of legal language
    • Automated parsing and translation of natural language arguments into a logical formalism
    • Linguistically-orientied XML mark up of legal arguments
    • Dialogue protocols for argumentation
    • Legal argument ontology
    • Computational theories of argumentation that are suitable to natural language
    • Controlled language systems for law.

    Submissions

    Submissions are solicited from researchers working on all aspects of semantic processing of legal texts. Authors are invited to submit papers describing original completed work, work in progress, interesting problems, case studies or research trends related to one or more of the topics of interest listed above. The final version of the accepted papers will be published in the Workshop Proceedings.

    Short or full papers can be submitted. Short papers are expected to present new ideas or new visions that may influence the direction of future research, yet they may be less mature than full papers. While an exhaustive evaluation of the proposed ideas is not necessary, insight and in-depth understanding of the issues is expected. Full papers should be more well developed and evaluated. Short papers will be reviewed the same way as full papers by the Program Committee and will be published in the Workshop Proceedings.

    Full paper submissions should not exceed 10 pages, short papers 6 pages; both should be typeset using a font size of 11 points. Style files will be made available by LREC for the camera-ready versions of accepted papers. Papers should be submitted electronically, no later than February 10, 2010. The only accepted format for submitted papers is Adobe PDF. Submission will be electronic using START paper submission software available at

    SPLeT 2010 Workshop

    Note that when submitting a paper through the START page, authors will be kindly asked to provide relevant information about the resources that have been used for the work described in their paper or that are the outcome of their research. In this way, authors will contribute to the LREC2010 Map, our new feature for LREC 2010. For further information on this initiative, please refer to

    LREC2010 Map of Language Resources

    Important Dates

    Paper submission deadline: 10 February 2010
    Acceptance notification sent: 5 March 2010
    Final version deadline: 21 March 2010
    Workshop date: 23 May 2010

    Workshop Chairs

    • Enrico Francesconi (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
    • Simonetta Montemagni (Istituto di Linguistica Computazionale of CNR, Pisa, Italy)
    • Wim Peters (Natural Language Processing Research Group, University of Sheffield, UK)
    • Adam Wyner (Department of Computer Science, University College London, UK)

    Address any queries regarding the workshop to: lrec10_legalWS@ilc.cnr.it

    Program Committee

    • Johan Bos (University of Rome, Italy)
    • Danièle Bourcier (Humboldt Universität, Berlin, Germany)
    • Thomas R. Bruce (Cornell Law School, Ithaca, NY, USA)
    • Pompeu Casanovas (Institut de Dret i Tecnologia, UAB, Barcelona, Spain)
    • Alessandro Lenci (Dipartimento di Linguistica, Università di Pisa, Pisa, Italy)
    • Leonardo Lesmo (Dipartimento di Informatica, Università di Torino, Torino, Italy)
    • Raquel Mochales Palau (Catholic University of Leuven, Belgium)
    • Paulo Quaresma (Universidade de Évora, Portugal)
    • Erich Schweighofer (Universität Wien, Rechtswissenschaftliche Fakultät, Wien, Austria)
    • Manfred Stede (University of Potsdam, Germany)
    • Daniela Tiscornia (Istituto di Teoria e Tecniche dell’Informazione Giuridica of CNR, Florence, Italy)
    • Tom van Engers (Leibniz Center for Law, University of Amsterdam, Netherlands)
    • Radboud Winkels (Leibniz Center for Law, University of Amsterdam, Netherlands)