Code Guide

Components

The code is organized in a modular fashion, consisting of the GUI running in the browser, the web server backend connected to a database, the CAT server, and the MT server, talking to a Moses server. Each of these components may run on a different machine.

Web Server

The web server creates the HTML pages for the browser and also hosts the Javascript that is executed in the browser that takes care of interactions once a document is loaded.

  • Builds on Matecat open source implementation
  • Typical web application: LAMP (Linux, Apache, MySQL, PHP)
  • Uses model, view, controller breakdown

Model

  • Relevant data is stored in MySQL database matecat_sandbox
  • Major database tables
    • Projects are stored in projects
    • They have a corresponding entry in jobs
    • Raw files (XLIFF) are stored in files
    • Segments are stored in segments
    • Translations of segments are stored in segment_translations
    • Log events are stored in *_event
    • etc.
  • The major change from Matecat is the logging

Controller

  • Typical request: get information about a segment:

POST http://192.168.56.2:8000/?action=getSegments&time=1446185242727

  • Script index.php selects corresponding action in lib/controller

e.g., getSegmentsController.php

  • Response is HTML or JSON
  • The main action is really in the Javascript GUI public/js
    • core functionality from Matecat public/js/cat.js
    • CASMACAT extensions public/js/casmacat

CAT Server

The CAT server receives requests from the UI (running Javascript), and sends back responses. This interaction is handled by a web socket with fall-back to HTTP requests.

  • To a large degree middleware
  • Calls external services such as
    • MT server
    • word aligner
    • interactive translation prediction
  • Caches information about a sentence translation

MT Server

The MT server handles requests to translate sentences, and provides additional information such as lists of translation options or the search graph. It also provides component functionality such as tokenization and word alignment.

  • Google-style API to MT Server
  • Python wrapper for Moses
    • basic translation request
    • includes pre and post processing pipeline
    • other functions: word alignment, incremental updating, etc.
  • Uses mosesserver XMLRPC server

The implementation is in the script server.py.

  • Requires mosesserver to run as a service
 mosesserver -config $MODELDIR/moses.ini --server-port 9010
  • Script server.py requires a lot of parameters
    • preprocessing tools (tokenizer, truecaser, etc.)
    • IP address and port
    • URL of the mosesserver API
    • etc.
  • Request to the script
 http://127.0.0.1:9000//translate?q=Un+test&key=0&source=xx&target=xx
  • Response
 {"data": {"translations": [{"translatedText": "A test",
 "translatedTextRaw": "a test", 
 "annotatedSource": "un test", 
 "tokenization": {"src": 0, 1], [3, 6?, "tgt": 0, 0], [2, 5?}}]}}
 }

Home Edition

The Home Edition is a standard installation of all CASMACAT tools and dependencies in fixed file locations. It also features an administrative interface that allows the training of engines and configuration of the workbench.

  • Moses is installed in /opt/moses
  • CASMACAT is installed in /opt/casmacat
    • web server / GUI in /opt/casmacat/web-server
    • MT server (server.py) in /opt/casmacat/mt-server
    • CAT server in /opt/casmacat/cat-server
    • installation scripts in /opt/casmacat/install
    • log files in /opt/casmacat/logs
  • Home Edition
    • admin web server in /opt/casmacat/admin
    • corpus data in /opt/casmacat/data
    • prototype training in /opt/casmacat/experiment
    • engines stored in /opt/casmacat/engines

Home Edition MT Engine

Machine translation engines within the Home Edition are packaged up in a directory that contains all model files and a command to start up the engine. This directory can also be packaged up and shared between users of the CASMACAT Home Edition.

  • Demo engine in /opt/casmacat/engines/fr-en-upload-1
  • Files
 biconcor.1
 biconcor.1.align
 biconcor.1.src-vcb
 biconcor.1.tgt
 biconcor.1.tgt-vcb
 corpus-1.binlm.1
 fast-align.1
 fast-align.1.log
 fast-align.1.parameters
 fast-align-inverse.1
 fast-align-inverse.1.log
 fast-align-inverse.1.parameters
 info
 moses.tuned.ini.1
 phrase-table-mmsapt.1
 reordering-table.1.wbe-msd-bidirectional-fe.minlexr
 RUN
 truecase-model.1.en
 truecase-model.1.fr
  • The script RUN starts the engine