Spaces:
				
			
			
	
			
			
					
		Running
		
	
	
	
			
			
	
	
	
	
		
		
					
		Running
		
	Add some TODOs, update readme.md, update evaluation.ipynb
Browse files- Dockerfile +1 -0
 - README.md +13 -2
 - commafixer/src/fixer.py +2 -1
 - notebooks/evaluation.ipynb +0 -0
 
    	
        Dockerfile
    CHANGED
    
    | 
         @@ -27,6 +27,7 @@ COPY --chown=user . . 
     | 
|
| 27 | 
         
             
            FROM base as test
         
     | 
| 28 | 
         | 
| 29 | 
         
             
            RUN pip install .[test]
         
     | 
| 
         | 
|
| 30 | 
         
             
            RUN python -m pytest tests
         
     | 
| 31 | 
         | 
| 32 | 
         
             
            FROM python:3.10-slim as deploy
         
     | 
| 
         | 
|
| 27 | 
         
             
            FROM base as test
         
     | 
| 28 | 
         | 
| 29 | 
         
             
            RUN pip install .[test]
         
     | 
| 30 | 
         
            +
            # TODO don't run all at once because of memory errors?
         
     | 
| 31 | 
         
             
            RUN python -m pytest tests
         
     | 
| 32 | 
         | 
| 33 | 
         
             
            FROM python:3.10-slim as deploy
         
     | 
    	
        README.md
    CHANGED
    
    | 
         @@ -27,6 +27,7 @@ Note that you might have to 
     | 
|
| 27 | 
         
             
            `sudo service docker start`
         
     | 
| 28 | 
         
             
            first.
         
     | 
| 29 | 
         | 
| 
         | 
|
| 30 | 
         
             
            The application should then be available at http://localhost:8000.
         
     | 
| 31 | 
         
             
            For the API, see the `openapi.yaml` file.
         
     | 
| 32 | 
         
             
            Docker-compose mounts a volume and listens to changes in the source code, so the application will be reloaded and 
         
     | 
| 
         @@ -35,7 +36,15 @@ reflect them. 
     | 
|
| 35 | 
         
             
            We use multi-stage builds to reduce the image size, ensure flexibility in requirements and that tests are run before 
         
     | 
| 36 | 
         
             
            each deployment.
         
     | 
| 37 | 
         
             
            However, while it does reduce the size by nearly 3GB, the resulting image still contains deep learning libraries and 
         
     | 
| 38 | 
         
            -
            pre-downloaded models, and will take around  
     | 
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 
         | 
|
| 39 | 
         | 
| 40 | 
         
             
            Alternatively, you can setup a python environment by hand. It is recommended to use a virtualenv. Inside one, run
         
     | 
| 41 | 
         
             
            ```bash
         
     | 
| 
         @@ -43,6 +52,8 @@ pip install -e .[test] 
     | 
|
| 43 | 
         
             
            ```
         
     | 
| 44 | 
         
             
            the `[test]` option makes sure to install test dependencies.
         
     | 
| 45 | 
         | 
| 
         | 
|
| 
         | 
|
| 46 | 
         
             
            If you intend to perform training and evaluation of deep learning models, install also using the `[training]` option. 
         
     | 
| 47 | 
         | 
| 48 | 
         
             
            ### Running tests
         
     | 
| 
         @@ -91,7 +102,7 @@ dataset are as follows: 
     | 
|
| 91 | 
         
             
            | Model    | precision | recall | F1   | support |
         
     | 
| 92 | 
         
             
            |----------|-----------|--------|------|---------|
         
     | 
| 93 | 
         
             
            | baseline | 0.79      | 0.72   | 0.75 | 10079   |
         
     | 
| 94 | 
         
            -
            | ours*    | 0. 
     | 
| 95 | 
         
             
            *details of the fine-tuning process in the next section.
         
     | 
| 96 | 
         | 
| 97 | 
         
             
            We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token 
         
     | 
| 
         | 
|
| 27 | 
         
             
            `sudo service docker start`
         
     | 
| 28 | 
         
             
            first.
         
     | 
| 29 | 
         | 
| 30 | 
         
            +
             
     | 
| 31 | 
         
             
            The application should then be available at http://localhost:8000.
         
     | 
| 32 | 
         
             
            For the API, see the `openapi.yaml` file.
         
     | 
| 33 | 
         
             
            Docker-compose mounts a volume and listens to changes in the source code, so the application will be reloaded and 
         
     | 
| 
         | 
|
| 36 | 
         
             
            We use multi-stage builds to reduce the image size, ensure flexibility in requirements and that tests are run before 
         
     | 
| 37 | 
         
             
            each deployment.
         
     | 
| 38 | 
         
             
            However, while it does reduce the size by nearly 3GB, the resulting image still contains deep learning libraries and 
         
     | 
| 39 | 
         
            +
            pre-downloaded models, and will take around 9GB of disk space.
         
     | 
| 40 | 
         
            +
             
     | 
| 41 | 
         
            +
            NOTE: Since the service is hosting two large deep learning models, there might be memory issues depending on your 
         
     | 
| 42 | 
         
            +
            machine, where the terminal running 
         
     | 
| 43 | 
         
            +
            docker would simply crash.
         
     | 
| 44 | 
         
            +
            Should that happen, you can try increasing resources allocated to docker, or splitting commands in the docker file, 
         
     | 
| 45 | 
         
            +
            e.g., running tests one by one.
         
     | 
| 46 | 
         
            +
            If everything fails, you can still use the hosted huggingface hub demo, or follow the steps below and run the app 
         
     | 
| 47 | 
         
            +
            locally without Docker.
         
     | 
| 48 | 
         | 
| 49 | 
         
             
            Alternatively, you can setup a python environment by hand. It is recommended to use a virtualenv. Inside one, run
         
     | 
| 50 | 
         
             
            ```bash
         
     | 
| 
         | 
|
| 52 | 
         
             
            ```
         
     | 
| 53 | 
         
             
            the `[test]` option makes sure to install test dependencies.
         
     | 
| 54 | 
         | 
| 55 | 
         
            +
            Then, run `python app.py` or `uvicorn --host 0.0.0.0 --port 8000 "app:app" --reload` to run the application.
         
     | 
| 56 | 
         
            +
             
     | 
| 57 | 
         
             
            If you intend to perform training and evaluation of deep learning models, install also using the `[training]` option. 
         
     | 
| 58 | 
         | 
| 59 | 
         
             
            ### Running tests
         
     | 
| 
         | 
|
| 102 | 
         
             
            | Model    | precision | recall | F1   | support |
         
     | 
| 103 | 
         
             
            |----------|-----------|--------|------|---------|
         
     | 
| 104 | 
         
             
            | baseline | 0.79      | 0.72   | 0.75 | 10079   |
         
     | 
| 105 | 
         
            +
            | ours*    | 0.84      | 0.84   | 0.84 | 10079   |
         
     | 
| 106 | 
         
             
            *details of the fine-tuning process in the next section.
         
     | 
| 107 | 
         | 
| 108 | 
         
             
            We treat each comma as one token instance, as opposed to the original paper, which NER-tags the whole multiple-token 
         
     | 
    	
        commafixer/src/fixer.py
    CHANGED
    
    | 
         @@ -52,7 +52,8 @@ class CommaFixer: 
     | 
|
| 52 | 
         
             
                    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
         
     | 
| 53 | 
         
             
                    model = PeftModel.from_pretrained(inference_model, model_name)
         
     | 
| 54 | 
         
             
                    model = model.merge_and_unload()  # Join LoRa matrices with the main model for faster inference
         
     | 
| 55 | 
         
            -
                     
     | 
| 
         | 
|
| 56 | 
         | 
| 57 | 
         | 
| 58 | 
         
             
            def _fix_commas_based_on_labels_and_offsets(
         
     | 
| 
         | 
|
| 52 | 
         
             
                    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
         
     | 
| 53 | 
         
             
                    model = PeftModel.from_pretrained(inference_model, model_name)
         
     | 
| 54 | 
         
             
                    model = model.merge_and_unload()  # Join LoRa matrices with the main model for faster inference
         
     | 
| 55 | 
         
            +
                    # TODO batch, and move to CUDA if available
         
     | 
| 56 | 
         
            +
                    return model.eval(), tokenizer
         
     | 
| 57 | 
         | 
| 58 | 
         | 
| 59 | 
         
             
            def _fix_commas_based_on_labels_and_offsets(
         
     | 
    	
        notebooks/evaluation.ipynb
    CHANGED
    
    | 
         The diff for this file is too large to render. 
		See raw diff 
     | 
| 
         |