Spaces:
Running
Running
| LLM: gpt | |
| instructions: '1. Refactor the unstructured OCR text into a dictionary based on the | |
| JSON structure outlined below. | |
| 2. You should map the unstructured OCR text to the appropriate JSON key and then | |
| populate the field based on its rules. | |
| 3. Some JSON key fields are permitted to remain empty if the corresponding information | |
| is not found in the unstructured OCR text. | |
| 4. Ignore any information in the OCR text that doesn''t fit into the defined JSON | |
| structure. | |
| 5. Duplicate dictionary fields are not allowed. | |
| 6. Ensure that all JSON keys are in lowercase. | |
| 7. Ensure that new JSON field values follow sentence case capitalization. | |
| 8. Ensure all key-value pairs in the JSON dictionary strictly adhere to the format | |
| and data types specified in the template. | |
| 9. Ensure the output JSON string is valid JSON format. It should not have trailing | |
| commas or unquoted keys. | |
| 10. Only return a JSON dictionary represented as a string. You should not explain | |
| your answer.' | |
| json_formatting_instructions: "The next section of instructions outlines how to format\ | |
| \ the JSON dictionary. The keys are the same as those of the final formatted JSON\ | |
| \ object.\nFor each key there is a format requirement that specifies how to transcribe\ | |
| \ the information for that key. \nThe possible formatting options are:\n1. \"verbatim\ | |
| \ transcription\" - field is populated with verbatim text from the unformatted OCR.\n\ | |
| 2. \"spell check transcription\" - field is populated with spelling corrected text\ | |
| \ from the unformatted OCR.\n3. \"boolean yes no\" - field is populated with only\ | |
| \ yes or no.\n4. \"boolean 1 0\" - field is populated with only 1 or 0.\n5. \"integer\"\ | |
| \ - field is populated with only an integer.\n6. \"[list]\" - field is populated\ | |
| \ from one of the values in the list.\n7. \"yyyy-mm-dd\" - field is populated with\ | |
| \ a date in the format year-month-day.\nThe desired null value is also given. Populate\ | |
| \ the field with the null value of the information for that key is not present in\ | |
| \ the unformatted OCR text." | |
| mapping: | |
| # Add column names to the desired category. This is used to map the VV Editor. | |
| COLLECTING: [] | |
| GEOGRAPHY: [] | |
| LOCALITY: [] | |
| MISCELLANEOUS: [] | |
| TAXONOMY: | |
| - catalog_number | |
| rules: | |
| Dictionary: | |
| # Manually add rows here. You MUST keep 'catalog_number' unchanged. Use 'catalog_number' as a guide for adding more columns. | |
| # The only values allowed in the 'format' key are those outlines above in the 'json_formatting_instructions' section. | |
| # If you want an empty cell by default, use '' for the 'null_value'. | |
| catalog_number: | |
| description: The barcode identifier, typically a number with at least 6 digits, | |
| but fewer than 30 digits. | |
| format: verbatim transcription | |
| null_value: '' | |
| # Do not change or remove below. This is required for some LLMs | |
| SpeciesName: | |
| taxonomy: | |
| - Genus_species | |