|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
pipeline_tag: text2text-generation |
|
--- |
|
|
|
# AquilaX-NL-JSON-Start-Scan |
|
|
|
## Overview |
|
AquilaX-NL-JSON-Start-Scan is a model built using Hugging Face's T5-small to convert natural language queries about vulnerabilities into JSON queries for MongoDB. |
|
|
|
## Model Information |
|
|
|
### Model |
|
- **Name**: AquilaX-NL-JSON-Start-Scan |
|
- **Architecture**: T5-small |
|
- **Framework**: Hugging Face Transformers |
|
|
|
### Description |
|
The AquilaX-NL-JSON-Start-Scan model is designed to interpret natural language queries related to vulnerabilities in code and convert them into JSON queries that can be executed on a MongoDB database. This facilitates automated scanning and analysis of code repositories for security issues. The model leverages the capabilities of the T5-small architecture, which is well-suited for natural language understanding and generation tasks. |
|
|
|
# Getting Started |
|
|
|
## Usage |
|
Below we share some code snippets on how to get quickly started with running the model. First make sure to `pip install -U transformers[torch]`, then copy the snippet from the section. |
|
|
|
## Requirements |
|
```bash |
|
pip install transformers[torch] |
|
``` |
|
|
|
## Inference Code |
|
```python |
|
import json |
|
import requests |
|
|
|
def convert_to_json(answer): |
|
""" |
|
Convert a string representation of a dictionary to a JSON object. |
|
|
|
This function takes a string representation of a dictionary, cleans it by removing |
|
specific unwanted tokens and correcting boolean representations, and then converts |
|
it into a JSON object. |
|
|
|
Parameters: |
|
answer (str): The input string representing a dictionary. |
|
|
|
Returns: |
|
dict: The JSON object converted from the input string. |
|
""" |
|
answer = answer.replace("<pad>", "").replace("</s>", "") |
|
answer = answer.strip("'") |
|
answer = answer.replace("false", "False").replace("true", "True") |
|
answer_dict = eval(answer) |
|
answer_json = json.dumps(answer_dict) |
|
json_data = json.loads(answer_json) |
|
return json_data |
|
|
|
def valid_url(url): |
|
""" |
|
Validate the given URL against a list of supported platforms. |
|
|
|
This function checks if the provided URL belongs to one of the supported |
|
platforms for scanning. If the URL is valid, it returns True. Otherwise, |
|
it returns a message indicating that the URL is not supported and lists the |
|
available scanners. |
|
|
|
Parameters: |
|
url (str): The URL to be validated. |
|
|
|
Returns: |
|
bool or dict: Returns True if the URL is valid, otherwise returns a |
|
dictionary with a message indicating the URL is not supported |
|
and lists the available scanners. |
|
""" |
|
valid_list = [ |
|
"github.com", "bitbucket.org", "sourceforge.net", "aws.amazon.com", |
|
"dev.azure.com", "gitea.com", "gogs.io", "phabricator.com", |
|
"gitkraken.com", "beanstalkapp.com", "gitlab.com" |
|
] |
|
platform = url.split("//")[1].split("/")[0] |
|
|
|
if platform in valid_list: |
|
return True |
|
|
|
return { |
|
'message': 'Provide a valid URL for scanning. Currently, we support PII_Scanner, SAST_Scanner, Sac_Scanner (Open_Source_Security), IaC_Scanner, Container_Scanner' |
|
} |
|
``` |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
import torch |
|
import time |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("AquilaX-AI/NL-JSON-Start-Scan") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("AquilaX-AI/NL-JSON-Start-Scan") |
|
|
|
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") |
|
|
|
# Change YOUR_QUERY eg: can this https://github.com/mr-vicky-01/educational-assitant on every week using pii and sast scan |
|
query = "Translate the following text to JSON: " + "YOUR_QUERY".lower() |
|
query = query.replace(",", "") |
|
|
|
start = time.time() |
|
|
|
inputs = tokenizer(query, return_tensors="pt") |
|
model.to(device) |
|
inputs = inputs.to(device) |
|
outputs = model.generate(**inputs, max_length=256) |
|
answer = tokenizer.decode(outputs[0]) |
|
try: |
|
json_data = convert_to_json(answer) |
|
except: |
|
json_data = {'message': 'We encountered an issue with your query. Please use the Personalized Scan option for accurate results.'} |
|
|
|
to_return = json_data.copy() |
|
try: |
|
valid = valid_url(json_data["repo"]) |
|
if valid != True: |
|
to_return = valid |
|
else: |
|
url = re.findall(r'https?://\S+', query) |
|
to_return['repo'] = url |
|
|
|
except: |
|
pass |
|
|
|
end = time.time() |
|
print(to_return) |
|
print(f"Time taken: {end - start}") |
|
``` |
|
|
|
## License |
|
This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
## Authors |
|
- [Aquilax-Ai](https://huggingface.co/AquilaX-AI) |
|
- [Suriya](https://huggingface.co/suriya7) |
|
- [Vicky](https://huggingface.co/Mr-Vicky-01) |
|
|
|
## Acknowledgments |
|
- Hugging Face for the Transformers library. |