YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
descripcion del modelo
Modelo gpt-2_Neo125M Fine Tune, para la prediccion de precios de casas o apartamentos en Cali-Colombia Descargue todos los archivos requeridos desde Dropbox.
- Developed by: Nicolai Potes
- Language: Python
- Finetuned from model : gpt-neo-125M.
Training Details
Num examples = 779
Num Epochs = 500
Instantaneous batch size per device = 80
Total train batch size (w. parallel, distributed & accumulation) = 80
Gradient Accumulation steps = 1
Total optimization steps = 5000
Number of trainable parameters = 125200128
Training Evaluate
{'eval_loss': 1.341125726699829,
'eval_runtime': 23.3347,
'eval_samples_per_second': 300.111,
'eval_steps_per_second': 3.771,
'epoch': 500.0}
Training Data
datos sacados de https://www.metrocuadrado.com/ formato para el entrenamiento del mododelo
'meter: 3651685 \n area: 267 \n bathroom: 4 \n room: 4 \n property: 1 \n price: 975000000',
'meter: 3206498 \n area: 70 \n bathroom: 3 \n room: 4 \n property: 2 \n price: 225000000',
'meter: 2181818 \n area: 110 \n bathroom: 2 \n room: 3 \n property: 2 \n price: 240000000',
'meter: 5882352 \n area: 306 \n bathroom: 4 \n room: 4 \n property: 2 \n price: 1800000000',
'meter: 2827586 \n area: 58 \n bathroom: 2 \n room: 2 \n property: 2 \n price: 164000000',
'meter: 7382550 \n area: 149 \n bathroom: 4 \n room: 3 \n property: 2 \n price: 1100000000',
'meter: 2833333 \n area: 300 \n bathroom: 3 \n room: 3 \n property: 1 \n price: 850000000',
'meter: 3678474 \n area: 73 \n bathroom: 2 \n room: 3 \n property: 2 \n price: 270000000',
'meter: 2254901 \n area: 51 \n bathroom: 2 \n room: 2 \n property: 2 \n price: 115000000',
'meter: 2500000 \n area: 90 \n bathroom: 3 \n room: 3 \n property: 2 \n price: 225000000',
'meter: 4508196 \n area: 122 \n bathroom: 5 \n room: 4 \n property: 2 \n price: 550000000',
'meter: 3489583 \n area: 96 \n bathroom: 3 \n room: 3 \n property: 2 \n price: 335000000',
'meter: 2151898 \n area: 395 \n bathroom: 5 \n room: 5 \n property: 1 \n price: 850000000',
Hardware GPU
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 49C P0 29W / 70W | 0MiB / 15360MiB | 5% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Librerias requeridas
pip install transformers
pip install torch
Codigo Python para cargar el modelo y predecir el valor de una propiedad (Casa/Apartamento)
import pandas as pd
import torch
import transformers
import re
'''
#dado el caso de estar en Google colab
from google.colab import drive
drive.mount('/content/drive')
path="/content/drive/My Drive/DatosMetroCuadradoPrueba/"
'''
path="[Direccion de la carpeta donde tiene el MODELO]"
path_carga= path+"modeloEntrenadoPreciosCasasApartamentos"
from transformers import GPT2Tokenizer, GPTNeoForCausalLM
new_modelPredict = GPTNeoForCausalLM.from_pretrained(path_carga).cuda()
tokenizer2 = GPT2Tokenizer.from_pretrained(path_carga)
new_modelPredict.resize_token_embeddings(len(tokenizer2))
tipo_propiedad= 1 # 1: casa , 2:apartamento
habitaciones= 5
baños= 5
area= 580
valor_inmueble= 1500000000
valorMetroCuadrado= int(valor_inmueble/area)
propiedad = f"<|startoftext|>meter: {valorMetroCuadrado} \n area: {area} \n bathroom: {baños} \n room: {habitaciones} \n property: {tipo_propiedad} \n price:"
print("Texto:",propiedad)
generated = tokenizer2(propiedad, # <|pad|>
return_tensors="pt").input_ids.cuda()
sample_outputs = new_modelPredict.generate(generated,
do_sample=True,
top_k=50,
max_length=100,
num_beams=7, #3
top_p=1.65,
temperature=.69,
num_return_sequences=1,
pad_token_id = 0)
price= []
#
for i, sample_output in enumerate(sample_outputs):
text= tokenizer2.decode(sample_output, skip_special_tokens=True)
num= text.split("\n")[-1].split("price: ")[1]
try:
num= re.sub(r'[^\d.]', '',num )#[0]
price.append( num )
except:
pass
# pd.set_option('display.float_format', '{.2f}'.format)
priceData2= pd.DataFrame(price,columns=['price']).astype(int)
print(priceData2)
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.