obake2ai
/

MaryGPT

Text Generation

Inference Endpoints

Model card Files Files and versions Community

MaryGPT / README.md

obake2ai's picture

Update README.md

7cacc67 verified 7 months ago

|

2.18 kB

	---
	license: apache-2.0
	tags:
	- gpt-j
	- llm
	datasets:
	- EleutherAI/pile
	---
	# MaryGPT Model Card

	MaryGPT is a is a text generation model and a fine-tuned version of [GPT-J 6B](https://huggingface.co/EleutherAI/gpt-j-6b).

	This model is fine-tuned exclusively on text from Mary Shelley's 1818 novel ["Frankenstein; or, The Modern Prometheus"](https://www.gutenberg.org/ebooks/84).

	This will be used as a base model for [AI Artist Yuma Kishi👤](https://obake2ai.com/)’s activity.

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/63042f1e3926de1f7ec623b9/H8i_CGAKcvC88uT9jhJlb.png)

	## Training Data Sources
	All data was obtained ethically and in compliance with the site's terms and conditions.
	No copyright images are used in the training of this model without the permission.
	No AI generated images are in the dataset.

	- GPT-J 6B was trained on [the Pile](https://pile.eleuther.ai), a large-scale curated dataset created by [EleutherAI](https://www.eleuther.ai).
	- Frankenstein; or, The Modern Prometheus, 1818 (Public domain)

	## Training procedure
	This model was trained for 402 billion tokens over 383,500 steps on TPU v3-256 pod. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.

	## Developed by
	MaryGPT
	- [Yuma Kishi](https://x.com/obake_ai)

	GPT-J
	- [James Bradbury](https://twitter.com/jekbradbury) for valuable assistance with debugging JAX issues.
	- [Stella Biderman](https://www.stellabiderman.com), [Eric Hallahan](https://twitter.com/erichallahan), [Kurumuz](https://github.com/kurumuz/), and [Finetune](https://github.com/finetuneanon/) for converting the model to be compatible with the `transformers` package.
	- [Leo Gao](https://twitter.com/nabla_theta) for running zero shot evaluations for the baseline models for the table.
	- [Laurence Golding](https://github.com/researcher2/) for adding some features to the web demo.
	- [Aran Komatsuzaki](https://twitter.com/arankomatsuzaki) for advice with experiment design and writing the blog posts.
	- [Janko Prester](https://github.com/jprester/) for creating the web demo frontend.