Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -42,6 +42,17 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 42 |
- [StarCoder2 Membership Test](https://stack-dev.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 43 |
</details>
|
| 44 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
<details>
|
| 46 |
<summary>
|
| 47 |
<b><font size="+1">💫StarCoder</font></b>
|
|
@@ -94,13 +105,10 @@ BigCode is an open scientific collaboration working on responsible training of l
|
|
| 94 |
<summary>
|
| 95 |
<b><font size="+1">📑The Stack</font></b>
|
| 96 |
</summary>
|
| 97 |
-
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses
|
| 98 |
-
|
| 99 |
-
|
| 100 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
| 101 |
-
- [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2): Exact deduplicated version of The Stack v2.
|
| 102 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
| 103 |
-
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
| 104 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 105 |
</details>
|
| 106 |
---
|
|
|
|
| 42 |
- [StarCoder2 Membership Test](https://stack-dev.dataportraits.org/): Blazing fast test if code was present in pretraining dataset.
|
| 43 |
</details>
|
| 44 |
---
|
| 45 |
+
<details>
|
| 46 |
+
<summary>
|
| 47 |
+
<b><font size="+1">📑The Stack v2</font></b>
|
| 48 |
+
</summary>
|
| 49 |
+
The Stack v2 is a 67.5TB dataset of source code in over 600 programming languages with permissive licenses or no license.
|
| 50 |
+
|
| 51 |
+
- [The Stack v2](https://huggingface.co/datasets/bigcode/the-stack-v2): Exact deduplicated version of The Stack v2.
|
| 52 |
+
- [The Stack v2 dedup](https://huggingface.co/datasets/bigcode/the-stack-v2-dedup): Near deduplicated version of The Stack v2 (recommended for training).
|
| 53 |
+
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 54 |
+
</details>
|
| 55 |
+
---
|
| 56 |
<details>
|
| 57 |
<summary>
|
| 58 |
<b><font size="+1">💫StarCoder</font></b>
|
|
|
|
| 105 |
<summary>
|
| 106 |
<b><font size="+1">📑The Stack</font></b>
|
| 107 |
</summary>
|
| 108 |
+
The Stack v1 is a 6.4TB dataset of source code in 358 programming languages from permissive licenses.
|
| 109 |
+
|
|
|
|
| 110 |
- [The Stack](https://huggingface.co/datasets/bigcode/the-stack): Exact deduplicated version of The Stack.
|
|
|
|
| 111 |
- [The Stack dedup](https://huggingface.co/datasets/bigcode/the-stack-dedup): Near deduplicated version of The Stack (recommended for training).
|
|
|
|
| 112 |
- [Am I in the Stack](https://huggingface.co/spaces/bigcode/in-the-stack): Check if your data is in The Stack and request opt-out.
|
| 113 |
</details>
|
| 114 |
---
|