tarsur909 commited on
Commit
1b6c197
·
verified ·
1 Parent(s): ef5ddb5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Snowflake/snowflake-arctic-embed-m-long
4
+ ---
5
+
6
+
7
+ # CodeRankEmbed
8
+
9
+ `CodeRankEmbed` is a 137M bi-encoder supporting 8192 context length for code retrieval. It significantly outperforms various open-source and proprietary code embedding models on various code retrieval tasks.
10
+
11
+
12
+ # Performance Benchmarks
13
+
14
+ | Name | Parameters | CSN | CoIR |
15
+ | :-------------------------------:| :----- | :-------- | :------: |
16
+ | **CodeRankEmbed** | 137M | **77.9** |**60.1** |
17
+ | CodeSage-Large | 1.3B | 71.2 | 59.4 |
18
+ | Jina-Code-v2 | 161M | 67.2 | 58.4 |
19
+ | CodeT5+ | 110M | 74.2 | 45.9 |
20
+ | Voyage-Code-002 | Unknown | 68.5 | 56.3 |
21
+
22
+
23
+ # Usage
24
+
25
+ **Important**: the query prompt *must* include the following *task instruction prefix*: "Represent this query for searching relevant code"
26
+
27
+ ```python
28
+ from sentence_transformers import SentenceTransformer
29
+
30
+ model = SentenceTransformer("cornstack/CodeRankEmbed", trust_remote_code=True)
31
+ queries = ['Represent this query for searching relevant code: Calculate the n-th Fibonacci number']
32
+ codes = ["""def func(n):
33
+ if n <= 0:
34
+ return "Input should be a positive integer"
35
+ elif n == 1:
36
+ return 0
37
+ elif n == 2:
38
+ return 1
39
+ else:
40
+ a, b = 0, 1
41
+ for _ in range(2, n):
42
+ a, b = b, a + b
43
+ return b
44
+ """]
45
+ query_embeddings = model.encode(queries)
46
+ print(query_embeddings)
47
+ code_embeddings = model.encode(codes)
48
+ print(code_embeddings)
49
+ ```