1 Star 1 Fork 0

xuweihui / SkyText-Chinese-GPT3

Create your Gitee Account
Explore and code with more than 12 million developers,Free private repositories !:)
Sign up
Clone or Download
contribute
Sync branch
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README
MIT

SkyText

SkyText is a Chinese GPT3 pre-trained large model released by Singularity-AI, which can perform different tasks such as chatting, Q&A, and Chinese-English translation.

image

Hugging Face home pages:

https://huggingface.co/SkyWork/SkyText

https://huggingface.co/SkyWork/SkyTextTiny

Here are some show cases:

Show cases:

For experience and trial, please visit Singularity-AI-trail

Chat

image

Question and answer

image

Generate recipes:

input—— image

output—— image

Couplet

image

Project Highlights

  1. Technical advantage 1: data cleaning of more than 30 processes

    With the development of NLP technology, pre-training large models has gradually become one of the core technologies of artificial intelligence. Pre-training large models usually requires a large amount of text for training, and network text naturally becomes the most important source of corpus. The quality of the training corpus undoubtedly directly affects the effect of the model. In order to train a model with outstanding capabilities, Singularity-AI has used more than 30 cleaning processes in data cleaning. Excellence in details, casting excellent model effect.

  2. Technical advantage 2: optimized and innovative Chinese coding method for Chinese

    In the field of pre-training large models, it has always been dominated by the English community, and the importance of Chinese pre-training large models is self-evident. Unlike English, the Chinese input method(pinyin text) of the Chinese pre-trained large model should obviously be different. According to the characteristics of Chinese, Singularity-AI has optimized and innovated a unique Chinese encoding method, which is more in line with Chinese language habits, and rebuilt a Chinese dictionary that is more conducive to model understanding.

News of Singularity-AI

—————————————————————————————————

Installation

Recommand
transformers>=4.18.0

Model Usage

# -*- coding: utf-8 -*-
from transformers import GPT2LMHeadModel
from transformers import AutoTokenizer
from transformers import TextGenerationPipeline

# 13Billions
model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyText")
tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyText", trust_remote_code=True)

# or 2.6Billions
model = GPT2LMHeadModel.from_pretrained("SkyWork/SkyTextTiny")
tokenizer = AutoTokenizer.from_pretrained("SkyWork/SkyTextTiny", trust_remote_code=True)

text_generator = TextGenerationPipeline(model, tokenizer, device=0)
input_str = "Today is a "
max_new_tokens = 20
print(text_generator(input_str, max_new_tokens=max_new_tokens, do_sample=True))

License

[MIT License]

Developer Group

Scan the code below with WeChat to join in the developer group

text

MIT License Copyright (c) 2022 SkyWorkAIGC Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

SkyText是由奇点智源发布的中文GPT3预训练大模型,可以进行文章续写、对话、中英翻译、内容风格生成、推理、诗词对联等不同任务。| SkyText is a Chinese GPT3 pre-trained large model released by Singularity-AI, which can perform different tasks such as chatting, Q&A, and Chinese-English translation. expand collapse
MIT
Cancel

Releases

No release

Contributors

All

Activities

Load More
can not load any more
1
https://gitee.com/wei__hui/SkyText-Chinese-GPT3.git
git@gitee.com:wei__hui/SkyText-Chinese-GPT3.git
wei__hui
SkyText-Chinese-GPT3
SkyText-Chinese-GPT3
main

Search