2 Star 4 Fork 2

ycpan / Visual-Chinese-LLaMA-Alpaca 中文多模态大模型

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
Apache-2.0

🇨🇳中文 | 🌐English



GitHub GitHub top language GitHub last commit

Visual-Chinese-LLaMA-AlpacaVisualCLA)是基于中文LLaMA&Alpaca大模型项目开发的多模态中文大模型。VisualCLA在中文LLaMA/Alpaca模型上增加了图像编码等模块,使LLaMA模型可以接收视觉信息。在此基础上,使用了中文图文对数据进行了多模态预训练,对齐图像与文本表示,赋予其基本的多模态理解能力;并使用多模态指令数据集精调,增强其对多模态指令的理解、执行和对话能力。

本项目仍处于开发阶段,目前发布的是供预览的测试版本,模型效果还在优化中。

本项目主要内容:

  • 🚀 基于Chinese-LLaMA-Alpaca的多模态模型VisualCLA,具备多模态指理解和对话能力
  • 🚀 提供了推理代码和基于Gradio/Text-Generation-WebUI的部署脚本
  • 🚀 展示了模型在多模态指令理解任务上的效果,并开放了翻译的测试集
  • 🚀 目前开源版本: VisualCLA-7B-v0.1(测试版)

演示示例




中文LLaMA-2&Alpaca-2大模型 | 中文LLaMA&Alpaca大模型 | 多模态VLE | 中文MiniRBT | 中文LERT | 中英文PERT | 中文MacBERT | 中文ELECTRA | 中文XLNet | 中文BERT | 知识蒸馏工具TextBrewer | 模型裁剪工具TextPruner

新闻

[2023/07/18] Demo添加了Webcam支持,可以从直接摄像头拍摄照片

内容导引

模型介绍

Visual-Chinese-LLaMA-Alpaca(VisualCLA)是一个支持图像和文本输入的中文多模态模型。VisualCLA在中文Alpaca模型的基础上,添加了图像编码模块,使中文Alpaca模型能理解视觉信息。



VisualCLA由Vision Encoder、Resampler和LLM三部分组成:

  • Vision Encoder:采用ViT结构,对输入图像编码,得到图像的序列表示。发布的VisualCLA模型采用了CLIP-ViT-L/14作为图像编码器的结构和初始化权重。
  • Resampler:采用6层的类BERT结构,其结构与功能类似于Flamingo中的Perceiver Resampler或BLIP-2中的Q-Former,通过可训练的query向量对图像表示进行重采样,减小图像表示的长度。然后,通过线性层将图形表示对齐到LLM的维度。该部分的参数从头开始训练。
  • LLM:采用LLaMA模型,并使用Chinese-Alpaca-Plus 7B初始化。

图像经过Vision Encoder编码,通过Resampler映射为固定长度的表示。随后,将图像和文本表示拼接后送入LLM。LLM根据图像和文本指令生成结果。

训练策略

与Chinese-LLaMA-Alpaca类似,VisualCLA采用LoRA对模型进行高效精调。可训练参数包括图像编码器的LoRA参数,LLM的LoRA参数以及Resampler的全部参数。可参考模型结构图中的说明。训练过程分为两个阶段:

  • 多模态预训练:采用中文图文对数据训练,模型根据图像生成对应的文本描述(caption)。
  • 多模态指令精调:基于上一步得到的模型,在由多种有监督任务数据构建的多模态指令数据集上精调。数据集中包括视觉问答、视觉推理、开放域问答、OCR等任务类型。同时也混入了一部分纯文本指令数据,弥补多模态数据的不足以及缓解遗忘指令跟随能力。该阶段使用了与Chinese-Alpaca模型相同的指令模版。

VisualCLA-7B-v0.1的训练相关信息总结于下表:

训练阶段 多模态预训练 多模态指令精调
初始化 Chinese-Alpaca-Plus 7B 多模态预训练模型
训练任务 多模态预训练 多模态指令精调
任务类型 图像描述(Captioning) 视觉问答、视觉推理、开放域问答、OCR等
Prompt模版 Alpaca prompt模版
训练集大小(样本数量) 23M 350K(多模态指令) + 1.3M(纯文本指令)

模型下载

LLaMA模型禁止商用,为了遵循相应的许可,本项目发布增量权重,包括:

  • LLaMA的LoRA、embedding和LM head权重
  • CLIP-ViT的LoRA权重
  • Resampler的全部权重

用户需要在Chinese-Alpaca-PlusCLIP-ViT的基础上加载或合并模型,以得到完整可用的VisualCLA模型。

模型名 依赖的基模型 增量权重下载
VisualCLA-7B-v0.1 Chinese-Alpaca-Plus 7B (HF格式) + CLIP-ViT-L/14 [百度网盘][Google Drive]

†: Chinese-Alpaca-Plus 7B模型的获取与合并方法请参考Chinese-LLaMA-Alpaca模型合并与转换

‡: CLIP-ViT-L/14模型下载链接

Model Hub

也可以在🤗Model Hub下载模型,使用transformers和PEFT调用VisualCLA。以下模型调用名称指的是使用.from_pretrained()中指定的模型名称。使用示例可参见模型使用

模型名 模型调用名称 链接
VisualCLA-7B-v0.1 ziqingyang/visualcla-7b-v0.1 Hub地址

压缩包内包含如下文件:

visualcla-7b-v0.1/
  - adapter_config.json      # LoRA配置文件
  - adapter_model.bin        # LoRA权重文件
  - config.json              # VisualCLA配置文件
  - added_tokens.json        # tokenizer配置文件
  - special_tokens_map.json  # tokenizer配置文件
  - tokenizer_config.json    # tokenizer配置文件
  - tokenizer.model          # tokenizer文件
  - preprocessor_config.json # ImageProcessor配置文件

模型使用

Colab笔记本

对于模型的安装、合并、推理和部署等流程,除了下述的步骤说明外,我们还提供了Colab笔记本,用户可方便地直接执行、体验并查看结果:

笔记本名 内容 链接 notebook文件
visualcla_inference.ipynb 模型的安装、合并、命令行推理和Gradio demo部署 Open In Colab visualcla_inference.ipynb

安装

将本项目下载至本地,安装模型代码至Python搜索路径

git clone https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca
cd Visual-Chinese-LLaMA-Alpaca
pip install -e .

合并模型(可选,推荐)

用户可以选择将增量权重与基模型合并后保存,使用更方便,加载更迅速。合并后的模型大小约为14G,合并过程约需占用20G内存,请确保机器有足够的硬盘和内存空间。

执行本项目中的scripts/merge_llama_with_vcla_lora.py进行合并:

python scripts/merge_llama_with_visualcla_lora.py \
    --text_model /path/to/chinese/alpaca/plus/7b \
    --vision_model /path/to/clip/vit/14-L \
    --lora_model /path/to/visualcla/lora \
    --output_dir output_dir

参数说明:

  • --text_model:Chinese-Alpaca-Plus 7B模型所在目录
  • --vision_model:CLIP-ViT-14/L模型所在目录
  • --lora_model:VisualCLA LoRA模型所在目录
  • --output_dir:保存合并后模型的目录

传入的模型所在目录也皆可用🤗Model Hub上的模型名代替。

合并后的output_dir目录内容如下:

output_dir/
 - text_encoder/             # LLM的模型权重和配置
 - image_encoer/             # Vision Encoder的模型权重和配置
 - pytorch_model.bin         # Resampler部分的权重
 - config.json               # VisualCLA的配置文件
 - added_tokens.json         # tokenizer配置文件
 - special_token_map.json    # tokenizer配置文件
 - tokenizer_config.json     # tokenizer配置文件
 - tokenizer.model           # tokenizer文件
 - preprocessor_config.json  # ImageProcessor配置文件

可以使用visualcla.get_model_and_tokenizer_and_processor加载,详见下节。

模型加载与推理

接口调用

如果已合并模型

可以使用如下代码在Python程序中调用VisualCLA:

import torch
import visualcla
model, tokenizer, _ = visualcla.get_model_and_tokenizer_and_processor(
      visualcla_model="/path/to/the/merged/visualcla/model",
      torch_dtype=torch.float16,
      load_in_8bit=True
)
model.to(0)
history=[]
visualcla.chat(model=model, image="path/to/image/filename", text="your instruction here", history=history)

如果未合并模型

需要同时加载Chinese-Alpaca-Plus-7B,CLIP-ViT-L/14和VisualCLA LoRA:

import torch
import visualcla
from peft import PeftModel
base_model, tokenizer, _ = visualcla.get_model_and_tokenizer_and_processor(
      text_model="/path/to/chinese/alpaca/plus/7b",  # Path to the Chinese-Alpaca-Plus 7B model
      vision_model="openai/clip-vit-large-patch14",  # We can also use the Model Hub name of the model
      lora_model="/path/to/visualcla/lora",
      torch_dtype=torch.float16
)
base_model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(base_model, "/path/to/visualcla/lora", torch_dtype=torch.float16)
model.to(0)
history = []
visualcla.chat(model=model, image="path/to/image/filename", text="your instruction here",history=history)

推理脚本

在本项目的scripts/inference文件夹下提供了封装更完善的Python推理脚本inference.py

如果已合并模型

python scripts/inference/inference.py \
    --visualcla_model visualcla_model \
    --image_file image_file \
    --load_in_8bit

如果未合并模型

python scripts/inference/inference.py \
    --text_model /path/to/chinese/alpaca/plus/7b \
    --vision_model /path/to/clip/vit/14-L \
    --lora_model /path/to/visualcla/lora \
    --image_file image_file
    # 未合并的模型暂不支持8bit加载

参数说明:

  • --text_model:合并后的Chinese-Alpaca-Plus 7B模型所在目录,或🤗Model Hub上的模型名
  • --vision_model:CLIP-ViT-14/L模型所在目录,或🤗Model Hub上的模型名
  • --lora_model:VisualCLA LoRA模型所在目录,或🤗Model Hub上的模型名
  • --visualcla_model:使用合并脚本合并后的VisualCLA模型
    • 若未提供此参数,则模型将合并text_modelvision_modellora_model并用于推理
    • 若提供此参数,则加载的模型将以此参数为准,无需再提供 text_modelvision_modellora_model
  • --image_file(可选):模型读入的图片名,支持pngjpg等标准图片格式。不提供此参数时,模型将只基于文本内容进行回复。
  • --load_in_8bit(可选):LLM部分是否使用8bit推理
  • --gpus(可选):使用的GPU设备id,默认为0
  • --only_cpu(可选):是否仅使用CPU推理

模型部署

基于Gradio的网页demo

先安装依赖包

pip install gradio mdtex2html

启动方式:

python scripts/inference/gradio_demo.py --visualcla_model visualcla_model --load_in_8bit

参数说明:

  • --visualcla_model:使用合并脚本合并后的VisualCLA模型
  • --share(可选):是否创建公开可访问链接
  • --load_in_8bit(可选):LLM部分是否使用8bit推理
  • --gpus(可选):使用的GPU设备id,默认为0
  • --only_cpu(可选):是否仅使用CPU推理
  • --no_stream(可选):不使用流式输出形式

基于Text-Generation-webUI的模型部署

相比基于gradio_demo.py的部署方式,Text-Generation-webUI支持在多轮对话中使用多张图片。基于Text-Generation-webUI的模型部署的详细步骤请参考这里

效果展示

以下展示的均是v0.1测试版的效果

中文测试集

我们将LLaVA测试集和OwlEval测试集翻译成了中文,数据集下载以及模型在这两个数据集上的结果参见此处

局限性

虽然本项目中的模型具备一定的结合图像的多模态理解和生成能力,但也存在一定局限性,包括但不限于:

  • 存在幻觉问题,可能会生成与图像内容不符或不相关的内容,比如描述了图片中不存在的物体等
  • 预训练仍不充分,可能出现指令理解错误以及不能很好地结合图片回答等情况
  • 对图像中的精细的文字、公式、表格等内容的识别和理解准确率较低
  • 进行多轮对话后模型输出质量变差
  • 没有在线可互动的demo(注:用户仍然可以自行在本地部署)

引用

如果您觉得本项目对您的研究有所帮助或使用了本项目的代码或数据,请参考引用我们的工作

@article{chinese-llama-alpaca,
      title={Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca}, 
      author={Cui, Yiming and Yang, Ziqing and Yao, Xin},
      journal={arXiv preprint arXiv:2304.08177},
      url={https://arxiv.org/abs/2304.08177},
      year={2023}
}

@misc{visualcla,
  author = {Yang, Ziqing and Pan, Yuchen and Cui, Yiming},
  title = {Visual-Chinese-LLaMA-Alpaca},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca/}},
}

致谢

本项目基于以下开源项目二次开发,在此对相关项目和研究开发人员表示感谢。

免责声明

本项目相关资源仅供学术研究之用,严禁用于商业用途。 使用涉及第三方代码的部分时,请严格遵循相应的开源协议。模型生成的内容受模型计算、随机性和量化精度损失等因素影响,本项目不对其准确性作出保证。对于模型输出的任何内容,本项目不承担任何法律责任,亦不对因使用相关资源和输出结果而可能产生的任何损失承担责任。

本项目由个人及协作者业余时间发起并维护,因此无法保证能及时回复解决相应问题。

Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright [yyyy] [name of copyright owner] Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

简介

Visual-Chinese-LLaMA-Alpaca(VisualCLA)是基于中文LLaMA&Alpaca大模型项目开发的多模态中文大模型。 Github地址:https://github.com/airaria/Visual-Chinese-LLaMA-Alpaca 展开 收起
Python 等 2 种语言
Apache-2.0
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/GoGoJoestar/Visual-Chinese-LLaMA-Alpaca.git
git@gitee.com:GoGoJoestar/Visual-Chinese-LLaMA-Alpaca.git
GoGoJoestar
Visual-Chinese-LLaMA-Alpaca
Visual-Chinese-LLaMA-Alpaca 中文多模态大模型
main

搜索帮助

53164aa7 5694891 3bd8fe86 5694891