18. Named Entity Recognition (NER)

YAO's: https://github.com/liuyaox/named_entity_recognition (Keras)

18.1 Overview

信息抽取：抽取出特定的事件或事实信息，帮助我们将海量内容自动分类、提取和重构

冠亚季军分享：预训练模型彻底改变了NLP，但也不能忽略传统方法带来的提升 - 2019

冠军：GloVe + Pretrained(Flair/ELMo/BERT/XLNet) + LSTM + CNN + CRF
亚军：Bert + BiLSTM + CRF
季军：Transformer + BiLSTM + CRF
【Great】https://github.com/cdjasonj/datagrand (Keras & Tensorflow)

Rank 6

输入：Embedding有2*3+1=7种：(char, bichar) * (Word2Vec, GloVe, FastText) + <char, ELMo>，每种有4种维度dim=(150, 200, 250, 300)

中间：
- 模型1：BiLSTMs + SelfAttention
- 模型2：BiLSTM + CNNs(kernel: 3, 5, 7) + SelfAttention
- 模型3：(BiGRU + LocalAttention)s
- 模型4：BiONLSTMs + SelfAttention (未使用)
输出：TimeDistributed(Dense) + CRF

YAO: Detailed Notes
- 特点1：数据不Padding！也没有使用Masking！ 按Batch训练模型，每一Batch内输入具有相同的Seq_length；应用时同样，相同Seq_length的输入一并应用。
- 特点2：使用ELMo，且融合了Word2Vec, GloVe, FastText三种静态Embedding的特点。
HERE HERE HERE HERE HERE
https://github.com/lonePatient/daguan_2019_rank9 (Pytorch)

Rank 9

模型1：BERT + LSTM + CRF
模型2：BERT + LSTM + MDP + CRF
模型3：BERT + LSTM + SPAN
【Great】https://github.com/renjunxiang/daguan_2019 (PyTorch)

自行训练BERT，而非直接使用训练好的BERT，过程详细值得学习

Data

https://github.com/LG-1/video_music_book_datasets

9632条视频/音乐/书籍标注数据

18.2 HMM & CRF

18.3 RNN + CRF

主要以 BiLSTM + CRF 为主

关于CRF和Loss推导和实现细节，以及解码细节，请参考：04_Probabilistic_Graphical_Model

Paper

Neural Architectures for Named Entity Recognition - CMU2016

BiLSTM + CRF
HSCRF: Hybrid semi-Markov CRF for Neural Sequence Labeling - USTC2018

Code: https://github.com/ZhixiuYe/HSCRF-pytorch (PyTorch)

Practice

Keras

https://github.com/stephen-v/zh-NER-keras (Keras)

Chinese：基于keras的BiLstm与CRF实现命名实体标注 - 2018

YAO: OK
- 基本信息：中文实体 基于字，采用BIO标注集，实体有Person/Location/Organization，则tags共有3*2+1=7个，模型结构: Embedding -> BiLSTM -> CRF
- 数据处理：对于X，as normal，向量化编码，补零截断；对于Y，向量化编码(不同tag转化为0-6)，随后也要补零截断！注意，Padding的mask_value，X与Y要相同（待研究？）
- Library版本：当(tensorflow=1.10.0, keras=2.2.0, keras-contrib=0.0.2)时CRF没问题
https://github.com/UmasouTTT/keras_bert_ner (Keras)

中文实体，模型结构：BERT + BiLSTM + CRF，数据和其他信息同上(zh-NER-keras)

YAO: OK Detailed Notes

使用keras-bert，基于chinese_L-12_H-768_A-12加载BERT模型，直接融入模型并且参与训练(Finetuning)
【Great】https://github.com/AidenHuen/BERT-BiLSTM-CRF (Keras)

BERT-BiLSTM-CRF的Keras版实现预训练模型为chinese_L-12_H-768_A-12.zip，使用BERT客户端和服务器bert-serving-server和bert-serving-client

YAO: HERE HERE HERE HERE HERE

PyTorch

https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling (PyTorch)

模型结构：Language Model + BiLSTM + CRF，使用了Highway Networks

Article: 一文完全搞懂序列标注算法
https://github.com/fangwater/Medical-named-entity-recognition-for-ccks2017 (PyTorch)

A LSTM+CRF model for the seq2seq task for Medical named entity recognition in ccks2017

YAO: PyTorch实现的CRF
https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Sequence-Labeling (PyTorch)

LM + BiLSTM + CRF

Artichle: NLP | 一文完全搞懂序列标注算法
https://github.com/llcing/BiLSTM-CRF-ChineseNER.pytorch (PyTorch)

PyTorch implement of BiLSTM-CRF for Chinese NER
https://github.com/yanwii/ChinsesNER-pytorch (PyTorch)

基于 BiLSTM + CRF 的中文命名实体识别
https://github.com/chenxiaoyouyou/Bert-BiLSTM-CRF-pytorch (PyTorch)

基于BERT做字嵌入的BiLSTM-CRF序列标注模型
Pytorch BiLSTM + CRF做NER - 2019 (PyTorch)
如何使用BERT来做命名实体识别 - 2019
NLP实战-中文命名实体识别 - 2019 (PyTorch)

Tensorflow

https://github.com/phychaos/transformer_crf (Tensorflow)

Transformer + CRF
https://github.com/Determined22/zh-NER-TF (Tensorflow)

A very simple BiLSTM-CRF model for Chinese Named Entity Recognition 中文命名实体识别

Article: 序列标注：BiLSTM-CRF模型做基于字的中文命名实体识别 - 2017
https://github.com/shiyybua/NER (Tensorflow)

BiRNN + CRF

Article: 基于深度学习的命名实体识别详解 - 2017
https://github.com/pumpkinduo/KnowledgeGraph_NER (Tensorflow)

中文医学知识图谱命名实体识别，模型有：BiLSTM+CRF, Transformer+CRF
【Great】https://github.com/baiyyang/medical-entity-recognition (Tensorflow)

包含传统的基于统计模型(CRF)和基于深度学习(Embedding-Bi-LSTM-CRF)下的医疗数据命名实体识别
https://github.com/Nrgeup/chinese_semantic_role_labeling (Tensorflow)

基于 Bi-LSTM 和 CRF 的中文语义角色标注
https://github.com/dkarunakaran/entity_recoginition_deep_learning (Tensorflow)

Article: Entity extraction using Deep Learning based on Guillaume Genthial work on NER - 2018

Article

【Great】CRF Layer on the Top of BiLSTM 1-8 - 2017 (Chainer)

YAO: 详细介绍Emission Score和Transition Score，以及Path Score, All Path Score和Loss
bi-LSTM + CRF with character embeddings for NER and POS - 2017

Code: https://github.com/guillaumegenthial/tf_ner (Tensorflow)

Chinese: 命名实体识别（biLSTM+crf）
如何理解LSTM后接 CRF？

YAO:

接CRF是为了Model label sequence jointly, instead of decoding each label independently.

CRF所需要的各种特征：转移概率矩阵G，直接是模型待学习的参数；发射概率矩阵H，由模型编码部分(如LSTM)完成对X的编码
CRF 和 LSTM 模型在序列标注上的优劣？

18.4 CNN

https://github.com/nlpdz/Medical-Named-Entity-Rec-Based-on-Dilated-CNN

基于膨胀卷积神经网络（Dilated Convolutions）训练好的医疗命名实体识别工具

18.5 Others

Article

用腻了 CRF，试试 LAN 吧 - Xihu2019

LAN: 逐层改进的基于标签注意力机制的网络

Code: https://github.com/Nealcly/BiLSTM-LAN (PyTorch)

liuyaox / roadmap_nlp

18. Named Entity Recognition (NER)

18.1 Overview

Paper

Article

Practice

Competition

Data

18.2 HMM & CRF

18.3 RNN + CRF

Paper

Practice

Keras

PyTorch

Tensorflow

Article

18.4 CNN

18.5 Others

Article

简介

发行版

贡献者

近期动态

liuyaox / roadmap_nlp .gitee-modal { width: 500px !important; }

18. Named Entity Recognition (NER)

18.1 Overview

Paper

Article

Practice

Competition

Data

18.2 HMM & CRF

18.3 RNN + CRF

Paper

Practice

Keras

PyTorch

Tensorflow

Article

18.4 CNN

18.5 Others

Article

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

liuyaox / roadmap_nlp