399 Star 1.4K Fork 1.5K

GVPopenGauss / openGauss-server

 / 详情

【测试类型:工具功能】【测试版本:6.0.0】【升级】3.0.5升级到6.0.0后core

待办的
缺陷
创建于  
2024-04-30 10:24

【标题描述】:3.0.5升级到6.0.0后core
【测试类型:工具功能】【测试版本:6.0.0】【升级】3.0.5升级到6.0.0后core

【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
【操作系统和硬件信息】(查询命令: cat /etc/system-release, uname -a):
CentOS Linux release 7.6.1810 (Core)
Linux kwemhisprc10436 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
【测试环境】(单机/1主x备x级联备):
一主两备
【被测功能】:
升级
【测试类型】:
功能
【数据库版本】(查询命令: gaussdb -V):
gsql (openGauss 3.0.5 build 4db7019a) compiled at 2024-01-04 20:18:28 commit 0 last mr
(openGauss OM 6.0.0-RC1 build e5bd52f7) compiled at 2024-04-24 00:26:10 commit 0 last mr
【预置条件】:

【操作步骤】(请填写详细的操作步骤):
1.3.0.5-6.0.0 灰度升级
【预期输出】:
成功
【实际输出】:
升级后产生core,部分节点宕机

(gdb) bt
#0  BBOX_CreateCoredump (file_name=file_name@entry=0x0) at bbox_create.cpp:404
#1  0x000055afc672cc12 in bbox_handler (sig=<optimized out>, si=0x7f45105c7b70, uc=0x7f45105c7a40) at gs_bbox.cpp:112
#2  bbox_handler (sig=<optimized out>, si=0x7f45105c7b70, uc=0x7f45105c7a40) at gs_bbox.cpp:102
#3  <signal handler called>
#4  0x00007f46875c3387 in raise () from /lib64/libc.so.6
#5  0x00007f46875c4a78 in abort () from /lib64/libc.so.6
#6  0x000055afc64bcecd in errfinish (dummy=<optimized out>) at elog.cpp:797
#7  0x000055afc7244808 in ProcessSyncRequests () at knl_usync.cpp:440
#8  0x000055afc7244a0b in PageWriterSyncWithAbsorption () at knl_usync.cpp:834
#9  0x000055afc6aac04c in HandlePageWriterMainInterrupts () at pagewriter.cpp:1263
#10 HandlePageWriterMainInterrupts () at pagewriter.cpp:1252
#11 0x000055afc6aacf25 in ckpt_move_queue_head_after_flush () at pagewriter.cpp:730
#12 ckpt_pagewriter_main_thread_flush_dirty_page () at pagewriter.cpp:774
#13 0x000055afc6ab09a2 in ckpt_pagewriter_main_thread_loop () at pagewriter.cpp:1320
#14 ckpt_pagewriter_main () at pagewriter.cpp:1807
#15 0x000055afc6a55fea in GaussDbAuxiliaryThreadMain<(knl_thread_role)46> (arg=0x7f4557bef500) at postmaster.cpp:11954
#16 GaussDbThreadMain<(knl_thread_role)46> (arg=0x7f4557bef500) at postmaster.cpp:14183
#17 0x000055afc6a31775 in InternalThreadFunc (args=<optimized out>) at postmaster.cpp:14775
#18 0x00007f4687962ea5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007f468768bb0d in clone () from /lib64/libc.so.6
(gdb) 

输入图片说明

备注:升级前及过程中逻辑复制业务:

select * from pg_create_logical_replication_slot('slot_pre', 'mppdb_decoding');
drop table if exists t_logical_replic_pre; create table t_logical_replic_pre(v1 int, v2 varchar(20));
insert into t_logical_replic_pre values(1, 'test');
update t_logical_replic_pre set v2 = 'text';
update t_logical_replic_pre set v2 = 'text';
delete from t_logical_replic_pre;
select * from pg_logical_slot_peek_changes('slot_pre', NULL, 11);

备注:3.0.5-5.0.2也偶现该问题,coredump信息一致

【原因分析】:

  1. 这个问题的根因
  2. 问题推断过程
  3. 还有哪些原因可能造成类似现象
  4. 该问题是否有临时规避措施
  5. 问题解决方案
  6. 预计修复问题时间

【日志信息】(请附上日志文件、截图、coredump信息):

【测试代码】:

评论 (5)

lixin 创建了缺陷

Hey @lixin, Welcome to openGauss Community.
All of the projects in openGauss Community are maintained by @opengauss_bot.
That means the developers can comment below every pull request or issue to trigger Bot Commands.
Please follow instructions at Here to find the details.

Hi @lixin, please use the command /sig xxx to add a SIG label to this issue.
For example: /sig sqlengine or /sig storageengine or /sig om or /sig ai and so on.
You can find more SIG labels from Here.
If you have no idea about that, please contact with @zhangxubo , @xiangxinyong .

lixin 负责人设置为张悦萌
lixin 关联项目设置为openGauss 6.0.0 community
lixin 优先级设置为次要
lixin 关联分支设置为master
lixin 添加协作者陈栋
lixin 修改了描述
lixin 修改了描述
lixin 修改了描述
陈栋 修改了备注
陈栋 修改了备注

通过arm环境进行复现以及测试进行脚本复现,core文件堆栈都为:

(gdb) bt
#0 0x0000aaad150722dc in sys_wait4 (ru=0x0, options=options@entry=1073741824, stat_addr=stat_addr@entry=0xfffcd56c5abc, upid=upid@entry=3260448) at bbox_syscall_support.cpp:36
#1 sys_waitpid (pid=pid@entry=3260448, status=status@entry=0xfffcd56c5abc, options=options@entry=1073741824) at bbox_syscall_support.cpp:516
#2 0x0000aaad150736d8 in BBOX_GetClonePidResult (iClonePid=iClonePid@entry=3260448, pstArgs=pstArgs@entry=0xfffcd56c5b40, iCloneErrno=iCloneErrno@entry=9) at bbox_threads.cpp:572
#3 0x0000aaad15073a18 in BBOX_GetAllThreads (enType=, pDone=pDone@entry=0xaaad1506a5c0 <BBOX_FinishDumpFile(void*)>, pDoneArgs=pDoneArgs@entry=0xfffcd56c5c88, pCallback=) at bbox_threads.cpp:700
#4 0x0000aaad1506a87c in BBOX_CreateCoredump (file_name=0xfffcd56c5db8 "/home/corefile//core-gaussdb-3225412-2024_05_08_15_28_09-bbox.lz4", file_name@entry=0x0) at bbox_create.cpp:440
#5 0x0000aaad15075384 in bbox_handler (sig=11, si=0xfffcd56c62f0, uc=) at gs_bbox.cpp:112
#6 bbox_handler (sig=11, si=0xfffcd56c62f0, uc=) at gs_bbox.cpp:102
#7
#8 0x0000aaad14eb3650 in ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.cpp:1009
#9 0x0000aaad14dcabf8 in SearchCatCacheMiss (cache=cache@entry=0xfffcefd5e080, nkeys=nkeys@entry=1, hashValue=hashValue@entry=98600738, hashIndex=hashIndex@entry=802, v1=v1@entry=3081, v2=v2@entry=0, v3=v3@entry=0, v4=v4@entry=0, level=level@entry=13) at catcache.cpp:1665
#10 0x0000aaad14dcb560 in SearchCatCacheInternal (level=13, v4=0, v3=0, v2=0, v1=3081, nkeys=1, cache=0xfffcefd5e080) at catcache.cpp:1366
#11 SearchCatCache1 (cache=0xfffcefd5e080, v1=3081) at catcache.cpp:1247
#12 0x0000aaad14df2034 in SearchSysCache1 (cacheId=cacheId@entry=49, key1=) at syscache.cpp:976
#13 0x0000aaad14ddbcd4 in RelationInitIndexAccessInfo (relation=relation@entry=0xfffcd6618610, index_tuple=index_tuple@entry=0x0) at relcache.cpp:2715
#14 0x0000aaad14de8f6c in RelationBuildDescExtended (insertIt=, buildkey=, targetRelId=) at relcache.cpp:2348
#15 RelationBuildDesc (targetRelId=, insertIt=insertIt@entry=true, buildkey=buildkey@entry=true) at relcache.cpp:2199
#16 0x0000aaad14dec8f8 in RelationIdGetRelation (relationId=, relationId@entry=3081) at relcache.cpp:3333
#17 0x0000aaad156c3f4c in relation_open (relationId=relationId@entry=3081, lockmode=lockmode@entry=1, bucketId=) at heapam.cpp:1472
#18 0x0000aaad156f26c0 in index_open (relation_id=relation_id@entry=3081, lockmode=lockmode@entry=1, bucket_id=) at indexam.cpp:166
#19 0x0000aaad156f1038 in systable_beginscan (heap_relation=heap_relation@entry=0xfffcd662ce38, index_id=index_id@entry=3081, index_ok=index_ok@entry=true, snapshot=snapshot@entry=0x0, nkeys=nkeys@entry=1, key=key@entry=0xfffcd56c7f70) at genam.cpp:336
#20 0x0000aaad151e9028 in get_extension_oid (extname=extname@entry=0xaaad1a604af8 "age", missing_ok=missing_ok@entry=true) at extension.cpp:119
#21 0x0000aaad153d1444 in InitAGESqlPluginHookIfNeeded () at postgres.cpp:979
#22 0x0000aaad153d3c6c in LoadSqlPlugin () at postgres.cpp:7749
#23 0x0000aaad153ddd44 in PostgresMain (argc=, argv=argv@entry=0xfffcefdde0d8, dbname=, username=) at postgres.cpp:8228
#24 0x0000aaad153376bc in BackendRun (port=port@entry=0xfffcd56ca578) at postmaster.cpp:9589
#25 0x0000aaad15364d48 in GaussDbThreadMain<(knl_thread_role)1> (arg=) at postmaster.cpp:11954
#26 0x0000aaad15337750 in InternalThreadFunc (args=) at postmaster.cpp:14775
#27 0x0000fffe76c387ac in ?? () from /usr/lib64/libpthread.so.0
#28 0x0000fffe76b8548c in ?? () from /usr/lib64/libc.so.6

输入图片说明

复现环境:arm
复现步骤:
升级前及升级中执行逻辑复制槽/创建兼容性B库,逻辑复制槽业务与issue描述相同,
兼容性B库:
drop database if exists db_testb_pre;
create database db_testb_pre dbcompatibility 'B';
drop table if exists t_testb_pre cascade;
create table t_testb_pre(id int,name text);
insert into t_testb_pre values (generate_series(1, 1000), 'testb');
select count(*) from t_testb_pre;
update t_testb_pre set name = 'pre' where id > 500;

去掉InitAGESqlPluginHookIfNeeded后,不再生成core,请对应的开发修改一下

张悦萌 添加协作者张悦萌
张悦萌 负责人张悦萌 修改为liuy
张悦萌 取消协作者陈栋
张悦萌 取消协作者张悦萌

补充:5.0.0-6.0.0灰度升级也复现该问题

(gdb) bt
#0  BBOX_CreateCoredump (file_name=file_name@entry=0x0) at bbox_create.cpp:404
#1  0x0000564d91e55f22 in bbox_handler (sig=<optimized out>, si=0x7f0ee82377f0, uc=0x7f0ee82376c0) at gs_bbox.cpp:112
#2  bbox_handler (sig=<optimized out>, si=0x7f0ee82377f0, uc=0x7f0ee82376c0) at gs_bbox.cpp:102
#3  <signal handler called>
#4  ResourceOwnerEnlargeCatCacheRefs (owner=0x0) at resowner.cpp:1009
#5  0x0000564d91b785d3 in SearchCatCacheMiss (cache=cache@entry=0x7f0ee546e080, nkeys=nkeys@entry=1, hashValue=hashValue@entry=98600738, 
    hashIndex=hashIndex@entry=802, v1=v1@entry=3081, v2=v2@entry=0, v3=0, v4=0, level=13) at catcache.cpp:1676
#6  0x0000564d91b78f14 in SearchCatCacheInternal (level=13, v4=0, v3=0, v2=0, v1=<optimized out>, nkeys=1, cache=0x7f0ee546e080)
    at catcache.cpp:1366
#7  SearchCatCache1 (cache=0x7f0ee546e080, v1=3081) at catcache.cpp:1247
#8  0x0000564d91ba44fc in SearchSysCache1 (cacheId=cacheId@entry=49, key1=<optimized out>) at syscache.cpp:976
#9  0x0000564d91b8b01d in RelationInitIndexAccessInfo (relation=relation@entry=0x7f0ee66d2b08, index_tuple=index_tuple@entry=0x0)
    at relcache.cpp:2715
#10 0x0000564d91b9adf2 in RelationBuildDescExtended (insertIt=<optimized out>, buildkey=<optimized out>, targetRelId=<optimized out>)
    at relcache.cpp:2348
#11 RelationBuildDesc (targetRelId=<optimized out>, insertIt=insertIt@entry=true, buildkey=buildkey@entry=true) at relcache.cpp:2199
#12 0x0000564d91b9e910 in RelationIdGetRelation (relationId=<optimized out>, relationId@entry=3081) at relcache.cpp:3333
#13 0x0000564d9254774a in relation_open (relationId=relationId@entry=3081, lockmode=lockmode@entry=1, bucketId=<optimized out>)
    at heapam.cpp:1472
#14 0x0000564d9257944f in index_open (relation_id=relation_id@entry=3081, lockmode=lockmode@entry=1, bucket_id=<optimized out>)
    at indexam.cpp:166
#15 0x0000564d92577ca3 in systable_beginscan (heap_relation=heap_relation@entry=0x7f0ee66c01d0, index_id=index_id@entry=3081, 
    index_ok=index_ok@entry=true, snapshot=snapshot@entry=0x0, nkeys=nkeys@entry=1, key=key@entry=0x7f0ee8238d20) at genam.cpp:336
#16 0x0000564d91fede56 in get_extension_oid (extname=extname@entry=0x564d97750d5e "age", missing_ok=missing_ok@entry=true) at extension.cpp:119
#17 0x0000564d921fba65 in InitAGESqlPluginHookIfNeeded () at postgres.cpp:979
#18 0x0000564d921fe268 in LoadSqlPlugin () at postgres.cpp:7743
#19 0x0000564d92208ca3 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x7f0f37645c68, dbname=<optimized out>, username=<optimized out>)
    at postgres.cpp:8222
#20 0x0000564d9215aca5 in BackendRun (port=port@entry=0x7f0ee823aa40) at postmaster.cpp:9589
#21 0x0000564d9218a388 in GaussDbThreadMain<(knl_thread_role)1> (arg=0x7f0f4c6494e0) at postmaster.cpp:11954
#22 0x0000564d9215ad35 in InternalThreadFunc (args=<optimized out>) at postmaster.cpp:14775
#23 0x00007f10809d4fed in ?? () from /usr/lib64/libpthread.so.0
#24 0x00007f108090818f in clone () from /usr/lib64/libc.so.6

输入图片说明

王恬静 添加协作者申正
lixin 取消协作者申正
lixin 添加协作者pengjiong
pengjiong 取消协作者pengjiong
pengjiong 添加协作者douxin
lixin 修改了描述

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
13084139 opengauss bot 1686829535
C++
1
https://gitee.com/opengauss/openGauss-server.git
git@gitee.com:opengauss/openGauss-server.git
opengauss
openGauss-server
openGauss-server

搜索帮助