术语库和语料库合集

2022/11/21 20:59
术语库和语料库合集

翻译实践中,我们常常会遇到很多字典中查询不到的词汇和表达,这个时候就可以借助术语库和语料库来解决问题。


在线术语库

中国关键词:

http://www.china.org.cn/chinese/china_key_words/

中国特色话语对外翻译标准化术语库:

http://210.72.20.108/index/index.jsp

中国核心词汇:

https://www.cnkeywords.net/index

中国思想文化术语:

https://www.chinesethought.cn/TermBase.aspx

联合国术语库:https://unterm.un.org/UNTERM/pohttps://unterm.un.org/UNTERM/portal/welcome

术语在线:

https://www.termonline.cn/index

国家教育研究院术语库:

https://terms.naer.edu.tw/download/

明代职官中英辞典:

https://escholarship.org/uc/uci_libs

中国规范术语:

https://shuyu.cnki.net/#/

Grand Dictionnaire Terminologique:

https://gdt.oqlf.gouv.qc.ca/

TERMIUM:

https://www.btb.termiumplus.gc.ca/tpv2alpha/alpha-eng.html?lang=eng

语帆术语宝:

http://termbox.lingosail.com/

微软术语库:

https://www.microsoft.com/zh-cn/language

世界卫生组织术语库:

https://www.who.int/home/cms-decommissioning

电子工程术语表:

https://www.maximintegrated.com/cn/glossary/definitions.mvp/terms/all

FreeMdict 100GB超大离线词库下载:

https://downloads.freemdict.com/

一本词典(专利术语库):

http://www.onedict.com/

国家标准《物流术语》 :https://logistics.nankai.edu.cn/_upload/article/76/83/1c5da71e4b8e9838ae0843c8cb3d/3a1617ed-acfb-4504-9e18-c079e98e6154.pdf

冬奥会术语查询网站:

owgt.lingosail.com/

音乐术语查询:

http://dictionary.t-classical.com/

European Union Language and terminology:

https://eur-lex.europa.eu/summary/glossary.html?locale=en

IATE (Interactive Terminology for Europe) EU’s terminology database:

https://iate.europa.eu/home

香港法律中英术语:

https://www.elegislation.gov.hk/glossary/chi

Magic Search:

https://magicsearch.org/

Linguee:

https://www.linguee.com/

The Free Dictionary:

https://www.thefreedictionary.com/

Glosbe:

https://glosbe.com/

在线语料库

国内

BCC语料库:

http://bcc.blcu.edu.cn/

语料库在线:

http://www.aihanyu.org/cncorpus/index.aspx

北京大学中国语言学研究中心:

ccl.pku.edu.cn

北外语料库语言学:

bfsu-corpus.org/

现代汉语平衡语料库:

https://www.sinica.edu.tw/SinicaCorpus/

古汉语语料库/近代汉语标记语料库/汉籍电子文献:

https://www.sinica.edu.tw/ch

树图数据库:

http://treebank.sinica.edu.tw/

搜文解字:

http://words.sinica.edu.tw/

媒体语言语料库(MLC):

https://ling.cuc.edu.cn/RawPub/

哈工大信息检索研究室对外共享语料库资源:

http://ir.hit.edu.cn/demo/ltp/Sharing_Plan.htm

泛话语地区汉语共时语料库(LiVaC):

http://www.livac.org/index.php?lang=sc

中文语言资源联盟:

http://www.chineseldc.org/

中央研究院近代汉语标记语料库:

http://lingcorpus.iis.sinica.edu.tw/early/

《红楼梦》汉英平行语料库:

http://corpus.usx.edu.cn/hongloumeng/images/shiyongshuoming.htm

国外

BNC——英国国家语料库(British National Corpus):

http://www.natcorp.ox.ac.uk/

BOE——柯林斯英语语料库(the Bank of English):

http://www.collinslanguage.com/language-resources/dictionary-datasets/

ANC——美国国家语料库(American National Corpus):

https://www.anc.org/

兰开斯特汉语语料库 (LCMC):

http://ota.oucs.ox.ac.uk/scripts/download.php?otaid=2474

SKETCH ENGINE多语言语料库:

https://www.sketchengine.eu/

BASE——英国学术口语语料库(British Academic Spoken English Corpus):

https://warwick.ac.uk/fac/soc/al-archive-deleted/research/base

Lextutor:

http://www.lextutor.ca/

My Memory:

https://mymemory.translated.net/

TAUS:

https://datamarketplace.taus.net/

TTMEM:

https://www.ttmem.com/terminology/download-translation-memory/

TinyTM:

http://tinytm.sourceforge.net/

DGT Translation Memory:

https://magmatranslation.com/en/free-translation-memory/

European Parliament Proceedings Parallel Corpus 1996-2011:

https://statmt.org/europarl/

University of Maryland Parallel Corpus Project: The Bible:

http://users.umiacs.umd.edu/~resnik/parallel/bible.html

Aligned Hansards of the 36th Parliament of Canada:

https://www.isi.edu/research_groups/nlg/home

EU Publication Offices:

https://op.europa.eu/en/web/general-publications/publications

Wikimedia Downloads:

https://dumps.wikimedia.org/backup-index.html

United Nations Parallel Corpus:

https://conferences.unite.un.org/UNCorpus/

European language pairs:

https://www.statmt.org/wmt13/translation-task.html#download

parallel corpus search:

http://paralela.clarin-pl.eu/

UM-Corpus: A Large English-Chinese Parallel Corpus(自然语言处理与中葡机器翻译实验室):

http://nlp2ct.cis.umac.mo/um-corpus/um-corpus-license.html

Clarin Parallel corpora:

https://www.clarin.eu/resource-families/parallel-corpora

The PKU 863 Chinese-English Parallel Corpus:

https://www.lancaster.ac.uk/fass/projects/corpus/863parallel/

BYU corpora: 

https://corpus.byu.edu/


其它子语料库


A collection of translated literature:

https://opus.nlpl.eu/Books.php

A collection of EU Translation Memories provided by the JRC:

https://opus.nlpl.eu/DGT.php

Documents from the Catalan Goverment:

https://opus.nlpl.eu/DOGC.php

European Central Bank corpus:

https://opus.nlpl.eu/ECB.php

European Medicines Agency documents:

https://opus.nlpl.eu/EMEA.php

The EU bookshop corpus:

https://opus.nlpl.eu/EUbookshop.php

The European constitution/European Parliament Proceedings:

https://opus.nlpl.eu/EUconst.php

French-English Gigal-Word Corpus:

https://opus.nlpl.eu/giga-fren.php

GNOME localization files:

https://opus.nlpl.eu/GNOME.php

News stories in various languages:

https://opus.nlpl.eu/GlobalVoices.php

English WaC corpus:

https://opus.nlpl.eu/hrenWaC.php

JRC-Acquis- legislative EU texts:

https://opus.nlpl.eu/JRC-Acquis.php

KDE4 – KDE4 localization files (v.2):

https://opus.nlpl.eu/KDE4.php

KDEdoc – the KDE manual corpus:

https://opus.nlpl.eu/KDEdoc.php

MBS – Belgisch Staatsblad corpus:

https://opus.nlpl.eu/MBS.php

memat – Xhosa/English parallel data:

https://opus.nlpl.eu/memat.php

MontenegrinSubs – Montenegrin movie subtitles:

https://opus.nlpl.eu/MontenegrinSubs.php

MultiUN – Translated UN documents:

https://opus.nlpl.eu/MultiUN.php

News Commentary, v9.0, v9.1:

https://opus.nlpl.eu/News-Commentary-v11.php

OfisPublik – Breton – French parallel texts:

https://opus.nlpl.eu/OfisPublik.php

OO – the OpenOffice.org corpus:

https://opus.nlpl.eu/OpenOffice-v2.php

OpenOffice.org 3 corpus:

https://opus.nlpl.eu/OpenOffice-v3.php

OpenSubtitles – the opensubtitles.org corpus:

https://opus.nlpl.eu/OpenSubtitles-v1.php

OpenSubtitles2016 – snapshot from 2016:

https://opus.nlpl.eu/OpenSubtitles-v2016.php

OpenSubtitles2018 – new complete version:

http://opus.nlpl.eu/OpenSubtitles-v2018.php

ParaCrawl corpus:

https://opus.nlpl.eu/ParaCrawl.php

ParaCrawl corpus:

http://opus.nlpl.eu/ParCor

ParCor – A Parallel Pronoun-Coreference Corpus/PHP – the PHP manual corpus:

http://opus.nlpl.eu/ParCor

Regeringsförklaringen – a tiny example corpus:

http://opus.nlpl.eu/RF.php

SETIMES – A parallel corpus of the Balkan languages:

http://opus.nlpl.eu/SETIMES.php

SPC – Stockholm Parallel Corpora:

https://opus.nlpl.eu/SPC.php

Tatoeba – A DB of translated sentences:

http://opus.nlpl.eu/Tatoeba.php

TedTalks hr-en:

http://opus.nlpl.eu/TedTalks.php

TED Talks 2013:

http://opus.nlpl.eu/TED2013.php

Tanzil – A collection of Quran translations:

http://opus.nlpl.eu/Tanzil.php

TEP – The Tehran English-Persian subtitle corpus:

http://opus.nlpl.eu/TEP.php

Ubuntu – Ubuntu localization files:

http://opus.nlpl.eu/Ubuntu.php

UN – Translated UN documents:

http://opus.nlpl.eu/UN.php

Wikipedia – translated sentences from Wikipedia:

http://opus.nlpl.eu/Wikipedia.php

WikiSource – (small en-sv sample only:

http://opus.nlpl.eu/WikiSource.php

WMT News Test Sets:

http://opus.nlpl.eu/WMT-News.php

The Xhosa – English Navy corpus:

http://opus.nlpl.eu/XhosaNavy.php