| Neural Network Databases and Learning Data |
|
| Repositories and Databases |
| CMU AI Repository - Neural Networks |
|
This directory contains neural networks, connectionist systems, and neural systems
software and related materials.
|
| DELVE |
|
Delve is a standardised environment designed to evaluate the performance of methods that learn
relationships based primarily on empirical data. Delve makes it possible for users to compare
their learning methods with other methods on many datasets. The Delve learning methods and
evaluation procedures are well documented, such that meaningful comparisons can be made.
|
| EconData |
|
Several hundred thousand economic time series, produced by a number U.S. Government agencies
and distributed in a variety of formats and media, can be found here. They have been put into
a standard, highly efficient, easy-to-use form for personal computers and made publicly available
through this site. These series include national income and product accounts (NIPA), labor statistics,
price indices, current business indicators, industrial production, information on states and regions,
as well as international data.
|
| FuNet - Neural |
|
Artificial neural network repository at FuNet.
|
| National Space Science Data Center |
|
The National Space Science Data Center serves as the permanent archive for NASA space science
mission data. "Space science" means astronomy and astrophysics, solar and space plasma physics,
and planetary and lunar science. As permanent archive, NSSDC teams with NASA's discipline-specific
space science "active archives" which provide access to data to researchers and, in some cases,
to the general public.
|
|
Collection of various documents, software, data, etc. related to Neural Networks.
|
| Netlib Repository |
|
Netlib is a collection of mathematical software, papers, and databases.
|
| Neuroprose |
|
Old archive of neural network papers collected during the period 1990-2000. The mirror can also be
found at here.
|
| NIST Technical Databases |
|
NIST has provides well-documented numeric data to scientists and engineers for use in technical
problem-solving, research, and development. These recommended values are based on data which have
been extracted from the world's literature, assessed for reliability, and then evaluated to select
the preferred values. These data activities are conducted by scientists at NIST and in university
data centers.
|
| RISE |
|
RISE is a distributed repository of online information sources that are used for the empirical analysis
of learning algorithms that generate extraction patterns. The sources included in this repository are
provided by people from the information extraction (IE) and wrapper generation (WG) communities. Both
communities use machine learning algorithms to generate extraction patterns for online information sources.
|
| Reinforcement Learning Repository |
|
The purpose of this web site is to provide a centralized resource for research on Reinforcement Learning
(RL), which is currently an actively researched topic in artificial intelligence. This site contains
resources on both RL research and applications to areas such as robotics and industrial problems.
Resources available include technical publications, sample testbeds, implementations of various algorithms,
online simulation packages, workshop information, and discussion forums for a variety of research areas
within RL.
|
| GANN |
|
WWW repository of resources on Evolutionary Design of Neural Architectures (EDNA).
|
| Solar Data Services |
|
Data pertaining to solar activity and the upper atmosphere. Beyond the obvious considerations
of heat and light, some examples of these direct and indirect solar influences are the effects
on short-wave radio communications, navigation, use of satellites for communication and navigation,
hazards to humans and instruments in space, electrical power transmission, geomagnetic prospecting,
gas pipeline monitoring, and possibly weather and human and animal behaviour.
|
| StatLib |
|
StatLib, a system for distributing statistical software, datasets, and information by electronic mail,
FTP and WWW. StatLib started out as an e-mail service and some of the organization still reflects that
heritage.
|
| Time Series Data Library |
|
This is a collection of about 800 time series, drawn from many different fields.
The time series may be freely copied and used, provided this source is clearly acknowledged.
|
| UCI Machine Learning Repository |
|
This is a repository of databases, domain theories and data generators that are used by the machine
learning community for the empirical analysis of machine learning algorithms.
|
| WIPO Automated Categorization Datasets |
|
WIPO automated categorization datasets provides information about the datasets' contents, how get
access, how to subscribe to the discussion list, and links to various relevant background sources
of information.
|
|
| Specific Task Data Sets |
| 20 Newsgroups |
|
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned
(nearly) evenly across 20 different newsgroups. To the best of my knowledge, it was originally collected
by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly
mention this collection.
|
| Document Understanding Database |
|
There are five concepts, expressed as predicates, to be learned.
They concern five logical components that is possible to
identify in a sample of business letters, namely sender, receiver,
logotype, reference number and date.
|
| DMEF |
|
Four individual data sets, each containing customer buying history for about 100,000 customers of
nationally known catalog and non-profit database marketing businesses are available through DMEF
to approved academic researchers for use within academic situations. Corporate names are anonymous
and customer names and addresses have been removed, but the business type is indicated. ZIP codes
have been retained (if possible) to provide a potential link to Census ZIP level demographics.
|
| Faces |
|
Face recognition using neural networks. 32 images of each of 20 students in the class were taken
with a variety of head positions and facial expressions. These images were then used to train and
test neural networks to recognize individual people, and to recognize different face poses.
|
| KDD Cup Data
2000,
2001,
2002,
2003,
2004
|
|
Three real-world datasets are available.
|
| Mesh Design |
|
Training set for learning on a Finite Element Mesh Design problem. This is a complete set of
the training examples. Ten different FE mesh models have been used as a source of examples.
|
| Natural Language Learning Data |
|
This directory contains data for natural-language learning experiments in a form suitable for
Inductive Logic Programming (ILP) systems.
|
| NIST Special Database 4 |
|
The NIST database of fingerprint images contains 2000 8-bit gray scale fingerprint image pairs.
Each image is 512-by-512 pixels with 32 rows of white space at the bottom and classified using
one of the five classes. The database is evenly distributed over each of the five classifications
with 400 fingerprint pairs from each class.
|
| PKDD'99 Discovery Challenge Datasets |
|
PKDD'99 Discovery Challenge conference data. Data is divided into 2 groups: financial and medical.
|
| Sensor Data of a Mobile Robot |
|
A set of data sets, where each data set is represented in first order logic. Valid restrictions of
all data sets are: facts can be linked using the argument of type TIME, and there are never two
different facts concerning the same sensor and the same point in time. Each data set corresponds
to learning disjoint concepts at one level. The levels are organized in a hierarchy.
|
| Stage Transitions Data Set |
|
The work on reverse engineering a human controlling a simulation of an F-16
aircraft required the definition of a flight plan (sequence of flight
stages). In that work a controller was semi-automatically induced from
behavioural traces using Quinlan's C4.5. For each stage and for each aircraft
control a tree was induced. When using the induced controller as an auto-pilot
the values for the aircraft controls are obtained by interpreting the
corresponding decision tree for that control and for that flight stage. In
the transition from stage to stage were done by means of hand-coded
rules.
|
| Text Categorization |
|
Test collections are standard data sets used to measure the effectiveness
of information retrieval systems. Most were originally developed to support
research on IR, but practitioners often find them useful as well.
|
| TIC |
|
The Insurance Company (TIC) Benchmark. This datamining benchmark dataset is ideally
suited for testing your datamining algorithms or using it as a case for datamining
lab sessions. The data was supplied by Sentient Machine Research. The main question is:
Can you predict who would be interested in buying a caravan insurance policy and give
an explanation why?
|
| The 4 Universities Data Set |
|
This data set contains WWW-pages collected from computer science departments
of various universities in January 1997 by the World Wide Knowledge Base project
of the CMU text learning group. The 8,282 pages were manually classified into 7 categories.
|
| The Penn Treebank Project |
|
The Penn Treebank Project annotates naturally-occuring text for linguistic structure.
|
| TREC |
|
Text REtrieval Conference data. The organizations supplying data have
either provided it free of charge, or for a fee. The data is copyrighted, and
also has commercial value as data, so one must be careful to use it only for
research purposes rather than for its informational uses.
|
| Web Index Recommendation Data |
|
Data for multi-instance learning based web index recommendation. The 113 web index pages are labeled
by 9 volunteers according to their interests. Therefore there are 9 data sets. If the volunteer is
interested in at least one linked page of the index, then the web index page is labeled as positive.
Otherwise the index page is labeled as negative. There is no label for the linked pages.
|
| Web Page Data |
|
Our dataset consists of 10,000 web documents classified into 10 equally-sized categories each
containing 1,000 web documents. Each category was chosen by selecting 4 distinct themes from
the real world, namely Banking and Finance, Programming Languages, Science and Sport. The second
step was to select 2/3 categories from each theme giving a total of 10 categories. This method
of category selection allows the user to select a wide range of categorization experiments with
varying degrees of categorization difficulty.
|
| WordSimilarity-353 |
|
The WordSimilarity-353 Test Collection contains two sets of English word pairs along with
human-assigned similarity judgements. The collection can be used to train and/or test computer
algorithms implementing semantic similarity measures (i.e., algorithms that numerically estimate
similarity of natural language words).
|
|
| See Also... |
|
DatGen,
DBCat,
CoCoMac,
CoCoDat,
NeuroDatabase,
FlyBrain,
IBVD,
INDP,
NeuronDB,
SynapseWEB,
WormBase,
ZFIN,
PDB
|
|