• <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>

     

    Datasets for Data Mining and Data Science




    See also

    Data repositories


    • : a collection of crawled Chinese news and blogs in JSON format.
    • , historical data of Macroeconomic Indicators and Market Data.
    • on github, curated by caesar0301.
    • , provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications.
    • .
    • , described in Virtual screening of bioassay data, by Amanda Schierz, J. of Cheminformatics, with 21 Bioassay datasets (Active / Inactive compounds) available for download.
    • , anonymized clicks on gov links.
    • , pilot project with many government and geospatial datasets.
    • data repository.
    • at Texas Advanced Computing Center, supporting data-centric science.
    • a: a 500彩票下载app二维码 equity loans credit data set, mortgage loan level data set, Loss Given Default (LGD) data set and corporate ratings data set.
    • library.
    • , A Guide to Public Data, by Pete Warden, O'Reilly (Jan 2011).
    • , open government data from US, EU, Canada, CKAN, and more.
    • , publicly available data from UK (also .)
    • , central guide for education data resources including high-value data sets, data visualization tools, resources for the classroom, applications created from open data and more.
    • , visualize the world's economy, societies, nature, and industries, with 100 million time series from UN, World Bank, Eurostat and other important data providers.
    • , public data put to good use.
    • , The largest repository of standardized and structured statistical data, with over 25 billion data points, 4.3 billion datasets, 400+ source databases.
    • , datasets for data geeks, find and share Machine Learning datasets.
    • , a clearinghouse of datasets available from the City & County of San Francisco, CA.
    • , a data mining tool that accesses and manipulates TheDataWeb, a collection of many on-line US Government datasets.
    • , Data for Evaluating Learning in Valid Experiments
    • , thousands of economic time series, produced by a number of US Government agencies.
    • , discover and share cool data, connect with interesting people, and work together to solve problems faster.
    • , data from about 150 users, mostly senior management of Enron.
    • , contains open metadata on 20 million texts, images, videos and sounds gathered by Europeana - the trusted and comprehensive resource for European cultural heritage content.
    • , a comprehensive source of US statistics and more
    • , implementations and datasets.
    • , a large catalog of financial data sets.
    • : The Global Data on Events, Location and Tone, described by Guardian as "a big data history of life, the universe and everything."
    • , a gene expression/molecular abundance repository supporting MIAME compliant data submissions, and a curated, online resource for gene expression data browsing, query and retrieval.
    • , geographical and spatial data.
    • , text from millions of books scanned by Google.
    • , financial data including stocks, futures, etc.
    • , comprehensive data on random 10,000 UK companies sampled from HitCompanies, updated automatically using AI/Machine Learning.
    • contains 44 million blog posts made between August 1st and October 1st, 2008.
    • , an open catalog and marketplace for data. You can share, sell, curate, and download data about anything and everything.
    • , includes financial data
    • .
    • , with all data, tasks, and results.
    • , the Koblenz Network Collection, with large network datasets of all types in order to perform research in the area of network mining.
    • project, at making data freely available to everyone.
    • , free access to data for editors and academics to mine stats on the retail industry.
    • , tracking 10 million global fashon searches a month, easily and freely accessible to academics as a valuable resource.
    • , from MIT Whitehead Center for Genome Research.
    • , the data repository of the EU Pascal2 networks.
    • , provides access to market data.
    • , data, reports, statistical yearbooks, press releases, and more from about 70 web sites, including countries from Africa, Europe, Asia, and Latin America.
    • (NSSDC), NASA data sets from planetary exploration, space and solar physics, life sciences, astrophysics, and more.
    • , has many collections of graph and networks from social science, machine learning, scientific computing, and other areas.
    • , assesses the state of open data around the world.
    • , access to over 10,000 datasets including business, education, government, and fun.
    • , many sports databases, including Baseball, Football, Basketball, and Hockey.
    • , genomic-related publications database
    • , a collaboratively curated portal to millions of financial and economic time-series datasets.
    • on housing, stock market, and more from his book Irrational Exuberance.
    • , stores raw and normalized data from microarray experiments.
    • , with Finance, Government, Machine Learning, Science, and other data.
    • , includes historic and status statistics on approximately 100,000 projects and over 1 million registered users' activities at the project management web site.
    • , with data for Soccer, NBA, NFL, NHL, and more.
    • , CMU Datasets Archive.
    • .
    • for large datasets used in machine learning and knowledge discovery research.
    • .
    • , offering datasets, papers, links, and code.
    • , UK/British postcodes with easting, northing, latitude, and longitude.
    • .
    • , structured data from the Common Crawl, the largest public web corpus.
    • , a (virtual) amalgamation of (mostly financial) data from many different sites, allowing users to merge data from different sources
    • .
    • , Language, Graph, Ratings, Advertising and Marketing, Competition
    • , all the data and reviews of the 250 closest businesses for 30 universities for students and academics to explore and research.

    Related


    Sign Up

    By subscribing you500彩票下载app二维码 accept KDnuggets Privacy Policy

    500彩票下载app二维码

  • <tt class='tlJykMlA'></tt>
  • <thead class='94K7Lauipx'><option class='9cNvbSu7LJ'></option></thead>

    <em class='jepytdnyfeW3'><b class='oR1Ys12nk'><td class='6rzzLTN'></td></b></em>

  • <dl class='Td0EmkF'><b class='qoWhaibfJ9'></b></dl>

  • <span class='lS1G'></span>