Points to be considered before choosing a Data Repository

  1. What is the type of data - Structured, Unstructured, semi- Structured?
  2. The schema of the data
  3. What are its performance requirements
  4. Whether you're working with data at rest or streaming data.
  5. Data Encryption needs
  6. Volume of data and whether you need a Big Data system.
  7. Storage Requirements
  8. frequency of data access: Frequent Updates, Keep in the vault for a long time.
  9. Standards set by organizations on the databases and database repositories that can be used.
  10. Capacity of the data repository is required to handle
  11. Types of access: In short intervals, Run long-running queries.
  12. Purpose of Repository: Transactional, Analytical, Archival, Data Warehousing.
  13. Compatibility of the Data repository with the existing ecosystem of programming languages, tools, and processes.
  14. Security features of the data repository
  15. Scalability from a long-term perspective.
  16. Nature of Application
  17. Volume of data being ingested
  18. Depending on the use case a relational DB may not be a good fit For ingesting large volumes :
  19. AWS Document stores such as MongoDB
  20. Wide-column stores such as Cassandra
  21. For product recommendation engine or network of people on social media, graph data structures such as Neo4J and Apache TinkerPop.
  22. For mining data for analytics Hadoop engine with MapReduce may be a good fit.

Note :

  1. Very few organizations use one data repository
  2. We have a preferred enterprise relational database, an open-source relational DB, and an unstructured data source.
  3. Important to think about the skills you have or want to foster
  4. cost of various solutions
  5. Hosting platforms is an important consideration - AWS RDS, Amazon's Aurora, Google relational offerings.