Points to be considered before choosing a Data Repository
- What is the type of data - Structured, Unstructured, semi- Structured?
- The schema of the data
- What are its performance requirements
- Whether you're working with data at rest or streaming data.
- Data Encryption needs
- Volume of data and whether you need a Big Data system.
- Storage Requirements
- frequency of data access: Frequent Updates, Keep in the vault for a long time.
- Standards set by organizations on the databases and database repositories that can be used.
- Capacity of the data repository is required to handle
- Types of access: In short intervals, Run long-running queries.
- Purpose of Repository: Transactional, Analytical, Archival, Data Warehousing.
- Compatibility of the Data repository with the existing ecosystem of programming languages, tools, and processes.
- Security features of the data repository
- Scalability from a long-term perspective.
- Nature of Application
- Volume of data being ingested
- Depending on the use case a relational DB may not be a good fit For ingesting large volumes :
- AWS Document stores such as MongoDB
- Wide-column stores such as Cassandra
- For product recommendation engine or network of people on social media, graph data structures such as Neo4J and Apache TinkerPop.
- For mining data for analytics Hadoop engine with MapReduce may be a good fit.
Note :
- Very few organizations use one data repository
- We have a preferred enterprise relational database, an open-source relational DB, and an unstructured data source.
- Important to think about the skills you have or want to foster
- cost of various solutions
- Hosting platforms is an important consideration - AWS RDS, Amazon's Aurora, Google relational offerings.