CompTIA Data+ da0-001
data ·- 1.0 Data Concepts and Environments
- 2.0 Data Mining
- 3.0 Data Analysis
-
4.0 Visualization
- 4.1 Given a scenario, translate business requirements to form a report
- 4.2 Given a scenario, use appropriate design components for reports and dashboards
- 4.3 Given a scenario, use appropriate methods for dashboard development
- 4.4 Given a scenario, apply the appropriate type of visualization
- 4.5 Compare and contrast types of reports
- 5.0 Data Governance, Quality, and Controls
- CompTIA Data+ (DA0-001) Acronym List
- Hardware
- Software
- Reference
1.0 Data Concepts and Environments
1.1 Identify basic concepts of data schemas and dimensions
- Databases
- Relational
- Non-relational
- Data mart/data warehousing/data lake
- Online transactional processing (OLTP)
- Online analytical processing (OLAP)
- Schema concepts
- Snowflake
- Star
- Slowly changing dimensions
- Keep current information
- Keep historical and current information
1.2 Compare and contrast different data types
- Date
- Numeric
- Alphanumeric
- Currency
- Text
- Discrete vs. continuous
- Categorical/dimension
- Images
- Audio
- Video
1.3 Compare and contrast common data structures and file formats
- Structures
- Structured
- Defined rows/columns
- Key value pairs
- Unstructured
- Undefined fields
- Machine data
- Structured
- Data file formats
- Text/Flat file
- Tab delimited
- Comma delimited
- JavaScript Object Notation (JSON)
- Extensible Markup Language (XML)
- Hypertext Markup Language (HTML)
- Text/Flat file
2.0 Data Mining
2.1 Explain data acquisition concepts
- Integration
- Extract, transform, load (ETL)
- Extract, load, transform (ELT)
- Delta load
- Application programming interfaces (APIs)
- Data collection methods
- Web scraping
- Public databases
- Application programming interface (API)/web services
- Survey
- Sampling
- Observation
2.2 Identify common reasons for cleansing and profiling datasets
- Duplicate data
- Redundant data
- Missing values
- Invalid data
- Non-parametric data
- Data outliers
- Specification mismatch
- Data type validation
2.3 Given a scenario, execute data manipulation techniques
- Recoding data
- Numeric
- Categorical
- Derived variables
- Data merge
- Data blending
- Concatenation
- Data append
- Imputation
- Reduction/aggregation
- Transpose
- Normalize data
- Parsing/string manipulation
2.4 Explain common techniques for data manipulation and query optimization
- Data manipulation
- Filtering
- Sorting
- Date functions
- Logical functions
- Aggregate functions
- System functions
- Query optimization
- Parametrization
- Indexing
- Temporary table in the query set
- Subset of records
- Execution plan
3.0 Data Analysis
3.1 Given a scenario, apply the appropriate descriptive statistical methods
- Measures of central tendency
- Mean
- Median
- Mode
- Measures of dispersion
- Range
- Max
- Min
- Distribution
- Variance
- Standard deviation
- Frequencies/percentages
- Percent change
- Percent difference
- Confidence intervals
- Range
3.2 Explain the purpose of inferential statistical methods
- t-tests
- Z-score
- p-values
- Chisquared
- Hypothesis testing
- Type I error
- Type II error
- Simple linear regression
- Correlation
3.3 Summarize types of analysis and key analysis techniques
- Process to determine type of analysis
- Review/refine business questions
- Determine data needs and sources to perform analysis
- Scoping/gap analysis
- Type of analysis
- Trend analysis
- Comparison of data over time
- Performance analysis
- Tracking measurements against defined goals
- Basic projections to achieve goals
- Exploratory data analysis
- Use of descriptive statistics to determine observations
- Link analysis
- Connection of data points or pathway
- Trend analysis
3.4 Identify common data analytics tools
(The intent of this objective is NOT to test specific vendor feature sets nor the purposes of the tools.)
- Structured Query Language (SQL)
- Python
- Microsoft Excel
- Rapid mining
- IBM Cognos
- IBM SPSS Modeler
- IBM SPSS
- SAS
- Tableau
- Power BI
- Qlik
- MicroStrategy
- BusinessObjects
- Apex(Apex in Salesforce: Apex is a strongly typed, object-oriented programming language that allows developers to execute flow and transaction control statements on Salesforce servers in conjunction with calls to the API)
- Dataroma
- Domo
- AWS QuickSight
- Stata
- Minitab
4.0 Visualization
4.1 Given a scenario, translate business requirements to form a report
- Data contant
- Filtering
- Views
- Data range
- Frequency
- Audience for repart
- Datribution list
4.2 Given a scenario, use appropriate design components for reports and dashboards
- Report cover page
- Instructions
- Summary
- Observations and insights
- Design elements
- Color schemes
- Layout
- Font size and style
- Key chart elements
- Titles
- Labels
- Legends
- Corporate reporting standards/style guide
- Branding
- Color codes
- Logos/trademarks
- Watermark
- Documentation elements
- Version number
- Reference data sources
- Reference dates
- Report run date
- Data refresh date
- Frequently asked questions (FAQs)
- Appendix
4.3 Given a scenario, use appropriate methods for dashboard development
- Dashboard considerations
- Data sources and attributes
- Field definitions
- Dimensions
- Measures
- Continuous/live data feed vs. static data
- Consumer types
- C-level executives
- Management
- External vendors/stakeholders
- General public
- Technical experts
- Data sources and attributes
- Development process
- Mockup/wireframe
- Layout/presentation
- Flow/navigation
- Data story planning
- Approval granted
- Develop dashboard
- Deploy to production
- Mockup/wireframe
- Delivery considerations
- Subscription
- Scheduled delivery
- Interactive (drill down/roll up)
- Saved searches
- Filtering
- Static
- Web interface
- Dashboard optimization
- Access permissions
4.4 Given a scenario, apply the appropriate type of visualization
- Line chart
- Pie chart
- Bubble chart
- Scatter plot
- Bar chart
- Histogram
- Waterfall
- Heat map
- Geographic map
- Tree map
- Stacked chart
- Infographic
- Word cloud
4.5 Compare and contrast types of reports
- Static vs. dynamic reports
- Point-in-time
- Real time
- Ad-hoc/one-time report
- Self-service/on demand
- Recurring reports
- Compliance reports (e.g.,financial, health, and safety)
- Risk and regulatory reports
- Operational reports [e.g.,performance, key performance indicators (KPIs)]
- Tactical/research report
5.0 Data Governance, Quality, and Controls
5.1 Summarize important data governance concepts
- Access requirements
- Role-based
- User group-based
- Data use agreements
- Release approvals
- Security requirements
- Data encryption
- Data transmission
- De-identify data/data masking
- Storage environment requirements
- Shared drive vs. cloud based vs. local storage
- Use requirements
- Acceptable use policy
- Data processing
- Data deletion
- Data retention
- Entity relationship requirements
- Record link restrictions
- Data constraints
- Cardinality
- Data classification
- Personally identifiable information (PII)
- Personal health information (PHI)
- Payment card industry (PCI)
- Jurisdiction requirements
- Impact of industry and governmental regulations
- Data breach reporting
- Escalate to appropriate authority
5.2 Given a scenario, apply data quality control concepts
- Circumstances to check for quality
- Data acquisition/data source
- Data transformation/intrahops
- Pass through
- Conversion
- Data manipulation
- Final product (report/dashboard, etc.)
- Automated validation
- Data field to data type validation
- Number of data points
- Data quality dimensions
- Data consistency
- Data accuracy
- Data completeness
- Data integrity
- Data attribute limitations
- Data quality rule and metrics
- Conformity
- Non-conformity
- Rows passed
- Rows failed
- Methods to validate quality
- Cross-validation
- Sample/spot check
- Reasonable expectations
- Data profiling
- Data audits
5.3 Explain master data management (MDM) concepts
- Processes
- Consolidation of multiple data fields
- Standardization of data field names
- Data dictionary
- Circumstances for MDM
- Mergers and acquisitions
- Compliance with policies and regulations
- Streamline data access
CompTIA Data+ (DA0-001) Acronym List
ACRONYM | DEFINITION |
---|---|
ANOVA | Analysis of Variance |
API | Application Programming Interface |
AWS | Amazon Web Services |
BI | Business Intelligence |
CRM | Customer Relationship Management |
CSV | Comma-separated Values |
ELT | Extract, Load, Transform |
ETL | Extract, Transform, Load |
FAQs | Frequently Asked Questions |
GDPR | General Data Protection Regulation |
HTML | Hypertext Markup Language |
JSON | JavaScript Object Notation |
KPI | Key Performance Indicator |
MDM | Master Data Management |
NoSQL | Not Only Structured Query Language |
OLAP | Online Analytical Processing |
OLTP | Online Transaction Processing |
P&L | Profit and Loss |
PCI | Payment Card Industry |
Portable Document Format | |
PHI | Personal Health Information |
PII | Personally Identifiable Information |
RDBMS | Relational Database Management System |
SDLC | Software Development Life Cycle |
SQL | Structured Query Language |
XML | Extensible Markup Language |
Hardware
- Desktop/laptop
- High processing power for large volume analyses
- Lower processing power for smaller volume analyses
- Internet access
- Cloud environment
Software
- SQL environment to run scripts(SQL Lite, Management Studio, etc.)
- Eclipse
- Anaconda
- R Studio
- Database modeling tool
- Microsoft Office Suite
- Visualization tools(Tableau, Power BI, etc.)
- Reporting tools
- Sample datasets(Kaggle)