Individual and Group Assignment: BUSS6002 Data Science in BusinessIndividual and Group Assignment Learning Outcomes Assessed 1. Identify types and sources of data, data quality issues and interact with data storage systems 2. Explain and apply foundational techniques of data analysis to business problems 3. Categorise business problems in order to select appropriate data analysis techniques and tools 4. Interpret and evaluate the outputs of data analysis techniques and tools 5. Evaluate data science capabilities of businesses and apply data science process models 7. Communicate effectively with technical and non-technical audiences General Notices 1. All plots, analyses and technical work must be completed using Python 2. The late penalty is 5% of the assignment mark per day starting at the due date. 3. The assignment is marked anonymously. 4. Collusion and plagiarism are obvious to markers and will not be tolerated. Academic Integrity Please be aware of the University’s academic integrity policies. Issues of academic integrity are taken seriously by the University and the BUSS6002 team. If you are suspected of dishonest behaviour you will be referred to the Academic Integrity Office who will process your case. This may result in delayed results, mark reduction, failure of the unit or expulsion. Dishonest behaviour includes but is not limited to: using contract cheating services plagiarism such as copying phrases, paragraphs etc not appropriately referencing It is unfair to leave your group members feeling like they need to complete work you were supposed to complete. It is even more unfair (and dishonest) to accept marks for group work if you have let others do your work for you. You are encouraged to refer to the full policy and guidelines on the University of Sydney website https://www.sydney.edu.au/students/academic-integrity.html (https://www.sydney.edu.au/students/academic-integrity.html) You can access the Academic Honesty module on canvas at any time https://canvas.sydney.edu.au/courses/29833 (https://canvas.sydney.edu.au/courses/29833) Group Work This assignment requires you to work with others in a group. You are expected to equally contribute.2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 2/9 From the University’s website: Acting with academic honesty in group work also means that you have to commit fully to participating in group discussions and meeting agreed deadlines. It is unfair to leave your group members feeling like they need to complete work you were supposed to complete. It is even more unfair (and dishonest) to accept marks for group work if you have let others do your work for you. Each group will be awarded a group mark per the marking criteria. Individual adjustments to grades may be made if there is a dispute in a group or the quality/quantity of contributions made by individuals are significantly different. In such a case the unit coordinator will seek meeting minutes and peer review reports from individuals within a group to decide on individual marks. Our recommendations for effective group work: 1. Discuss problems with your group members as soon as you can 2. Immediately notify the unit coordinator (buss6002.admin@sydney.edu.au) if you have problems that cannot be resolved – do not wait until after the submission date 3. Work collaboratively on each section of the final report. 1. Do not divide the group report into sections for each student to complete. 2. Doing so might be disastrous if one group member does not pull their weight or is unable to complete the work due to changes in personal circumstance. 4. Set deadlines well ahead of the due date 1. Do not wait until the day before to finalise the report 2. This will lead to a poor-quality report or in the worst case an incomplete submission 5. Keep accurate meeting minutes 1. In case of a dispute between group members we will require some evidence of the issue 2. If you do not record meeting minutes or collect other forms of evidence, it will be difficult for us to take any action 3. We also suggest that you communicate with your group members in English Group Size and Forming Groups Groups have been formed. Refer to this announcement for details https://canvas.sydney.edu.au/courses/36716/discussion_topics/702139 (https://canvas.sydney.edu.au/courses/36716/discussion_topics/702139) Meeting Minutes Each group is required to submit at least 3 meeting minutes as the appendix attached to the final report. A template will be provided for preparing meetings minutes. You may use the template2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 3/9 provided or a template of your choice. Peer Feedback We may ask for peer review from each student within a group. The instructions about how to do this will be released later. Allowed Packages For the EDA component of this task you must use only the Python packages covered in tutorials e.g. Pandas, NumPy, SciPy, Matplotlib, sqlite3, scikit-learn or statsmodels. Use of automatic EDA packages will result in a Fail grade for this component. For modelling you may use more advanced models or feature engineering and packages that provide such functionality. However, you are required to provide a technical explanation of the model and feature engineering. If you are not confident in your understanding or explanation, then we recommend that you use simpler techniques. Description Please note that the scenario for this assignment is fictional. Modelling and forecasting the volatility of financial asset returns is of great importance to financial institutions world wide. Volatility forecasts are used in risk management, derivative pricing and hedging, market making, market timing, portfolio selection and many other financial activities (Engle and Patton, 2000) (https://sydney.primo.exlibrisgroup.com/permalink/61USYD_INST/2rsddf/cdi_repec_primary_tafquant f_v_3a1_3ay_3a2001_3ai_3a2_3ap_3a237_245_htm) . Volatility is usually interpreted as a measure of risk, i.e., an increase in volatility points to an increase in the dispersion of returns, which then leads to an increase in the investment risk of the underlying asset. Recent research has shown that "realised variance (RV)" is an efficient measure of volatility for daily returns of a financial asset. The RV of day is given by the sum of squared intra-daily returns: where is the -th intra-daily return (typically calculated using prices sampled at 5-minute intervals). Because is always positive, it is often more convenient to model its natural logarithm . You are hired by a Sydney-based hedge fund, Alpha Capitals, to develop a predictive model for the log RV of daily stock returns of the Commonwealth Bank of Australia (CBA). The management team at Alpha Capitals has provided you with a dataset containing the observed values of log RV and a set of potential features constructed using intra-daily returns (based on prices sampled at 5-minute intervals) of CBA from January 7, 2003 to August 20, 2021. Note that you are not required to calculate the values of log RV yourself; they are provided to you in the dataset. All the features are constructed using only past information to ensure that they are t RVt = ∑ m j=1 r2 t,j rt,j j RVt log RVt2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 4/9 free of look-ahead-bias. E.g., a feature for 2021-08-20 only uses all the information up to and including 2021-08-19, and this is true for all of the provided features. Please pay attention to the time indices inside "[ ]" after each variable description in the data dictionary file; they tell you exactly which days are used in the calculation of the feature. The features are organised into three groups: 1. Log RV based: features constructed using past values of the log RV itself. These features are designed to capture any serial dependencies of log RV, i.e., past values of log RV might be predictive of future values of log RV. 2. Range based: features constructed using past values of the range of intra-daily returns, i.e., calculated by taking the difference between the highest and the lowest returns within a day. The range based features provide a measure of dispersion or scale of the past intra-daily returns. That is, the higher the range, the more dispersed the returns are within a day. 3. Quantile range based: features constructed using past values of the quantile range of intra daily returns. The quantile range is the difference between the 95th percentile and the 5th percentile of the returns within a day. The quantile range can be interpreted as an outlier robust version of the range, since it only considers the middle 90% of the observations. You are hired as a data analytics professional to help Alpha Capitals address two key questions: 1. Which features are predictive of future log RV of CBA at least one-day-ahead? 2. Can a model be built to forecast one-day-ahead the log RV of CBA? Dataset The management team of Alpha Capitals has provided you with two files: 1. (https://canvas.sydney.edu.au/courses/36716/files/19476840?wrap=1) cba_log_rv.sqlite (https://can vas.sydney.edu.au/courses/36716/files/19476840/download) 2. (https://canvas.sydney.edu.au/courses/36716/files/19476846/download) data_dictionary.txt (http s://canvas.sydney.edu.au/courses/36716/files/19476846/download) The first file is an SQLite file containing the daily observations of log RV and the features; the second file contains descriptions of the variables and the database structure. Individual Task Due Date : 23:59 September 27, 2021 Weight: 15%2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 5/9 Max Length: 500 words +/- 10% (excluding code) Type: Individual Submission Submission Type: Jupyter Notebook (.ipynb) via Canvas Submit Here (https://canvas.sydney.edu.au/courses/36716/assignments/328384) In this task you are required to produce a short EDA vignette that explores how the features from one of the following sets are predictive of future log RV at least one-day-ahead. 1. Log RV based features. 2. Range based features. 3. Quantile range based features. Your vignette will consist of a Jupyter Notebook, in which you will use Markdown cells to provide commentary on your EDA process. Due to the small word limit we do not expect a large amount of detail. This vignette should be used as a way to share your preliminary findings with your group which is to be expanded on in the group report. As a team you will need to assign each team member to one of the three feature sets. Each team member is expected to work independently as you will be marked separately for this component of the assignment. However, you are allowed to discuss your hypothesis, approaches and seek suggestions or feedback from your team members. Suggested Structure 1. Heading – include which attribute you are exploring 2. Main body (approx. 400 words) – this should contain the plots, tables, corresponding commentary and your Python code that make up your exploratory analysis 3. Further work (approx. 100 words) – this is a small note to your teammates about how your analysis could be further refined and used in your final report Submission Items These items are to be submitted via Canvas individually: 1. A Jupyter Notebook 1. Filename format: BUSS6002_FEATURESETX_SID.ipynb 2. If you chose feature-set 2 (range based features) and your SID is 55554444 then your filename would be BUSS6002_FEATURESET2_55554444.ipynb2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 6/9 Marking Criteria View on google docs (https://docs.google.com/document/d/1G6G8AL80ujChRejJwN0n23N6OVrj6OkquyiCVoJXR44/edit? usp=sharing) Group Task Due Date : 23:59 November 5, 2021 Weight: 25% Max Length: 15 pages, excluding references, meeting minutes and appendices Type: Group Submission Submission Type: PDF and Jupyter Notebook (.ipynb) via Canvas The context has not clearly informed the analysis. as o ed the analysis. e co te t s carefully considered and has clearly informed the analysis. FA PS CR DI HD Results Presentation 4 marks LO2, LO6 The figures or tables produced are inappropriate or do not support the analysis. The figures produced are of sub standard quality. Grammar and spelling pose significant barriers to the reader’s comprehension. The figures or tables produced are mostly appropriate and support the analysis. The figures produced are of poor visual quality. Written language presents some barrier to the reader’s comprehension. The figures or tables produced are mostly appropriate and support the analysis. The figures produced are of average visual quality. The written analysis may contain some grammatical or spelling errors, but none that pose any significant barrier to reader comprehension. The figures or tables produced are appropriate and support the analysis. The figures produced are of good visual quality and easy to read. Written language demonstrates outstanding precision, clarity, and concision. The figures or tables produced are appropriate and support the analysis. The figures produced are of high visual quality, are well appointed and easy to read. Written language demonstrates outstanding precision, clarity, and concision. FA PS CR DI HD Notebook Presentation 3 marks Features of the Jupyter Notebook are not used appropriately. The notebook is Features of the Jupyter Notebook are used mostly appropriately. The notebook is Features of the Jupyter Notebook have been used appropriately. The notebook is Features of the Jupyter Notebook have been used appropriately Features of the Jupyter Notebook have been used to pleasing effect. The notebook Submit Report (%24CANVAS_OBJECT_REFERENCE%24/assignments/ge694994258caeb743b4361359ce34cb0) Submit Notebook (%24CANVAS_OBJECT_REFERENCE%24/assignments/g02e458d095b051583d2ab6d25b1af17b)2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 7/9 This task requires you to produce a final report which is the deliverable for the Alpha Capitals project. This report must address the two key questions. To increase the integrity of your findings, others must be able to replicate your work. Therefore, the management team of Alpha Capitals has asked that you provide all code used to produce the report. This code should be runnable without error, neat and documented. Q1: Which features are predictive of future log RV of CBA at least one-day-ahead? To answer this question, you should provide a cohesive EDA that combines and extends your individual analyses. You may use any combination of EDA techniques and traditional statistical testing. This might include non-graphical or graphical EDA, clustering or preliminary statistical testing. You must provide accompanying written explanations of your intentions and subsequent findings. Finally you should provide a summary statement. It is up to you how you choose to answer this question. You will be assessed on the depth, quality of written analysis/justification and the clarity/presentation of results. We have not covered how to do statistical testing in tutorials. However, we have equipped you with the skills and knowledge to research this on your own should you feel the need to take this approach. Q2: Can a model be built to forecast one-day-ahead the log RV of CBA? This question will require you to carefully consider the goal of the problem and the available data. You must provide a description of how one could build a model to answer question 2 including: technical explanation of the model discussion of how such a model could be evaluated a thorough discussion of assumptions and shortcomings of said model a discussion of how to interpret the model generally and an example interpretation of the prototype model It is expected that you will have to make simplifications or assumptions to make the model work. It is important that you clearly explain your steps and provide justification for your work. You must include a proof of concept in your Jupyter Notebook. The prototype only needs to be a close facsimile of the model you describe in your report that implements the core components. In your report you may describe more advanced techniques which could be incorporated later. Reflection and Future Work Provide a reflection of how you have or have not followed the CRISP-DM process model during this project and which types of analytical capabilities you exercised. You must: explain how the work you have done in each stage or question of the project aligns with the different phases of the process model2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment 8/9 identify where and when you were exercising specific analytic capabilities provide references where to appropriately justify your reflection Suggested Structure 1. Q1 2. Q2 3. Reflection 4. References 5. Appendices 6. Meeting Minutes (DOWNLOAD TEMPLATE (https://canvas.sydney.edu.au/courses/36716/files/18423711?wrap=1) (https://canvas.sydney.edu.au/courses/36716/files/18423711/download?download_frd=1) ) Submission Items These items are to be submitted via Canvas. Only one group member needs to submit: 1. A report in PDF format 1. Filename format: BUSS6002_REPORT_GROUPX.pdf 2. If you are in Group 10 then your filename will be BUSS6002_REPORT_GROUP10.pdf 2. A Jupyter Notebook that contains all the code used in the development of your report. 1. The notebook will be used to assist our marking process and as proof of work. 2. Filename format: BUSS6002_GROUPX.ipynb 3. If you are in Group 42 then your filename will be BUSS6002_GROUP42.ipynb Marking Criteria View on google docs (https://docs.google.com/document/d/1_RryxHPTwuFP- MltwGqvekirs7AkEB4fg6mLG0JWW5w/edit?usp=sharing)2021/9/20 下午7:04 Individual and Group Assignment: BUSS6002 Data Science in Business https://canvas.sydney.edu.au/courses/36716/pages/individual-and-group-assignment |