來源:
南開管理評論
一旦選擇一個有趣而适當的選題,設計并執行一個合理的數據收集,制定一個引人注目的“伏筆”和發展一個堅實的理論,這些艱巨而又令人興奮的工作完成後,人們就很容易坐視不前、麻痹放松,在各種方法和結果中遊走。接下來的工作似乎很簡單直接,也許有點按部就班——向讀者報告:(1)如何獲得數據以及為什麼獲得數據;(2)如何分析數據以及發現了什麼。對于AMJ這一系列的關于如何在AMJ發表的叙述性文章,确實有許多讀者在等待着它的出版。因此,如果我們這篇文章缺乏說服力,我們希望它至少能提供一些信息。
作為作者,我們不得不承認,在寫這些章節的時候,我們已經屈服于放松注意力的誘惑。我們聽到同事們說,他們把這些部分交給研究團隊的初級成員,讓他們在草稿寫作中“練手”,好像這些部分的重要性不如開頭部分、假設發展和讨論部分那麼重要。也許的确如此。但作為過去兩年來現任編輯團隊的成員,我們面對這樣一個現實:“方法”和“結果”部分,即使不是最關鍵的部分,也往往在審稿人如何評價稿件方面發揮着重要作用。如果這些章節并沒有對數據收集程序和結果提供清晰、詳細的描述,反而常常讓審稿人感到困惑,并會就作者使用的研究程序和發現,提出比他們回答更多的問題。相比之下,一個有效的陳述可以對作者說服讀者相信他們的理論論點(或其中的一部分)得到支持的程度産生至關重要的影響。高質量的“方法”和“結果”部分也傳達了表現作者責任心的積極信号。知道他們在準備這些章節時是謹慎而嚴謹的,對于讨論是建議拒絕還是建議修改請求的外審來說可能會有所不同。
為了更好地理解審稿人共同關心的問題,我們在任期内對每一封被拒稿件的決定信進行了評估。我們發現有幾個問題在被拒絕的手稿中比在要求修改的手稿中出現的頻率要高得多。我們的評估結果,如果不令人驚訝的話,揭示了這兩個部分的一系列非常一緻的主要問題,我們總結為“三個度”(3C:Completeness,Clarity,Credibility):完整度、清晰度和可信度。
方法
完整度
在審查我們的決定書時,可能與“方法”部分相關的最常見的問題是作者未能提供他們獲得數據的方式、他們使用的構念的操作化以及他們進行的分析類型的完整描述。當作者收集了他們的數據(主要的數據收集)時,他們不僅要詳細解釋發生了什麼,而且要詳細解釋他們為什麼要做出這些決定。Bommer、Dierdorff和Rubin(2007)關于群組層面的公民行為和工作績效的研究就是一個很好的例子。我們在他們的方法中了解了如何聯系參與者(如,在現場,由研究的第一作者)、如何獲得數據(如,在現場培訓室,從20-30名員工組成的小組中)、鼓勵參與的方式(如公司總裁和研究人員的信件)以及模型中不同構念由誰報告的信息(即員工、主管和主管的經理)。此外,這些作者還報告了有關其數據收集的其他相關信息。例如,他們指出,員工和他們的主管從來沒有被安排在同一個房間裡完成他們的問卷調查。此外,他們還報告了一套“制衡”制度,以确保主管報告所有直接下屬的績效。提供這些細節,除了在個人和團隊層面對分析樣本的特征進行全面描述之外,還允許審稿人評估研究設計的優缺點。盡管強調自己研究的優勢是合理的,但是報告足夠的關于數據收集的優勢和潛在弱點的細節比隐藏重要細節的方法更可取,因為某些妥協或缺陷也會帶來好處。考慮使用雪球抽樣方法在兩個有幾個月間隔的時間段中收集數據的方法。這種方法的一個缺點可能是,如果研究人員隻聯系第一撥參與者參與第二撥,那麼在兩個階段上匹配的樣本将比産生的樣本要小。但是,這種方法也有一定的優勢。特别是,可以使用大量的單階段參與者(即參與第一波或第二撥的參與者)直接解決響應偏差和代表性問題。
在其他許多情況下,研究的數據是從檔案資料裡獲得的。在這裡,研究人員可能無法獲得數據收集過程的所有細節,但報告的完整性同樣重要。大多數(如果不是全部)存檔數據集都附帶有技術報告或使用手冊,這些報告或手冊提供了大量詳細信息。有了這些,研究人員可以嘗試複制原始數據收集中發現的數據收集過程和度量的細節。一個很好的例子是使用全國縱向調查和青年群體數據(NLSY79),見Lee,Gerhart,Weller和Trevor(2008)。對于其他存檔數據集,作者自己構建數據集,可能是通過對公司文件、媒體帳戶進行編碼,或者從其他來源構建變量。在這些情況下,有必要完整地描述他們如何識别樣本,有多少觀察結果因不同原因丢失,他們如何進行編碼,以及進行了哪些判斷調用。
不管研究人員使用了什麼類型的數據集,這一部分的目标都是相同的。第一,作者應該披露研究過程的方式、目的和原因。例如,包含一個完整的度量列表(和适當的項目)的附錄通常是一個不錯的選擇。第二,完整度使讀者能夠評估所采用方法的優缺點,總的來說,這會給研究帶來更積極的印象。第三,方法部分的一個主要目标應該是提供足夠的信息,如果某人使用完全相同的程序和數據,他們可以複制研究并得到相同的結果。在閱讀了“方法”部分之後,讀者應該有信心,他們可以複制主要的數據收集或編譯與作者報告的相同的存檔數據庫。
清晰度
太多的時候,作者沒有清楚地解釋他們所做的事情。盡管有許多潛在的例子,但一個典型的、非常常見的問題涉及測量的描述。審稿人經常關注諸如“我們調整了項目”或“我們使用了多個來源的項目”這樣的語言。事實上,在評估我們的決定書時,不報告措施是如何調整的,是與測量有關的模式問題。理想情況下,作者可以通過使用完整的、有效的構念測量來避免這些問題。當這不可能時,就必須為修改提供理由,理想情況下,還需要對改變的措施提供額外的經驗驗證。如果最初沒有包括這些信息,審稿人總是會要求它;預先提供這些信息可以提高論文獲得修改的機會。
另一個非常常見的清晰度問題涉及變量編碼的合理性。編碼決策幾乎在每一個定量研究中都有,但在涉及檔案數據集、實驗設計和基于定性反應的數字代碼分配的研究中可能最常見。例如,Ferrier(2001)使用結構化内容分析編碼新聞标題,以測量競争性攻擊。在一個清晰的例子中,Ferrier以一種有組織的方式,用直截了當的語言描述了研究團隊如何為每個維度做出編碼決策,以及這些決策如何導緻與競争攻擊維度的構成定義相匹配的操作。
可信度
作者可以在他們的“方法”部分做一些簡單的事情來增強可信度。第一,重要的是要說明為什麼要選擇一個特定的樣本。審稿人經常質疑為什麼要使用某個特定的樣本,尤其是在不明顯的情況下,為什麼感興趣的現象在所使用的樣本中很重要。例如,在Tangirala和Ramanujam關于建言、個人控制和組織身份認同的研究中,作者通過描述他們為什麼選擇抽樣一線醫院護士來測試他們的假設來打開“方法”,并指出:(1)“他們完全有能力觀察病人護理中不安全狀況的早期迹象,并将報告以引起醫院的注意”和(2)“人們越來越認識到護士願意說出護理過程中的問題對于提高患者安全性和減少可避免的醫療錯誤(如使用錯誤的藥物)至關重要,這是美國患者傷亡的主要原因”(2008:1,193)。第二,在描述一個構念所使用的測量之前,最好先對其概念定義進行總結。這不僅使讀者避免在論文中翻來覆去地尋找構成性定義,而且如果做得好,還将減少讀者對論文所提出的理論是否與所進行的測試相匹配的擔憂。第三,解釋為什麼使用特定的操作性定義總是很重要的。例如,組織績效有許多維度。有些可能與文章的假設有關,有些則與之無關。我們經常看到作者在沒有正當理由的情況下引入某些維度,從而使外審們“大吃一驚”。在有替代措施的情況下,作者應報告他們考慮了哪些其他措施以及為什麼沒有選擇這些措施。如果數據集中有替代措施,通常最好報告使用這些替代措施時所獲得的結果。第四,證明模型規範和數據分析方法的合理性是至關重要的。我們經常看到作者引入了控制變量,卻沒有充分說明為什麼要對它們進行控制。對于某些類型的數據,存在多種可能的分析方法。作者需要證明為什麼使用某種方法而不是其他方法。例如,面闆數據可以使用固定效應模型或随機效應模型進行分析。多事件曆史分析方法可以分析存活數據。每種方法都有其特定的假設。在某些情況下,需要進行額外的分析來做出選擇(例如,通過Hausman檢驗在面闆數據的固定效應模型和随機效應模型之間進行選擇)。
結果
完整度
有效地撰寫“結果”部分并不是一件容易的事,尤其是當一個人的理論框架和/或研究設計很複雜時,完整度就顯得尤為重要。對于初學者來說,包括均值、标準差和相關性的表格是一個“低風險的果實”。這個表格中的信息可能沒有直接檢驗假設,但它描繪了數據的總體情況,這對于判斷研究結果的可信度至關重要。例如,變量之間的高相關性常常引起人們對多重共線性的擔憂。相對于變量平均值的較大标準差可能會引起對異常值的關注。事實上,在數據分析過程中,檢查數據範圍和異常值是一個很好的做法,以避免主要由少數異常值驅動的顯著結果。表中報告的變量的分布特性(如平均值、最小值和最大值)本身就是信息。例如,在一項關于CEO繼任的研究中,衡量不同類型CEO接班人的變量方法可以告訴樣本中來自不同來源的新CEO的分布情況。這些分布性質描述了CEO接班現象,具有重要的現實意義。
在報告結果時,重要的是指定分析單位、樣本量和每個模型中使用的因變量。當這些信息因模型而異時,這一點尤為重要。以Arthaud-Day、Certo、Dalton和Dalton(2006)為例。這些作者研究了公司财務重述後的執行官和董事的離職率。他們有四個因變量:CEO離職率、CFO離職率、外部董事離職率和審計承諾成員離職率。在CEO和CFO離職模型中,由于他們能夠識别離職月份,因此他們以“CEO/CFO”為分析單元構建數據,并使用Cox模型來檢驗高管離職的時機。
CEO離職模型的樣本量為485,CFO離職模型的樣本量為407。
相比之下,在檢驗外部董事和審計委員會成員的更替時,由于Arthaud-Day和她的同事無法确定外部董事和審計委員會成員離職的月份,他們以董事/審計委員會成員年數為分析單位構建數據,并用Logistic回歸分析其離職的可能性。
外部董事離職模型的樣本量為2668,審計委員會成員離職的樣本量為1327。
可以效仿的是,像Arthaud-Day和同事們提供的那些細緻的描述,可以幫助讀者校正他們對結果的解釋,并防止外審提出關于澄清的問題。
清晰度
“結果”部分的目的是回答已經提出的研究問題,并為假設提供經驗證據(或者解釋證據不足)。然而,我們經常看到,作者并不把他們的發現與研究的假設聯系起來。我們還看到作者在結果部分報告了結果,但在讨論部分讨論了結果與假設的聯系,或者相反,過早地開始讨論結果中發現的含義,而不是在讨論中這樣做。在這些情況下,作者未能以清晰的方式描述結果對研究重點話題的啟示。為了避免這個問題,在報告相關結果之前先對每個假設進行總結是有幫助的。試試這個格式:“假設X表明……我們發現……在模型中……在表中……因此,假設X是(或不)支持的。”盡管這種格式看起來可能機械,甚至無聊,但它是一種非常有效的清楚報告結果的方法(另見Bem,1987)。我們鼓勵并歡迎作者嘗試用新穎而清晰的方法來呈現結果。我們還建議作者按順序報告與他們的假設相關的結果,從第一個假設開始,然後繼續到最後一個假設,除非有一些令人信服的理由表明不按照順序報告更好。
在許多研究中,結果并不支持所有的假設。然而,那些沒有統計學意義的結果和那些與預測相反的結果,同那些得到支持的結果一樣重要。然而,正如一位編輯指出的那樣:“如果結果與預期相反,我發現作者往往會試圖将其‘掃地出門’。”不用說,有時這樣的結果反映了不充分的理論(例如,假設是錯誤的,或者至少表明存在其他的論點和預測)。然而,其他時候,不受支持的結果是讨論部分新鮮的、批判性思維的重要素材。關鍵是,所有的結果是否重要——支持或反對假設——都需要有直接的和清楚的應對。
以相同的順序跨章節引用變量,也是一個很好的做法。例如,在“方法”部分中描述它們的測量,在表中列出它們,并在“結果”部分中以相同的順序讨論結果。這種一緻性提高了論述的清晰度,有助于讀者既能跟蹤稿件,又能快速地找到信息。它還為作者提供了一個查驗清單,以便他們記住需要覆蓋相關信息(例如,模型中包含的變量在方法部分和/或相關矩陣中沒有提及)。
可信度
盡管論文的每一個部分都會讓讀者對其可信度産生積極或消極的影響(例如,充分的理論分析和嚴謹的研究設計),“結果”部分對此仍有用武之地,作者可以在此處着力以增強研究發現被感知到的可信度:第一,向讀者展示為什麼某人對結果的解釋是正确的。例如,交互作用項的負系數可能表明,随着調節因子值的增加,預測因子的正效應減弱、消失,甚至變為負。繪制一個顯著的交互效應有助于我們把發現可視化,從而證明發現是否與預期假設一緻。Aiken和West(1991)提供了一些關于如何在回歸中繪制交互效應的“黃金法則”。除此之外,确定簡單斜率在統計上是否顯著,在評估一個人的結果是否完全支持假設時通常很重要;由Preacher, Curran, Bauer (2006)開發的技術在這些計算中很有幫助。
第二,如果研究中可以使用替代測量、方法和/或模型規範,但作者隻使用一種可能的選擇來報告結果,讀者可能會有這樣的印象:作者“精心挑選”了與假設相符的發現。補充分析和穩健性檢驗可以解決這些問題。例如,Tsai和Ghoshal(1998)研究了企業内部網絡中業務部門位置的價值創造作用。盡管他們在單個業務單元層面提出了假設,但他們從二元層面的數據中生成了幾個業務單元屬性的測量。這些步驟引起了對分析水平和結果可靠性的一些擔憂。為了解決這些問題,他們還分析了二元層面的數據,得到了一緻的結果。
第三,即使結果在統計學上有顯著性,讀者還是會問,那又怎樣?統計上顯著的影響不一定是實際重要的影響。作者通常在“讨論”中讨論研究的實際意義;然而,他們可以在結果中進行和報告額外的分析,以證明研究結果的實際相關性。Barnett和King(2008)關于溢出危害的研究就是一個很好的例子。這些作者提出了以下假設:“一家公司的錯誤會損害同一行業的其他公司。”(Barnett & King,2008:1,153)。除了報告預測因子的統計顯著性外,作者還提供了信息來傳達這種溢出的平均規模。他們報告說,“在發生平均3.5名員工受傷的事故後,與事故發生地同一行業的化工企業預計将損失其股票價格的0.15%”,以及“在發生導緻員工死亡的事故後,該公司預計将再損失0.83%”(Barnett & King,2008:1,160)。在一些情況下,作者可能想讨論小效應規模的含義,也許是注意到了解釋給定因變量中的方差有多困難,或者,在這種情況下,一個實驗,注意到即使對自變量的操縱非常小,也發現了顯著的影響(Prentice & Miller,1992)。
結論
改進“方法”和“結果”部分聽起來可能并不令人興奮或具有挑戰性。因此,作者在寫作時往往不太注意。有時,這些章節的寫作任務被委派給研究小組的初級成員。然而,在編輯的經驗中,我們發現這些部分通常在審稿人對稿件的評價中起着重要的,甚至是關鍵的作用。我們敦促作者在完成這些部分時要更加小心。在這方面,3C規則的完整度、清晰度和可信度是一個值得借鑒的訣竅。
作者:
Yan (Anthea) Zhang
Rice University
Jason D. Shaw
University of Minnesota
校譯:
《南開管理評論》編輯部周軒
原文出處:
Academy of Management Journal 2012, Vol. 55, No. 1,8-12.
英文原文:
FROM THE EDITORS
PUBLISHING INAMJ—PART 5: CRAFTING THE METHODS AND RESULTS
Once the arduous, but exciting, work of selecting an intriguing and appropriate topic, designing and executing a sound data collection, crafting a compelling “hook,” and developing a solid theory is finished, it is tempting to sit back, relax, and cruise through the Methods and Results. It seems straightforward, and perhaps a little mundane, to report to the readers (1) how and why the data were obtained; (2) how the data were analyzed and what was found. Indeed, it is unlikely that many readers ofAMJhave waited with bated breath for an entertaining narrative in this installment of the Publishing inAMJeditorial series. If we fall short of being compelling, therefore, we hope to at least be informative.
As authors ourselves, we have, admittedly, succumbed to the temptation of relaxing our concentration when it is time to write these sections. We have heard colleagues say that they pass off these sections to junior members of their research teams to “get their feet wet” in manuscript crafting, as though these sections were of less importance than the opening, hypothesis development, and Discussion sections. Perhaps this is so. But as members of the current editorial team for the past two years, we have come face-to-face with the reality that the Methods and Results sections, if not the most critical, often play a major role in how reviewers evaluate a manuscript. Instead of providing a clear, detailed account of the data collection procedures and findings, these sections often leave reviewers perplexed and raise more questions than they answer about the research procedures and findings that the authors used. In contrast, an effective presentation can have a crucial impact on the extent to which authors can convince their audiences that their theoretical arguments (or parts of them) are supported. High-quality Methods and Results sections also send positive signals about the conscientiousness of the author(s). Knowing that they were careful and rigorous in their preparation of these sections may make a difference for reviewers debating whether to recommend a rejection or a revision request.
To better understand the common concerns raised by reviewers, we evaluated each of our decision letters for rejected manuscripts to this point in our term. We found several issues arose much more frequently in rejected manuscripts than they did in manuscripts for which revisions were requested. The results of our evaluation, if not surprising, revealed a remarkably consistent set of major concerns for both sections, which we summarize as “the three C’s”: completeness, clarity, and credibility.
THE METHODS
Completeness
In the review of our decision letters, perhaps the most common theme related to Methods sections was that the authors failed to provide a complete description of the ways they obtained the data, the operationalizations of the constructs that they used, and the types of analyses that they conducted. When authors have collected their data—a primary data collection—it is important for them to explain in detail not only what happened, but why they made certain decisions. A good example is found in Bommer, Dierdorff, and Rubin’s (2007) study of group-level citizenship behaviors and job performance. We learn in their Methods how the participants were contacted (i.e., on site, by the study’s first author), how the data were obtained (i.e., in an on-site training room, from groups of 20–30 employees), what kinds of encouragement for participation were used (i.e., letters from both the company president and the researchers), and who reported the information for different constructs in the model (i.e., employees, supervisors, and managers of the supervisors). In addition, these authors reported other relevant pieces of information about their data collection. For example, they noted that employees and their supervisors were never scheduled to complete their questionnaires in the same room together. In addition, they reported a system of “checks and balances” to make sure supervisors reported performance for all of their direct reports. Providing these details, in addition to a full description of the characteristics of the analysis sample at the individual and team levels, allows reviewers to evaluate the strengths and weaknesses of a research design. Although it is reasonable to highlight the strengths of one’s research, reporting sufficient details on the strengths and potential weaknesses of the data collection is preferred over an approach that conceals important details, because certain compromises or flaws can also yield advantages. Consider the example of data collected with a snowball sampling approach in two waves separated by a few months. A disadvantage of this approach would likely be that the sample matched over the two waves will be smaller than the sample resulting if the researchers only contact wave 1 participants to participate in wave 2. But, this approach also has certain advantages. In particular, large numbers of one-wave participants (i.e., those that participated either in the first wave or the second wave) can be used to address response bias and representativeness issues straightforwardly.
In many other cases, the data for a study were obtained from archival sources. Here a researcher may not have access to all the nitty-gritty details of the data collection procedures, but completeness in reporting is no less important. Most, if not all, archival data sets come with technical reports or usage manuals that provide a good deal of detail. Armed with these, the researcher can attempt to replicate the detail of the data collection procedures and measures that is found in primary data collections. For a good example, using the National Longitudinal Survey and Youth Cohort (NLSY79), see Lee, Gerhart, Weller, and Trevor (2008). For other archival data collections, authors construct the dataset themselves, perhaps by coding corporate filings, media accounts, or building variables from other sources. In these cases, a complete description of how they identified the sample, how many observations were lost for different reasons, how they conducted the coding, and what judgment calls were made are necessary.
Regardless of the type of data set a researcher has used, the goals in this section are the same. First, authors should disclose the hows, whats, and whys of the research procedures. Including an Appendix with a full list of measures (and items, where appropriate), for example, is often a nice touch. Second, completeness allows readers to evaluate the advantages and disadvantages of the approach taken, which on balance, creates a more positive impression of the study. Third, a primary goal of the Methods section should be to provide sufficient information that someone could replicate the study and get the same results, if they used exactly the same procedure and data. After reading the Methods section, readers should have confidence that they could replicate the primary data collection or compile the same archival database that the authors are reporting.
Clarity
Far too often, authors fail to clearly explain what they have done. Although there are many potential examples, a typical, very common, problem concerns descriptions of measures. Reviewers are often concerned with language such as “we adapted items” or “we used items from several sources.” Indeed, not reportinghowmeasures were adapted was the modal issue related to measurement in the evaluation of our decision letters. Ideally, authors can avoid these problems simply by using the full, validated measures of constructs when they are available. When this is not possible, it is imperative to provide a justification for the modifications and, ideally, to provide additional, empirical validation of the altered measures. If this information is not initially included, reviewers will invariably ask for it; providing the information up front improves the chances of a revision request.
Another very common clarity issue concerns the justification for variable coding. Coding decisions are made in nearly every quantitative study, but are perhaps most frequently seen in research involving archival data sets, experimental designs, and assignment of numerical codes based on qualitative responses. For example, Ferrier (2001) used structured content analysis to code news headlines for measures of competitive attacks. In an excellent example of clarity, Ferrier described in an organized fashion and with straightforward language how the research team made the coding decisions for each dimension and how these decisions resulted in operationalizations that matched the constitutive definitions of the competitive attack dimensions.
Credibility
Authors can do several uncomplicated things to enhance perceptions of credibility in their Methods sections. First, it is important to address why a particular sample was chosen. Reviewers often question why a particular sample was used, especially when it is not immediately obvious why the phenomenon of interest is important in the setting used. For example, in Tangirala and Ramanujam’s study of voice, personal control, and organizational identification, the authors opened the Methods by describing why they chose to sample front-line hospital nurses to test their hypotheses, noting (1) “they are well positioned to observe early signs of unsafe conditions in patient care and bring them to the attention of the hospital” and (2) “there is a growing recognition that the willingness of nurses to speak up about problems in care delivery is critical for improving patient safety and reducing avoidable medical errors (such as administration of the wrong drug), a leading cause of patient injury and death in the United States” (2008: 1,193). Second, it is always good practice to summarize the conceptual definition of a construct before describing the measure used for it. This not only makes it easier for readers—they don’t have to flip back and forth in the paper to find the constitutive definitions— but when done well will lessen reader concerns about whether the theory a paper presents matches the tests that were conducted. Third, it is always important to explain why a particular operationalization was used. For example, organizational performance has numerous dimensions. Some may be relevant to the hypotheses at hand, and others are not. We have often seen authors “surprise” reviewers by introducing certain dimensions with no justification. In cases in which alternative measures are available, authors should report what other measures they considered and why they were not chosen. If alternative measures are available in the data set, it is often a good idea to report the findings obtained when those alternative measures were used. Fourth, it is crucial to justify model specification and data analysis approaches. We have often seen authors include control variables without sufficiently justifying why they should be controlled for. For some types of data, multiple possible methods for analysis exist. Authors need to justify why one method rather than the other(s) was used. Panel data, for example, can be analyzed using fixed-effect models or random-effect models. Multiple event history analysis methods can analyze survival data. Each method has its specific assumption(s). In some cases, additional analysis is warranted to make the choice (for example, doing a Hausman test to choose between fixed- and random-effect models for panel data).
THE RESULTS
Completeness
Effectively writing a Results section is not an easy task, especially when one’s theoretical framework and/or research design is complex, making completeness all the more important. For starters, including a table for means, standard deviation, and correlations is a piece of “low-hanging fruit.” The information in this table may not have directly tested hypotheses, yet it paints an overall picture of the data, which is critical for judging the credibility of findings. For example, high correlations between variables often raise concerns about multicollinearity. A large standard deviation relative to the mean of a variable can raise concerns about outliers. Indeed it is a good practice to check data ranges and outliers in the process of data analyses so as to avoid having significant findings mainly driven by a few outliers. Distributional properties of variables (such as means and minimum and maximum values) reported in a table are informative by themselves. For example, in a study on CEO succession, means of variables that measured different types of CEO successions can tell the distribution of new CEOs in the sample recruited from different sources. These distributional properties describe the phenomenon of CEO successions and have important practical implications.
In reporting results, it is important to specify the unit of analysis, sample size, and dependent variable used in each model. This is especially crucial when such information varies across models. Take Arthaud-Day, Certo, Dalton, and Dalton (2006) as an example. These authors examined executive and director turnover following corporate financial restatements. They had four dependent variables: CEO turnover, CFO turnover, outside director turnover, and auditing commitment member turnover. In models of CEO and CFO turnover, because they were able to identify the month of the turnover, they constructed the data using “CEO/CFO” as the unit of analysis and used a Cox model to examine the timing of the executive turnover. The sample size of the model on CEO turnover was 485, and the sample size of the model on CFO turnover was 407. In comparison, in examining turnover of outside directors and audit committee members, because Arthaud-Day and her colleagues were unable to determine the month in which outside directors and audit committee members left office, they constructed the data using director/auditing committee member-year as the unit of analysis and used logistic regression to examine the likelihood of their turnover. The sample size of the model on outside director turnover was 2,668, and the sample size for auditing committee member turnover was 1,327. The take-away here is that careful descriptions such as those Arthaud-Day and colleagues provided help readers calibrate their interpretations of results and prevent reviewers from raising questions about clarification.
Clarity
The purpose of a Results section is to answer the research questions that have been posed and provide empirical evidence for the hypotheses (or note that evidence is lacking). We often see, however, that authors do not relate their findings to the study’s hypotheses. We also see that authors report the results in the Results section, but discuss their linkage with hypotheses in the Discussion section or, conversely, begin to discuss the implications of the findings in the Results prematurely, rather than doing this in the Discussion. In these cases, the authors fail to describe what the results indicate with respect to the focal topic of the study in a clear manner. To avoid this problem, it helps to summarize each hypothesis before reporting the related results. Try this format: “Hypothesis X suggests that . . . We find that . . . in model . . . in Table . . . Thus, Hypothesis X is (or isn’t) supported.” Although this format may sound mechanical or even boring, it is a very effective way to clearly report results (see also Bem, 1987). We encourage and welcome authors to experiment with novel andclearways to present results. We also suggest that authors report the results associated with their hypotheses in order, beginning with the first hypothesis and continuing sequentially to the last one, unless some compelling reasons suggest that a different order is better.
In many studies, the results do not support all the hypotheses. Yet results that are not statistically significant and those with signs opposite to prediction are just as important as those that are supported. However, as one editor noted, “If the results are contrary to expectations, I find authors will often try to ‘sweep them under the rug.’” Of course, reviewers will catch this immediately. Needless to say, sometimes such results reflect inadequate theorizing (e.g., the hypotheses are wrong, or at least there are alternative arguments and predictions). Other times, however, unsupported results are great fodder for new, critical thinking in a Discussion section. The point is that all results—significant or not, supporting or opposite to hypotheses— need to be addressed directly and clearly.
It is also a good practice to reference variables across sections in the same order—for example, describe their measures in the Methods section, list them in tables, and discuss results in the Results section all in the same order. Such consistency improves the clarity of exposition and helps readers to both follow the manuscript and find information easily. It also provides authors with a checklist so that they will remember to include relevant information (e.g., a variable included in the models is not mentioned in the Methods section and/or in the correlation matrix).
Credibility
Although every part of a paper plays an important role in helping or hurting its credibility (e.g., adequate theorizing and rigorous research design), there are some things authors can do in their Results sections to enhance the perceived credibility of findings. First, it is crucial to demonstrate to readers why one’s interpretations of results are correct. For example, a negative coefficient for an interaction term may suggest that the positive effect of the predictor became weaker, or disappeared, or even became negative as the value of the moderator increased. Plotting a significant interaction effect helps one visualize the finding and thus demonstrate whether the finding is consistent with the intended hypothesis. Aiken and West (1991) provided some “golden rules” on how to plot interaction effects in regressions. Beyond these, determining whether the simple slopes are statistically significant is often important in assessing whether one’s results fully support hypotheses; techniques developed by Preacher, Curran, and Bauer (2006) are helpful in these calculations.
Second, if alternative measurements, methods, and/or model specifications could be used for a study, but authors only report results using one possible choice, readers may have the impression that the authors “cherry-picked” findings that were consistent with the hypotheses. Supplementary analyses and robustness checks can address these concerns. For example, Tsai and Ghoshal (1998) examined the value creation role of a business unit’s position in intrafirm networks. Although they proposed the hypotheses at the individual business unit level, they generated several measures of business units’ attributes from data at the dyadic level. These steps raised some concerns about level of analysis and the reliability of the results. To address these concerns, they also analyzed data at the dyadic level and obtained consistent results.
Third, even if a result is statistically significant, readers may still ask, So what? A statistically significant effect is not necessarily a practically important effect. Authors typically discuss the practical implications of a study in their Discussion; they can, however, conduct and report additional analyses in Results to demonstrate the practical relevance of findings. A good example is found in Barnett and King’s (2008) study of spillover harm. These authors stated the following Hypothesis 1: “An error at one firm harms other firms in the same industry” (Barnett & King, 2008: 1,153). In addition to reporting the statistical significance of the predictor, the authors provided information to communicate the average scale of such spillovers. They reported that “following an accident that injured an average number of employees (3.5), a chemical firm with operations in the same industry as that in which an accident occurred could expect to lose 0.15 percent of its stock price” and that “after an accident that caused the death of an employee, the firm could expect to lose an additional 0.83 percent” (Barnett & King, 2008: 1,160). In other cases, authors may want to discuss the implications of small effect sizes, perhaps by noting how difficult it is to explain variance in a given dependent variable or, in the case, of an experiment, noting that a significant effect was found even though the manipulation of the independent variable was quite minimal (Prentice & Miller, 1992).
Conclusions
Crafting Methods and Results sections may not sound exciting or challenging. As a result, authors tend to pay less attention in writing them. Sometimes these sections are delegated to the junior members of research teams. However, in our experience as editors, we find that these sections often play a major, if not a critical, role in reviewers’ evaluations of a manuscript. We urge authors to take greater care in crafting these sections. The three-C rule—completeness, clarity, and credibility—is one recipe to follow in that regard.
REFERENCES
Aiken, L. S., & West, S. G. 1991.Multiple regression: Testing and interpreting interactions.Newbury Park, CA: Sage.
Arthaud-Day, M. L., Certo, S. T., Dalton, C. M., & Dalton, D. R. 2006. A changing of the guard: Executive and director turnover following corporate financial restatements.Academy of Management Journal,49: 1119–1136.
Barnett, M. L., & King, A. A. 2008. Good fences make good neighbors: A longitudinal analysis of an industry self-regulatory institution.Academy of Management Journal,51: 1150–1170.
Bem, D. J. 1987. Writing the empirical journal article. In M. P. Zanna & J. M. Darley, (Eds.),The compleat academic: A practical guide for the beginning social scientist:171–201. New York: Random House.
Bommer, W. H., Dierdorff, E. C., & Rubin, R. S. 2007. Does prevalence mitigate relevance? The moderating effect of group-level OCB on employee performance.Academy of Management Journal,50: 1481–1494.
Ferrier, W. J. 2001. Navigating the competitive landscape: The drivers and consequences of competitive aggressiveness.Academy of Management Journal,44: 858–877.
Lee, T. H., Gerhart, B., Weller, I., & Trevor, C. O. 2008. Understanding voluntary turnover: Path-specific job satisfaction effects and the importance of unsolicited job offers.Academy of Management Journal,51: 651–671.
Preacher, K. J., Curran, P. J., & Bauer, D. J. 2006. Computational tools for probing interaction effects in multiple linear regression, multilevel modeling, and latent curve analysis.Journal of Educational and Behavioral Statistics,31: 437–448.
Prentice, D. A., & Miller, D. T. 1992. When small effects are impressive.Psychological Bulletin,112: 160– 164.
Tangirila, S., & Ramanujam, R. 2008. Exploring nonlinearity in employee voice: The effects of personal control and organizational identification.Academy of Management Journal,51: 1189–1203.
Tsai, W., & Ghoshal, S. 1998. Social capital and value creation: The role of intrafirm networks.Academy of Management Journal,41: 464–474.
Yan (Anthea) Zhang
Rice University
Jason D. Shaw
University of Minnesota
有話要說...